提交 · c05c2ec96bb8b7310da1055c7b9d786a3ec6dc0c · openeuler / raspberrypi-kernel

23 3月, 2016 1 次提交

ext4: in ext4_dir_llseek, check syscall bitness directly · 121cef8f

由 Andy Lutomirski 提交于 3月 22, 2016

ext4 treats directory offsets differently for 32-bit and 64-bit callers.
Check the caller type using in_compat_syscall, not is_compat_task.  This
changes behavior on SPARC slightly.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

121cef8f

14 3月, 2016 3 次提交

ext4: clean up error handling in the MMP support · 03046886

由 vikram.jadhav07 提交于 3月 13, 2016

There is memory leak as both caller function kmmpd() and callee
read_mmp_block() not releasing bh_check  (i.e buffer_head).
Given patch fixes this problem.

[ Additional changes suggested by Andreas Dilger -- TYT ]
Signed-off-by: NJadhav Vikram <vikramjadhavpucsd2007@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

03046886

ext4: use __GFP_NOFAIL in ext4_free_blocks() · adb7ef60

由 Konstantin Khlebnikov 提交于 3月 13, 2016

This might be unexpected but pages allocated for sbi->s_buddy_cache are
charged to current memory cgroup. So, GFP_NOFS allocation could fail if
current task has been killed by OOM or if current memory cgroup has no
free memory left. Block allocator cannot handle such failures here yet.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

adb7ef60

ext4: fix compile error while opening the macro DOUBLE_CHECK · a2821e34

由 Aihua Zhang 提交于 3月 13, 2016

the error is:
    fs/ext4/mballoc.c:475:43: error: 'struct ext4_group_info' has
no member named 'bb_bitmap'.
    so, the definition of macro DOUBLE_CHECK should before
'struct ext4_group_info', I fixed it, and I moved the macro
AGGRESSIVE_CHECK together, because I think they shoule be together.
Signed-off-by: NAihua Zhang <zhangaihua1@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

a2821e34

13 3月, 2016 2 次提交

ext4: print ext4 mount option data_err=abort correctly · 7915a861

由 Ales Novak 提交于 3月 12, 2016

If data_err=abort option is specified for an ext3/ext4 mount,
/proc/mounts does show it as "(null)". This is caused by token2str()
returning NULL for Opt_data_err_abort (due to its pattern containing
'=').
Signed-off-by: NAles Novak <alnovak@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

7915a861

ext4: fix NULL pointer dereference in ext4_mark_inode_dirty() · 5e1021f2

由 Eryu Guan 提交于 3月 12, 2016

ext4_reserve_inode_write() in ext4_mark_inode_dirty() could fail on
error (e.g. EIO) and iloc.bh can be NULL in this case. But the error is
ignored in the following "if" condition and ext4_expand_extra_isize()
might be called with NULL iloc.bh set, which triggers NULL pointer
dereference.

This is uncovered by commit 8b4953e1 ("ext4: reserve code points for
the project quota feature"), which enlarges the ext4_inode size, and
run the following script on new kernel but with old mke2fs:

  #/bin/bash
  mnt=/mnt/ext4
  devname=ext4-error
  dev=/dev/mapper/$devname
  fsimg=/home/fs.img

  trap cleanup 0 1 2 3 9 15

  cleanup()
  {
          umount $mnt >/dev/null 2>&1
          dmsetup remove $devname
          losetup -d $backend_dev
          rm -f $fsimg
          exit 0
  }

  rm -f $fsimg
  fallocate -l 1g $fsimg
  backend_dev=`losetup -f --show $fsimg`
  devsize=`blockdev --getsz $backend_dev`

  good_tab="0 $devsize linear $backend_dev 0"
  error_tab="0 $devsize error $backend_dev 0"

  dmsetup create $devname --table "$good_tab"

  mkfs -t ext4 $dev
  mount -t ext4 -o errors=continue,strictatime $dev $mnt

  dmsetup load $devname --table "$error_tab" && dmsetup resume $devname
  echo 3 > /proc/sys/vm/drop_caches
  ls -l $mnt
  exit 0

[ Patch changed to simplify the function a tiny bit. -- Ted ]
Signed-off-by: NEryu Guan <guaneryu@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

5e1021f2

10 3月, 2016 8 次提交

ext4: drop unneeded BUFFER_TRACE in ext4_delete_inline_entry() · a8ed9b86

由 Geliang Tang 提交于 3月 10, 2016

BUFFER_TRACE info "call ext4_handle_dirty_metadata" doesn't match the
code, so drop it.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

a8ed9b86

ext4: fix misspellings in comments. · b8a07463

由 Adam Buchbinder 提交于 3月 09, 2016

Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b8a07463

ext4: more efficient SEEK_DATA implementation · 2d90c160

由 Jan Kara 提交于 3月 09, 2016

Using SEEK_DATA in a huge sparse file can easily lead to sotflockups as
ext4_seek_data() iterates hole block-by-block. Fix the problem by using
returned hole size from ext4_map_blocks() and thus skip the hole in one
go.

Update also SEEK_HOLE implementation to follow the same pattern as
SEEK_DATA to make future maintenance easier.

Furthermore we add cond_resched() to both ext4_seek_data() and
ext4_seek_hole() to avoid softlockups in case evil user creates huge
fragmented file and we have to go through lots of extents.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

2d90c160

ext4: cleanup handling of bh->b_state in DAX mmap · e3fb8eb1

由 Jan Kara 提交于 3月 09, 2016

ext4_dax_mmap_get_block() updates bh->b_state directly instead of using
ext4_update_bh_state(). This is mostly a cosmetic issue since DAX code
always passes on-stack buffer_head but clean this up to make code more
uniform.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

e3fb8eb1

ext4: return hole from ext4_map_blocks() · facab4d9

由 Jan Kara 提交于 3月 09, 2016

Currently, ext4_map_blocks() just returns 0 when it finds a hole and
allocation is not requested. However we have all the information
available to tell how large the hole actually is and there are callers
of ext4_map_blocks() which would save some block-by-block hole iteration
if they knew this information. So fill in struct ext4_map_blocks even
for holes with the information we have. We keep returning 0 for holes to
maintain backward compatibility of the function.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

facab4d9

ext4: factor out determining of hole size · 140a5250

由 Jan Kara 提交于 3月 09, 2016

ext4_ext_put_gap_in_cache() determines hole size in the extent tree,
then trims this with possible delayed allocated blocks, and inserts the
result into the extent status tree. Factor out determination of the size
of the hole in the extent tree as we will need this information in
ext4_ext_map_blocks() as well.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

140a5250

ext4: fix setting of referenced bit in ext4_es_lookup_extent() · 87d8a74b

由 Jan Kara 提交于 3月 09, 2016

We were setting referenced bit on the extent structure we return from
ext4_es_lookup_extent() which is just a private structure on stack. Thus
setting had no effect. Set the bit in the structure in the status tree
instead.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

87d8a74b

ext4: iterate over buffer heads correctly in move_extent_per_page() · 6ffe77ba

由 Eryu Guan 提交于 2月 21, 2016

In commit bcff2488 ("ext4: don't read blocks from disk after extents
being swapped") bh is not updated correctly in the for loop and wrong
data has been written to disk. generic/324 catches this on sub-page
block size ext4.

Fixes: bcff2488 ("ext4: don't read blocks from disk after extentsbeing swapped")
Signed-off-by: NEryu Guan <guaneryu@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

6ffe77ba

09 3月, 2016 6 次提交

ext4: remove i_ioend_count · 600be30a

由 Jan Kara 提交于 3月 08, 2016

Remove counter of pending io ends as it is unused.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

600be30a

ext4: simplify io_end handling for AIO DIO · 109811c2

由 Jan Kara 提交于 3月 08, 2016

When mapping blocks for direct IO, we allocate io_end structure before
mapping blocks and store pointer to it in the inode. This creates a
requirement that any AIO DIO using io_end must be protected by i_mutex.
This created problems in the past with dioread_nolock mode which was
corrupting io_end pointers. Also io_end is allocated unnecessarily in
case where we don't need to convert any extents (which is a common case
for example when overwriting file).

We fix the problem by allocating io_end only once we return unwritten
extent from block mapping function for AIO DIO (so we can save some
pointless io_end allocations) and we pass pointer to it in bh->b_private
which generic DIO code later passes to our end IO callback. That way we
remove any need for global pointer to io_end structure and thus fix the
races.

The downside of this change is that the checking for unwritten IO in
flight in ext4_extents_can_be_merged() is more racy since we now
increment i_unwritten / set EXT4_STATE_DIO_UNWRITTEN only after dropping
i_data_sem. However the check has been racy already before because
ext4_writepages() already increment i_unwritten after dropping
i_data_sem and reserved blocks save us from hitting ENOSPC in the worst
case.
Signed-off-by: NJan Kara <jack@suse.cz>

109811c2

ext4: move trans handling and completion deferal out of _ext4_get_block · efe70c29

由 Jan Kara 提交于 3月 08, 2016

There is no need to handle starting of a transaction and deferal of DIO
completion in _ext4_get_block() function. We can move this out to get
block functions for direct IO that need it. That way we can add stricter
checks verifying things work as we expect.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

efe70c29

ext4: rename and split get blocks functions · 705965bd

由 Jan Kara 提交于 3月 08, 2016

Rename ext4_get_blocks_write() to ext4_get_blocks_unwritten() to better
describe what it does. Also split out get blocks functions for direct
IO. Later we move functionality from _ext4_get_blocks() there. There's no
functional change in this patch.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

705965bd

ext4: use i_mutex to serialize unaligned AIO DIO · e142d052

由 Jan Kara 提交于 3月 08, 2016

Currently we've used hashed aio_mutex to serialize unaligned AIO DIO.
However the code cleanups that happened after 2011 when the lock was
introduced made aio_mutex acquired at almost the same places where we
already have exclusion using i_mutex. So just use i_mutex for the
exclusion of unaligned AIO DIO.

The change moves waiting for pending unwritten extent conversion under
i_mutex. That makes special handling of O_APPEND writes unnecessary and
also avoids possible livelocking of unaligned AIO DIO with aligned one
(nothing was preventing contiguous stream of aligned AIO DIOs to let
unaligned AIO DIO wait forever).
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

e142d052

ext4: pack ioend structure better · 3bd6ad7b

由 Jan Kara 提交于 3月 08, 2016

On 64-bit architectures we have two 4-byte holes in struct ext4_io_end.
Order entries better to avoid this and thus make the structure occupy
64 instead of 72 bytes for 64-bit architectures.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

3bd6ad7b

29 2月, 2016 1 次提交

ext4: Fix data exposure after failed AIO DIO · 74c66bcb

由 Jan Kara 提交于 2月 29, 2016

When AIO DIO fails e.g. due to IO error, we must not convert unwritten
extents as that will expose uninitialized data. Handle this case
by clearing unwritten flag from io_end in case of error and thus
preventing extent conversion.
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

74c66bcb

28 2月, 2016 4 次提交

ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite() · 1e9d180b

由 Ross Zwisler 提交于 2月 27, 2016

As it is currently written ext4_dax_mkwrite() assumes that the call into
__dax_mkwrite() will not have to do a block allocation so it doesn't create
a journal entry.  For a read that creates a zero page to cover a hole
followed by a write that actually allocates storage this is incorrect.  The
ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
get_blocks() to allocate storage.

Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
as this function already has all the logic needed to allocate a journal
entry and call __dax_fault().

Also update the ext2 fault handlers in this same way to remove duplicate
code and keep the logic between ext2 and ext4 the same.
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

1e9d180b

dax: move writeback calls into the filesystems · 7f6d5b52

由 Ross Zwisler 提交于 2月 26, 2016

Previously calls to dax_writeback_mapping_range() for all DAX filesystems
(ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range().

dax_writeback_mapping_range() needs a struct block_device, and it used
to get that from inode->i_sb->s_bdev.  This is correct for normal inodes
mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw
block devices and for XFS real-time files.

Instead, call dax_writeback_mapping_range() directly from the filesystem
->writepages function so that it can supply us with a valid block
device.  This also fixes DAX code to properly flush caches in response
to sync(2).
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7f6d5b52

ext4: online defrag not supported with DAX · 73f34a5e

由 Ross Zwisler 提交于 2月 26, 2016

Online defrag operations for ext4 are hard coded to use the page cache.
See ext4_ioctl() -> ext4_move_extents() -> move_extent_per_page()

When combined with DAX I/O, which circumvents the page cache, this can
result in data corruption.  This was observed with xfstests ext4/307 and
ext4/308.

Fix this by only allowing online defrag for non-DAX files.
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73f34a5e

ext2, ext4: only set S_DAX for regular inodes · 0a6cf913

由 Ross Zwisler 提交于 2月 26, 2016

When S_DAX is set on an inode we assume that if there are pages attached
to the mapping (mapping->nrpages != 0), those pages are clean zero pages
that were used to service reads from holes.  Any dirty data associated
with the inode should be in the form of DAX exceptional entries
(mapping->nrexceptional) that is written back via
dax_writeback_mapping_range().

With the current code, though, this isn't always true.  For example,
ext2 and ext4 directory inodes can have S_DAX set, but have their dirty
data stored as dirty page cache entries.  For these types of inodes,
having S_DAX set doesn't really make sense since their I/O doesn't
actually happen through the DAX code path.

Instead, only allow S_DAX to be set for regular inodes for ext2 and
ext4.  This allows us to have strict DAX vs non-DAX paths in the
writeback code.
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0a6cf913

23 2月, 2016 6 次提交

ext4: trim unused parameter from convert_initialized_extent() · 29c6eaff

由 Eric Whitney 提交于 2月 22, 2016

The flags parameter is also unused.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

29c6eaff

mbcache: add reusable flag to cache entries · 6048c64b

由 Andreas Gruenbacher 提交于 2月 22, 2016

To reduce amount of damage caused by single bad block, we limit number
of inodes sharing an xattr block to 1024. Thus there can be more xattr
blocks with the same contents when there are lots of files with the same
extended attributes. These xattr blocks naturally result in hash
collisions and can form long hash chains and we unnecessarily check each
such block only to find out we cannot use it because it is already
shared by too many inodes.

Add a reusable flag to cache entries which is cleared when a cache entry
has reached its maximum refcount.  Cache entries which are not marked
reusable are skipped by mb_cache_entry_find_{first,next}. This
significantly speeds up mbcache when there are many same xattr blocks.
For example for xattr-bench with 5 values and each process handling
20000 files, the run for 64 processes is 25x faster with this patch.
Even for 8 processes the speedup is almost 3x. We have also verified
that for situations where there is only one xattr block of each kind,
the patch doesn't have a measurable cost.

[JK: Remove handling of setting the same value since it is not needed
anymore, check for races in e_reusable setting, improve changelog,
add measurements]
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

6048c64b

ext4: shortcut setting of xattr to the same value · 3fd16462

由 Jan Kara 提交于 2月 22, 2016

When someone tried to set xattr to the same value (i.e., not changing
anything) we did all the work of removing original xattr, possibly
breaking references to shared xattr block, inserting new xattr, and
merging xattr blocks again. Since this is not so rare operation and it
is relatively cheap for us to detect this case, check for this and
shortcut xattr setting in that case.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

3fd16462

ext4: kill ext4_mballoc_ready · 2335d05f

由 Andreas Gruenbacher 提交于 2月 22, 2016

This variable, introduced in commit 9c191f70, is unnecessary: it is set
once the module has been initialized correctly, and ext4_fill_super
cannot run unless the module has been initialized correctly.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

2335d05f

mbcache2: rename to mbcache · 7a2508e1

由 Jan Kara 提交于 2月 22, 2016

Since old mbcache code is gone, let's rename new code to mbcache since
number 2 is now meaningless. This is just a mechanical replacement.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

7a2508e1

ext4: convert to mbcache2 · 82939d79

由 Jan Kara 提交于 2月 22, 2016

The conversion is generally straightforward. The only tricky part is
that xattr block corresponding to found mbcache entry can get freed
before we get buffer lock for that block. So we have to check whether
the entry is still valid after getting buffer lock.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

82939d79

22 2月, 2016 2 次提交

ext4: iterate over buffer heads correctly in move_extent_per_page() · 87f9a031

由 Eryu Guan 提交于 2月 21, 2016

In commit bcff2488 ("ext4: don't read blocks from disk after extents
being swapped") bh is not updated correctly in the for loop and wrong
data has been written to disk. generic/324 catches this on sub-page
block size ext4.

Fixes: bcff2488 ("ext4: don't read blocks from disk after extentsbeing swapped")
Signed-off-by: NEryu Guan <guaneryu@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

87f9a031

ext4: make sure to revoke all the freeable blocks in ext4_free_blocks · f96c450d

由 Daeho Jeong 提交于 2月 21, 2016

Now, ext4_free_blocks() doesn't revoke data blocks of per-file data
journalled inode and it can cause file data inconsistency problems.
Even though data blocks of per-file data journalled inode are already
forgotten by jbd2_journal_invalidatepage() in advance of invoking
ext4_free_blocks(), we still need to revoke the data blocks here.
Moreover some of the metadata blocks, which are not found by
sb_find_get_block(), are still needed to be revoked, but this is also
missing here.
Signed-off-by: NDaeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

f96c450d

20 2月, 2016 1 次提交

ext4: Make Q_GETNEXTQUOTA work for quota in hidden inodes · 6332b9b5

由 Eric Sandeen 提交于 2月 19, 2016

We forgot to set .get_nextdqblk operation in quotactl_ops structure used
by ext4 when quota is using hidden inode thus the operation was not
really supported. Fix the omission.
Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
Signed-off-by: NJan Kara <jack@suse.cz>

6332b9b5

19 2月, 2016 2 次提交

ext4: fix crashes in dioread_nolock mode · 74dae427

由 Jan Kara 提交于 2月 19, 2016

Competing overwrite DIO in dioread_nolock mode will just overwrite
pointer to io_end in the inode. This may result in data corruption or
extent conversion happening from IO completion interrupt because we
don't properly set buffer_defer_completion() when unlocked DIO races
with locked DIO to unwritten extent.

Since unlocked DIO doesn't need io_end for anything, just avoid
allocating it and corrupting pointer from inode for locked DIO.
A cleaner fix would be to avoid these games with io_end pointer from the
inode but that requires more intrusive changes so we leave that for
later.

Cc: stable@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

74dae427

ext4: fix bh->b_state corruption · ed8ad838

由 Jan Kara 提交于 2月 19, 2016

ext4 can update bh->b_state non-atomically in _ext4_get_block() and
ext4_da_get_block_prep(). Usually this is fine since bh is just a
temporary storage for mapping information on stack but in some cases it
can be fully living bh attached to a page. In such case non-atomic
update of bh->b_state can race with an atomic update which then gets
lost. Usually when we are mapping bh and thus updating bh->b_state
non-atomically, nobody else touches the bh and so things work out fine
but there is one case to especially worry about: ext4_finish_bio() uses
BH_Uptodate_Lock on the first bh in the page to synchronize handling of
PageWriteback state. So when blocksize < pagesize, we can be atomically
modifying bh->b_state of a buffer that actually isn't under IO and thus
can race e.g. with delalloc trying to map that buffer. The result is
that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.

Fix the problem by always updating bh->b_state bits atomically.

CC: stable@vger.kernel.org
Reported-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ed8ad838

16 2月, 2016 1 次提交

ext4: fix memleak in ext4_readdir() · c906f38e

由 Kirill Tkhai 提交于 2月 16, 2016

When ext4_bread() fails, fname_crypto_str remains
allocated after return. Fix that.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
CC: Dmitry Monakhov <dmonakhov@virtuozzo.com>

c906f38e

12 2月, 2016 3 次提交

ext4: remove unused parameter "newblock" in convert_initialized_extent() · 56263b4c

由 Eryu Guan 提交于 2月 12, 2016

The "newblock" parameter is not used in convert_initialized_extent(),
remove it.
Signed-off-by: NEryu Guan <guaneryu@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

56263b4c

ext4: don't read blocks from disk after extents being swapped · bcff2488

由 Eryu Guan 提交于 2月 12, 2016

I notice ext4/307 fails occasionally on ppc64 host, reporting md5
checksum mismatch after moving data from original file to donor file.

The reason is that move_extent_per_page() calls __block_write_begin()
and block_commit_write() to write saved data from original inode blocks
to donor inode blocks, but __block_write_begin() not only maps buffer
heads but also reads block content from disk if the size is not block
size aligned.  At this time the physical block number in mapped buffer
head is pointing to the donor file not the original file, and that
results in reading wrong data to page, which get written to disk in
following block_commit_write call.

This also can be reproduced by the following script on 1k block size ext4
on x86_64 host:

    mnt=/mnt/ext4
    donorfile=$mnt/donor
    testfile=$mnt/testfile
    e4compact=~/xfstests/src/e4compact

    rm -f $donorfile $testfile

    # reserve space for donor file, written by 0xaa and sync to disk to
    # avoid EBUSY on EXT4_IOC_MOVE_EXT
    xfs_io -fc "pwrite -S 0xaa 0 1m" -c "fsync" $donorfile

    # create test file written by 0xbb
    xfs_io -fc "pwrite -S 0xbb 0 1023" -c "fsync" $testfile

    # compute initial md5sum
    md5sum $testfile | tee md5sum.txt
    # drop cache, force e4compact to read data from disk
    echo 3 > /proc/sys/vm/drop_caches

    # test defrag
    echo "$testfile" | $e4compact -i -v -f $donorfile
    # check md5sum
    md5sum -c md5sum.txt

Fix it by creating & mapping buffer heads only but not reading blocks
from disk, because all the data in page is guaranteed to be up-to-date
in mext_page_mkuptodate().

Cc: stable@vger.kernel.org
Signed-off-by: NEryu Guan <guaneryu@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

bcff2488

ext4: fix potential integer overflow · 46901760

由 Insu Yun 提交于 2月 12, 2016

Since sizeof(ext_new_group_data) > sizeof(ext_new_flex_group_data),
integer overflow could be happened.
Therefore, need to fix integer overflow sanitization.

Cc: stable@vger.kernel.org
Signed-off-by: NInsu Yun <wuninsu@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

46901760