提交 · c27e43a10c9755231f8a1c618efc1ac299dd5007 · openeuler / raspberrypi-kernel

22 6月, 2015 2 次提交

ext4: minor cleanup of ext4_da_reserve_space() · c27e43a1

由 Eric Whitney 提交于 6月 21, 2015

Remove outdated comments and dead code from ext4_da_reserve_space.
Clean up its trace point, and relocate it to make it more useful.

While we're at it, fix a nearby conditional used to determine if
we have a non-bigalloc file system.  It doesn't match usage elsewhere
in the code, and misleadingly suggests that an s_cluster_ratio value
of 0 would be legal.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

c27e43a1

ext4: don't retry file block mapping on bigalloc fs with non-extent file · 292db1bc

由 Darrick J. Wong 提交于 6月 21, 2015

ext4 isn't willing to map clusters to a non-extent file.  Don't signal
this with an out of space error, since the FS will retry the
allocation (which didn't fail) forever.  Instead, return EUCLEAN so
that the operation will fail immediately all the way back to userspace.

(The fix is either to run e2fsck -E bmap2extent, or to chattr +e the file.)
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

292db1bc

21 6月, 2015 3 次提交

ext4: prevent ext4_quota_write() from failing due to ENOSPC · c5e298ae

由 Theodore Ts'o 提交于 6月 21, 2015

In order to prevent quota block tracking to be inaccurate when
ext4_quota_write() fails with ENOSPC, we make two changes.  The quota
file can now use the reserved block (since the quota file is arguably
file system metadata), and ext4_quota_write() now uses
ext4_should_retry_alloc() to retry the block allocation after a commit
has completed and released some blocks for allocation.

This fixes failures of xfstests generic/270:

Quota error (device vdc): write_blk: dquota write failed
Quota error (device vdc): qtree_write_dquot: Error -28 occurred while creating quota
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

c5e298ae

ext4: call sync_blockdev() before invalidate_bdev() in put_super() · 89d96a6f

由 Theodore Ts'o 提交于 6月 20, 2015

Normally all of the buffers will have been forced out to disk before
we call invalidate_bdev(), but there will be some cases, where a file
system operation was aborted due to an ext4_error(), where there may
still be some dirty buffers in the buffer cache for the device. So
try to force them out to memory before calling invalidate_bdev().

This fixes a warning triggered by generic/081:

WARNING: CPU: 1 PID: 3473 at /usr/projects/linux/ext4/fs/block_dev.c:56 __blkdev_put+0xb5/0x16f()
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

89d96a6f

jbd2: speedup jbd2_journal_dirty_metadata() · 2143c196

由 Jan Kara 提交于 6月 20, 2015

It is often the case that we mark buffer as having dirty metadata when
the buffer is already in that state (frequent for bitmaps, inode table
blocks, superblock). Thus it is unnecessary to contend on grabbing
journal head reference and bh_state lock. Avoid that by checking whether
any modification to the buffer is needed before grabbing any locks or
references.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

2143c196

16 6月, 2015 3 次提交

jbd2: get rid of open coded allocation retry loop · 7b506b10

由 Michal Hocko 提交于 6月 15, 2015

insert_revoke_hash does an open coded endless allocation loop if
journal_oom_retry is true. It doesn't implement any allocation fallback
strategy between the retries, though. The memory allocator doesn't know
about the never fail requirement so it cannot potentially help to move
on with the allocation (e.g. use memory reserves).

Get rid of the retry loop and use __GFP_NOFAIL instead. We will lose the
debugging message but I am not sure it is anyhow helpful.

Do the same for journal_alloc_journal_head which is doing a similar
thing.
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

7b506b10

ext4: improve warning directory handling messages · b03a2f7e

由 Andreas Dilger 提交于 6月 15, 2015

Several ext4_warning() messages in the directory handling code do not
report the inode number of the (potentially corrupt) directory where a
problem is seen, and others report this in an ad-hoc manner.  Add an
ext4_warning_inode() helper to print the inode number and command name
consistent with ext4_error_inode().

Consolidate the place in ext4.h that these macros are defined.

Clean up some other directory error and warning messages to print the
calling function name.

Minor code style fixes in nearby lines.
Signed-off-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b03a2f7e

jbd2: fix ocfs2 corrupt when updating journal superblock fails · 6f6a6fda

由 Joseph Qi 提交于 6月 15, 2015

If updating journal superblock fails after journal data has been
flushed, the error is omitted and this will mislead the caller as a
normal case.  In ocfs2, the checkpoint will be treated successfully
and the other node can get the lock to update. Since the sb_start is
still pointing to the old log block, it will rewrite the journal data
during journal recovery by the other node. Thus the new updates will
be overwritten and ocfs2 corrupts.  So in above case we have to return
the error, and ocfs2_commit_cache will take care of the error and
prevent the other node to do update first.  And only after recovering
journal it can do the new updates.

The issue discussion mail can be found at:
https://oss.oracle.com/pipermail/ocfs2-devel/2015-June/010856.html
http://comments.gmane.org/gmane.comp.file-systems.ext4/48841

[ Fixed bug in patch which allowed a non-negative error return from
  jbd2_cleanup_journal_tail() to leak out of jbd2_fjournal_flush(); this
  was causing xfstests ext4/306 to fail. -- Ted ]
Reported-by: NYiwen Jiang <jiangyiwen@huawei.com>
Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Tested-by: NYiwen Jiang <jiangyiwen@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: stable@vger.kernel.org

6f6a6fda

15 6月, 2015 4 次提交

ext4: mballoc: avoid 20-argument function call · 97b4af2f

由 Rasmus Villemoes 提交于 6月 15, 2015

Making a function call with 20 arguments is rather expensive in both
stack and .text. In this case, doing the formatting manually doesn't
make it any less readable, so we might as well save 155 bytes of .text
and 112 bytes of stack.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>

97b4af2f

ext4: wait for existing dio workers in ext4_alloc_file_blocks() · 0d306dcf

由 Lukas Czerner 提交于 6月 15, 2015

Currently existing dio workers can jump in and potentially increase
extent tree depth while we're allocating blocks in
ext4_alloc_file_blocks().  This may cause us to underestimate the
number of credits needed for the transaction because the extent tree
depth can change after our estimation.

Fix this by waiting for all the existing dio workers in the same way
as we do it in ext4_punch_hole.  We've seen errors caused by this in
xfstest generic/299, however it's really hard to reproduce.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

0d306dcf

ext4: recalculate journal credits as inode depth changes · 4134f5c8

由 Lukas Czerner 提交于 6月 15, 2015

Currently in ext4_alloc_file_blocks() the number of credits is
calculated only once before we enter the allocation loop. However within
the allocation loop the extent tree depth can change, hence the number
of credits needed can increase potentially exceeding the number of credits
reserved in the handle which can cause journal failures.

Fix this by recalculating number of credits when the inode depth
changes. Note that even though ext4_alloc_file_blocks() is only
currently used by extent base inodes we will avoid recalculating number
of credits unnecessarily in the case of indirect based inodes.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

4134f5c8

jbd2: use GFP_NOFS in jbd2_cleanup_journal_tail() · b4f1afcd

由 Dmitry Monakhov 提交于 6月 15, 2015

jbd2_cleanup_journal_tail() can be invoked by jbd2__journal_start()
So allocations should be done with GFP_NOFS

[Full stack trace snipped from 3.10-rh7]
[<ffffffff815c4bd4>] dump_stack+0x19/0x1b
[<ffffffff8105dba1>] warn_slowpath_common+0x61/0x80
[<ffffffff8105dcca>] warn_slowpath_null+0x1a/0x20
[<ffffffff815c2142>] slab_pre_alloc_hook.isra.31.part.32+0x15/0x17
[<ffffffff8119c045>] kmem_cache_alloc+0x55/0x210
[<ffffffff811477f5>] ? mempool_alloc_slab+0x15/0x20
[<ffffffff811477f5>] mempool_alloc_slab+0x15/0x20
[<ffffffff81147939>] mempool_alloc+0x69/0x170
[<ffffffff815cb69e>] ? _raw_spin_unlock_irq+0xe/0x20
[<ffffffff8109160d>] ? finish_task_switch+0x5d/0x150
[<ffffffff811f1a8e>] bio_alloc_bioset+0x1be/0x2e0
[<ffffffff8127ee49>] blkdev_issue_flush+0x99/0x120
[<ffffffffa019a733>] jbd2_cleanup_journal_tail+0x93/0xa0 [jbd2] -->GFP_KERNEL
[<ffffffffa019aca1>] jbd2_log_do_checkpoint+0x221/0x4a0 [jbd2]
[<ffffffffa019afc7>] __jbd2_log_wait_for_space+0xa7/0x1e0 [jbd2]
[<ffffffffa01952d8>] start_this_handle+0x2d8/0x550 [jbd2]
[<ffffffff811b02a9>] ? __memcg_kmem_put_cache+0x29/0x30
[<ffffffff8119c120>] ? kmem_cache_alloc+0x130/0x210
[<ffffffffa019573a>] jbd2__journal_start+0xba/0x190 [jbd2]
[<ffffffff811532ce>] ? lru_cache_add+0xe/0x10
[<ffffffffa01c9549>] ? ext4_da_write_begin+0xf9/0x330 [ext4]
[<ffffffffa01f2c77>] __ext4_journal_start_sb+0x77/0x160 [ext4]
[<ffffffffa01c9549>] ext4_da_write_begin+0xf9/0x330 [ext4]
[<ffffffff811446ec>] generic_file_buffered_write_iter+0x10c/0x270
[<ffffffff81146918>] __generic_file_write_iter+0x178/0x390
[<ffffffff81146c6b>] __generic_file_aio_write+0x8b/0xb0
[<ffffffff81146ced>] generic_file_aio_write+0x5d/0xc0
[<ffffffffa01bf289>] ext4_file_write+0xa9/0x450 [ext4]
[<ffffffff811c31d9>] ? pipe_read+0x379/0x4f0
[<ffffffff811b93f0>] do_sync_write+0x90/0xe0
[<ffffffff811b9b6d>] vfs_write+0xbd/0x1e0
[<ffffffff811ba5b8>] SyS_write+0x58/0xb0
[<ffffffff815d4799>] system_call_fastpath+0x16/0x1b
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

b4f1afcd

13 6月, 2015 4 次提交

ext4: use swap() in mext_page_double_lock() · bf865467

由 Fabian Frederick 提交于 6月 12, 2015

Use kernel.h macro definition.

Thanks to Julia Lawall for Coccinelle scripting support.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

bf865467

ext4: use swap() in memswap() · 4b7e2db5

由 Fabian Frederick 提交于 6月 12, 2015

Use kernel.h macro definition.

Thanks to Julia Lawall for Coccinelle scripting support.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

4b7e2db5

ext4: fix race between truncate and __ext4_journalled_writepage() · bdf96838

由 Theodore Ts'o 提交于 6月 12, 2015

The commit cf108bca: "ext4: Invert the locking order of page_lock
and transaction start" caused __ext4_journalled_writepage() to drop
the page lock before the page was written back, as part of changing
the locking order to jbd2_journal_start -> page_lock.  However, this
introduced a potential race if there was a truncate racing with the
data=journalled writeback mode.

Fix this by grabbing the page lock after starting the journal handle,
and then checking to see if page had gotten truncated out from under
us.

This fixes a number of different warnings or BUG_ON's when running
xfstests generic/086 in data=journalled mode, including:

jbd2_journal_dirty_metadata: vdc-8: bad jh for block 115643: transaction (ee3fe7
c0, 164), jh->b_transaction (  (null), 0), jh->b_next_transaction (  (null), 0), jlist 0

	      	      	  - and -

kernel BUG at /usr/projects/linux/ext4/fs/jbd2/transaction.c:2200!
    ...
Call Trace:
 [<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117
 [<c02b2de5>] __ext4_journalled_invalidatepage+0x10f/0x117
 [<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117
 [<c027d883>] ? lock_buffer+0x36/0x36
 [<c02b2dfa>] ext4_journalled_invalidatepage+0xd/0x22
 [<c0229139>] do_invalidatepage+0x22/0x26
 [<c0229198>] truncate_inode_page+0x5b/0x85
 [<c022934b>] truncate_inode_pages_range+0x156/0x38c
 [<c0229592>] truncate_inode_pages+0x11/0x15
 [<c022962d>] truncate_pagecache+0x55/0x71
 [<c02b913b>] ext4_setattr+0x4a9/0x560
 [<c01ca542>] ? current_kernel_time+0x10/0x44
 [<c026c4d8>] notify_change+0x1c7/0x2be
 [<c0256a00>] do_truncate+0x65/0x85
 [<c0226f31>] ? file_ra_state_init+0x12/0x29

	      	      	  - and -

WARNING: CPU: 1 PID: 1331 at /usr/projects/linux/ext4/fs/jbd2/transaction.c:1396
irty_metadata+0x14a/0x1ae()
    ...
Call Trace:
 [<c01b879f>] ? console_unlock+0x3a1/0x3ce
 [<c082cbb4>] dump_stack+0x48/0x60
 [<c0178b65>] warn_slowpath_common+0x89/0xa0
 [<c02ef2cf>] ? jbd2_journal_dirty_metadata+0x14a/0x1ae
 [<c0178bef>] warn_slowpath_null+0x14/0x18
 [<c02ef2cf>] jbd2_journal_dirty_metadata+0x14a/0x1ae
 [<c02d8615>] __ext4_handle_dirty_metadata+0xd4/0x19d
 [<c02b2f44>] write_end_fn+0x40/0x53
 [<c02b4a16>] ext4_walk_page_buffers+0x4e/0x6a
 [<c02b59e7>] ext4_writepage+0x354/0x3b8
 [<c02b2f04>] ? mpage_release_unused_pages+0xd4/0xd4
 [<c02b1b21>] ? wait_on_buffer+0x2c/0x2c
 [<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8
 [<c02b5a5b>] __writepage+0x10/0x2e
 [<c0225956>] write_cache_pages+0x22d/0x32c
 [<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8
 [<c02b6ee8>] ext4_writepages+0x102/0x607
 [<c019adfe>] ? sched_clock_local+0x10/0x10e
 [<c01a8a7c>] ? __lock_is_held+0x2e/0x44
 [<c01a8ad5>] ? lock_is_held+0x43/0x51
 [<c0226dff>] do_writepages+0x1c/0x29
 [<c0276bed>] __writeback_single_inode+0xc3/0x545
 [<c0277c07>] writeback_sb_inodes+0x21f/0x36d
    ...
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

bdf96838

ext4 crypto: fail the mount if blocksize != pagesize · 1cb767cd

由 Theodore Ts'o 提交于 6月 12, 2015

We currently don't correctly handle the case where blocksize !=
pagesize, so disallow the mount in those cases.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

1cb767cd

09 6月, 2015 6 次提交

ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate · 331573fe

由 Namjae Jeon 提交于 6月 09, 2015

This patch implements fallocate's FALLOC_FL_INSERT_RANGE for Ext4.

1) Make sure that both offset and len are block size aligned.
2) Update the i_size of inode by len bytes.
3) Compute the file's logical block number against offset. If the computed
   block number is not the starting block of the extent, split the extent
   such that the block number is the starting block of the extent.
4) Shift all the extents which are lying between [offset, last allocated extent]
   towards right by len bytes. This step will make a hole of len bytes
   at offset.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>

331573fe

jbd2: speedup jbd2_journal_get_[write|undo]_access() · de92c8ca

由 Jan Kara 提交于 6月 08, 2015

jbd2_journal_get_write_access() and jbd2_journal_get_create_access() are
frequently called for buffers that are already part of the running
transaction - most frequently it is the case for bitmaps, inode table
blocks, and superblock. Since in such cases we have nothing to do, it is
unfortunate we still grab reference to journal head, lock the bh, lock
bh_state only to find out there's nothing to do.

Improving this is a bit subtle though since until we find out journal
head is attached to the running transaction, it can disappear from under
us because checkpointing / commit decided it's no longer needed. We deal
with this by protecting journal_head slab with RCU. We still have to be
careful about journal head being freed & reallocated within slab and
about exposing journal head in consistent state (in particular
b_modified and b_frozen_data must be in correct state before we allow
user to touch the buffer).
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

de92c8ca

jbd2: more simplifications in do_get_write_access() · 8b00f400

由 Jan Kara 提交于 6月 08, 2015

Check for the simple case of unjournaled buffer first, handle it and
bail out. This allows us to remove one if and unindent the difficult case
by one tab. The result is easier to read.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

8b00f400

jbd2: simplify error path on allocation failure in do_get_write_access() · d012aa59

由 Jan Kara 提交于 6月 08, 2015

We were acquiring bh_state_lock when allocation of buffer failed in
do_get_write_access() only to be able to jump to a label that releases
the lock and does all other checks that don't make sense for this error
path. Just jump into the right label instead.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d012aa59

jbd2: simplify code flow in do_get_write_access() · ee57aba1

由 Jan Kara 提交于 6月 08, 2015

needs_copy is set only in one place in do_get_write_access(), just move
the frozen buffer copying into that place and factor it out to a
separate function to make do_get_write_access() slightly more readable.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ee57aba1

ext4 crypto: fix sparse warnings in fs/ext4/ioctl.c · b4ab9e29

由 Fabian Frederick 提交于 6月 08, 2015

[ Added another sparse fix for EXT4_IOC_GET_ENCRYPTION_POLICY while
  we're at it. --tytso ]
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b4ab9e29

08 6月, 2015 6 次提交

ext4: BUG_ON assertion repeated for inode1, not done for inode2 · 8bc3b1e6

由 David Moore 提交于 6月 08, 2015

During a source code review of fs/ext4/extents.c I noted identical
consecutive lines. An assertion is repeated for inode1 and never done
for inode2. This is not in keeping with the rest of the code in the
ext4_swap_extents function and appears to be a bug.

Assert that the inode2 mutex is not locked.
Signed-off-by: NDavid Moore <dmoorefo@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>

8bc3b1e6

T
ext4 crypto: fix ext4_get_crypto_ctx()'s calling convention in ext4_decrypt_one · ad0a0ce8
由 Theodore Ts'o 提交于 6月 08, 2015
```
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
ad0a0ce8

ext4: return error code from ext4_mb_good_group() · 42ac1848

由 Lukas Czerner 提交于 6月 08, 2015

Currently ext4_mb_good_group() only returns 0 or 1 depending on whether
the allocation group is suitable for use or not. However we might get
various errors and fail while initializing new group including -EIO
which would never get propagated up the call chain. This might lead to
an endless loop at writeback when we're trying to find a good group to
allocate from and we fail to initialize new group (read error for
example).

Fix this by returning proper error code from ext4_mb_good_group() and
using it in ext4_mb_regular_allocator(). In ext4_mb_regular_allocator()
we will always return only the first occurred error from
ext4_mb_good_group() and we only propagate it back to the caller if we
do not get any other errors and we fail to allocate any blocks.

Note that with other modes than errors=continue, we will fail
immediately in ext4_mb_good_group() in case of error, however with
errors=continue we should try to continue using the file system, that's
why we're not going to fail immediately when we see an error from
ext4_mb_good_group(), but rather when we fail to find a suitable block
group to allocate from due to an problem in group initialization.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>

42ac1848

ext4: try to initialize all groups we can in case of failure on ppc64 · bbdc322f

由 Lukas Czerner 提交于 6月 08, 2015

Currently on the machines with page size > block size when initializing
block group buddy cache we initialize it for all the block group bitmaps
in the page. However in the case of read error, checksum error, or if
a single bitmap is in any way corrupted we would fail to initialize all
of the bitmaps. This is problematic because we will not have access to
the other allocation groups even though those might be perfectly fine
and usable.

Fix this by reading all the bitmaps instead of error out on the first
problem and simply skip the bitmaps which were either not read properly,
or are not valid.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

bbdc322f

ext4: verify block bitmap even after fresh initialization · 41e5b7ed

由 Lukas Czerner 提交于 6月 08, 2015

If we want to rely on the buffer_verified() flag of the block bitmap
buffer, we have to set it consistently. However currently if we're
initializing uninitialized block bitmap in
ext4_read_block_bitmap_nowait() we're not going to set buffer verified
at all.

We can do this by simply setting the flag on the buffer, but I think
it's actually better to run ext4_validate_block_bitmap() to make sure
that what we did in the ext4_init_block_bitmap() is right.

So run ext4_validate_block_bitmap() even after the block bitmap
initialization. Also bail out early from ext4_validate_block_bitmap() if
we see corrupt bitmap, since we already know it's corrupt and we do not
need to verify that.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

41e5b7ed

jbd2: revert must-not-fail allocation loops back to GFP_NOFAIL · 6ccaf3e2

由 Michal Hocko 提交于 6月 08, 2015

This basically reverts 47def826 (jbd2: Remove __GFP_NOFAIL from jbd2
layer). The deprecation of __GFP_NOFAIL was a bad choice because it led
to open coding the endless loop around the allocator rather than
removing the dependency on the non failing allocation. So the
deprecation was a clear failure and the reality tells us that
__GFP_NOFAIL is not even close to go away.

It is still true that __GFP_NOFAIL allocations are generally discouraged
and new uses should be evaluated and an alternative (pre-allocations or
reservations) should be considered but it doesn't make any sense to lie
the allocator about the requirements. Allocator can take steps to help
making a progress if it knows the requirements.
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Acked-by: NDavid Rientjes <rientjes@google.com>

6ccaf3e2

03 6月, 2015 1 次提交

ext4 crypto: allocate bounce pages using GFP_NOWAIT · 3dbb5eb9

由 Theodore Ts'o 提交于 6月 03, 2015

Previously we allocated bounce pages using a combination of
alloc_page() and mempool_alloc() with the __GFP_WAIT bit set.
Instead, use mempool_alloc() with GFP_NOWAIT.  The mempool_alloc()
function will try using alloc_pages() initially, and then only use the
mempool reserve of pages if alloc_pages() is unable to fulfill the
request.

This minimizes the the impact on the mm layer when we need to do a
large amount of writeback of encrypted files, as Jaeguk Kim had
reported that under a heavy fio workload on a system with restricted
amounts memory (which unfortunately, includes many mobile handsets),
he had observed the the OOM killer getting triggered several times.
Using GFP_NOWAIT

If the mempool_alloc() function fails, we will retry the page
writeback at a later time; the function of the mempool is to ensure
that we can writeback at least 32 pages at a time, so we can more
efficiently dispatch I/O under high memory pressure situations.  In
the future we should make this be a tunable so we can determine the
best tradeoff between permanently sequestering memory and the ability
to quickly launder pages so we can free up memory quickly when
necessary.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

3dbb5eb9

01 6月, 2015 11 次提交

ext4 crypto: release crypto resource on module exit · e298e73b

由 Chao Yu 提交于 5月 31, 2015

Crypto resource should be released when ext4 module exits, otherwise
it will cause memory leak.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

e298e73b

ext4 crypto: handle unexpected lack of encryption keys · abdd438b

由 Theodore Ts'o 提交于 5月 31, 2015

Fix up attempts by users to try to write to a file when they don't
have access to the encryption key.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

abdd438b

ext4 crypto: allocate the right amount of memory for the on-disk symlink · 4d3c4e5b

由 Theodore Ts'o 提交于 5月 31, 2015

Previously we were taking the required padding when allocating space
for the on-disk symlink.  This caused a buffer overrun which could
trigger a krenel crash when running fsstress.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

4d3c4e5b

ext4 crypto: clean up error handling in ext4_fname_setup_filename · 82d0d3e7

由 Theodore Ts'o 提交于 5月 31, 2015

Fix a potential memory leak where fname->crypto_buf.name wouldn't get
freed in some error paths, and also make the error handling easier to
understand/audit.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

82d0d3e7

ext4 crypto: policies may only be set on directories · d87f6d78

由 Theodore Ts'o 提交于 5月 31, 2015

Thanks to Chao Yu <chao2.yu@samsung.com> for pointing out we were
missing this check.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d87f6d78

ext4 crypto: enforce crypto policy restrictions on cross-renames · c2faccaf

由 Theodore Ts'o 提交于 5月 31, 2015

Thanks to Chao Yu <chao2.yu@samsung.com> for pointing out the need for
this check.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

c2faccaf

ext4 crypto: encrypt tmpfile located in encryption protected directory · e709e9df

由 Theodore Ts'o 提交于 5月 31, 2015

Factor out calls to ext4_inherit_context() and move them to
__ext4_new_inode(); this fixes a problem where ext4_tmpfile() wasn't
calling calling ext4_inherit_context(), so the temporary file wasn't
getting protected.  Since the blocks for the tmpfile could end up on
disk, they really should be protected if the tmpfile is created within
the context of an encrypted directory.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

e709e9df

T
ext4 crypto: make sure the encryption info is initialized on opendir(2) · 6bc445e0
由 Theodore Ts'o 提交于 5月 31, 2015
```
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
6bc445e0

ext4 crypto: set up encryption info for new inodes in ext4_inherit_context() · 55557029

由 Theodore Ts'o 提交于 5月 31, 2015

Set up the encryption information for newly created inodes immediately
after they inherit their encryption context from their parent
directories.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

55557029

ext4 crypto: fix memory leaks in ext4_encrypted_zeroout · 95ea68b4

由 Theodore Ts'o 提交于 5月 31, 2015

ext4_encrypted_zeroout() could end up leaking a bio and bounce page.
Fortunately it's not used much.  While we're fixing things up,
refactor out common code into the static function alloc_bounce_page()
and fix up error handling if mempool_alloc() fails.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

95ea68b4

ext4 crypto: use per-inode tfm structure · c936e1ec

由 Theodore Ts'o 提交于 5月 31, 2015

As suggested by Herbert Xu, we shouldn't allocate a new tfm each time
we read or write a page.  Instead we can use a single tfm hanging off
the inode's crypt_info structure for all of our encryption needs for
that inode, since the tfm can be used by multiple crypto requests in
parallel.

Also use cmpxchg() to avoid races that could result in crypt_info
structure getting doubly allocated or doubly freed.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

c936e1ec