提交 · ab74c7b23f3770935016e3eb3ecdf1e42b73efaa · openeuler / Kernel

08 8月, 2020 3 次提交

ext4: indicate via a block bitmap read is prefetched via a tracepoint · ab74c7b2

由 Theodore Ts'o 提交于 7月 15, 2020

Modify the ext4_read_block_bitmap_load tracepoint so that it tells us
whether a block bitmap is being prefetched.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NArtem Blagodarenko <artem.blagodarenko@gmail.com>

ab74c7b2

jbd2: remove unused parameter in jbd2_journal_try_to_free_buffers() · 529a781e

由 zhangyi (F) 提交于 6月 20, 2020

Parameter gfp_mask in jbd2_journal_try_to_free_buffers() is no longer
used after commit <536fc240> ("jbd2: clean up
jbd2_journal_try_to_free_buffers()"), so just remove it.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20200620025427.1756360-6-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

529a781e

ext4: abort the filesystem if failed to async write metadata buffer · bc71726c

由 zhangyi (F) 提交于 6月 20, 2020

There is a risk of filesystem inconsistency if we failed to async write
back metadata buffer in the background. Because of current buffer's end
io procedure is handled by end_buffer_async_write() in the block layer,
and it only clear the buffer's uptodate flag and mark the write_io_error
flag, so ext4 cannot detect such failure immediately. In most cases of
getting metadata buffer (e.g. ext4_read_inode_bitmap()), although the
buffer's data is actually uptodate, it may still read data from disk
because the buffer's uptodate flag has been cleared. Finally, it may
lead to on-disk filesystem inconsistency if reading old data from the
disk successfully and write them out again.

This patch detect bdev mapping->wb_err when getting journal's write
access and mark the filesystem error if bdev's mapping->wb_err was
increased, this could prevent further writing and potential
inconsistency.
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Suggested-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200620025427.1756360-2-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

bc71726c

06 8月, 2020 14 次提交

ext4: skip non-loaded groups at cr=0/1 when scanning for good groups · c1d2c7d4

由 Alex Zhuravlev 提交于 6月 19, 2020

cr=0 is supposed to be an optimization to save CPU cycles, but if
buddy data (in memory) is not initialized then all this makes no sense
as we have to do sync IO taking a lot of cycles.  Also, at cr=0
mballoc doesn't choose any available chunk.  cr=1 also skips groups
using heuristic based on avg. fragment size.  It's more useful to skip
such groups and switch to cr=2 where groups will be scanned for
available chunks.  However, we always read the first block group in a
flex_bg so metadata blocks will get read into the first flex_bg if
possible.

Using sparse image and dm-slow virtual device of 120TB was
simulated, then the image was formatted and filled using debugfs to
mark ~85% of available space as busy.  mount process w/o the patch
couldn't complete in half an hour (according to vmstat it would take
~10-11 hours).  With the patch applied mount took ~20 seconds.

Lustre-bug-id: https://jira.whamcloud.com/browse/LU-12988Signed-off-by: NAlex Zhuravlev <azhuravlev@whamcloud.com>
Reviewed-by: NAndreas Dilger <adilger@whamcloud.com>
Reviewed-by: NArtem Blagodarenko <artem.blagodarenko@gmail.com>

c1d2c7d4

ext4: add prefetching for block allocation bitmaps · cfd73237

由 Alex Zhuravlev 提交于 4月 21, 2020

This should significantly improve bitmap loading, especially for flex
groups as it tries to load all bitmaps within a flex.group instead of
one by one synchronously.

Prefetching is done in 8 * flex_bg groups, so it should be 8
read-ahead reads for a single allocating thread. At the end of
allocation the thread waits for read-ahead completion and initializes
buddy information so that read-aheads are not lost in case of memory
pressure.

At cr=0 the number of prefetching IOs is limited per allocation
context to prevent a situation when mballoc loads thousands of bitmaps
looking for a perfect group and ignoring groups with good chunks.

Together with the patch "ext4: limit scanning of uninitialized groups"
the mount time (which includes few tiny allocations) of a 1PB
filesystem is reduced significantly:

               0% full    50%-full unpatched    patched
  mount time       33s                9279s       563s

[ Restructured by tytso; removed the state flags in the allocation
  context, so it can be used to lazily prefetch the allocation bitmaps
  immediately after the file system is mounted.  Skip prefetching
  block groups which are uninitialized.  Finally pass in the
  REQ_RAHEAD flag to the block layer while prefetching. ]
Signed-off-by: NAlex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: NAndreas Dilger <adilger@whamcloud.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

cfd73237

ext4: handle read only external journal device · 273108fa

由 Lukas Czerner 提交于 7月 17, 2020

Ext4 uses blkdev_get_by_dev() to get the block_device for journal device
which does check to see if the read-only block device was opened
read-only.

As a result ext4 will hapily proceed mounting the file system with
external journal on read-only device. This is bad as we would not be
able to use the journal leading to errors later on.

Instead of simply failing to mount file system in this case, treat it in
a similar way we treat internal journal on read-only device. Allow to
mount with -o noload in read-only mode.

This can be reproduced easily like this:

mke2fs -F -O journal_dev $JOURNAL_DEV 100M
mkfs.$FSTYPE -F -J device=$JOURNAL_DEV $FS_DEV
blockdev --setro $JOURNAL_DEV
mount $FS_DEV $MNT
touch $MNT/file
umount $MNT

leading to error like this

[ 1307.318713] ------------[ cut here ]------------
[ 1307.323362] generic_make_request: Trying to write to read-only block-device dm-2 (partno 0)
[ 1307.331741] WARNING: CPU: 36 PID: 3224 at block/blk-core.c:855 generic_make_request_checks+0x2c3/0x580
[ 1307.341041] Modules linked in: ext4 mbcache jbd2 rfkill intel_rapl_msr intel_rapl_common isst_if_commd
[ 1307.419445] CPU: 36 PID: 3224 Comm: jbd2/dm-2 Tainted: G        W I       5.8.0-rc5 #2
[ 1307.427359] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 2.3.10 08/15/2019
[ 1307.434932] RIP: 0010:generic_make_request_checks+0x2c3/0x580
[ 1307.440676] Code: 94 03 00 00 48 89 df 48 8d 74 24 08 c6 05 cf 2b 18 01 01 e8 7f a4 ff ff 48 c7 c7 50e
[ 1307.459420] RSP: 0018:ffffc0d70eb5fb48 EFLAGS: 00010286
[ 1307.464646] RAX: 0000000000000000 RBX: ffff9b33b2978300 RCX: 0000000000000000
[ 1307.471780] RDX: ffff9b33e12a81e0 RSI: ffff9b33e1298000 RDI: ffff9b33e1298000
[ 1307.478913] RBP: ffff9b7b9679e0c0 R08: 0000000000000837 R09: 0000000000000024
[ 1307.486044] R10: 0000000000000000 R11: ffffc0d70eb5f9f0 R12: 0000000000000400
[ 1307.493177] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
[ 1307.500308] FS:  0000000000000000(0000) GS:ffff9b33e1280000(0000) knlGS:0000000000000000
[ 1307.508396] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1307.514142] CR2: 000055eaf4109000 CR3: 0000003dee40a006 CR4: 00000000007606e0
[ 1307.521273] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1307.528407] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1307.535538] PKRU: 55555554
[ 1307.538250] Call Trace:
[ 1307.540708]  generic_make_request+0x30/0x340
[ 1307.544985]  submit_bio+0x43/0x190
[ 1307.548393]  ? bio_add_page+0x62/0x90
[ 1307.552068]  submit_bh_wbc+0x16a/0x190
[ 1307.555833]  jbd2_write_superblock+0xec/0x200 [jbd2]
[ 1307.560803]  jbd2_journal_update_sb_log_tail+0x65/0xc0 [jbd2]
[ 1307.566557]  jbd2_journal_commit_transaction+0x2ae/0x1860 [jbd2]
[ 1307.572566]  ? check_preempt_curr+0x7a/0x90
[ 1307.576756]  ? update_curr+0xe1/0x1d0
[ 1307.580421]  ? account_entity_dequeue+0x7b/0xb0
[ 1307.584955]  ? newidle_balance+0x231/0x3d0
[ 1307.589056]  ? __switch_to_asm+0x42/0x70
[ 1307.592986]  ? __switch_to_asm+0x36/0x70
[ 1307.596918]  ? lock_timer_base+0x67/0x80
[ 1307.600851]  kjournald2+0xbd/0x270 [jbd2]
[ 1307.604873]  ? finish_wait+0x80/0x80
[ 1307.608460]  ? commit_timeout+0x10/0x10 [jbd2]
[ 1307.612915]  kthread+0x114/0x130
[ 1307.616152]  ? kthread_park+0x80/0x80
[ 1307.619816]  ret_from_fork+0x22/0x30
[ 1307.623400] ---[ end trace 27490236265b1630 ]---
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20200717090605.2612-1-lczerner@redhat.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

273108fa

ext4: fix spelling typos in ext4_mb_initialize_context · 3cb77bd2

由 brookxu 提交于 7月 15, 2020

Fix spelling typos in ext4_mb_initialize_context.
Signed-off-by: NChunguang Xu <brookxu@tencent.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/883b523c-58ec-7f38-0bb8-cd2ea4393684@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

3cb77bd2

ext4: use generic names for generic ioctls · cb29a02d

由 Eric Biggers 提交于 7月 14, 2020

Don't define EXT4_IOC_* aliases to ioctls that already have a generic
FS_IOC_* name. These aliases are unnecessary, and they make it unclear
which ioctls are ext4-specific and which are generic.

Exception: leave EXT4_IOC_GETVERSION_OLD and EXT4_IOC_SETVERSION_OLD
as-is for now, since renaming them to FS_IOC_GETVERSION and
FS_IOC_SETVERSION would probably make them more likely to be confused
with EXT4_IOC_GETVERSION and EXT4_IOC_SETVERSION which also exist.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200714230909.56349-1-ebiggers@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

cb29a02d

ext4: don't hardcode bit values in EXT4_FL_USER_* · 2a12e147

由 Eric Biggers 提交于 7月 12, 2020

Define the EXT4_FL_USER_* constants by OR-ing together the appropriate
flags, rather than hard-coding a numeric value.  This makes it much
easier to see which flags are listed.

No change in the actual values.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200713031012.192440-1-ebiggers@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

2a12e147

ext4: don't BUG on inconsistent journal feature · 11215630

由 Jan Kara 提交于 7月 10, 2020

A customer has reported a BUG_ON in ext4_clear_journal_err() hitting
during an LTP testing. Either this has been caused by a test setup
issue where the filesystem was being overwritten while LTP was mounting
it or the journal replay has overwritten the superblock with invalid
data. In either case it is preferable we don't take the machine down
with a BUG_ON. So handle the situation of unexpectedly missing
has_journal feature more gracefully. We issue warning and fail the mount
in the cases where the race window is narrow and the failed check is
most likely a programming error. In cases where fs corruption is more
likely, we do full ext4_error() handling before failing mount / remount.
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200710140759.18031-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

11215630

ext4: do not block RWF_NOWAIT dio write on unallocated space · 0b3171b6

由 Jan Kara 提交于 7月 08, 2020

Since commit 378f32ba ("ext4: introduce direct I/O write using iomap
infrastructure") we don't properly bail out of RWF_NOWAIT direct IO
write if underlying blocks are not allocated. Also
ext4_dio_write_checks() does not honor RWF_NOWAIT when re-acquiring
i_rwsem. Fix both issues.

Fixes: 378f32ba ("ext4: introduce direct I/O write using iomap infrastructure")
Cc: stable@kernel.org
Reported-by: NFilipe Manana <fdmanana@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200708153516.9507-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

0b3171b6

ext4: replace HTTP links with HTTPS ones · e65bf6e4

由 Alexander A. Klimov 提交于 7月 06, 2020

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
          If both the HTTP and HTTPS versions
          return 200 OK and serve the same content:
            Replace HTTP with HTTPS.
Signed-off-by: NAlexander A. Klimov <grandmaster@al2klimov.de>
Link: https://lore.kernel.org/r/20200706190339.20709-1-grandmaster@al2klimov.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

e65bf6e4

ext4: lost matching-pair of trace in ext4_unlink · e5f78159

由 Yi Zhuang 提交于 6月 29, 2020

If dquot_initialize() return non-zero and trace of ext4_unlink_enter/exit
enabled then the matching-pair of trace_exit will lost in log.
Signed-off-by: NYi Zhuang <zhuangyi1@huawei.com>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200629122621.129953-1-zhuangyi1@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

e5f78159

ext4: lost matching-pair of trace in ext4_truncate · 9a5d265f

由 zhengliang 提交于 7月 01, 2020

It should call trace exit in all return path for ext4_truncate.
Signed-off-by: Nzhengliang <zhengliang6@huawei.com>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200701083027.45996-1-zhengliang6@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

9a5d265f

ext4: fix potential negative array index in do_split() · 5872331b

由 Eric Sandeen 提交于 6月 17, 2020

If for any reason a directory passed to do_split() does not have enough
active entries to exceed half the size of the block, we can end up
iterating over all "count" entries without finding a split point.

In this case, count == move, and split will be zero, and we will
attempt a negative index into map[].

Guard against this by detecting this case, and falling back to
split-to-half-of-count instead; in this case we will still have
plenty of space (> half blocksize) in each split block.

Fixes: ef2b02d3 ("ext34: ensure do_split leaves enough free space in both blocks")
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/f53e246b-647c-64bb-16ec-135383c70ad7@redhat.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

5872331b

ext4: fix coding style in file.c · e030a288

由 Dio Putra 提交于 6月 14, 2020

Fixed a few coding style issues in file.c
Signed-off-by: NDio Putra <dioput12@gmail.com>
Link: https://lore.kernel.org/r/239fcd8f-d33f-8621-9e82-0416dd3f9c94@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

e030a288

ext4: delete unnecessary checks before brelse() · e0f49d27

由 Markus Elfring 提交于 6月 13, 2020

The brelse() function tests whether its argument is NULL
and then returns immediately.
Thus remove the tests which are not needed around the shown calls.

This issue was detected by using the Coccinelle software.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/0d713702-072f-a89c-20ec-ca70aa83a432@web.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

e0f49d27

30 7月, 2020 1 次提交

ext4: fix spelling mistakes in extents.c · e4d7f2d3

由 Keyur Patel 提交于 6月 10, 2020

Fix spelling issues over the comments in the code.

requsted ==> requested
deterimined ==> determined
insde ==> inside
neet ==> need
somthing ==> something
Signed-off-by: NKeyur Patel <iamkeyur96@gmail.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200611031947.165079-1-iamkeyur96@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

e4d7f2d3

13 6月, 2020 2 次提交

ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error handlers · 7b97d868

由 zhangyi (F) 提交于 6月 09, 2020

In the ext4 filesystem with errors=panic, if one process is recording
errno in the superblock when invoking jbd2_journal_abort() due to some
error cases, it could be raced by another __ext4_abort() which is
setting the SB_RDONLY flag but missing panic because errno has not been
recorded.

jbd2_journal_commit_transaction()
 jbd2_journal_abort()
  journal->j_flags |= JBD2_ABORT;
  jbd2_journal_update_sb_errno()
                                    | ext4_journal_check_start()
                                    |  __ext4_abort()
                                    |   sb->s_flags |= SB_RDONLY;
                                    |   if (!JBD2_REC_ERR)
                                    |        return;
  journal->j_flags |= JBD2_REC_ERR;

Finally, it will no longer trigger panic because the filesystem has
already been set read-only. Fix this by introduce j_abort_mutex to make
sure journal abort is completed before panic, and remove JBD2_REC_ERR
flag.

Fixes: 4327ba52 ("ext4, jbd2: ensure entering into panic after recording an error in superblock")
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200609073540.3810702-1-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

7b97d868

ext4: support xattr gnu.* namespace for the Hurd · 88ee9d57

由 Jan (janneke) Nieuwenhuizen 提交于 5月 25, 2020

The Hurd gained[0] support for moving the translator and author
fields out of the inode and into the "gnu.*" xattr namespace.

In anticipation of that, an xattr INDEX was reserved[1].  The Hurd has
now been brought into compliance[2] with that.

This patch adds support for reading and writing such attributes from
Linux; you can now do something like

    mkdir -p hurd-root/servers/socket
    touch hurd-root/servers/socket/1
    setfattr --name=gnu.translator --value='"/hurd/pflocal\0"' \
        hurd-root/servers/socket/1
    getfattr --name=gnu.translator hurd-root/servers/socket/1
    # file: 1
    gnu.translator="/hurd/pflocal"

to setup a pipe translator, which is being used to create[3] a
vm-image for the Hurd from GNU Guix.

[0] https://summerofcode.withgoogle.com/projects/#5869799859027968
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3980bd3b406addb327d858aebd19e229ea340b9a
[2] https://git.savannah.gnu.org/cgit/hurd/hurd.git/commit/?id=a04c7bf83172faa7cb080fbe3b6c04a8415ca645
[3] https://git.savannah.gnu.org/cgit/guix.git/log/?h=wip-hurd-vmSigned-off-by: NJan Nieuwenhuizen <janneke@gnu.org>
Link: https://lore.kernel.org/r/20200525193940.878-1-janneke@gnu.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

88ee9d57

11 6月, 2020 5 次提交

ext4: mballoc: Use this_cpu_read instead of this_cpu_ptr · 81198536

由 Ritesh Harjani 提交于 6月 09, 2020

Simplify reading a seq variable by directly using this_cpu_read API
instead of doing this_cpu_ptr and then dereferencing it.

This also avoid the below kernel BUG: which happens when
CONFIG_DEBUG_PREEMPT is enabled

BUG: using smp_processor_id() in preemptible [00000000] code: syz-fuzzer/6927
caller is ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
CPU: 1 PID: 6927 Comm: syz-fuzzer Not tainted 5.7.0-next-20200602-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x18f/0x20d lib/dump_stack.c:118
 check_preemption_disabled+0x20d/0x220 lib/smp_processor_id.c:48
 ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
 ext4_ext_map_blocks+0x201b/0x33e0 fs/ext4/extents.c:4244
 ext4_map_blocks+0x4cb/0x1640 fs/ext4/inode.c:626
 ext4_getblk+0xad/0x520 fs/ext4/inode.c:833
 ext4_bread+0x7c/0x380 fs/ext4/inode.c:883
 ext4_append+0x153/0x360 fs/ext4/namei.c:67
 ext4_init_new_dir fs/ext4/namei.c:2757 [inline]
 ext4_mkdir+0x5e0/0xdf0 fs/ext4/namei.c:2802
 vfs_mkdir+0x419/0x690 fs/namei.c:3632
 do_mkdirat+0x21e/0x280 fs/namei.c:3655
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 42f56b7a4a7d ("ext4: mballoc: introduce pcpu seqcnt for freeing PA
to improve ENOSPC handling")
Suggested-by: NBorislav Petkov <bp@alien8.de>
Tested-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reported-by: syzbot+82f324bb69744c5f6969@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/534f275016296996f54ecf65168bb3392b6f653d.1591699601.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

81198536

ext4: avoid utf8_strncasecmp() with unstable name · 2ce3ee93

由 Eric Biggers 提交于 6月 01, 2020

If the dentry name passed to ->d_compare() fits in dentry::d_iname, then
it may be concurrently modified by a rename.  This can cause undefined
behavior (possibly out-of-bounds memory accesses or crashes) in
utf8_strncasecmp(), since fs/unicode/ isn't written to handle strings
that may be concurrently modified.

Fix this by first copying the filename to a stack buffer if needed.
This way we get a stable snapshot of the filename.

Fixes: b886ee3e ("ext4: Support case-insensitive file name lookups")
Cc: <stable@vger.kernel.org> # v5.2+
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Daniel Rosenberg <drosen@google.com>
Cc: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20200601200543.59417-1-ebiggers@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

2ce3ee93

ext4: stop overwrite the errcode in ext4_setup_super · 5adaccac

由 yangerkun 提交于 6月 01, 2020

Now the errcode from ext4_commit_super will overwrite EROFS exists in
ext4_setup_super. Actually, no need to call ext4_commit_super since we
will return EROFS. Fix it by goto done directly.

Fixes: c89128a0 ("ext4: handle errors on ext4_commit_super")
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200601073404.3712492-1-yangerkun@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

5adaccac

ext4: fix partial cluster initialization when splitting extent · cfb3c85a

由 Jeffle Xu 提交于 5月 22, 2020

Fix the bug when calculating the physical block number of the first
block in the split extent.

This bug will cause xfstests shared/298 failure on ext4 with bigalloc
enabled occasionally. Ext4 error messages indicate that previously freed
blocks are being freed again, and the following fsck will fail due to
the inconsistency of block bitmap and bg descriptor.

The following is an example case:

1. First, Initialize a ext4 filesystem with cluster size '16K', block size
'4K', in which case, one cluster contains four blocks.

2. Create one file (e.g., xxx.img) on this ext4 filesystem. Now the extent
tree of this file is like:

...
36864:[0]4:220160
36868:[0]14332:145408
51200:[0]2:231424
...

3. Then execute PUNCH_HOLE fallocate on this file. The hole range is
like:

..
ext4_ext_remove_space: dev 254,16 ino 12 since 49506 end 49506 depth 1
ext4_ext_remove_space: dev 254,16 ino 12 since 49544 end 49546 depth 1
ext4_ext_remove_space: dev 254,16 ino 12 since 49605 end 49607 depth 1
...

4. Then the extent tree of this file after punching is like

...
49507:[0]37:158047
49547:[0]58:158087
...

5. Detailed procedure of punching hole [49544, 49546]

5.1. The block address space:
```
lblk        ~49505  49506   49507~49543     49544~49546    49547~
	  ---------+------+-------------+----------------+--------
	    extent | hole |   extent	|	hole	 | extent
	  ---------+------+-------------+----------------+--------
pblk       ~158045  158046  158047~158083  158084~158086   158087~
```

5.2. The detailed layout of cluster 39521:
```
		cluster 39521
	<------------------------------->

		hole		  extent
	<----------------------><--------

lblk      49544   49545   49546   49547
	+-------+-------+-------+-------+
	|	|	|	|	|
	+-------+-------+-------+-------+
pblk     158084  1580845  158086  158087
```

5.3. The ftrace output when punching hole [49544, 49546]:
- ext4_ext_remove_space (start 49544, end 49546)
  - ext4_ext_rm_leaf (start 49544, end 49546, last_extent [49507(158047), 40], partial [pclu 39522 lblk 0 state 2])
    - ext4_remove_blocks (extent [49507(158047), 40], from 49544 to 49546, partial [pclu 39522 lblk 0 state 2]
      - ext4_free_blocks: (block 158084 count 4)
        - ext4_mballoc_free (extent 1/6753/1)

5.4. Ext4 error message in dmesg:
EXT4-fs error (device vdb): mb_free_blocks:1457: group 1, block 158084:freeing already freed block (bit 6753); block bitmap corrupt.
EXT4-fs error (device vdb): ext4_mb_generate_buddy:747: group 1, block bitmap and bg descriptor inconsistent: 19550 vs 19551 free clusters

In this case, the whole cluster 39521 is freed mistakenly when freeing
pblock 158084~158086 (i.e., the first three blocks of this cluster),
although pblock 158087 (the last remaining block of this cluster) has
not been freed yet.

The root cause of this isuue is that, the pclu of the partial cluster is
calculated mistakenly in ext4_ext_remove_space(). The correct
partial_cluster.pclu (i.e., the cluster number of the first block in the
next extent, that is, lblock 49597 (pblock 158086)) should be 39521 rather
than 39522.

Fixes: f4226d9e ("ext4: fix partial cluster initialization")
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NEric Whitney <enwlinux@gmail.com>
Cc: stable@kernel.org # v3.19+
Link: https://lore.kernel.org/r/1590121124-37096-1-git-send-email-jefflexu@linux.alibaba.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

cfb3c85a

ext4: avoid race conditions when remounting with options that change dax · 829b37b8

由 Theodore Ts'o 提交于 6月 10, 2020

Trying to change dax mount options when remounting could allow mount
options to be enabled for a small amount of time, and then the mount
option change would be reverted.

In the case of "mount -o remount,dax", this can cause a race where
files would temporarily treated as DAX --- and then not.

Cc: stable@kernel.org
Reported-by: syzbot+bca9799bf129256190da@syzkaller.appspotmail.com
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

829b37b8

10 6月, 2020 1 次提交

mmap locking API: convert mmap_sem comments · c1e8d7c6

由 Michel Lespinasse 提交于 6月 08, 2020

Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]
Signed-off-by: NMichel Lespinasse <walken@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c1e8d7c6

04 6月, 2020 14 次提交

ext4: avoid unnecessary transaction starts during writeback · 6b8ed620

由 Jan Kara 提交于 5月 25, 2020

ext4_writepages() currently works in a loop like:
  start a transaction
  scan inode for pages to write
  map and submit these pages
  stop the transaction

This loop results in starting transaction once more than is needed
because in the last iteration we start a transaction only to scan the
inode and find there are no pages to write. This can be significant
increase in number of transaction starts for single-extent files or
files that have all blocks already mapped. Furthermore we already know
from previous iteration whether there are more pages to write or not. So
propagate the information from mpage_prepare_extent_to_map() and avoid
unnecessary looping in case there are no more pages to write.
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200525081215.29451-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

6b8ed620

ext4: don't block for O_DIRECT if IOCB_NOWAIT is set · 6e014c62

由 Jens Axboe 提交于 5月 24, 2020

Running with some debug patches to detect illegal blocking triggered the
extend/unaligned condition in ext4. If ext4 needs to extend the file (and
hence go to buffered IO), or if the app is doing unaligned IO, then ext4
asks the iomap code to wait for IO completion. If the caller asked for
no-wait semantics by setting IOCB_NOWAIT, then ext4 should return -EAGAIN
instead.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

Link: https://lore.kernel.org/r/76152096-2bbb-7682-8fce-4cb498bcd909@kernel.dkSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

6e014c62

ext4: remove the access_ok() check in ext4_ioctl_get_es_cache · ba988903

由 Christoph Hellwig 提交于 5月 23, 2020

access_ok just checks we are fed a proper user pointer. We also do that
in copy_to_user itself, so no need to do this early.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200523073016.2944131-10-hch@lst.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

ba988903

fs: handle FIEMAP_FLAG_SYNC in fiemap_prep · 45dd052e

由 Christoph Hellwig 提交于 5月 23, 2020

By moving FIEMAP_FLAG_SYNC handling to fiemap_prep we ensure it is
handled once instead of duplicated, but can still be done under fs locks,
like xfs/iomap intended with its duplicate handling. Also make sure the
error value of filemap_write_and_wait is propagated to user space.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-8-hch@lst.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

45dd052e

fs: move fiemap range validation into the file systems instances · cddf8a2c

由 Christoph Hellwig 提交于 5月 23, 2020

Replace fiemap_check_flags with a fiemap_prep helper that also takes the
inode and mapped range, and performs the sanity check and truncation
previously done in fiemap_check_range. This way the validation is inside
the file system itself and thus properly works for the stacked overlayfs
case as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-7-hch@lst.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

cddf8a2c

fs: move the fiemap definitions out of fs.h · 10c5db28

由 Christoph Hellwig 提交于 5月 23, 2020

No need to pull the fiemap definitions into almost every file in the
kernel build.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

10c5db28

ext4: remove the call to fiemap_check_flags in ext4_fiemap · da565e79

由 Christoph Hellwig 提交于 5月 23, 2020

iomap_fiemap already calls fiemap_check_flags first thing, so this
additional check is redundant.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200523073016.2944131-3-hch@lst.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

da565e79

ext4: split _ext4_fiemap · 03a5ed24

由 Christoph Hellwig 提交于 5月 23, 2020

The fiemap and EXT4_IOC_GET_ES_CACHE cases share almost no code, so split
them into entirely separate functions.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200523073016.2944131-2-hch@lst.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

03a5ed24

ext4: fix fiemap size checks for bitmap files · 328e24ae

由 Christoph Hellwig 提交于 5月 05, 2020

Add an extra validation of the len parameter, as for ext4 some files
might have smaller file size limits than others. This also means the
redundant size check in ext4_ioctl_get_es_cache can go away, as all
size checking is done in the shared fiemap handler.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200505154324.3226743-3-hch@lst.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

328e24ae

ext4: fix EXT4_MAX_LOGICAL_BLOCK macro · 175efa81

由 Ritesh Harjani 提交于 5月 05, 2020

ext4 supports max number of logical blocks in a file to be 0xffffffff.
(This is since ext4_extent's ee_block is __le32).
This means that EXT4_MAX_LOGICAL_BLOCK should be 0xfffffffe (starting
from 0 logical offset). This patch fixes this.

The issue was seen when ext4 moved to iomap_fiemap API and when
overlayfs was mounted on top of ext4. Since overlayfs was missing
filemap_check_ranges(), so it could pass a arbitrary huge length which
lead to overflow of map.m_len logic.

This patch fixes that.

Fixes: d3b6f23f ("ext4: move ext4_fiemap to use iomap framework")
Reported-by: syzbot+77fa5bdb65cc39711820@syzkaller.appspotmail.com
Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20200505154324.3226743-2-hch@lst.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

175efa81

add comment for ext4_dir_entry_2 file_type member · 9f364e1d

由 Jonathan Grant 提交于 5月 22, 2020

Signed-off-by: NJonathan Grant <jg@jguk.org>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/ad3290d5-86af-99c1-f9d5-cd1bab710429@jguk.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

9f364e1d

ext4: drop ext4_journal_free_reserved() · dfcd4489

由 Jan Kara 提交于 5月 20, 2020

Remove ext4_journal_free_reserved() function. It is never used.
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20200520133119.1383-2-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

dfcd4489

ext4: mballoc: use lock for checking free blocks while retrying · 99377830

由 Ritesh Harjani 提交于 5月 20, 2020

Currently while doing block allocation grp->bb_free may be getting
modified if discard is happening in parallel.
For e.g. consider a case where there are lot of threads who have
preallocated lot of blocks and there is a thread which is trying
to discard all of this group's PA. Now it could happen that
we see all of those group's bb_free is zero and fail the allocation
while there is sufficient space if we free up all the PA.

So this patch adds another flag "EXT4_MB_STRICT_CHECK" which will be set
if we are unable to allocate any blocks in the first try (since we may
not have considered blocks about to be discarded from PA lists).
So during retry attempt to allocate blocks we will use ext4_lock_group()
for checking if the group is good or not.
Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/9cb740a117c958c36596f167b12af1beae9a68b7.1589955723.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

99377830

ext4: mballoc: refactor ext4_mb_good_group() · 8ef123fe

由 Ritesh Harjani 提交于 5月 20, 2020

ext4_mb_good_group() definition was changed some time back
and now it even initializes the buddy cache (via ext4_mb_init_group()),
if in case the EXT4_MB_GRP_NEED_INIT() is true for a group.
Note that ext4_mb_init_group() could sleep and so should not be called
under a spinlock held.
This is fine as of now because ext4_mb_good_group() is called before
loading the buddy bitmap without ext4_lock_group() held
and again called after loading the bitmap, only this time with
ext4_lock_group() held.
But still this whole thing is confusing.

So this patch refactors out ext4_mb_good_group_nolock() which should be
called when without holding ext4_lock_group().
Also in further patches we hold the spinlock (ext4_lock_group()) while
doing any calculations which involves grp->bb_free or grp->bb_fragments.
Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/d9f7d031a5fbe1c943fae6bf1ff5cdf0604ae722.1589955723.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

8ef123fe

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功