提交 · 512edfac85d243ed6a5a5f42f513ebb7c2d32863 · openeuler / Kernel

15 10月, 2021 2 次提交

xfs: port the defer ops capture and continue to resource capture · 512edfac

由 Darrick J. Wong 提交于 9月 16, 2021

When log recovery tries to recover a transaction that had log intent
items attached to it, it has to save certain parts of the transaction
state (reservation, dfops chain, inodes with no automatic unlock) so
that it can finish single-stepping the recovered transactions before
finishing the chains.

This is done with the xfs_defer_ops_capture and xfs_defer_ops_continue
functions.  Right now they open-code this functionality, so let's port
this to the formalized resource capture structure that we introduced in
the previous patch.  This enables us to hold up to two inodes and two
buffers during log recovery, the same way we do for regular runtime.

With this patch applied, we'll be ready to support atomic extent swap
which holds two inodes; and logged xattrs which holds one inode and one
xattr leaf buffer.
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>

512edfac

xfs: formalize the process of holding onto resources across a defer roll · c5db9f93

由 Darrick J. Wong 提交于 9月 16, 2021

Transaction users are allowed to flag up to two buffers and two inodes
for ownership preservation across a deferred transaction roll. Hoist
the variables and code responsible for this out of xfs_defer_trans_roll
so that we can use it for the defer capture mechanism.
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>

c5db9f93

12 10月, 2021 2 次提交

xfs: use kmem_cache_free() for kmem_cache objects · c30a0cbd

由 Rustam Kovhaev 提交于 10月 11, 2021

For kmalloc() allocations SLOB prepends the blocks with a 4-byte header,
and it puts the size of the allocated blocks in that header.
Blocks allocated with kmem_cache_alloc() allocations do not have that
header.

SLOB explodes when you allocate memory with kmem_cache_alloc() and then
try to free it with kfree() instead of kmem_cache_free().
SLOB will assume that there is a header when there is none, read some
garbage to size variable and corrupt the adjacent objects, which
eventually leads to hang or panic.

Let's make XFS work with SLOB by using proper free function.

Fixes: 9749fee8 ("xfs: enable the xfs_defer mechanism to process extents to free")
Signed-off-by: NRustam Kovhaev <rkovhaev@gmail.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

c30a0cbd

xfs: Use kvcalloc() instead of kvzalloc() · a785fba7

由 Gustavo A. R. Silva 提交于 10月 11, 2021

Use 2-factor argument multiplication form kvcalloc() instead of
kvzalloc().

Link: https://github.com/KSPP/linux/issues/162Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

a785fba7

04 10月, 2021 1 次提交

elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings · 9b2f72cc

由 Chen Jingwen 提交于 9月 28, 2021

In commit b212921b ("elf: don't use MAP_FIXED_NOREPLACE for elf
executable mappings") we still leave MAP_FIXED_NOREPLACE in place for
load_elf_interp.

Unfortunately, this will cause kernel to fail to start with:

    1 (init): Uhuuh, elf segment at 00003ffff7ffd000 requested but the memory is mapped already
    Failed to execute /init (error -17)

The reason is that the elf interpreter (ld.so) has overlapping segments.

  readelf -l ld-2.31.so
  Program Headers:
    Type           Offset             VirtAddr           PhysAddr
                   FileSiz            MemSiz              Flags  Align
    LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                   0x000000000002c94c 0x000000000002c94c  R E    0x10000
    LOAD           0x000000000002dae0 0x000000000003dae0 0x000000000003dae0
                   0x00000000000021e8 0x0000000000002320  RW     0x10000
    LOAD           0x000000000002fe00 0x000000000003fe00 0x000000000003fe00
                   0x00000000000011ac 0x0000000000001328  RW     0x10000

The reason for this problem is the same as described in commit
ad55eac7 ("elf: enforce MAP_FIXED on overlaying elf segments").

Not only executable binaries, elf interpreters (e.g. ld.so) can have
overlapping elf segments, so we better drop MAP_FIXED_NOREPLACE and go
back to MAP_FIXED in load_elf_interp.

Fixes: 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map")
Cc: <stable@vger.kernel.org> # v4.19
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: NChen Jingwen <chenjingwen6@huawei.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b2f72cc

02 10月, 2021 1 次提交

io_uring: kill fasync · 3f008385

由 Pavel Begunkov 提交于 10月 01, 2021

We have never supported fasync properly, it would only fire when there
is something polling io_uring making it useless. The original support came
in through the initial io_uring merge for 5.1. Since it's broken and
nobody has reported it, get rid of the fasync bits.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2f7ca3d344d406d34fa6713824198915c41cea86.1633080236.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3f008385

01 10月, 2021 7 次提交

ext4: recheck buffer uptodate bit under buffer lock · f2c77973

由 Zhang Yi 提交于 9月 10, 2021

Commit 8e33fadf ("ext4: remove an unnecessary if statement in
__ext4_get_inode_loc()") forget to recheck buffer's uptodate bit again
under buffer lock, which may overwrite the buffer if someone else have
already brought it uptodate and changed it.

Fixes: 8e33fadf ("ext4: remove an unnecessary if statement in __ext4_get_inode_loc()")
Cc: stable@kernel.org
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210910080316.70421-1-yi.zhang@huawei.com

f2c77973

ext4: fix potential infinite loop in ext4_dx_readdir() · 42cb4474

由 yangerkun 提交于 9月 14, 2021

When ext4_htree_fill_tree() fails, ext4_dx_readdir() can run into an
infinite loop since if info->last_pos != ctx->pos this will reset the
directory scan and reread the failing entry.  For example:

1. a dx_dir which has 3 block, block 0 as dx_root block, block 1/2 as
   leaf block which own the ext4_dir_entry_2
2. block 1 read ok and call_filldir which will fill the dirent and update
   the ctx->pos
3. block 2 read fail, but we has already fill some dirent, so we will
   return back to userspace will a positive return val(see ksys_getdents64)
4. the second ext4_dx_readdir will reset the world since info->last_pos
   != ctx->pos, and will also init the curr_hash which pos to block 1
5. So we will read block1 too, and once block2 still read fail, we can
   only fill one dirent because the hash of the entry in block1(besides
   the last one) won't greater than curr_hash
6. this time, we forget update last_pos too since the read for block2
   will fail, and since we has got the one entry, ksys_getdents64 can
   return success
7. Latter we will trapped in a loop with step 4~6

Cc: stable@kernel.org
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210914111415.3921954-1-yangerkun@huawei.com

42cb4474

ext4: flush s_error_work before journal destroy in ext4_fill_super · bb9464e0

由 yangerkun 提交于 9月 24, 2021

The error path in ext4_fill_super forget to flush s_error_work before
journal destroy, and it may trigger the follow bug since
flush_stashed_error_work can run concurrently with journal destroy
without any protection for sbi->s_journal.

[32031.740193] EXT4-fs (loop66): get root inode failed
[32031.740484] EXT4-fs (loop66): mount failed
[32031.759805] ------------[ cut here ]------------
[32031.759807] kernel BUG at fs/jbd2/transaction.c:373!
[32031.760075] invalid opcode: 0000 [#1] SMP PTI
[32031.760336] CPU: 5 PID: 1029268 Comm: kworker/5:1 Kdump: loaded
4.18.0
[32031.765112] Call Trace:
[32031.765375]  ? __switch_to_asm+0x35/0x70
[32031.765635]  ? __switch_to_asm+0x41/0x70
[32031.765893]  ? __switch_to_asm+0x35/0x70
[32031.766148]  ? __switch_to_asm+0x41/0x70
[32031.766405]  ? _cond_resched+0x15/0x40
[32031.766665]  jbd2__journal_start+0xf1/0x1f0 [jbd2]
[32031.766934]  jbd2_journal_start+0x19/0x20 [jbd2]
[32031.767218]  flush_stashed_error_work+0x30/0x90 [ext4]
[32031.767487]  process_one_work+0x195/0x390
[32031.767747]  worker_thread+0x30/0x390
[32031.768007]  ? process_one_work+0x390/0x390
[32031.768265]  kthread+0x10d/0x130
[32031.768521]  ? kthread_flush_work_fn+0x10/0x10
[32031.768778]  ret_from_fork+0x35/0x40

static int start_this_handle(...)
    BUG_ON(journal->j_flags & JBD2_UNMOUNT); <---- Trigger this

Besides, after we enable fast commit, ext4_fc_replay can add work to
s_error_work but return success, so the latter journal destroy in
ext4_load_journal can trigger this problem too.

Fix this problem with two steps:
1. Call ext4_commit_super directly in ext4_handle_error for the case
   that called from ext4_fc_replay
2. Since it's hard to pair the init and flush for s_error_work, we'd
   better add a extras flush_work before journal destroy in
   ext4_fill_super

Besides, this patch will call ext4_commit_super in ext4_handle_error for
any nojournal case too. But it seems safe since the reason we call
schedule_work was that we should save error info to sb through journal
if available. Conversely, for the nojournal case, it seems useless delay
commit superblock to s_error_work.

Fixes: c92dc856 ("ext4: defer saving error info from atomic context")
Fixes: 2d01ddc8 ("ext4: save error info to sb through journal if available")
Cc: stable@kernel.org
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210924093917.1953239-1-yangerkun@huawei.com

bb9464e0

ext4: fix loff_t overflow in ext4_max_bitmap_size() · 75ca6ad4

由 Ritesh Harjani 提交于 6月 05, 2021

We should use unsigned long long rather than loff_t to avoid
overflow in ext4_max_bitmap_size() for comparison before returning.
w/o this patch sbi->s_bitmap_maxbytes was becoming a negative
value due to overflow of upper_limit (with has_huge_files as true)

Below is a quick test to trigger it on a 64KB pagesize system.

sudo mkfs.ext4 -b 65536 -O ^has_extents,^64bit /dev/loop2
sudo mount /dev/loop2 /mnt
sudo echo "hello" > /mnt/hello 	-> This will error out with
				"echo: write error: File too large"
Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/594f409e2c543e90fd836b78188dfa5c575065ba.1622867594.git.riteshh@linux.ibm.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

75ca6ad4

ext4: fix reserved space counter leakage · 6fed8395

由 Jeffle Xu 提交于 8月 23, 2021

When ext4_insert_delayed block receives and recovers from an error from
ext4_es_insert_delayed_block(), e.g., ENOMEM, it does not release the
space it has reserved for that block insertion as it should. One effect
of this bug is that s_dirtyclusters_counter is not decremented and
remains incorrectly elevated until the file system has been unmounted.
This can result in premature ENOSPC returns and apparent loss of free
space.

Another effect of this bug is that
/sys/fs/ext4/<dev>/delayed_allocation_blocks can remain non-zero even
after syncfs has been executed on the filesystem.

Besides, add check for s_dirtyclusters_counter when inode is going to be
evicted and freed. s_dirtyclusters_counter can still keep non-zero until
inode is written back in .evict_inode(), and thus the check is delayed
to .destroy_inode().

Fixes: 51865fda ("ext4: let ext4 maintain extent status tree")
Cc: stable@kernel.org
Suggested-by: NGao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210823061358.84473-1-jefflexu@linux.alibaba.com

6fed8395

ext4: limit the number of blocks in one ADD_RANGE TLV · a2c2f082

由 Hou Tao 提交于 8月 20, 2021

Now EXT4_FC_TAG_ADD_RANGE uses ext4_extent to track the
newly-added blocks, but the limit on the max value of
ee_len field is ignored, and it can lead to BUG_ON as
shown below when running command "fallocate -l 128M file"
on a fast_commit-enabled fs:

  kernel BUG at fs/ext4/ext4_extents.h:199!
  invalid opcode: 0000 [#1] SMP PTI
  CPU: 3 PID: 624 Comm: fallocate Not tainted 5.14.0-rc6+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
  RIP: 0010:ext4_fc_write_inode_data+0x1f3/0x200
  Call Trace:
   ? ext4_fc_write_inode+0xf2/0x150
   ext4_fc_commit+0x93b/0xa00
   ? ext4_fallocate+0x1ad/0x10d0
   ext4_sync_file+0x157/0x340
   ? ext4_sync_file+0x157/0x340
   vfs_fsync_range+0x49/0x80
   do_fsync+0x3d/0x70
   __x64_sys_fsync+0x14/0x20
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Simply fixing it by limiting the number of blocks
in one EXT4_FC_TAG_ADD_RANGE TLV.

Fixes: aa75f4d3 ("ext4: main fast-commit commit path")
Cc: stable@kernel.org
Signed-off-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210820044505.474318-1-houtao1@huawei.com

a2c2f082

ksmbd: missing check for NULL in convert_to_nt_pathname() · 87ffb310

由 Dan Carpenter 提交于 9月 30, 2021

The kmalloc() does not have a NULL check.  This code can be re-written
slightly cleaner to just use the kstrdup().

Fixes: 265fd199 ("ksmbd: use LOOKUP_BENEATH to prevent the out of share access")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NNamjae Jeon <linkinjeon@kernel.org>
Acked-by: NHyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

87ffb310

30 9月, 2021 6 次提交

ksmbd: fix transform header validation · 4227f811

由 Namjae Jeon 提交于 9月 29, 2021

Validate that the transform and smb request headers are present
before checking OriginalMessageSize and SessionId fields.

Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Böhme <slow@samba.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: NTom Talpey <tom@talpey.com>
Acked-by: NHyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: NNamjae Jeon <linkinjeon@kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>

4227f811

ksmbd: add buffer validation for SMB2_CREATE_CONTEXT · 8f77150c

由 Hyunchul Lee 提交于 9月 24, 2021

Add buffer validation for SMB2_CREATE_CONTEXT.

Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Reviewed-by: NRalph Boehme <slow@samba.org>
Signed-off-by: NHyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: NNamjae Jeon <linkinjeon@kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>

8f77150c

ksmbd: add validation in smb2 negotiate · 442ff9eb

由 Namjae Jeon 提交于 9月 29, 2021

This patch add validation to check request buffer check in smb2
negotiate and fix null pointer deferencing oops in smb3_preauth_hash_rsp()
that found from manual test.

Cc: Tom Talpey <tom@talpey.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Böhme <slow@samba.org>
Cc: Hyunchul Lee <hyc.lee@gmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: NRalph Boehme <slow@samba.org>
Signed-off-by: NNamjae Jeon <linkinjeon@kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>

442ff9eb

ksmbd: add request buffer validation in smb2_set_info · 9496e268

由 Namjae Jeon 提交于 9月 29, 2021

Add buffer validation in smb2_set_info, and remove unused variable
in set_file_basic_info. and smb2_set_info infolevel functions take
structure pointer argument.

Cc: Tom Talpey <tom@talpey.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Böhme <slow@samba.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: NHyunchul Lee <hyc.lee@gmail.com>
Reviewed-by: NRalph Boehme <slow@samba.org>
Signed-off-by: NNamjae Jeon <linkinjeon@kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>

9496e268

ksmbd: use correct basic info level in set_file_basic_info() · 88d30052

由 Namjae Jeon 提交于 9月 29, 2021

Use correct basic info level in set/get_file_basic_info().
Reviewed-by: NRalph Boehme <slow@samba.org>
Signed-off-by: NNamjae Jeon <linkinjeon@kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>

88d30052

ksmbd: remove NTLMv1 authentication · ce812992

由 Namjae Jeon 提交于 9月 27, 2021

Remove insecure NTLMv1 authentication.

Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Böhme <slow@samba.org>
Reviewed-by: NTom Talpey <tom@talpey.com>
Acked-by: NSteve French <smfrench@gmail.com>
Signed-off-by: NNamjae Jeon <linkinjeon@kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>

ce812992

29 9月, 2021 2 次提交

ksmbd: fix documentation for 2 functions · 1018bf24

由 Enzo Matsumiya 提交于 9月 28, 2021

ksmbd_kthread_fn() and create_socket() returns 0 or error code, and not
task_struct/ERR_PTR.
Signed-off-by: NEnzo Matsumiya <ematsumiya@suse.de>
Acked-by: NNamjae Jeon <linkinjeon@kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>

1018bf24

kernfs: also call kernfs_set_rev() for positive dentry · df38d852

由 Hou Tao 提交于 9月 28, 2021

A KMSAN warning is reported by Alexander Potapenko:

BUG: KMSAN: uninit-value in kernfs_dop_revalidate+0x61f/0x840
fs/kernfs/dir.c:1053
 kernfs_dop_revalidate+0x61f/0x840 fs/kernfs/dir.c:1053
 d_revalidate fs/namei.c:854
 lookup_dcache fs/namei.c:1522
 __lookup_hash+0x3a6/0x590 fs/namei.c:1543
 filename_create+0x312/0x7c0 fs/namei.c:3657
 do_mkdirat+0x103/0x930 fs/namei.c:3900
 __do_sys_mkdir fs/namei.c:3931
 __se_sys_mkdir fs/namei.c:3929
 __x64_sys_mkdir+0xda/0x120 fs/namei.c:3929
 do_syscall_x64 arch/x86/entry/common.c:51

It seems a positive dentry in kernfs becomes a negative dentry directly
through d_delete() in vfs_rmdir(). dentry->d_time is uninitialized
when accessing it in kernfs_dop_revalidate(), because it is only
initialized when created as negative dentry in kernfs_iop_lookup().

The problem can be reproduced by the following command:

  cd /sys/fs/cgroup/pids && mkdir hi && stat hi && rmdir hi && stat hi

A simple fixes seems to be initializing d->d_time for positive dentry
in kernfs_iop_lookup() as well. The downside is the negative dentry
will be revalidated again after it becomes negative in d_delete(),
because the revison of its parent must have been increased due to
its removal.

Alternative solution is implement .d_iput for kernfs, and assign d_time
for the newly-generated negative dentry in it. But we may need to
take kernfs_rwsem to protect again the concurrent kernfs_link_sibling()
on the parent directory, it is a little over-killing. Now the simple
fix is chosen.

Link: https://marc.info/?l=linux-fsdevel&m=163249838610499
Fixes: c7e7c042 ("kernfs: use VFS negative dentry caching")
Reported-by: NAlexander Potapenko <glider@google.com>
Signed-off-by: NHou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20210928140750.1274441-1-houtao1@huawei.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

df38d852

28 9月, 2021 2 次提交

vboxfs: fix broken legacy mount signature checking · 9b3b353e

由 Linus Torvalds 提交于 9月 27, 2021

Commit 9d682ea6 ("vboxsf: Fix the check for the old binary
mount-arguments struct") was meant to fix a build error due to sign
mismatch in 'char' and the use of character constants, but it just moved
the error elsewhere, in that on some architectures characters and signed
and on others they are unsigned, and that's just how the C standard
works.

The proper fix is a simple "don't do that then".  The code was just
being silly and odd, and it should never have cared about signed vs
unsigned characters in the first place, since what it is testing is not
four "characters", but four bytes.

And the way to compare four bytes is by using "memcmp()".

Which compilers will know to just turn into a single 32-bit compare with
a constant, as long as you don't have crazy debug options enabled.

Link: https://lore.kernel.org/lkml/20210927094123.576521-1-arnd@kernel.org/
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b3b353e

io-wq: exclusively gate signal based exit on get_signal() return · 78f8876c

由 Jens Axboe 提交于 9月 27, 2021

io-wq threads block all signals, except SIGKILL and SIGSTOP. We should not
need any extra checking of signal_pending or fatal_signal_pending, rely
exclusively on whether or not get_signal() tells us to exit.

The original debugging of this issue led to the false positive that we
were exiting on non-fatal signals, but that is not the case. The issue
was around races with nr_workers accounting.

Fixes: 87c16966 ("io-wq: ensure we exit if thread group is exiting")
Fixes: 15e20db2 ("io-wq: only exit on fatal signals")
Reported-by: NEric W. Biederman <ebiederm@xmission.com>
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

78f8876c

27 9月, 2021 2 次提交

ksmbd: fix invalid request buffer access in compound · d72a9c15

由 Namjae Jeon 提交于 9月 24, 2021

Ronnie reported invalid request buffer access in chained command when
inserting garbage value to NextCommand of compound request.
This patch add validation check to avoid this issue.

Cc: Tom Talpey <tom@talpey.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Böhme <slow@samba.org>
Tested-by: NSteve French <smfrench@gmail.com>
Reviewed-by: NSteve French <smfrench@gmail.com>
Acked-by: NHyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: NNamjae Jeon <linkinjeon@kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>

d72a9c15

ksmbd: remove RFC1002 check in smb2 request · 18d46769

由 Ronnie Sahlberg 提交于 9月 21, 2021

In smb_common.c you have this function :   ksmbd_smb_request() which
is called from connection.c once you have read the initial 4 bytes for
the next length+smb2 blob.

It checks the first byte of this 4 byte preamble for valid values,
i.e. a NETBIOSoverTCP SESSION_MESSAGE or a SESSION_KEEP_ALIVE.

We don't need to check this for ksmbd since it only implements SMB2
over TCP port 445.
The netbios stuff was only used in very old servers when SMB ran over
TCP port 139.
Now that we run over TCP port 445, this is actually not a NB header anymore
and you can just treat it as a 4 byte length field that must be less
than 16Mbyte. and remove the references to the RFC1002 constants that no
longer applies.

Cc: Tom Talpey <tom@talpey.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Böhme <slow@samba.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: NHyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NNamjae Jeon <linkinjeon@kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>

18d46769

25 9月, 2021 12 次提交

ksmbd: use LOOKUP_BENEATH to prevent the out of share access · 265fd199

由 Hyunchul Lee 提交于 9月 25, 2021

instead of removing '..' in a given path, call
kern_path with LOOKUP_BENEATH flag to prevent
the out of share access.

ran various test on this:
smb2-cat-async smb://127.0.0.1/homes/../out_of_share
smb2-cat-async smb://127.0.0.1/homes/foo/../../out_of_share
smbclient //127.0.0.1/homes -c "mkdir ../foo2"
smbclient //127.0.0.1/homes -c "rename bar ../bar"

Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Boehme <slow@samba.org>
Tested-by: NSteve French <smfrench@gmail.com>
Tested-by: NNamjae Jeon <linkinjeon@kernel.org>
Acked-by: NNamjae Jeon <linkinjeon@kernel.org>
Signed-off-by: NHyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

265fd199

mm: fs: invalidate bh_lrus for only cold path · 243418e3

由 Minchan Kim 提交于 9月 24, 2021

The kernel test robot reported the regression of fio.write_iops[1] with
commit 8cc621d2 ("mm: fs: invalidate BH LRU during page migration").

Since lru_add_drain is called frequently, invalidate bh_lrus there could
increase bh_lrus cache miss ratio, which needs more IO in the end.

This patch moves the bh_lrus invalidation from the hot path( e.g.,
zap_page_range, pagevec_release) to cold path(i.e., lru_add_drain_all,
lru_cache_disable).

Zhengjun Xing confirmed
 "I test the patch, the regression reduced to -2.9%"

[1] https://lore.kernel.org/lkml/20210520083144.GD14190@xsang-OptiPlex-9020/
[2] 8cc621d2, mm: fs: invalidate BH LRU during page migration

Link: https://lkml.kernel.org/r/20210907212347.1977686-1-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
Reported-by: Nkernel test robot <oliver.sang@intel.com>
Reviewed-by: NChris Goldsworthy <cgoldswo@codeaurora.org>
Tested-by: N"Xing, Zhengjun" <zhengjun.xing@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

243418e3

ocfs2: drop acl cache for directories too · 9c0f0a03

由 Wengang Wang 提交于 9月 24, 2021

ocfs2_data_convert_worker() is currently dropping any cached acl info
for FILE before down-converting meta lock.  It should also drop for
DIRECTORY.  Otherwise the second acl lookup returns the cached one (from
VFS layer) which could be already stale.

The problem we are seeing is that the acl changes on one node doesn't
get refreshed on other nodes in the following case:

  Node 1                    Node 2
  --------------            ----------------
  getfacl dir1

                            getfacl dir1    <-- this is OK

  setfacl -m u:user1:rwX dir1
  getfacl dir1   <-- see the change for user1

                            getfacl dir1    <-- can't see change for user1

Link: https://lkml.kernel.org/r/20210903012631.6099-1-wen.gang.wang@oracle.comSigned-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9c0f0a03

io_uring: make OP_CLOSE consistent with direct open · 7df778be

由 Pavel Begunkov 提交于 9月 24, 2021

From recently open/accept are now able to manipulate fixed file table,
but it's inconsistent that close can't. Close the gap, keep API same as
with open/accept, i.e. via sqe->file_slot.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7df778be

io_uring: kill extra checks in io_write() · 9f3a2cb2

由 Pavel Begunkov 提交于 9月 24, 2021

We don't retry short writes and so we would never get to async setup in
io_write() in that case. Thus ret2 > 0 is always false and
iov_iter_advance() is never used. Apparently, the same is found by
Coverity, which complains on the code.

Fixes: cd658695 ("io_uring: use iov_iter state save/restore helpers")
Reported-by: NDave Jones <davej@codemonkey.org.uk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5b33e61034748ef1022766efc0fb8854cfcf749c.1632500058.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9f3a2cb2

io_uring: don't punt files update to io-wq unconditionally · cdb31c29

由 Jens Axboe 提交于 9月 24, 2021

There's no reason to punt it unconditionally, we just need to ensure that
the submit lock grabbing is conditional.

Fixes: 05f3fb3c ("io_uring: avoid ring quiesce for fixed file set unregister and update")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cdb31c29

io_uring: put provided buffer meta data under memcg accounting · 9990da93

由 Jens Axboe 提交于 9月 24, 2021

For each provided buffer, we allocate a struct io_buffer to hold the
data associated with it. As a large number of buffers can be provided,
account that data with memcg.

Fixes: ddf0322d ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9990da93

io_uring: allow conditional reschedule for intensive iterators · 8bab4c09

由 Jens Axboe 提交于 9月 24, 2021

If we have a lot of threads and rings, the tctx list can get quite big.
This is especially true if we keep creating new threads and rings.
Likewise for the provided buffers list. Be nice and insert a conditional
reschedule point while iterating the nodes for deletion.

Link: https://lore.kernel.org/io-uring/00000000000064b6b405ccb41113@google.com/
Reported-by: syzbot+111d2a03f51f5ae73775@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8bab4c09

io_uring: fix potential req refcount underflow · 5b7aa38d

由 Hao Xu 提交于 9月 22, 2021

For multishot mode, there may be cases like:

iowq                                 original context
io_poll_add
  _arm_poll()
  mask = vfs_poll() is not 0
  if mask
(2)  io_poll_complete()
  compl_unlock
   (interruption happens
    tw queued to original
    context)
                                     io_poll_task_func()
                                     compl_lock
                                 (3) done = io_poll_complete() is true
                                     compl_unlock
                                     put req ref
(1) if (poll->flags & EPOLLONESHOT)
      put req ref

EPOLLONESHOT flag in (1) may be from (2) or (3), so there are multiple
combinations that can cause ref underfow.
Let's address it by:
- check the return value in (2) as done
- change (1) to if (done)
    in this way, we only do ref put in (1) if 'oneshot flag' is from
    (2)
- do poll.done check in io_poll_task_func(), so that we won't put ref
  for the second time.
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101238.7177-4-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

5b7aa38d

io_uring: fix missing set of EPOLLONESHOT for CQ ring overflow · a62682f9

由 Hao Xu 提交于 9月 22, 2021

We should set EPOLLONESHOT if cqring_fill_event() returns false since
io_poll_add() decides to put req or not by it.

Fixes: 5082620f ("io_uring: terminate multishot poll for CQ ring overflow")
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101238.7177-3-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a62682f9

io_uring: fix race between poll completion and cancel_hash insertion · bd99c71b

由 Hao Xu 提交于 9月 22, 2021

If poll arming and poll completion runs in parallel, there maybe races.
For instance, run io_poll_add in iowq and io_poll_task_func in original
context, then:

  iowq                                      original context
  io_poll_add
    vfs_poll
     (interruption happens
      tw queued to original
      context)                              io_poll_task_func
                                              generate cqe
                                              del from cancel_hash[]
    if !poll.done
      insert to cancel_hash[]

The entry left in cancel_hash[], similar case for fast poll.
Fix it by set poll.done = true when del from cancel_hash[].

Fixes: 5082620f ("io_uring: terminate multishot poll for CQ ring overflow")
Signed-off-by: NHao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101238.7177-2-haoxu@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

bd99c71b

io-wq: ensure we exit if thread group is exiting · 87c16966

由 Jens Axboe 提交于 9月 21, 2021

Dave reports that a coredumping workload gets stuck in 5.15-rc2, and
identified the culprit in the Fixes line below. The problem is that
relying solely on fatal_signal_pending() to gate whether to exit or not
fails miserably if a process gets eg SIGILL sent. Don't exclusively
rely on fatal signals, also check if the thread group is exiting.

Fixes: 15e20db2 ("io-wq: only exit on fatal signals")
Reported-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

87c16966

24 9月, 2021 3 次提交

cifs: fix incorrect check for null pointer in header_assemble · 9ed38fd4

由 Steve French 提交于 9月 23, 2021

Although very unlikely that the tlink pointer would be null in this case,
get_next_mid function can in theory return null (but not an error)
so need to check for null (not for IS_ERR, which can not be returned
here).

Address warning:

        fs/smbfs_client/connect.c:2392 cifs_match_super()
        warn: 'tlink' isn't an ERR_PTR

Pointed out by Dan Carpenter via smatch code analysis tool

CC: stable@vger.kernel.org
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

9ed38fd4

smb3: correct server pointer dereferencing check to be more consistent · 1db1aa98

由 Steve French 提交于 9月 23, 2021

Address warning:

    fs/smbfs_client/misc.c:273 header_assemble()
    warn: variable dereferenced before check 'treeCon->ses->server'

Pointed out by Dan Carpenter via smatch code analysis tool

Although the check is likely unneeded, adding it makes the code
more consistent and easier to read, as the same check is
done elsewhere in the function.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

1db1aa98

smb3: correct smb3 ACL security descriptor · b06d893e

由 Steve French 提交于 9月 23, 2021

Address warning:

        fs/smbfs_client/smb2pdu.c:2425 create_sd_buf()
        warn: struct type mismatch 'smb3_acl vs cifs_acl'

Pointed out by Dan Carpenter via smatch code analysis tool
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

b06d893e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功