提交 · d8ba8ba4c801b794f47852a6f1821ea48f83b5d1 · openeuler / Kernel

30 11月, 2022 1 次提交

KVM: x86/xen: Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured · d8ba8ba4

由 David Woodhouse 提交于 11月 27, 2022

Closer inspection of the Xen code shows that we aren't supposed to be
using the XEN_RUNSTATE_UPDATE flag unconditionally. It should be
explicitly enabled by guests through the HYPERVISOR_vm_assist hypercall.
If we randomly set the top bit of ->state_entry_time for a guest that
hasn't asked for it and doesn't expect it, that could make the runtimes
fail to add up and confuse the guest. Without the flag it's perfectly
safe for a vCPU to read its own vcpu_runstate_info; just not for one
vCPU to read *another's*.

I briefly pondered adding a word for the whole set of VMASST_TYPE_*
flags but the only one we care about for HVM guests is this, so it
seemed a bit pointless.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20221127122210.248427-3-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d8ba8ba4

29 11月, 2022 1 次提交

KVM: always declare prototype for kvm_arch_irqchip_in_kernel · 3ca9d84e

由 Paolo Bonzini 提交于 11月 23, 2022

Architecture code might want to use it even if CONFIG_HAVE_KVM_IRQ_ROUTING
is false; for example PPC XICS has KVM_IRQ_LINE and wants to use
kvm_arch_irqchip_in_kernel from there, but it does not have
KVM_SET_GSI_ROUTING so the prototype was not provided.

Fixes: d663b8a2 ("KVM: replace direct irq.h inclusion")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3ca9d84e

23 11月, 2022 2 次提交

KVM: s390: pv: add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE · 8c516b25

由 Claudio Imbrenda 提交于 11月 11, 2022

Add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE to signal that the
KVM_PV_ASYNC_DISABLE and KVM_PV_ASYNC_DISABLE_PREPARE commands for the
KVM_S390_PV_COMMAND ioctl are available.
Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: NNico Boehr <nrb@linux.ibm.com>
Reviewed-by: NSteffen Eiden <seiden@linux.ibm.com>
Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20221111170632.77622-4-imbrenda@linux.ibm.com
Message-Id: <20221111170632.77622-4-imbrenda@linux.ibm.com>
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>

8c516b25

KVM: s390: pv: asynchronous destroy for reboot · fb491d55

由 Claudio Imbrenda 提交于 11月 11, 2022

Until now, destroying a protected guest was an entirely synchronous
operation that could potentially take a very long time, depending on
the size of the guest, due to the time needed to clean up the address
space from protected pages.

This patch implements an asynchronous destroy mechanism, that allows a
protected guest to reboot significantly faster than previously.

This is achieved by clearing the pages of the old guest in background.
In case of reboot, the new guest will be able to run in the same
address space almost immediately.

The old protected guest is then only destroyed when all of its memory
has been destroyed or otherwise made non protected.

Two new PV commands are added for the KVM_S390_PV_COMMAND ioctl:

KVM_PV_ASYNC_CLEANUP_PREPARE: set aside the current protected VM for
later asynchronous teardown. The current KVM VM will then continue
immediately as non-protected. If a protected VM had already been
set aside for asynchronous teardown, but without starting the teardown
process, this call will fail. There can be at most one VM set aside at
any time. Once it is set aside, the protected VM only exists in the
context of the Ultravisor, it is not associated with the KVM VM
anymore. Its protected CPUs have already been destroyed, but not its
memory. This command can be issued again immediately after starting
KVM_PV_ASYNC_CLEANUP_PERFORM, without having to wait for completion.

KVM_PV_ASYNC_CLEANUP_PERFORM: tears down the protected VM previously
set aside using KVM_PV_ASYNC_CLEANUP_PREPARE. Ideally the
KVM_PV_ASYNC_CLEANUP_PERFORM PV command should be issued by userspace
from a separate thread. If a fatal signal is received (or if the
process terminates naturally), the command will terminate immediately
without completing. All protected VMs whose teardown was interrupted
will be put in the need_cleanup list. The rest of the normal KVM
teardown process will take care of properly cleaning up all remaining
protected VMs, including the ones on the need_cleanup list.
Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: NNico Boehr <nrb@linux.ibm.com>
Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NSteffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20221111170632.77622-2-imbrenda@linux.ibm.com
Message-Id: <20221111170632.77622-2-imbrenda@linux.ibm.com>
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>

fb491d55

19 11月, 2022 2 次提交

x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants · bd19c94a

由 Vitaly Kuznetsov 提交于 11月 01, 2022

It may not come clear from where the magical '64' value used in
__cpumask_to_vpset() come from. Moreover, '64' means both the maximum
sparse bank number as well as the number of vCPUs per bank. Add defines
to make things clear. These defines are also going to be used by KVM.

No functional change.
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-15-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bd19c94a

KVM: x86: avoid memslot check in NX hugepage recovery if it cannot succeed · 6c7b2202

由 Paolo Bonzini 提交于 11月 17, 2022

Since gfn_to_memslot() is relatively expensive, it helps to
skip it if it the memslot cannot possibly have dirty logging
enabled.  In order to do this, add to struct kvm a counter
of the number of log-page memslots.  While the correct value
can only be read with slots_lock taken, the NX recovery thread
is content with using an approximate value.  Therefore, the
counter is an atomic_t.

Based on https://lore.kernel.org/kvm/20221027200316.2221027-2-dmatlack@google.com/
by David Matlack.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6c7b2202

10 11月, 2022 6 次提交

KVM: replace direct irq.h inclusion · d663b8a2

由 Paolo Bonzini 提交于 11月 03, 2022

virt/kvm/irqchip.c is including "irq.h" from the arch-specific KVM source
directory (i.e. not from arch/*/include) for the sole purpose of retrieving
irqchip_in_kernel.

Making the function inline in a header that is already included,
such as asm/kvm_host.h, is not possible because it needs to look at
struct kvm which is defined after asm/kvm_host.h is included.  So add a
kvm_arch_irqchip_in_kernel non-inline function; irqchip_in_kernel() is
only performance critical on arm64 and x86, and the non-inline function
is enough on all other architectures.

irq.h can then be deleted from all architectures except x86.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d663b8a2

KVM: x86: Add a VALID_MASK for the MSR exit reason flags · db205f7e

由 Aaron Lewis 提交于 9月 21, 2022

Add the mask KVM_MSR_EXIT_REASON_VALID_MASK for the MSR exit reason
flags.  This simplifies checks that validate these flags, and makes it
easier to introduce new flags in the future.

No functional change intended.
Signed-off-by: NAaron Lewis <aaronlewis@google.com>
Message-Id: <20220921151525.904162-3-aaronlewis@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db205f7e

kvm: Add interruptible flag to __gfn_to_pfn_memslot() · c8b88b33

由 Peter Xu 提交于 10月 11, 2022

Add a new "interruptible" flag showing that the caller is willing to be
interrupted by signals during the __gfn_to_pfn_memslot() request.  Wire it
up with a FOLL_INTERRUPTIBLE flag that we've just introduced.

This prepares KVM to be able to respond to SIGUSR1 (for QEMU that's the
SIGIPI) even during e.g. handling an userfaultfd page fault.

No functional change intended.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20221011195809.557016-4-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c8b88b33

kvm: Add KVM_PFN_ERR_SIGPENDING · fe5ed56c

由 Peter Xu 提交于 10月 11, 2022

Add a new pfn error to show that we've got a pending signal to handle
during hva_to_pfn_slow() procedure (of -EINTR retval).
Signed-off-by: NPeter Xu <peterx@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20221011195809.557016-3-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fe5ed56c

mm/gup: Add FOLL_INTERRUPTIBLE · 93c5c61d

由 Peter Xu 提交于 10月 11, 2022

We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs.  One
issue with it is that not all GUP paths are able to handle signal delivers
besides SIGKILL.

That's not ideal for the GUP users who are actually able to handle these
cases, like KVM.

KVM uses GUP extensively on faulting guest pages, during which we've got
existing infrastructures to retry a page fault at a later time.  Allowing
the GUP to be interrupted by generic signals can make KVM related threads
to be more responsive.  For examples:

  (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI,
      e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be
      generated to kick the vcpus out of kernel context immediately,

  (2) SIGINT: which can be used with interactive hypervisor users to stop a
      virtual machine with Ctrl-C without any delays/hangs,

  (3) SIGTRAP: which grants GDB capability even during page faults that are
      stuck for a long time.

Normally hypervisor will be able to receive these signals properly, but not
if we're stuck in a GUP for a long time for whatever reason.  It happens
easily with a stucked postcopy migration when e.g. a network temp failure
happens, then some vcpu threads can hang death waiting for the pages.  With
the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively
enable the ability to trap these signals.
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20221011195809.557016-2-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

93c5c61d

bug: introduce ASSERT_STRUCT_OFFSET · 07a368b3

由 Maxim Levitsky 提交于 10月 25, 2022

ASSERT_STRUCT_OFFSET allows to assert during the build of
the kernel that a field in a struct have an expected offset.

KVM used to have such macro, but there is almost nothing KVM specific
in it so move it to build_bug.h, so that it can be used in other
places in KVM.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221025124741.228045-10-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

07a368b3

02 11月, 2022 1 次提交

netlink: introduce bigendian integer types · ecaf75ff

由 Florian Westphal 提交于 10月 31, 2022

Jakub reported that the addition of the "network_byte_order"
member in struct nla_policy increases size of 32bit platforms.

Instead of scraping the bit from elsewhere Johannes suggested
to add explicit NLA_BE types instead, so do this here.

NLA_POLICY_MAX_BE() macro is removed again, there is no need
for it: NLA_POLICY_MAX(NLA_BE.., ..) will do the right thing.

NLA_BE64 can be added later.

Fixes: 08724ef6 ("netlink: introduce NLA_POLICY_MAX_BE")
Reported-by: NJakub Kicinski <kuba@kernel.org>
Suggested-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20221031123407.9158-1-fw@strlen.deSigned-off-by: NJakub Kicinski <kuba@kernel.org>

ecaf75ff

01 11月, 2022 1 次提交

asm-generic: compat: fix compat_arg_u64() and compat_arg_u64_dual() · 40ff2143

由 Andreas Schwab 提交于 10月 31, 2022

The macros are defined backwards.

This affects the following compat syscalls:
 - compat_sys_truncate64()
 - compat_sys_ftruncate64()
 - compat_sys_fallocate()
 - compat_sys_sync_file_range()
 - compat_sys_fadvise64_64()
 - compat_sys_readahead()
 - compat_sys_pread64()
 - compat_sys_pwrite64()

Fixes: 43d5de2b ("asm-generic: compat: Support BE for long long args in 32-bit ABIs")
Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
[mpe: Add list of affected syscalls]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/871qqoyvni.fsf_-_@igel.home

40ff2143

29 10月, 2022 5 次提交

net: remove SOCK_SUPPORT_ZC from sockmap · fee9ac06

由 Pavel Begunkov 提交于 10月 27, 2022

sockmap replaces ->sk_prot with its own callbacks, we should remove
SOCK_SUPPORT_ZC as the new proto doesn't support msghdr::ubuf_info.

Cc: <stable@vger.kernel.org> # 6.0
Reported-by: NJakub Kicinski <kuba@kernel.org>
Fixes: e993ffe3 ("net: flag sockets supporting msghdr originated zerocopy")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

fee9ac06

netlink: hide validation union fields from kdoc · 7354c902

由 Jakub Kicinski 提交于 10月 27, 2022

Mark the validation fields as private, users shouldn't set
them directly and they are too complicated to explain in
a more succinct way (there's already a long explanation
in the comment above).

The strict_start_type field is set directly and has a dedicated
comment so move that above the "private" section.

Link: https://lore.kernel.org/r/20221027212107.2639255-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

7354c902

fortify: Capture __bos() results in const temp vars · 6f7630b1

由 Kees Cook 提交于 10月 28, 2022

In two recent run-time memcpy() bound checking bug reports (NFS[1] and
JFS[2]), the _detection_ was working correctly (in the sense that the
requested copy size was larger than the destination field size), but
the _warning text_ was showing the destination field size as SIZE_MAX
("unknown size"). This should be impossible, since the detection function
will explicitly give up if the destination field size is unknown. For
example, the JFS warning was:

  memcpy: detected field-spanning write (size 132) of single field "ip->i_link" at fs/jfs/namei.c:950 (size 18446744073709551615)

Other cases of this warning (e.g.[3]) have reported correctly,
and the reproducer only happens under GCC (at least 10.2 and 12.1),
so this currently appears to be a GCC bug. Explicitly capturing the
__builtin_object_size() results in const temporary variables fixes the
report. For example, the JFS reproducer now correctly reports the field
size (128):

  memcpy: detected field-spanning write (size 132) of single field "ip->i_link" at fs/jfs/namei.c:950 (size 128)

Examination of the .text delta (which is otherwise identical), shows
the literal value used in the report changing:

-     mov    $0xffffffffffffffff,%rcx
+     mov    $0x80,%ecx

[1] https://lore.kernel.org/lkml/Y0zEzZwhOxTDcBTB@codemonkey.org.uk/
[2] https://syzkaller.appspot.com/bug?id=23d613df5259b977dac1696bec77f61a85890e3d
[3] https://lore.kernel.org/all/202210110948.26b43120-yujie.liu@intel.com/

Cc: "Dr. David Alan Gilbert" <linux@treblig.org>
Cc: llvm@lists.linux.dev
Cc: linux-hardening@vger.kernel.org
Signed-off-by: NKees Cook <keescook@chromium.org>

6f7630b1

x86: fortify: kmsan: fix KMSAN fortify builds · 78a498c3

由 Alexander Potapenko 提交于 10月 24, 2022

Ensure that KMSAN builds replace memset/memcpy/memmove calls with the
respective __msan_XXX functions, and that none of the macros are redefined
twice.  This should allow building kernel with both CONFIG_KMSAN and
CONFIG_FORTIFY_SOURCE.

Link: https://lkml.kernel.org/r/20221024212144.2852069-5-glider@google.com
Link: https://github.com/google/kmsan/issues/89Signed-off-by: NAlexander Potapenko <glider@google.com>
Reported-by: NTamas K Lengyel <tamas.lengyel@zentific.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

78a498c3

mm/uffd: fix vma check on userfault for wp · 67eae54b

由 Peter Xu 提交于 10月 24, 2022

We used to have a report that pte-marker code can be reached even when
uffd-wp is not compiled in for file memories, here:

https://lore.kernel.org/all/YzeR+R6b4bwBlBHh@x1n/T/#u

I just got time to revisit this and found that the root cause is we simply
messed up with the vma check, so that for !PTE_MARKER_UFFD_WP system, we
will allow UFFDIO_REGISTER of MINOR & WP upon shmem as the check was
wrong:

    if (vm_flags & VM_UFFD_MINOR)
        return is_vm_hugetlb_page(vma) || vma_is_shmem(vma);

Where we'll allow anything to pass on shmem as long as minor mode is
requested.

Axel did it right when introducing minor mode but I messed it up in
b1f9e876 when moving code around.  Fix it.

Link: https://lkml.kernel.org/r/20221024193336.1233616-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20221024193336.1233616-2-peterx@redhat.com
Fixes: b1f9e876 ("mm/uffd: enable write protection for shmem & hugetlbfs")
Signed-off-by: NPeter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

67eae54b

28 10月, 2022 1 次提交

net/mlx5: Fix possible use-after-free in async command interface · bacd22df

由 Tariq Toukan 提交于 10月 26, 2022

mlx5_cmd_cleanup_async_ctx should return only after all its callback
handlers were completed. Before this patch, the below race between
mlx5_cmd_cleanup_async_ctx and mlx5_cmd_exec_cb_handler was possible and
lead to a use-after-free:

1. mlx5_cmd_cleanup_async_ctx is called while num_inflight is 2 (i.e.
   elevated by 1, a single inflight callback).
2. mlx5_cmd_cleanup_async_ctx decreases num_inflight to 1.
3. mlx5_cmd_exec_cb_handler is called, decreases num_inflight to 0 and
   is about to call wake_up().
4. mlx5_cmd_cleanup_async_ctx calls wait_event, which returns
   immediately as the condition (num_inflight == 0) holds.
5. mlx5_cmd_cleanup_async_ctx returns.
6. The caller of mlx5_cmd_cleanup_async_ctx frees the mlx5_async_ctx
   object.
7. mlx5_cmd_exec_cb_handler goes on and calls wake_up() on the freed
   object.

Fix it by syncing using a completion object. Mark it completed when
num_inflight reaches 0.

Trace:

BUG: KASAN: use-after-free in do_raw_spin_lock+0x23d/0x270
Read of size 4 at addr ffff888139cd12f4 by task swapper/5/0

CPU: 5 PID: 0 Comm: swapper/5 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <IRQ>
 dump_stack_lvl+0x57/0x7d
 print_report.cold+0x2d5/0x684
 ? do_raw_spin_lock+0x23d/0x270
 kasan_report+0xb1/0x1a0
 ? do_raw_spin_lock+0x23d/0x270
 do_raw_spin_lock+0x23d/0x270
 ? rwlock_bug.part.0+0x90/0x90
 ? __delete_object+0xb8/0x100
 ? lock_downgrade+0x6e0/0x6e0
 _raw_spin_lock_irqsave+0x43/0x60
 ? __wake_up_common_lock+0xb9/0x140
 __wake_up_common_lock+0xb9/0x140
 ? __wake_up_common+0x650/0x650
 ? destroy_tis_callback+0x53/0x70 [mlx5_core]
 ? kasan_set_track+0x21/0x30
 ? destroy_tis_callback+0x53/0x70 [mlx5_core]
 ? kfree+0x1ba/0x520
 ? do_raw_spin_unlock+0x54/0x220
 mlx5_cmd_exec_cb_handler+0x136/0x1a0 [mlx5_core]
 ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core]
 ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core]
 mlx5_cmd_comp_handler+0x65a/0x12b0 [mlx5_core]
 ? dump_command+0xcc0/0xcc0 [mlx5_core]
 ? lockdep_hardirqs_on_prepare+0x400/0x400
 ? cmd_comp_notifier+0x7e/0xb0 [mlx5_core]
 cmd_comp_notifier+0x7e/0xb0 [mlx5_core]
 atomic_notifier_call_chain+0xd7/0x1d0
 mlx5_eq_async_int+0x3ce/0xa20 [mlx5_core]
 atomic_notifier_call_chain+0xd7/0x1d0
 ? irq_release+0x140/0x140 [mlx5_core]
 irq_int_handler+0x19/0x30 [mlx5_core]
 __handle_irq_event_percpu+0x1f2/0x620
 handle_irq_event+0xb2/0x1d0
 handle_edge_irq+0x21e/0xb00
 __common_interrupt+0x79/0x1a0
 common_interrupt+0x78/0xa0
 </IRQ>
 <TASK>
 asm_common_interrupt+0x22/0x40
RIP: 0010:default_idle+0x42/0x60
Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 eb 47 22 02 85 c0 7e 07 0f 00 2d e0 9f 48 00 fb f4 <c3> 48 c7 c7 80 08 7f 85 e8 d1 d3 3e fe eb de 66 66 2e 0f 1f 84 00
RSP: 0018:ffff888100dbfdf0 EFLAGS: 00000242
RAX: 0000000000000001 RBX: ffffffff84ecbd48 RCX: 1ffffffff0afe110
RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835cc9bc
RBP: 0000000000000005 R08: 0000000000000001 R09: ffff88881dec4ac3
R10: ffffed1103bd8958 R11: 0000017d0ca571c9 R12: 0000000000000005
R13: ffffffff84f024e0 R14: 0000000000000000 R15: dffffc0000000000
 ? default_idle_call+0xcc/0x450
 default_idle_call+0xec/0x450
 do_idle+0x394/0x450
 ? arch_cpu_idle_exit+0x40/0x40
 ? do_idle+0x17/0x450
 cpu_startup_entry+0x19/0x20
 start_secondary+0x221/0x2b0
 ? set_cpu_sibling_map+0x2070/0x2070
 secondary_startup_64_no_verify+0xcd/0xdb
 </TASK>

Allocated by task 49502:
 kasan_save_stack+0x1e/0x40
 __kasan_kmalloc+0x81/0xa0
 kvmalloc_node+0x48/0xe0
 mlx5e_bulk_async_init+0x35/0x110 [mlx5_core]
 mlx5e_tls_priv_tx_list_cleanup+0x84/0x3e0 [mlx5_core]
 mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core]
 mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core]
 mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
 mlx5e_suspend+0xdb/0x140 [mlx5_core]
 mlx5e_remove+0x89/0x190 [mlx5_core]
 auxiliary_bus_remove+0x52/0x70
 device_release_driver_internal+0x40f/0x650
 driver_detach+0xc1/0x180
 bus_remove_driver+0x125/0x2f0
 auxiliary_driver_unregister+0x16/0x50
 mlx5e_cleanup+0x26/0x30 [mlx5_core]
 cleanup+0xc/0x4e [mlx5_core]
 __x64_sys_delete_module+0x2b5/0x450
 do_syscall_64+0x3d/0x90
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

Freed by task 49502:
 kasan_save_stack+0x1e/0x40
 kasan_set_track+0x21/0x30
 kasan_set_free_info+0x20/0x30
 ____kasan_slab_free+0x11d/0x1b0
 kfree+0x1ba/0x520
 mlx5e_tls_priv_tx_list_cleanup+0x2e7/0x3e0 [mlx5_core]
 mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core]
 mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core]
 mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
 mlx5e_suspend+0xdb/0x140 [mlx5_core]
 mlx5e_remove+0x89/0x190 [mlx5_core]
 auxiliary_bus_remove+0x52/0x70
 device_release_driver_internal+0x40f/0x650
 driver_detach+0xc1/0x180
 bus_remove_driver+0x125/0x2f0
 auxiliary_driver_unregister+0x16/0x50
 mlx5e_cleanup+0x26/0x30 [mlx5_core]
 cleanup+0xc/0x4e [mlx5_core]
 __x64_sys_delete_module+0x2b5/0x450
 do_syscall_64+0x3d/0x90
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fixes: e355477e ("net/mlx5: Make mlx5_cmd_exec_cb() a safe API")
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-8-saeed@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

bacd22df

27 10月, 2022 3 次提交

blk-mq: don't add non-pt request with ->end_io to batch · 2d87d455

由 Ming Lei 提交于 10月 27, 2022

dm-rq implements ->end_io callback for request issued to underlying queue,
and it isn't passthrough request.

Commit ab3e1d3b ("block: allow end_io based requests in the completion
batch handling") doesn't clear rq->bio and rq->__data_len for request
with ->end_io in blk_mq_end_request_batch(), and this way is actually
dangerous, but so far it is only for nvme passthrough request.

dm-rq needs to clean up remained bios in case of partial completion,
and req->bio is required, then use-after-free is triggered, so the
underlying clone request can't be completed in blk_mq_end_request_batch.

Fix panic by not adding such request into batch list, and the issue
can be triggered simply by exposing nvme pci to dm-mpath simply.

Fixes: ab3e1d3b ("block: allow end_io based requests in the completion batch handling")
Cc: dm-devel@redhat.com
Cc: Mike Snitzer <snitzer@kernel.org>
Reported-by: NChanghui Zhong <czhong@redhat.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20221027085709.513175-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2d87d455

KVM: Initialize gfn_to_pfn_cache locks in dedicated helper · 52491a38

由 Michal Luczaj 提交于 10月 13, 2022

Move the gfn_to_pfn_cache lock initialization to another helper and
call the new helper during VM/vCPU creation.  There are race
conditions possible due to kvm_gfn_to_pfn_cache_init()'s
ability to re-initialize the cache's locks.

For example: a race between ioctl(KVM_XEN_HVM_EVTCHN_SEND) and
kvm_gfn_to_pfn_cache_init() leads to a corrupted shinfo gpc lock.

                (thread 1)                |           (thread 2)
                                          |
 kvm_xen_set_evtchn_fast                  |
  read_lock_irqsave(&gpc->lock, ...)      |
                                          | kvm_gfn_to_pfn_cache_init
                                          |  rwlock_init(&gpc->lock)
  read_unlock_irqrestore(&gpc->lock, ...) |

Rename "cache_init" and "cache_destroy" to activate+deactivate to
avoid implying that the cache really is destroyed/freed.

Note, there more races in the newly named kvm_gpc_activate() that will
be addressed separately.

Fixes: 982ed0de ("KVM: Reinstate gfn_to_pfn_cache with invalidation support")
Cc: stable@vger.kernel.org
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMichal Luczaj <mhal@rbox.co>
[sean: call out that this is a bug fix]
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20221013211234.1318131-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

52491a38

perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL · cb6c18b5

由 Ravi Bangoria 提交于 10月 01, 2022

PERF_MEM_LVLNUM_EXTN_MEM was introduced to cover CXL devices but it's
bit ambiguous name and also not generic enough to cover cxl.cache and
cxl.io devices. Rename it to PERF_MEM_LVLNUM_CXL to be more specific.
Signed-off-by: NRavi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/f6268268-b4e9-9ed6-0453-65792644d953@amd.com

cb6c18b5

26 10月, 2022 1 次提交

overflow: Fix kern-doc markup for functions · 31970608

由 Kees Cook 提交于 9月 26, 2022

Fix the kern-doc markings for several of the overflow helpers and move
their location into the core kernel API documentation, where it belongs
(it's not driver-specific).

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: linux-hardening@vger.kernel.org
Reviewed-by: NAkira Yokosawa <akiyks@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>

31970608

25 10月, 2022 4 次提交

media: videodev2.h: V4L2_DV_BT_BLANKING_HEIGHT should check 'interlaced' · 8da7f097

由 Hans Verkuil 提交于 10月 12, 2022

If it is a progressive (non-interlaced) format, then ignore the
interlaced timing values.
Signed-off-by: NHans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 7f68127f ([media] videodev2.h: defines to calculate blanking and frame sizes)
Signed-off-by: NMauro Carvalho Chehab <mchehab@kernel.org>

8da7f097

genetlink: piggy back on resv_op to default to a reject policy · 4fa86555

由 Jakub Kicinski 提交于 10月 21, 2022

To keep backward compatibility we used to leave attribute parsing
to the family if no policy is specified. This becomes tedious as
we move to more strict validation. Families must define reject all
policies if they don't want any attributes accepted.

Piggy back on the resv_start_op field as the switchover point.
AFAICT only ethtool has added new commands since the resv_start_op
was defined, and it has per-op policies so this should be a no-op.

Nonetheless the patch should still go into v6.1 for consistency.

Link: https://lore.kernel.org/all/20221019125745.3f2e7659@kernel.org/
Link: https://lore.kernel.org/r/20221021193532.1511293-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

4fa86555

drm/amd: Add IMU fw version to fw version queries · 68bc1473

由 David Francis 提交于 9月 16, 2022

IMU is a new firmware for GFX11.

There are four means by which firmware version can be queried
from the driver: device attributes, vf2pf, debugfs,
and the AMDGPU_INFO_FW_VERSION option in the amdgpu info ioctl.

Add IMU as an option for those four methods.

V2: Added debugfs
Reviewed-by: NLikun Gao <Likun.Gao@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NDavid Francis <David.Francis@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

68bc1473

net-memcg: avoid stalls when under memory pressure · 720ca52b

由 Jakub Kicinski 提交于 10月 21, 2022

As Shakeel explains the commit under Fixes had the unintended
side-effect of no longer pre-loading the cached memory allowance.
Even tho we previously dropped the first packet received when
over memory limit - the consecutive ones would get thru by using
the cache. The charging was happening in batches of 128kB, so
we'd let in 128kB (truesize) worth of packets per one drop.

After the change we no longer force charge, there will be no
cache filling side effects. This causes significant drops and
connection stalls for workloads which use a lot of page cache,
since we can't reclaim page cache under GFP_NOWAIT.

Some of the latency can be recovered by improving SACK reneg
handling but nowhere near enough to get back to the pre-5.15
performance (the application I'm experimenting with still
sees 5-10x worst latency).

Apply the suggested workaround of using GFP_ATOMIC. We will now
be more permissive than previously as we'll drop _no_ packets
in softirq when under pressure. But I can't think of any good
and simple way to address that within networking.

Link: https://lore.kernel.org/all/20221012163300.795e7b86@kernel.org/Suggested-by: NShakeel Butt <shakeelb@google.com>
Fixes: 4b1327be ("net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem()")
Acked-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NRoman Gushchin <roman.gushchin@linux.dev>
Link: https://lore.kernel.org/r/20221021160304.1362511-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

720ca52b

24 10月, 2022 2 次提交

efi: random: reduce seed size to 32 bytes · 161a438d

由 Ard Biesheuvel 提交于 10月 20, 2022

We no longer need at least 64 bytes of random seed to permit the early
crng init to complete. The RNG is now based on Blake2s, so reduce the
EFI seed size to the Blake2s hash size, which is sufficient for our
purposes.

While at it, drop the READ_ONCE(), which was supposed to prevent size
from being evaluated after seed was unmapped. However, this cannot
actually happen, so READ_ONCE() is unnecessary here.

Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
Reviewed-by: NJason A. Donenfeld <Jason@zx2c4.com>
Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>

161a438d

kernel/utsname_sysctl.c: Fix hostname polling · 52826d3b

由 Linus Torvalds 提交于 10月 23, 2022

Commit bfca3dd3 ("kernel/utsname_sysctl.c: print kernel arch") added
a new entry to the uts_kern_table[] array, but didn't update the
UTS_PROC_xyz enumerators of older entries, breaking anything that used
them.

Which is admittedly not many cases: it's really just the two uses of
uts_proc_notify() in kernel/sys.c. But apparently journald-systemd
actually uses this to detect hostname changes.
Reported-by: NTorsten Hilbrich <torsten.hilbrich@secunet.com>
Fixes: bfca3dd3 ("kernel/utsname_sysctl.c: print kernel arch")
Link: https://lore.kernel.org/lkml/0c2b92a6-0f25-9538-178f-eee3b06da23f@secunet.com/
Link: https://linux-regtracking.leemhuis.info/regzbot/regression/0c2b92a6-0f25-9538-178f-eee3b06da23f@secunet.com/
Cc: Petr Vorel <pvorel@suse.cz>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

52826d3b

22 10月, 2022 2 次提交

net: flag sockets supporting msghdr originated zerocopy · e993ffe3

由 Pavel Begunkov 提交于 10月 21, 2022

We need an efficient way in io_uring to check whether a socket supports
zerocopy with msghdr provided ubuf_info. Add a new flag into the struct
socket flags fields.

Cc: <stable@vger.kernel.org> # 6.0
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Acked-by: NJakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/3dafafab822b1c66308bb58a0ac738b1e3f53f74.1666346426.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

e993ffe3

kvm: Add support for arch compat vm ioctls · ed51862f

由 Alexander Graf 提交于 10月 17, 2022

We will introduce the first architecture specific compat vm ioctl in the
next patch. Add all necessary boilerplate to allow architectures to
override compat vm ioctls when necessary.
Signed-off-by: NAlexander Graf <graf@amazon.com>
Message-Id: <20221017184541.2658-2-graf@amazon.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ed51862f

21 10月, 2022 4 次提交

efi: efivars: Fix variable writes without query_variable_store() · 8a254d90

由 Ard Biesheuvel 提交于 10月 19, 2022

Commit bbc6d2c6 ("efi: vars: Switch to new wrapper layer")
refactored the efivars layer so that the 'business logic' related to
which UEFI variables affect the boot flow in which way could be moved
out of it, and into the efivarfs driver.

This inadvertently broke setting variables on firmware implementations
that lack the QueryVariableInfo() boot service, because we no longer
tolerate a EFI_UNSUPPORTED result from check_var_size() when calling
efivar_entry_set_get_size(), which now ends up calling check_var_size()
a second time inadvertently.

If QueryVariableInfo() is missing, we support writes of up to 64k -
let's move that logic into check_var_size(), and drop the redundant
call.

Cc: <stable@vger.kernel.org> # v6.0
Fixes: bbc6d2c6 ("efi: vars: Switch to new wrapper layer")
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

8a254d90

iommu: Add gfp parameter to iommu_alloc_resv_region · 0251d010

由 Lu Baolu 提交于 10月 19, 2022

Add gfp parameter to iommu_alloc_resv_region() for the callers to specify
the memory allocation behavior. Thus iommu_alloc_resv_region() could also
be available in critical contexts.
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Tested-by: NAlex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20220927053109.4053662-2-baolu.lu@linux.intel.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>

0251d010

ALSA: control: add snd_ctl_rename() · 966f015f

由 Maciej S. Szmigiero 提交于 10月 20, 2022

Add a snd_ctl_rename() function that takes care of updating the control
hash entries for callers that already have the relevant struct snd_kcontrol
at hand and hold the control write lock (or simply haven't registered the
card yet).

Fixes: c27e1efb ("ALSA: control: Use xarray for faster lookups")
Cc: stable@vger.kernel.org
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/r/4170b71117ea81357a4f7eb8410f7cde20836c70.1666296963.git.maciej.szmigiero@oracle.comSigned-off-by: NTakashi Iwai <tiwai@suse.de>

966f015f

bpf: Fix dispatcher patchable function entry to 5 bytes nop · dbe69b29

由 Jiri Olsa 提交于 10月 18, 2022

The patchable_function_entry(5) might output 5 single nop
instructions (depends on toolchain), which will clash with
bpf_arch_text_poke check for 5 bytes nop instruction.

Adding early init call for dispatcher that checks and change
the patchable entry into expected 5 nop instruction if needed.

There's no need to take text_mutex, because we are using it
in early init call which is called at pre-smp time.

Fixes: ceea991a ("bpf: Move bpf_dispatcher function out of ftrace locations")
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221018075934.574415-1-jolsa@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

dbe69b29

20 10月, 2022 4 次提交

ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph() · 883bbbff

由 Peter Zijlstra 提交于 10月 18, 2022

Different function signatures means they needs to be different
functions; otherwise CFI gets upset.

As triggered by the ftrace boot tests:

  [] CFI failure at ftrace_return_to_handler+0xac/0x16c (target: ftrace_stub+0x0/0x14; expected type: 0x0a5d5347)

Fixes: 3c516f89 ("x86: Add support for CONFIG_CFI_CLANG")
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Tested-by: NMark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/Y06dg4e1xF6JTdQq@hirez.programming.kicks-ass.net

883bbbff

drm/panfrost: replace endian-specific types with native ones · 72655fb9

由 Steven Price 提交于 10月 17, 2022

__le32 and __le64 types aren't portable and are not available on
FreeBSD (which uses the same uAPI).

Instead of attempting to always output little endian, just use native
endianness in the dumps. Tools can detect the endianness in use by
looking at the 'magic' field, but equally we don't expect big-endian to
be used with Mali (there are no known implementations out there).

Bug: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7252
Fixes: 730c2bf4 ("drm/panfrost: Add support for devcoredump")
Reviewed-by: NAlyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: NSteven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221017104602.142992-3-steven.price@arm.com

72655fb9

drm/panfrost: Remove type name from internal structs · 7228d9d7

由 Steven Price 提交于 10月 17, 2022

The two structs internal to struct panfrost_dump_object_header were
named, but sadly that is incompatible with C++, causing an error: "an
anonymous union may only have public non-static data members".

However nothing refers to struct pan_reg_hdr and struct pan_bomap_hdr
and there's no need to export these definitions, so lets drop them. This
fixes the C++ build error with the minimum change in userspace API.
Reported-by: NAdrián Larumbe <adrian.larumbe@collabora.com>
Fixes: 730c2bf4 ("drm/panfrost: Add support for devcoredump")
Reviewed-by: NAlyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: NSteven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221017104602.142992-2-steven.price@arm.com

7228d9d7

fscrypt: fix keyring memory leak on mount failure · ccd30a47

由 Eric Biggers 提交于 10月 11, 2022

Commit d7e7b9af ("fscrypt: stop using keyrings subsystem for
fscrypt_master_key") moved the keyring destruction from __put_super() to
generic_shutdown_super() so that the filesystem's block device(s) are
still available. Unfortunately, this causes a memory leak in the case
where a mount is attempted with the test_dummy_encryption mount option,
but the mount fails after the option has already been processed.

To fix this, attempt the keyring destruction in both places.

Reported-by: syzbot+104c2a89561289cec13e@syzkaller.appspotmail.com
Fixes: d7e7b9af ("fscrypt: stop using keyrings subsystem for fscrypt_master_key")
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NChristian Brauner (Microsoft) <brauner@kernel.org>
Link: https://lore.kernel.org/r/20221011213838.209879-1-ebiggers@kernel.org

ccd30a47

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功