提交 · 66558b730f2533cc2bf2b74d51f5f80b81e2bad0 · openeuler / Kernel

15 10月, 2021 3 次提交

sched: Add cluster scheduler level for x86 · 66558b73

由 Tim Chen 提交于 9月 24, 2021

There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce is
shared among a cluster of cores instead of being exclusive to one
single core.

To prevent oversubscription of L2 cache, load should be balanced
between such L2 clusters, especially for tasks with no shared data.
On benchmark such as SPECrate mcf test, this change provides a boost
to performance especially on medium load system on Jacobsville.  on a
Jacobsville that has 24 Atom cores, arranged into 6 clusters of 4
cores each, the benchmark number is as follow:

 Improvement over baseline kernel for mcf_r
 copies		run time	base rate
 1		-0.1%		-0.2%
 6		25.1%		25.1%
 12		18.8%		19.0%
 24		0.3%		0.3%

So this looks pretty good. In terms of the system's task distribution,
some pretty bad clumping can be seen for the vanilla kernel without
the L2 cluster domain for the 6 and 12 copies case. With the extra
domain for cluster, the load does get evened out between the clusters.

Note this patch isn't an universal win as spreading isn't necessarily
a win, particually for those workload who can benefit from packing.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NBarry Song <song.bao.hua@hisilicon.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210924085104.44806-4-21cnbao@gmail.com

66558b73

sched: Add wrapper for get_wchan() to keep task blocked · 42a20f86

由 Kees Cook 提交于 9月 29, 2021

Having a stable wchan means the process must be blocked and for it to
stay that way while performing stack unwinding.
Suggested-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm]
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org

42a20f86

x86: Fix get_wchan() to support the ORC unwinder · bc9bbb81

由 Qi Zheng 提交于 9月 29, 2021

Currently, the kernel CONFIG_UNWINDER_ORC option is enabled by default
on x86, but the implementation of get_wchan() is still based on the frame
pointer unwinder, so the /proc/<pid>/wchan usually returned 0 regardless
of whether the task <pid> is running.

Reimplement get_wchan() by calling stack_trace_save_tsk(), which is
adapted to the ORC and frame pointer unwinders.

Fixes: ee9f8fce ("x86/unwind: Add the ORC unwinder")
Signed-off-by: NQi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211008111626.271115116@infradead.org

bc9bbb81

05 10月, 2021 1 次提交

x86/sched: Decrease further the priorities of SMT siblings · 183b8ec3

由 Ricardo Neri 提交于 9月 10, 2021

When scheduling, it is better to prefer a separate physical core rather
than the SMT sibling of a high priority core. The existing formula to
compute priorities takes such fact in consideration. There may exist,
however, combinations of priorities (i.e., maximum frequencies) in which
the priority of high-numbered SMT siblings of high-priority cores collides
with the priority of low-numbered SMT siblings of low-priority cores.

Consider for instance an SMT2 system with CPUs [0, 1] with priority 60 and
[2, 3] with priority 30(CPUs in brackets are SMT siblings. In such a case,
the resulting priorities would be [120, 60], [60, 30]. Thus, to ensure
that CPU2 has higher priority than CPU1, divide the raw priority by the
squared SMT iterator. The resulting priorities are [120, 30]. [60, 15].
Originally-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210911011819.12184-2-ricardo.neri-calderon@linux.intel.com

183b8ec3

04 10月, 2021 1 次提交

kvm: fix objtool relocation warning · 291073a5

由 Linus Torvalds 提交于 10月 03, 2021

The recent change to make objtool aware of more symbol relocation types
(commit 24ff6525: "objtool: Teach get_alt_entry() about more
relocation types") also added another check, and resulted in this
objtool warning when building kvm on x86:

    arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception

The reason seems to be that kvm_fastop_exception() is marked as a global
symbol, which causes the relocation to ke kept around for objtool.  And
at the same time, the kvm_fastop_exception definition (which is done as
an inline asm statement) doesn't actually set the type of the global,
which then makes objtool unhappy.

The minimal fix is to just not mark kvm_fastop_exception as being a
global symbol.  It's only used in that one compilation unit anyway, so
it was always pointless.  That's how all the other local exception table
labels are done.

I'm not entirely happy about the kinds of games that the kvm code plays
with doing its own exception handling, and the fact that it confused
objtool is most definitely a symptom of the code being a bit too subtle
and ad-hoc.  But at least this trivial one-liner makes objtool no longer
upset about what is going on.

Fixes: 24ff6525 ("objtool: Teach get_alt_entry() about more relocation types")
Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/
Cc: Borislav Petkov <bp@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

291073a5

01 10月, 2021 2 次提交

perf/x86/intel: Update event constraints for ICX · ecc2123e

由 Kan Liang 提交于 9月 28, 2021

According to the latest event list, the event encoding 0xEF is only
available on the first 4 counters. Add it into the event constraints
table.

Fixes: 60176089 ("perf/x86/intel: Add Icelake support")
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1632842343-25862-1-git-send-email-kan.liang@linux.intel.com

ecc2123e

perf/x86: Reset destroy callback on event init failure · 02d029a4

由 Anand K Mistry 提交于 9月 29, 2021

perf_init_event tries multiple init callbacks and does not reset the
event state between tries. When x86_pmu_event_init runs, it
unconditionally sets the destroy callback to hw_perf_event_destroy. On
the next init attempt after x86_pmu_event_init, in perf_try_init_event,
if the pmu's capabilities includes PERF_PMU_CAP_NO_EXCLUDE, the destroy
callback will be run. However, if the next init didn't set the destroy
callback, hw_perf_event_destroy will be run (since the callback wasn't
reset).

Looking at other pmu init functions, the common pattern is to only set
the destroy callback on a successful init. Resetting the callback on
failure tries to replicate that pattern.

This was discovered after commit f11dd0d8 ("perf/x86/amd/ibs: Extend
PERF_PMU_CAP_NO_EXCLUDE to IBS Op") when the second (and only second)
run of the perf tool after a reboot results in 0 samples being
generated. The extra run of hw_perf_event_destroy results in
active_events having an extra decrement on each perf run. The second run
has active_events == 0 and every subsequent run has active_events < 0.
When active_events == 0, the NMI handler will early-out and not record
any samples.
Signed-off-by: NAnand K Mistry <amistry@google.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210929170405.1.I078b98ee7727f9ae9d6df8262bad7e325e40faf0@changeid

02d029a4

30 9月, 2021 2 次提交

KVM: x86: Swap order of CPUID entry "index" vs. "significant flag" checks · e8a747d0

由 Sean Christopherson 提交于 9月 29, 2021

Check whether a CPUID entry's index is significant before checking for a
matching index to hack-a-fix an undefined behavior bug due to consuming
uninitialized data.  RESET/INIT emulation uses kvm_cpuid() to retrieve
CPUID.0x1, which does _not_ have a significant index, and fails to
initialize the dummy variable that doubles as EBX/ECX/EDX output _and_
ECX, a.k.a. index, input.

Practically speaking, it's _extremely_  unlikely any compiler will yield
code that causes problems, as the compiler would need to inline the
kvm_cpuid() call to detect the uninitialized data, and intentionally hose
the kernel, e.g. insert ud2, instead of simply ignoring the result of
the index comparison.

Although the sketchy "dummy" pattern was introduced in SVM by commit
66f7b72e ("KVM: x86: Make register state after reset conform to
specification"), it wasn't actually broken until commit 7ff6c035
("KVM: x86: Remove stateful CPUID handling") arbitrarily swapped the
order of operations such that "index" was checked before the significant
flag.

Avoid consuming uninitialized data by reverting to checking the flag
before the index purely so that the fix can be easily backported; the
offending RESET/INIT code has been refactored, moved, and consolidated
from vendor code to common x86 since the bug was introduced.  A future
patch will directly address the bad RESET/INIT behavior.

The undefined behavior was detected by syzbot + KernelMemorySanitizer.

  BUG: KMSAN: uninit-value in cpuid_entry2_find arch/x86/kvm/cpuid.c:68
  BUG: KMSAN: uninit-value in kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103
  BUG: KMSAN: uninit-value in kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183
   cpuid_entry2_find arch/x86/kvm/cpuid.c:68 [inline]
   kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103 [inline]
   kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183
   kvm_vcpu_reset+0x13fb/0x1c20 arch/x86/kvm/x86.c:10885
   kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923
   vcpu_enter_guest+0xfd2/0x6d80 arch/x86/kvm/x86.c:9534
   vcpu_run+0x7f5/0x18d0 arch/x86/kvm/x86.c:9788
   kvm_arch_vcpu_ioctl_run+0x245b/0x2d10 arch/x86/kvm/x86.c:10020

  Local variable ----dummy@kvm_vcpu_reset created at:
   kvm_vcpu_reset+0x1fb/0x1c20 arch/x86/kvm/x86.c:10812
   kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923

Reported-by: syzbot+f3985126b746b3d59c9d@syzkaller.appspotmail.com
Reported-by: NAlexander Potapenko <glider@google.com>
Fixes: 2a24be79 ("KVM: VMX: Set EDX at INIT with CPUID.0x1, Family-Model-Stepping")
Fixes: 7ff6c035 ("KVM: x86: Remove stateful CPUID handling")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <20210929222426.1855730-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e8a747d0

x86/kvmclock: Move this_cpu_pvti into kvmclock.h · ad9af930

由 Zelin Deng 提交于 9月 29, 2021

There're other modules might use hv_clock_per_cpu variable like ptp_kvm,
so move it into kvmclock.h and export the symbol to make it visiable to
other modules.
Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Message-Id: <1632892429-101194-2-git-send-email-zelin.deng@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ad9af930

28 9月, 2021 1 次提交

bpf, x86: Fix bpf mapping of atomic fetch implementation · ced18582

由 Johan Almbladh 提交于 9月 27, 2021

Fix the case where the dst register maps to %rax as otherwise this produces
an incorrect mapping with the implementation in 981f94c3 ("bpf: Add
bitwise atomic instructions") as %rax is clobbered given it's part of the
cmpxchg as operand.

The issue is similar to b29dd96b ("bpf, x86: Fix BPF_FETCH atomic and/or/
xor with r0 as src") just that the case of dst register was missed.

Before, dst=r0 (%rax) src=r2 (%rsi):

  [...]
  c5:   mov    %rax,%r10
  c8:   mov    0x0(%rax),%rax       <---+ (broken)
  cc:   mov    %rax,%r11                |
  cf:   and    %rsi,%r11                |
  d2:   lock cmpxchg %r11,0x0(%rax) <---+
  d8:   jne    0x00000000000000c8       |
  da:   mov    %rax,%rsi                |
  dd:   mov    %r10,%rax                |
  [...]                                 |
                                        |
After, dst=r0 (%rax) src=r2 (%rsi):     |
                                        |
  [...]                                 |
  da:	mov    %rax,%r10                |
  dd:	mov    0x0(%r10),%rax       <---+ (fixed)
  e1:	mov    %rax,%r11                |
  e4:	and    %rsi,%r11                |
  e7:	lock cmpxchg %r11,0x0(%r10) <---+
  ed:	jne    0x00000000000000dd
  ef:	mov    %rax,%rsi
  f2:	mov    %r10,%rax
  [...]

The remaining combinations were fine as-is though:

After, dst=r9 (%r15) src=r0 (%rax):

  [...]
  dc:	mov    %rax,%r10
  df:	mov    0x0(%r15),%rax
  e3:	mov    %rax,%r11
  e6:	and    %r10,%r11
  e9:	lock cmpxchg %r11,0x0(%r15)
  ef:	jne    0x00000000000000df      _
  f1:	mov    %rax,%r10                | (unneeded, but
  f4:	mov    %r10,%rax               _|  not a problem)
  [...]

After, dst=r9 (%r15) src=r4 (%rcx):

  [...]
  de:	mov    %rax,%r10
  e1:	mov    0x0(%r15),%rax
  e5:	mov    %rax,%r11
  e8:	and    %rcx,%r11
  eb:	lock cmpxchg %r11,0x0(%r15)
  f1:	jne    0x00000000000000e1
  f3:	mov    %rax,%rcx
  f6:	mov    %r10,%rax
  [...]

The case of dst == src register is rejected by the verifier and
therefore not supported, but x86 JIT also handles this case just
fine.

After, dst=r0 (%rax) src=r0 (%rax):

  [...]
  eb:	mov    %rax,%r10
  ee:	mov    0x0(%r10),%rax
  f2:	mov    %rax,%r11
  f5:	and    %r10,%r11
  f8:	lock cmpxchg %r11,0x0(%r10)
  fe:	jne    0x00000000000000ee
 100:	mov    %rax,%r10
 103:	mov    %r10,%rax
  [...]

Fixes: 981f94c3 ("bpf: Add bitwise atomic instructions")
Reported-by: NJohan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: NJohan Almbladh <johan.almbladh@anyfinetworks.com>
Co-developed-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NBrendan Jackman <jackmanb@google.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>

ced18582

27 9月, 2021 1 次提交

KVM: VMX: Fix a TSX_CTRL_CPUID_CLEAR field mask issue · 5c49d185

由 Zhenzhong Duan 提交于 9月 26, 2021

When updating the host's mask for its MSR_IA32_TSX_CTRL user return entry,
clear the mask in the found uret MSR instead of vmx->guest_uret_msrs[i].
Modifying guest_uret_msrs directly is completely broken as 'i' does not
point at the MSR_IA32_TSX_CTRL entry. In fact, it's guaranteed to be an
out-of-bounds accesses as is always set to kvm_nr_uret_msrs in a prior
loop. By sheer dumb luck, the fallout is limited to "only" failing to
preserve the host's TSX_CTRL_CPUID_CLEAR. The out-of-bounds access is
benign as it's guaranteed to clear a bit in a guest MSR value, which are
always zero at vCPU creation on both x86-64 and i386.

Cc: stable@vger.kernel.org
Fixes: 8ea8b8d6 ("KVM: VMX: Use common x86's uret MSR list as the one true list")
Signed-off-by: NZhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210926015545.281083-1-zhenzhong.duan@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5c49d185

24 9月, 2021 2 次提交

x86/insn, tools/x86: Fix undefined behavior due to potential unaligned accesses · 5ba1071f

由 Numfor Mbiziwo-Tiapo 提交于 9月 23, 2021

Don't perform unaligned loads in __get_next() and __peek_nbyte_next() as
these are forms of undefined behavior:

"A pointer to an object or incomplete type may be converted to a pointer
to a different object or incomplete type. If the resulting pointer
is not correctly aligned for the pointed-to type, the behavior is
undefined."

(from http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf)

These problems were identified using the undefined behavior sanitizer
(ubsan) with the tools version of the code and perf test.

 [ bp: Massage commit message. ]
Signed-off-by: NNumfor Mbiziwo-Tiapo <nums@google.com>
Signed-off-by: NIan Rogers <irogers@google.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20210923161843.751834-1-irogers@google.com

5ba1071f

crypto: x86/sm4 - Fix frame pointer stack corruption · 0e14ef38

由 Josh Poimboeuf 提交于 9月 21, 2021

sm4_aesni_avx_crypt8() sets up the frame pointer (which includes pushing
RBP) before doing a conditional sibling call to sm4_aesni_avx_crypt4(),
which sets up an additional frame pointer. Things will not go well when
sm4_aesni_avx_crypt4() pops only the innermost single frame pointer and
then tries to return to the outermost frame pointer.

Sibling calls need to occur with an empty stack frame. Do the
conditional sibling call *before* setting up the stack pointer.

This fixes the following warning:

arch/x86/crypto/sm4-aesni-avx-asm_64.o: warning: objtool: sm4_aesni_avx_crypt8()+0x8: sibling call from callable instruction with modified stack frame

Fixes: a7ee22ee ("crypto: x86/sm4 - add AES-NI/AVX/x86_64 implementation")
Reported-by: Nkernel test robot <lkp@intel.com>
Reported-by: NArnd Bergmann <arnd@kernel.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NTianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

0e14ef38

23 9月, 2021 6 次提交

KVM: X86: Synchronize the shadow pagetable before link it · 65855ed8

由 Lai Jiangshan 提交于 9月 18, 2021

If gpte is changed from non-present to present, the guest doesn't need
to flush tlb per SDM.  So the host must synchronze sp before
link it.  Otherwise the guest might use a wrong mapping.

For example: the guest first changes a level-1 pagetable, and then
links its parent to a new place where the original gpte is non-present.
Finally the guest can access the remapped area without flushing
the tlb.  The guest's behavior should be allowed per SDM, but the host
kvm mmu makes it wrong.

Fixes: 4731d4c7 ("KVM: MMU: out of sync shadow core")
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210918005636.3675-3-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

65855ed8

KVM: X86: Fix missed remote tlb flush in rmap_write_protect() · f8160295

由 Lai Jiangshan 提交于 9月 18, 2021

When kvm->tlbs_dirty > 0, some rmaps might have been deleted
without flushing tlb remotely after kvm_sync_page().  If @gfn
was writable before and it's rmaps was deleted in kvm_sync_page(),
and if the tlb entry is still in a remote running VCPU,  the @gfn
is not safely protected.

To fix the problem, kvm_sync_page() does the remote flush when
needed to avoid the problem.

Fixes: a4ee1ca4 ("KVM: MMU: delay flush all tlbs on sync_page path")
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210918005636.3675-2-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f8160295

KVM: x86: nSVM: don't copy virt_ext from vmcb12 · faf6b755

由 Maxim Levitsky 提交于 9月 14, 2021

These field correspond to features that we don't expose yet to L2

While currently there are no CVE worthy features in this field,
if AMD adds more features to this field, that could allow guest
escapes similar to CVE-2021-3653 and CVE-2021-3656.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210914154825.104886-6-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

faf6b755

KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround · d1cba6c9

由 Maxim Levitsky 提交于 9月 14, 2021

GP SVM errata workaround made the #GP handler always emulate
the SVM instructions.

However these instructions #GP in case the operand is not 4K aligned,
but the workaround code didn't check this and we ended up
emulating these instructions anyway.

This is only an emulation accuracy check bug as there is no harm for
KVM to read/write unaligned vmcb images.

Fixes: 82a11e9c ("KVM: SVM: Add emulation support for #GP triggered by SVM instructions")
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210914154825.104886-4-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d1cba6c9

KVM: x86: nSVM: restore int_vector in svm_clear_vintr · aee77e11

由 Maxim Levitsky 提交于 9月 14, 2021

In svm_clear_vintr we try to restore the virtual interrupt
injection that might be pending, but we fail to restore
the interrupt vector.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210914154825.104886-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

aee77e11

x86/asm: Fix SETZ size enqcmds() build failure · d81ff5fe

由 Kees Cook 提交于 9月 10, 2021

When building under GCC 4.9 and 5.5:

  arch/x86/include/asm/special_insns.h: Assembler messages:
  arch/x86/include/asm/special_insns.h:286: Error: operand size mismatch for `setz'

Change the type to "bool" for condition code arguments, as documented.

Fixes: 7f5933f8 ("x86/asm: Add an enqcmds() wrapper for the ENQCMDS instruction")
Co-developed-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210910223332.3224851-1-keescook@chromium.org

d81ff5fe

22 9月, 2021 21 次提交

kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[] · e1fc1553

由 Fares Mehanna 提交于 9月 15, 2021

Intel PMU MSRs is in msrs_to_save_all[], so add AMD PMU MSRs to have a
consistent behavior between Intel and AMD when using KVM_GET_MSRS,
KVM_SET_MSRS or KVM_GET_MSR_INDEX_LIST.

We have to add legacy and new MSRs to handle guests running without
X86_FEATURE_PERFCTR_CORE.
Signed-off-by: NFares Mehanna <faresx@amazon.de>
Message-Id: <20210915133951.22389-1-faresx@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e1fc1553

KVM: x86: nVMX: re-evaluate emulation_required on nested VM exit · dbab610a

由 Maxim Levitsky 提交于 9月 13, 2021

If L1 had invalid state on VM entry (can happen on SMM transactions
when we enter from real mode, straight to nested guest),

then after we load 'host' state from VMCS12, the state has to become
valid again, but since we load the segment registers with
__vmx_set_segment we weren't always updating emulation_required.

Update emulation_required explicitly at end of load_vmcs12_host_state.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210913140954.165665-8-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dbab610a

KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry · c8607e4a

由 Maxim Levitsky 提交于 9月 13, 2021

It is possible that when non root mode is entered via special entry
(!from_vmentry), that is from SMM or from loading the nested state,
the L2 state could be invalid in regard to non unrestricted guest mode,
but later it can become valid.

(for example when RSM emulation restores segment registers from SMRAM)

Thus delay the check to VM entry, where we will check this and fail.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210913140954.165665-7-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c8607e4a

KVM: x86: VMX: synthesize invalid VM exit when emulating invalid guest state · c42dec14

由 Maxim Levitsky 提交于 9月 13, 2021

Since no actual VM entry happened, the VM exit information is stale.
To avoid this, synthesize an invalid VM guest state VM exit.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210913140954.165665-6-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c42dec14

KVM: x86: nSVM: refactor svm_leave_smm and smm_enter_smm · 136a55c0

由 Maxim Levitsky 提交于 9月 22, 2021

Use return statements instead of nested if, and fix error
path to free all the maps that were allocated.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210913140954.165665-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

136a55c0

KVM: x86: SVM: call KVM_REQ_GET_NESTED_STATE_PAGES on exit from SMM mode · e85d3e7b

由 Maxim Levitsky 提交于 9月 13, 2021

Currently the KVM_REQ_GET_NESTED_STATE_PAGES on SVM only reloads PDPTRs,
and MSR bitmap, with former not really needed for SMM as SMM exit code
reloads them again from SMRAM'S CR3, and later happens to work
since MSR bitmap isn't modified while in SMM.

Still it is better to be consistient with VMX.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210913140954.165665-5-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e85d3e7b

KVM: x86: reset pdptrs_from_userspace when exiting smm · 37687c40

由 Maxim Levitsky 提交于 9月 13, 2021

When exiting SMM, pdpts are loaded again from the guest memory.

This fixes a theoretical bug, when exit from SMM triggers entry to the
nested guest which re-uses some of the migration
code which uses this flag as a workaround for a legacy userspace.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210913140954.165665-4-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

37687c40

KVM: x86: nSVM: restore the L1 host state prior to resuming nested guest on SMM exit · e2e6e449

由 Maxim Levitsky 提交于 9月 13, 2021

Otherwise guest entry code might see incorrect L1 state (e.g paging state).

Fixes: 37be407b ("KVM: nSVM: Fix L1 state corruption upon return from SMM")
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210913140954.165665-3-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e2e6e449

KVM: nVMX: Filter out all unsupported controls when eVMCS was activated · 8d68bad6

由 Vitaly Kuznetsov 提交于 9月 07, 2021

Windows Server 2022 with Hyper-V role enabled failed to boot on KVM when
enlightened VMCS is advertised. Debugging revealed there are two exposed
secondary controls it is not happy with: SECONDARY_EXEC_ENABLE_VMFUNC and
SECONDARY_EXEC_SHADOW_VMCS. These controls are known to be unsupported,
as there are no corresponding fields in eVMCSv1 (see the comment above
EVMCS1_UNSUPPORTED_2NDEXEC definition).

Previously, commit 31de3d25 ("x86/kvm/hyper-v: move VMX controls
sanitization out of nested_enable_evmcs()") introduced the required
filtering mechanism for VMX MSRs but for some reason put only known
to be problematic (and not full EVMCS1_UNSUPPORTED_* lists) controls
there.

Note, Windows Server 2022 seems to have gained some sanity check for VMX
MSRs: it doesn't even try to launch a guest when there's something it
doesn't like, nested_evmcs_check_controls() mechanism can't catch the
problem.

Let's be bold this time and instead of playing whack-a-mole just filter out
all unsupported controls from VMX MSRs.

Fixes: 31de3d25 ("x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210907163530.110066-1-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8d68bad6

KVM: x86: Fix stack-out-of-bounds memory access from ioapic_write_indirect() · 2f9b68f5

由 Vitaly Kuznetsov 提交于 8月 27, 2021

KASAN reports the following issue:

 BUG: KASAN: stack-out-of-bounds in kvm_make_vcpus_request_mask+0x174/0x440 [kvm]
 Read of size 8 at addr ffffc9001364f638 by task qemu-kvm/4798

 CPU: 0 PID: 4798 Comm: qemu-kvm Tainted: G               X --------- ---
 Hardware name: AMD Corporation DAYTONA_X/DAYTONA_X, BIOS RYM0081C 07/13/2020
 Call Trace:
  dump_stack+0xa5/0xe6
  print_address_description.constprop.0+0x18/0x130
  ? kvm_make_vcpus_request_mask+0x174/0x440 [kvm]
  __kasan_report.cold+0x7f/0x114
  ? kvm_make_vcpus_request_mask+0x174/0x440 [kvm]
  kasan_report+0x38/0x50
  kasan_check_range+0xf5/0x1d0
  kvm_make_vcpus_request_mask+0x174/0x440 [kvm]
  kvm_make_scan_ioapic_request_mask+0x84/0xc0 [kvm]
  ? kvm_arch_exit+0x110/0x110 [kvm]
  ? sched_clock+0x5/0x10
  ioapic_write_indirect+0x59f/0x9e0 [kvm]
  ? static_obj+0xc0/0xc0
  ? __lock_acquired+0x1d2/0x8c0
  ? kvm_ioapic_eoi_inject_work+0x120/0x120 [kvm]

The problem appears to be that 'vcpu_bitmap' is allocated as a single long
on stack and it should really be KVM_MAX_VCPUS long. We also seem to clear
the lower 16 bits of it with bitmap_zero() for no particular reason (my
guess would be that 'bitmap' and 'vcpu_bitmap' variables in
kvm_bitmap_or_dest_vcpus() caused the confusion: while the later is indeed
16-bit long, the later should accommodate all possible vCPUs).

Fixes: 7ee30bc1 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
Fixes: 9a2ae9f6 ("KVM: x86: Zero the IOAPIC scan request dest vCPUs bitmap")
Reported-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210827092516.1027264-7-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2f9b68f5

KVM: SEV: Allow some commands for mirror VM · 5b92b6ca

由 Peter Gonda 提交于 9月 21, 2021

A mirrored SEV-ES VM will need to call KVM_SEV_LAUNCH_UPDATE_VMSA to
setup its vCPUs and have them measured, and their VMSAs encrypted. Without
this change, it is impossible to have mirror VMs as part of SEV-ES VMs.

Also allow the guest status check and debugging commands since they do
not change any guest state.
Signed-off-by: NPeter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Nathan Tempelman <natet@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steve Rutherford <srutherford@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 54526d1f ("KVM: x86: Support KVM VMs sharing SEV context", 2021-04-21)
Message-Id: <20210921150345.2221634-3-pgonda@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5b92b6ca

KVM: SEV: Update svm_vm_copy_asid_from for SEV-ES · f43c887c

由 Peter Gonda 提交于 9月 21, 2021

For mirroring SEV-ES the mirror VM will need more then just the ASID.
The FD and the handle are required to all the mirror to call psp
commands. The mirror VM will need to call KVM_SEV_LAUNCH_UPDATE_VMSA to
setup its vCPUs' VMSAs for SEV-ES.
Signed-off-by: NPeter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Nathan Tempelman <natet@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steve Rutherford <srutherford@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 54526d1f ("KVM: x86: Support KVM VMs sharing SEV context", 2021-04-21)
Message-Id: <20210921150345.2221634-2-pgonda@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f43c887c

KVM: nVMX: Fix nested bus lock VM exit · 24a996ad

由 Chenyi Qiang 提交于 9月 14, 2021

Nested bus lock VM exits are not supported yet. If L2 triggers bus lock
VM exit, it will be directed to L1 VMM, which would cause unexpected
behavior. Therefore, handle L2's bus lock VM exits in L0 directly.

Fixes: fe6b6bc8 ("KVM: VMX: Enable bus lock VM exit")
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210914095041.29764-1-chenyi.qiang@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

24a996ad

KVM: x86: Identify vCPU0 by its vcpu_idx instead of its vCPUs array entry · 94c245a2

由 Sean Christopherson 提交于 9月 10, 2021

Use vcpu_idx to identify vCPU0 when updating HyperV's TSC page, which is
shared by all vCPUs and "owned" by vCPU0 (because vCPU0 is the only vCPU
that's guaranteed to exist).  Using kvm_get_vcpu() to find vCPU works,
but it's a rather odd and suboptimal method to check the index of a given
vCPU.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210910183220.2397812-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

94c245a2

KVM: x86: Query vcpu->vcpu_idx directly and drop its accessor · 4eeef242

由 Sean Christopherson 提交于 9月 10, 2021

Read vcpu->vcpu_idx directly instead of bouncing through the one-line
wrapper, kvm_vcpu_get_idx(), and drop the wrapper.  The wrapper is a
remnant of the original implementation and serves no purpose; remove it
before it gains more users.

Back when kvm_vcpu_get_idx() was added by commit 497d72d8 ("KVM: Add
kvm_vcpu_get_idx to get vcpu index in kvm->vcpus"), the implementation
was more than just a simple wrapper as vcpu->vcpu_idx did not exist and
retrieving the index meant walking over the vCPU array to find the given
vCPU.

When vcpu_idx was introduced by commit 8750e72a ("KVM: remember
position in kvm->vcpus array"), the helper was left behind, likely to
avoid extra thrash (but even then there were only two users, the original
arm usage having been removed at some point in the past).

No functional change intended.
Suggested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210910183220.2397812-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4eeef242

kvm: fix wrong exception emulation in check_rdtsc · e9337c84

由 Hou Wenlong 提交于 8月 18, 2021

According to Intel's SDM Vol2 and AMD's APM Vol3, when
CR4.TSD is set, use rdtsc/rdtscp instruction above privilege
level 0 should trigger a #GP.

Fixes: d7eb8203 ("KVM: SVM: Add intercept checks for remaining group7 instructions")
Signed-off-by: NHou Wenlong <houwenlong93@linux.alibaba.com>
Message-Id: <1297c0dd3f1bb47a6d089f850b629c7aa0247040.1629257115.git.houwenlong93@linux.alibaba.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e9337c84

KVM: SEV: Pin guest memory for write for RECEIVE_UPDATE_DATA · 50c03801

由 Sean Christopherson 提交于 9月 14, 2021

Require the target guest page to be writable when pinning memory for
RECEIVE_UPDATE_DATA.  Per the SEV API, the PSP writes to guest memory:

  The result is then encrypted with GCTX.VEK and written to the memory
  pointed to by GUEST_PADDR field.

Fixes: 15fb7de1 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210914210951.2994260-2-seanjc@google.com>
Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com>
Reviewed-by: NPeter Gonda <pgonda@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

50c03801

KVM: SVM: fix missing sev_decommission in sev_receive_start · f1815e0a

由 Mingwei Zhang 提交于 9月 12, 2021

DECOMMISSION the current SEV context if binding an ASID fails after
RECEIVE_START.  Per AMD's SEV API, RECEIVE_START generates a new guest
context and thus needs to be paired with DECOMMISSION:

     The RECEIVE_START command is the only command other than the LAUNCH_START
     command that generates a new guest context and guest handle.

The missing DECOMMISSION can result in subsequent SEV launch failures,
as the firmware leaks memory and might not able to allocate more SEV
guest contexts in the future.

Note, LAUNCH_START suffered the same bug, but was previously fixed by
commit 934002cd ("KVM: SVM: Call SEV Guest Decommission if ASID
binding fails").

Cc: Alper Gun <alpergun@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: David Rienjes <rientjes@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: John Allen <john.allen@amd.com>
Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vipin Sharma <vipinsh@google.com>
Cc: stable@vger.kernel.org
Reviewed-by: NMarc Orr <marcorr@google.com>
Acked-by: NBrijesh Singh <brijesh.singh@amd.com>
Fixes: af43cbbf ("KVM: SVM: Add support for KVM_SEV_RECEIVE_START command")
Signed-off-by: NMingwei Zhang <mizhang@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210912181815.3899316-1-mizhang@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f1815e0a

KVM: SEV: Acquire vcpu mutex when updating VMSA · bb18a677

由 Peter Gonda 提交于 9月 15, 2021

The update-VMSA ioctl touches data stored in struct kvm_vcpu, and
therefore should not be performed concurrently with any VCPU ioctl
that might cause KVM or the processor to use the same data.

Adds vcpu mutex guard to the VMSA updating code. Refactors out
__sev_launch_update_vmsa() function to deal with per vCPU parts
of sev_launch_update_vmsa().

Fixes: ad73109a ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
Signed-off-by: NPeter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <20210915171755.3773766-1-pgonda@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb18a677

KVM: nVMX: fix comments of handle_vmon() · ed7023a1

由 Yu Zhang 提交于 9月 09, 2021

"VMXON pointer" is saved in vmx->nested.vmxon_ptr since
commit 3573e22c ("KVM: nVMX: additional checks on
vmxon region"). Also, handle_vmptrld() & handle_vmclear()
now have logic to check the VMCS pointer against the VMXON
pointer.

So just remove the obsolete comments of handle_vmon().
Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
Message-Id: <20210908171731.18885-1-yu.c.zhang@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ed7023a1

KVM: x86: Handle SRCU initialization failure during page track init · eb7511bf

由 Haimin Zhang 提交于 9月 03, 2021

Check the return of init_srcu_struct(), which can fail due to OOM, when
initializing the page track mechanism.  Lack of checking leads to a NULL
pointer deref found by a modified syzkaller.
Reported-by: NTCS Robot <tcs_robot@tencent.com>
Signed-off-by: NHaimin Zhang <tcs_kernel@tencent.com>
Message-Id: <1630636626-12262-1-git-send-email-tcs_kernel@tencent.com>
[Move the call towards the beginning of kvm_arch_init_vm. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

eb7511bf

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功