提交 · 0cbdfd9b67756c729b8c727026ab03e5ef4e15e0 · openeuler / Kernel

03 11月, 2022 8 次提交

KVM: VMX: Enable Notify VM exit · 0cbdfd9b

由 Tao Xu 提交于 5月 24, 2022

mainline inclusion
from mainline-v6.0-rc1
commit 2f4073e0
category: feature
feature: Notify VM exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5PAJ5
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=2f4073e0

Intel-SIG: commit 2f4073e0 ("KVM: VMX: Enable Notify VM exit")

-------------------------------------

KVM: VMX: Enable Notify VM exit

There are cases that malicious virtual machines can cause CPU stuck (due
to event windows don't open up), e.g., infinite loop in microcode when
nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
IRQ) can be delivered. It leads the CPU to be unavailable to host or
other VMs.

VMM can enable notify VM exit that a VM exit generated if no event
window occurs in VM non-root mode for a specified amount of time (notify
window).

Feature enabling:
- The new vmcs field SECONDARY_EXEC_NOTIFY_VM_EXITING is introduced to
  enable this feature. VMM can set NOTIFY_WINDOW vmcs field to adjust
  the expected notify window.
- Add a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT so that user space
  can query and enable this feature in per-VM scope. The argument is a
  64bit value: bits 63:32 are used for notify window, and bits 31:0 are
  for flags. Current supported flags:
  - KVM_X86_NOTIFY_VMEXIT_ENABLED: enable the feature with the notify
    window provided.
  - KVM_X86_NOTIFY_VMEXIT_USER: exit to userspace once the exits happen.
- It's safe to even set notify window to zero since an internal hardware
  threshold is added to vmcs.notify_window.

VM exit handling:
- Introduce a vcpu state notify_window_exits to records the count of
  notify VM exits and expose it through the debugfs.
- Notify VM exit can happen incident to delivery of a vector event.
  Allow it in KVM.
- Exit to userspace unconditionally for handling when VM_CONTEXT_INVALID
  bit is set.

Nested handling
- Nested notify VM exits are not supported yet. Keep the same notify
  window control in vmcs02 as vmcs01, so that L1 can't escape the
  restriction of notify VM exits through launching L2 VM.

Notify VM exit is defined in latest Intel Architecture Instruction Set
Extensions Programming Reference, chapter 9.2.
Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NTao Xu <tao3.xu@intel.com>
Co-developed-by: NChenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220524135624.22988-5-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

0cbdfd9b

KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault · af5a4488

由 Chenyi Qiang 提交于 5月 24, 2022

mainline inclusion
from mainline-v6.0-rc1
commit ed235117
category: feature
feature: Notify VM exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5PAJ5
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=ed235117

Intel-SIG: commit ed235117 ("KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault")

-------------------------------------

KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault

For the triple fault sythesized by KVM, e.g. the RSM path or
nested_vmx_abort(), if KVM exits to userspace before the request is
serviced, userspace could migrate the VM and lose the triple fault.

Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault with a
new event KVM_VCPUEVENT_VALID_FAULT_FAULT so that userspace can save and
restore the triple fault event. This extension is guarded by a new KVM
capability KVM_CAP_TRIPLE_FAULT_EVENT.

Note that in the set_vcpu_events path, userspace is able to set/clear
the triple fault request through triple_fault.pending field.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220524135624.22988-2-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

af5a4488

KVM: VMX: Remove redundant handling of bus lock vmexit · 265cc29f

由 Hao Xiang 提交于 10月 15, 2021

mainline inclusion
from mainline-v5.15-rc7
commit d61863c6
category: feature
feature: KVM Bus Lock VM Exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=d61863c6

Intel-SIG: commit d61863c6 ("KVM: VMX: Remove redundant handling of bus lock vmexit")

-------------------------------------

KVM: VMX: Remove redundant handling of bus lock vmexit

Hardware may or may not set exit_reason.bus_lock_detected on BUS_LOCK
VM-Exits. Dealing with KVM_RUN_X86_BUS_LOCK in handle_bus_lock_vmexit
could be redundant when exit_reason.basic is EXIT_REASON_BUS_LOCK.

We can remove redundant handling of bus lock vmexit. Unconditionally Set
exit_reason.bus_lock_detected in handle_bus_lock_vmexit(), and deal with
KVM_RUN_X86_BUS_LOCK only in vmx_handle_exit().
Signed-off-by: NHao Xiang <hao.xiang@linux.alibaba.com>
Message-Id: <1634299161-30101-1-git-send-email-hao.xiang@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

265cc29f

KVM: nVMX: Fix nested bus lock VM exit · 78e6cda7

由 Chenyi Qiang 提交于 9月 14, 2021

mainline inclusion
from mainline-v5.15-rc4
commit 24a996ad
category: feature
feature: KVM Bus Lock VM Exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=24a996ad

Intel-SIG: commit 24a996ad ("KVM: nVMX: Fix nested bus lock VM exit")

-------------------------------------

KVM: nVMX: Fix nested bus lock VM exit

Nested bus lock VM exits are not supported yet. If L2 triggers bus lock
VM exit, it will be directed to L1 VMM, which would cause unexpected
behavior. Therefore, handle L2's bus lock VM exits in L0 directly.

Fixes: fe6b6bc8 ("KVM: VMX: Enable bus lock VM exit")
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210914095041.29764-1-chenyi.qiang@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

78e6cda7

KVM: VMX: Enable bus lock VM exit · 26bba696

由 Chenyi Qiang 提交于 11月 06, 2020

mainline inclusion
from mainline-v5.12-rc1
commit fe6b6bc8
category: feature
feature: KVM Bus Lock VM Exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=fe6b6bc8

Intel-SIG: commit fe6b6bc8 ("KVM: VMX: Enable bus lock VM exit")

-------------------------------------

KVM: VMX: Enable bus lock VM exit

Virtual Machine can exploit bus locks to degrade the performance of
system. Bus lock can be caused by split locked access to writeback(WB)
memory or by using locks on uncacheable(UC) memory. The bus lock is
typically >1000 cycles slower than an atomic operation within a cache
line. It also disrupts performance on other cores (which must wait for
the bus lock to be released before their memory operations can
complete).

To address the threat, bus lock VM exit is introduced to notify the VMM
when a bus lock was acquired, allowing it to enforce throttling or other
policy based mitigations.

A VMM can enable VM exit due to bus locks by setting a new "Bus Lock
Detection" VM-execution control(bit 30 of Secondary Processor-based VM
execution controls). If delivery of this VM exit was preempted by a
higher priority VM exit (e.g. EPT misconfiguration, EPT violation, APIC
access VM exit, APIC write VM exit, exception bitmap exiting), bit 26 of
exit reason in vmcs field is set to 1.

In current implementation, the KVM exposes this capability through
KVM_CAP_X86_BUS_LOCK_EXIT. The user can get the supported mode bitmap
(i.e. off and exit) and enable it explicitly (disabled by default). If
bus locks in guest are detected by KVM, exit to user space even when
current exit reason is handled by KVM internally. Set a new field
KVM_RUN_BUS_LOCK in vcpu->run->flags to inform the user space that there
is a bus lock detected in guest.

Document for Bus Lock VM exit is now available at the latest "Intel
Architecture Instruction Set Extensions Programming Reference".

Document Link:
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.htmlCo-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20201106090315.18606-4-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

26bba696

KVM: X86: Reset the vcpu->run->flags at the beginning of vcpu_run · 580aa8e4

由 Chenyi Qiang 提交于 11月 06, 2020

mainline inclusion
from mainline-v5.12-rc1
commit 15aad3be
category: feature
feature: KVM Bus Lock VM Exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=15aad3be

Intel-SIG: commit 15aad3be ("KVM: X86: Reset the vcpu->run->flags at the beginning of vcpu_run")

-------------------------------------

KVM: X86: Reset the vcpu->run->flags at the beginning of vcpu_run

Reset the vcpu->run->flags at the beginning of kvm_arch_vcpu_ioctl_run.
It can avoid every thunk of code that needs to set the flag clear it,
which increases the odds of missing a case and ending up with a flag in
an undefined state.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20201106090315.18606-3-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

580aa8e4

KVM: Expose AVX_VNNI instruction to guset · b733e0a1

由 Yang Zhong 提交于 1月 05, 2021

mainline inclusion
from mainline-v5.12-rc1
commit 1085a6b5
category: feature
feature: SPR New Instructions Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5O6WB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=1085a6b5

Intel-SIG: commit 1085a6b5 ("KVM: Expose AVX_VNNI instruction to guset")

-------------------------------------

KVM: Expose AVX_VNNI instruction to guset

Expose AVX (VEX-encoded) versions of the Vector Neural Network
Instructions to guest.

The bit definition:
CPUID.(EAX=7,ECX=1):EAX[bit 4] AVX_VNNI

The following instructions are available when this feature is
present in the guest.
  1. VPDPBUS: Multiply and Add Unsigned and Signed Bytes
  2. VPDPBUSDS: Multiply and Add Unsigned and Signed Bytes with Saturation
  3. VPDPWSSD: Multiply and Add Signed Word Integers
  4. VPDPWSSDS: Multiply and Add Signed Integers with Saturation

This instruction is currently documented in the latest "extensions"
manual (ISE). It will appear in the "main" manual (SDM) in the future.
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Message-Id: <20210105004909.42000-3-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

b733e0a1

KVM: x86: Expose AVX512_FP16 for supported CPUID · f723ffab

由 Cathy Zhang 提交于 12月 07, 2020

mainline inclusion
from mainline-v5.11-rc1
commit 2224fc9e
category: feature
feature: SPR New Instructions Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5O6WB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=2224fc9e

Intel-SIG: commit 2224fc9e ("KVM: x86: Expose AVX512_FP16 for supported CPUID")

-------------------------------------

KVM: x86: Expose AVX512_FP16 for supported CPUID

AVX512_FP16 is supported by Intel processors, like Sapphire Rapids.
It could gain better performance for it's faster compared to FP32
if the precision or magnitude requirements are met. It's availability
is indicated by CPUID.(EAX=7,ECX=0):EDX[bit 23].

Expose it in KVM supported CPUID, then guest could make use of it; no
new registers are used, only new instructions.
Signed-off-by: NCathy Zhang <cathy.zhang@intel.com>
Signed-off-by: NKyung Min Park <kyung.min.park@intel.com>
Acked-by: NDave Hansen <dave.hansen@intel.com>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Message-Id: <20201208033441.28207-3-kyung.min.park@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

f723ffab

02 11月, 2022 1 次提交

KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op() · 3bad6ccc

由 Vitaly Kuznetsov 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.132
commit eb58fd350a851b5cda9f4c9a2cefb15c7ccf33f3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YS3T

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=eb58fd350a851b5cda9f4c9a2cefb15c7ccf33f3

--------------------------------

[ Upstream commit 8a414f94 ]

'vector' and 'trig_mode' fields of 'struct kvm_lapic_irq' are left
uninitialized in kvm_pv_kick_cpu_op(). While these fields are normally
not needed for APIC_DM_REMRD, they're still referenced by
__apic_accept_irq() for trace_kvm_apic_accept_irq(). Fully initialize
the structure to avoid consuming random stack memory.

Fixes: a183b638 ("KVM: x86: make apic_accept_irq tracepoint more generic")
Reported-by: syzbot+d6caa905917d353f0d07@syzkaller.appspotmail.com
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220708125147.593975-1-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

3bad6ccc

27 10月, 2022 2 次提交

KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak · 321358dd

由 Ashish Kalra 提交于 10月 26, 2022

stable inclusion
from stable-v5.10.124
commit 401bef1f95de92c3a8c6eece46e02fa88d7285ee
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6E7

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=401bef1f95de92c3a8c6eece46e02fa88d7285ee

--------------------------------

commit d22d2474 upstream.

For some sev ioctl interfaces, the length parameter that is passed maybe
less than or equal to SEV_FW_BLOB_MAX_SIZE, but larger than the data
that PSP firmware returns. In this case, kmalloc will allocate memory
that is the size of the input rather than the size of the data.
Since PSP firmware doesn't fully overwrite the allocated buffer, these
sev ioctl interface may return uninitialized kernel slab memory.
Reported-by: NAndy Nguyen <theflow@google.com>
Suggested-by: NDavid Rientjes <rientjes@google.com>
Suggested-by: NPeter Gonda <pgonda@google.com>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Fixes: eaf78265 ("KVM: SVM: Move SEV code to separate file")
Fixes: 2c07ded0 ("KVM: SVM: add support for SEV attestation command")
Fixes: 4cfdd47d ("KVM: SVM: Add KVM_SEV SEND_START command")
Fixes: d3d1af85 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
Fixes: eba04b20 ("KVM: x86: Account a variety of miscellaneous allocations")
Signed-off-by: NAshish Kalra <ashish.kalra@amd.com>
Reviewed-by: NPeter Gonda <pgonda@google.com>
Message-Id: <20220516154310.3685678-1-Ashish.Kalra@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[sudip: adjust context]
Signed-off-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

321358dd

KVM: x86: Account a variety of miscellaneous allocations · 3ed16a7d

由 Sean Christopherson 提交于 10月 26, 2022

stable inclusion
from stable-v5.10.124
commit d6be031a2f5e27f27f3648bac98d2a35874eaddc
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6E7

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d6be031a2f5e27f27f3648bac98d2a35874eaddc

--------------------------------

commit eba04b20 upstream.

Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are
clearly associated with a single task/VM.

Note, there are a several SEV allocations that aren't accounted, but
those can (hopefully) be fixed by using the local stack for memory.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210331023025.2485960-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[sudip: adjust context]
Signed-off-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

3ed16a7d

08 10月, 2022 8 次提交

KVM: VMX: enable IPI virtualization · c52cf141

由 Chao Gao 提交于 4月 19, 2022

mainline inclusion
from mainline-v6.0-rc1
commit d588bb9b
category: feature
feature: IPI Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d588bb9be1da6aa750aa64875fe57369db983d8b

Intel-SIG: commit d588bb9b ("KVM: VMX: enable IPI virtualization")

-------------------------------------

KVM: VMX: enable IPI virtualization

With IPI virtualization enabled, the processor emulates writes to
APIC registers that would send IPIs. The processor sets the bit
corresponding to the vector in target vCPU's PIR and may send a
notification (IPI) specified by NDST and NV fields in target vCPU's
Posted-Interrupt Descriptor (PID). It is similar to what IOMMU
engine does when dealing with posted interrupt from devices.

A PID-pointer table is used by the processor to locate the PID of a
vCPU with the vCPU's APIC ID. The table size depends on maximum APIC
ID assigned for current VM session from userspace. Allocating memory
for PID-pointer table is deferred to vCPU creation, because irqchip
mode and VM-scope maximum APIC ID is settled at that point. KVM can
skip PID-pointer table allocation if !irqchip_in_kernel().

Like VT-d PI, if a vCPU goes to blocked state, VMM needs to switch its
notification vector to wakeup vector. This can ensure that when an IPI
for blocked vCPUs arrives, VMM can get control and wake up blocked
vCPUs. And if a VCPU is preempted, its posted interrupt notification
is suppressed.

Note that IPI virtualization can only virualize physical-addressing,
flat mode, unicast IPIs. Sending other IPIs would still cause a
trap-like APIC-write VM-exit and need to be handled by VMM.
Signed-off-by: NChao Gao <chao.gao@intel.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Message-Id: <20220419154510.11938-1-guang.zeng@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

c52cf141

KVM: x86: Allow userspace to set maximum VCPU id for VM · 59461b8c

由 Zeng Guang 提交于 4月 19, 2022

mainline inclusion
from mainline-v6.0-rc1
commit 35875316
category: feature
feature: IPI Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=35875316384b71d23dc2a45a969732fc8cab16af

Intel-SIG: commit 35875316 ("KVM: x86: Allow userspace to set maximum VCPU id for VM")

-------------------------------------

KVM: x86: Allow userspace to set maximum VCPU id for VM

Introduce new max_vcpu_ids in KVM for x86 architecture. Userspace
can assign maximum possible vcpu id for current VM session using
KVM_CAP_MAX_VCPU_ID of KVM_ENABLE_CAP ioctl().

This is done for x86 only because the sole use case is to guide
memory allocation for PID-pointer table, a structure needed to
enable VMX IPI.

By default, max_vcpu_ids set as KVM_MAX_VCPU_ID.
Suggested-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Message-Id: <20220419154444.11888-1-guang.zeng@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

59461b8c

KVM: Move kvm_arch_vcpu_precreate() under kvm->lock · c8477db6

由 Zeng Guang 提交于 4月 19, 2022

mainline inclusion
from mainline-v6.0-rc1
commit 1d5e740d
category: feature
feature: IPI Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1d5e740d518e02cea46325b3d37135bf9c08982a

Intel-SIG: commit 1d5e740d ("KVM: Move kvm_arch_vcpu_precreate() under kvm->lock")

-------------------------------------

KVM: Move kvm_arch_vcpu_precreate() under kvm->lock

kvm_arch_vcpu_precreate() targets to handle arch specific VM resource
to be prepared prior to the actual creation of vCPU. For example, x86
platform may need do per-VM allocation based on max_vcpu_ids at the
first vCPU creation. It probably leads to concurrency control on this
allocation as multiple vCPU creation could happen simultaneously. From
the architectual point of view, it's necessary to execute
kvm_arch_vcpu_precreate() under protect of kvm->lock.

Currently only arm64, x86 and s390 have non-nop implementations at the
stage of vCPU pre-creation. Remove the lock acquiring in s390's design
and make sure all architecture can run kvm_arch_vcpu_precreate() safely
under kvm->lock without recrusive lock issue.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Message-Id: <20220419154409.11842-1-guang.zeng@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

c8477db6

KVM: VMX: Clean up vmx_refresh_apicv_exec_ctrl() · d178b8c3

由 Zeng Guang 提交于 4月 19, 2022

mainline inclusion
from mainline-v6.0-rc1
commit f08a06c9
category: feature
feature: IPI Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f08a06c9a35706349f74b7a18deefe3f89f73e8e

Intel-SIG: commit f08a06c9 ("KVM: VMX: Clean up vmx_refresh_apicv_exec_ctrl()")

-------------------------------------

KVM: VMX: Clean up vmx_refresh_apicv_exec_ctrl()

Remove the condition check cpu_has_secondary_exec_ctrls(). Calling
vmx_refresh_apicv_exec_ctrl() premises secondary controls activated
and VMCS fields related to APICv valid as well. If it's invoked in
wrong circumstance at the worst case, VMX operation will report
VMfailValid error without further harmful impact and just functions
as if all the secondary controls were 0.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153604.11786-1-guang.zeng@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

d178b8c3

KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode · 54838a4f

由 Zeng Guang 提交于 4月 19, 2022

mainline inclusion
from mainline-v6.0-rc1
commit 5413bcba
category: feature
feature: IPI Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5413bcba7ed57206178d60ee03dd5bb3a460e645

Intel-SIG: commit 5413bcba ("KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode")

-------------------------------------

KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode

Upcoming Intel CPUs will support virtual x2APIC MSR writes to the vICR,
i.e. will trap and generate an APIC-write VM-Exit instead of intercepting
the WRMSR.  Add support for handling "nodecode" x2APIC writes, which
were previously impossible.

Note, x2APIC MSR writes are 64 bits wide.
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153516.11739-1-guang.zeng@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

54838a4f

KVM: VMX: Report tertiary_exec_control field in dump_vmcs() · 88642907

由 Robert Hoo 提交于 4月 19, 2022

mainline inclusion
from mainline-v6.0-rc1
commit 0b85baa5
category: feature
feature: IPI Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0b85baa5f46de1c6ad6e4b987905df041f2f80f0

Intel-SIG: commit 0b85baa5 ("KVM: VMX: Report tertiary_exec_control field in dump_vmcs()")

-------------------------------------

KVM: VMX: Report tertiary_exec_control field in dump_vmcs()

Add tertiary_exec_control field report in dump_vmcs(). Meanwhile,
reorganize the dump output of VMCS category as follows.

Before change:
*** Control State ***
 PinBased=0x000000ff CPUBased=0xb5a26dfa SecondaryExec=0x061037eb
 EntryControls=0000d1ff ExitControls=002befff

After change:
*** Control State ***
 CPUBased=0xb5a26dfa SecondaryExec=0x061037eb TertiaryExec=0x0000000000000010
 PinBased=0x000000ff EntryControls=0000d1ff ExitControls=002befff
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NRobert Hoo <robert.hu@linux.intel.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153441.11687-1-guang.zeng@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

88642907

KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config · d2ad10f2

由 Robert Hoo 提交于 4月 19, 2022

mainline inclusion
from mainline-v6.0-rc1
commit 1ad4e543
category: feature
feature: IPI Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1ad4e5438c67a01620ed67cea959de89f4430515

Intel-SIG: commit 1ad4e543 ("KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config")

-------------------------------------

KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config

Check VMX features on tertiary execution control in VMCS config setup.
Sub-features in tertiary execution control to be enabled are adjusted
according to hardware capabilities although no sub-feature is enabled
in this patch.

EVMCSv1 doesn't support tertiary VM-execution control, so disable it
when EVMCSv1 is in use. And define the auxiliary functions for Tertiary
control field here, using the new BUILD_CONTROLS_SHADOW().
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NRobert Hoo <robert.hu@linux.intel.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153400.11642-1-guang.zeng@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

d2ad10f2

KVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit variation · 51a37c7e

由 Robert Hoo 提交于 4月 19, 2022

mainline inclusion
from mainline-v6.0-rc1
commit ed3905ba
category: feature
feature: IPI Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ed3905ba60384ab8c73b421c3618375e58080a9a

Intel-SIG: commit ed3905ba ("KVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit variation")

-------------------------------------

KVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit variation

The Tertiary VM-Exec Control, different from previous control fields, is 64
bit. So extend BUILD_CONTROLS_SHADOW() by adding a 'bit' parameter, to
support both 32 bit and 64 bit fields' auxiliary functions building.
Suggested-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NRobert Hoo <robert.hu@linux.intel.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153318.11595-1-guang.zeng@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

51a37c7e

29 9月, 2022 2 次提交

KVM: nVMX: Clear IDT vectoring on nested VM-Exit for double/triple fault · c32147cf

由 Sean Christopherson 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.121
commit e690350d3d9f248eb5fa7edb5371878473af1278
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e690350d3d9f248eb5fa7edb5371878473af1278

--------------------------------

[ Upstream commit 9bd1f0ef ]

Clear the IDT vectoring field in vmcs12 on next VM-Exit due to a double
or triple fault.  Per the SDM, a VM-Exit isn't considered to occur during
event delivery if the exit is due to an intercepted double fault or a
triple fault.  Opportunistically move the default clearing (no event
"pending") into the helper so that it's more obvious that KVM does indeed
handle this case.

Note, the double fault case is worded rather wierdly in the SDM:

  The original event results in a double-fault exception that causes the
  VM exit directly.

Temporarily ignoring injected events, double faults can _only_ occur if
an exception occurs while attempting to deliver a different exception,
i.e. there's _always_ an original event.  And for injected double fault,
while there's no original event, injected events are never subject to
interception.

Presumably the SDM is calling out that a the vectoring info will be valid
if a different exit occurs after a double fault, e.g. if a #PF occurs and
is intercepted while vectoring #DF, then the vectoring info will show the
double fault.  In other words, the clause can simply be read as:

  The VM exit is caused by a double-fault exception.

Fixes: 4704d0be ("KVM: nVMX: Exiting from L2 to L1")
Cc: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220407002315.78092-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

c32147cf

KVM: nVMX: Leave most VM-Exit info fields unmodified on failed VM-Entry · d8ae675d

由 Sean Christopherson 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.121
commit fd7dca68a69befd3de17d59eb604a9908a13657b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fd7dca68a69befd3de17d59eb604a9908a13657b

--------------------------------

[ Upstream commit c3634d25 ]

Don't modify vmcs12 exit fields except EXIT_REASON and EXIT_QUALIFICATION
when performing a nested VM-Exit due to failed VM-Entry.  Per the SDM,
only the two aformentioned fields are filled and "All other VM-exit
information fields are unmodified".

Fixes: 4704d0be ("KVM: nVMX: Exiting from L2 to L1")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220407002315.78092-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

d8ae675d

21 9月, 2022 3 次提交

KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog · 5898671f

由 Like Xu 提交于 9月 21, 2022

mainline inclusion
from mainline-v5.18
commit 75189d1d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5RD6Y
CVE: NA

-------------

NMI-watchdog is one of the favorite features of kernel developers,
but it does not work in AMD guest even with vPMU enabled and worse,
the system misrepresents this capability via /proc.

This is a PMC emulation error. KVM does not pass the latest valid
value to perf_event in time when guest NMI-watchdog is running, thus
the perf_event corresponding to the watchdog counter will enter the
old state at some point after the first guest NMI injection, forcing
the hardware register PMC0 to be constantly written to 0x800000000001.

Meanwhile, the running counter should accurately reflect its new value
based on the latest coordinated pmc->counter (from vPMC's point of view)
rather than the value written directly by the guest.

Fixes: 168d918f ("KVM: x86: Adjust counter sample period after a wrmsr")
Reported-by: NDongli Cao <caodongli@kingsoft.com>
Signed-off-by: NLike Xu <likexu@tencent.com>
Reviewed-by: NYanan Wang <wangyanan55@huawei.com>
Tested-by: NYanan Wang <wangyanan55@huawei.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <20220409015226.38619-1-likexu@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5898671f

KVM: x86/pmu: Introduce pmc->is_paused to reduce the call time of perf interfaces · e7815ebf

由 Like Xu 提交于 9月 21, 2022

mainline inclusion
from mainline-v5.14
commit e79f49c3
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5RD6Y
CVE: NA

-------------

Based on our observations, after any vm-exit associated with vPMU, there
are at least two or more perf interfaces to be called for guest counter
emulation, such as perf_event_{pause, read_value, period}(), and each one
will {lock, unlock} the same perf_event_ctx. The frequency of calls becomes
more severe when guest use counters in a multiplexed manner.

Holding a lock once and completing the KVM request operations in the perf
context would introduce a set of impractical new interfaces. So we can
further optimize the vPMU implementation by avoiding repeated calls to
these interfaces in the KVM context for at least one pattern:

After we call perf_event_pause() once, the event will be disabled and its
internal count will be reset to 0. So there is no need to pause it again
or read its value. Once the event is paused, event period will not be
updated until the next time it's resumed or reprogrammed. And there is
also no need to call perf_event_period twice for a non-running counter,
considering the perf_event for a running counter is never paused.

Based on this implementation, for the following common usage of
sampling 4 events using perf on a 4u8g guest:

  echo 0 > /proc/sys/kernel/watchdog
  echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent
  echo 10000 > /proc/sys/kernel/perf_event_max_sample_rate
  echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
  for i in `seq 1 1 10`
  do
  taskset -c 0 perf record \
  -e cpu-cycles -e instructions -e branch-instructions -e cache-misses \
  /root/br_instr a
  done

the average latency of the guest NMI handler is reduced from
37646.7 ns to 32929.3 ns (~1.14x speed up) on the Intel ICX server.
Also, in addition to collecting more samples, no loss of sampling
accuracy was observed compared to before the optimization.
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20210728120705.6855-1-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e7815ebf

x86/speculation: Add RSB VM Exit protections · 30b1e4cb

由 Daniel Sneddon 提交于 9月 21, 2022

stable inclusion
from stable-v5.10.136
commit 509c2c9fe75ea7493eebbb6bb2f711f37530ae19
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5N1SO
CVE: CVE-2022-26373

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=509c2c9fe75ea7493eebbb6bb2f711f37530ae19

--------------------------------

commit 2b129932 upstream.

tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as
documented for RET instructions after VM exits. Mitigate it with a new
one-entry RSB stuffing mechanism and a new LFENCE.

== Background ==

Indirect Branch Restricted Speculation (IBRS) was designed to help
mitigate Branch Target Injection and Speculative Store Bypass, i.e.
Spectre, attacks. IBRS prevents software run in less privileged modes
from affecting branch prediction in more privileged modes. IBRS requires
the MSR to be written on every privilege level change.

To overcome some of the performance issues of IBRS, Enhanced IBRS was
introduced. eIBRS is an "always on" IBRS, in other words, just turn
it on once instead of writing the MSR on every privilege level change.
When eIBRS is enabled, more privileged modes should be protected from
less privileged modes, including protecting VMMs from guests.

== Problem ==

Here's a simplification of how guests are run on Linux' KVM:

void run_kvm_guest(void)
{
// Prepare to run guest
VMRESUME();
// Clean up after guest runs
}

The execution flow for that would look something like this to the
processor:

1. Host-side: call run_kvm_guest()
2. Host-side: VMRESUME
3. Guest runs, does "CALL guest_function"
4. VM exit, host runs again
5. Host might make some "cleanup" function calls
6. Host-side: RET from run_kvm_guest()

Now, when back on the host, there are a couple of possible scenarios of
post-guest activity the host needs to do before executing host code:

* on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not
touched and Linux has to do a 32-entry stuffing.

* on eIBRS hardware, VM exit with IBRS enabled, or restoring the host
IBRS=1 shortly after VM exit, has a documented side effect of flushing
the RSB except in this PBRSB situation where the software needs to stuff
the last RSB entry "by hand".

IOW, with eIBRS supported, host RET instructions should no longer be
influenced by guest behavior after the host retires a single CALL
instruction.

However, if the RET instructions are "unbalanced" with CALLs after a VM
exit as is the RET in #6, it might speculatively use the address for the
instruction after the CALL in #3 as an RSB prediction. This is a problem
since the (untrusted) guest controls this address.

Balanced CALL/RET instruction pairs such as in step #5 are not affected.

== Solution ==

The PBRSB issue affects a wide variety of Intel processors which
support eIBRS. But not all of them need mitigation. Today,
X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates
PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e.,
eIBRS systems which enable legacy IBRS explicitly.

However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT
and most of them need a new mitigation.

Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE
which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT.

The lighter-weight mitigation performs a CALL instruction which is
immediately followed by a speculative execution barrier (INT3). This
steers speculative execution to the barrier -- just like a retpoline
-- which ensures that speculation can never reach an unbalanced RET.
Then, ensure this CALL is retired before continuing execution with an
LFENCE.

In other words, the window of exposure is opened at VM exit where RET
behavior is troublesome. While the window is open, force RSB predictions
sampling for RET targets to a dead end at the INT3. Close the window
with the LFENCE.

There is a subset of eIBRS systems which are not vulnerable to PBRSB.
Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB.
Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO.

[ bp: Massage, incorporate review comments from Andy Cooper. ]
Signed-off-by: NDaniel Sneddon <daniel.sneddon@linux.intel.com>
Co-developed-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

conflict:
arch/x86/include/asm/cpufeatures.h
Signed-off-by: NChen Jiahao <chenjiahao16@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Reviewed-by: NLiao Chang <liaochang1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

30b1e4cb

20 9月, 2022 16 次提交

kvm: fix objtool relocation warning · fe0dcc51

由 Linus Torvalds 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit 39065d54347fe1395371ad5397df353ef77c8989
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=39065d54347fe1395371ad5397df353ef77c8989

--------------------------------

commit 291073a5 upstream.

The recent change to make objtool aware of more symbol relocation types
(commit 24ff6525: "objtool: Teach get_alt_entry() about more
relocation types") also added another check, and resulted in this
objtool warning when building kvm on x86:

    arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception

The reason seems to be that kvm_fastop_exception() is marked as a global
symbol, which causes the relocation to ke kept around for objtool.  And
at the same time, the kvm_fastop_exception definition (which is done as
an inline asm statement) doesn't actually set the type of the global,
which then makes objtool unhappy.

The minimal fix is to just not mark kvm_fastop_exception as being a
global symbol.  It's only used in that one compilation unit anyway, so
it was always pointless.  That's how all the other local exception table
labels are done.

I'm not entirely happy about the kinds of games that the kvm code plays
with doing its own exception handling, and the fact that it confused
objtool is most definitely a symptom of the code being a bit too subtle
and ad-hoc.  But at least this trivial one-liner makes objtool no longer
upset about what is going on.

Fixes: 24ff6525 ("objtool: Teach get_alt_entry() about more relocation types")
Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/
Cc: Borislav Petkov <bp@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fe0dcc51

x86/kvm: fix FASTOP_SIZE when return thunks are enabled · ba1b205b

由 Thadeu Lima de Souza Cascardo 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit 8e31dfd6306e7c6bb3758a459c2a6067be807886
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8e31dfd6306e7c6bb3758a459c2a6067be807886

--------------------------------

commit 84e7051c upstream.

The return thunk call makes the fastop functions larger, just like IBT
does. Consider a 16-byte FASTOP_SIZE when CONFIG_RETHUNK is enabled.

Otherwise, functions will be incorrectly aligned and when computing their
position for differently sized operators, they will executed in the middle
or end of a function, which may as well be an int3, leading to a crash
like:

[   36.091116] int3: 0000 [#1] SMP NOPTI
[   36.091119] CPU: 3 PID: 1371 Comm: qemu-system-x86 Not tainted 5.15.0-41-generic #44
[   36.091120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[   36.091121] RIP: 0010:xaddw_ax_dx+0x9/0x10 [kvm]
[   36.091185] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3 cc cc
[   36.091186] RSP: 0018:ffffb1f541143c98 EFLAGS: 00000202
[   36.091188] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
[   36.091188] RDX: 0000000076543210 RSI: ffffffffc073c6d0 RDI: 0000000000000200
[   36.091189] RBP: ffffb1f541143ca0 R08: ffff9f1803350a70 R09: 0000000000000002
[   36.091190] R10: ffff9f1803350a70 R11: 0000000000000000 R12: ffff9f1803350a70
[   36.091190] R13: ffffffffc077fee0 R14: 0000000000000000 R15: 0000000000000000
[   36.091191] FS:  00007efdfce8d640(0000) GS:ffff9f187dd80000(0000) knlGS:0000000000000000
[   36.091192] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.091192] CR2: 0000000000000000 CR3: 0000000009b62002 CR4: 0000000000772ee0
[   36.091195] PKRU: 55555554
[   36.091195] Call Trace:
[   36.091197]  <TASK>
[   36.091198]  ? fastop+0x5a/0xa0 [kvm]
[   36.091222]  x86_emulate_insn+0x7b8/0xe90 [kvm]
[   36.091244]  x86_emulate_instruction+0x2f4/0x630 [kvm]
[   36.091263]  ? kvm_arch_vcpu_load+0x7c/0x230 [kvm]
[   36.091283]  ? vmx_prepare_switch_to_host+0xf7/0x190 [kvm_intel]
[   36.091290]  complete_emulated_mmio+0x297/0x320 [kvm]
[   36.091310]  kvm_arch_vcpu_ioctl_run+0x32f/0x550 [kvm]
[   36.091330]  kvm_vcpu_ioctl+0x29e/0x6d0 [kvm]
[   36.091344]  ? kvm_vcpu_ioctl+0x120/0x6d0 [kvm]
[   36.091357]  ? __fget_files+0x86/0xc0
[   36.091362]  ? __fget_files+0x86/0xc0
[   36.091363]  __x64_sys_ioctl+0x92/0xd0
[   36.091366]  do_syscall_64+0x59/0xc0
[   36.091369]  ? syscall_exit_to_user_mode+0x27/0x50
[   36.091370]  ? do_syscall_64+0x69/0xc0
[   36.091371]  ? syscall_exit_to_user_mode+0x27/0x50
[   36.091372]  ? __x64_sys_writev+0x1c/0x30
[   36.091374]  ? do_syscall_64+0x69/0xc0
[   36.091374]  ? exit_to_user_mode_prepare+0x37/0xb0
[   36.091378]  ? syscall_exit_to_user_mode+0x27/0x50
[   36.091379]  ? do_syscall_64+0x69/0xc0
[   36.091379]  ? do_syscall_64+0x69/0xc0
[   36.091380]  ? do_syscall_64+0x69/0xc0
[   36.091381]  ? do_syscall_64+0x69/0xc0
[   36.091381]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[   36.091384] RIP: 0033:0x7efdfe6d1aff
[   36.091390] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[   36.091391] RSP: 002b:00007efdfce8c460 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   36.091393] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007efdfe6d1aff
[   36.091393] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000c
[   36.091394] RBP: 0000558f1609e220 R08: 0000558f13fb8190 R09: 00000000ffffffff
[   36.091394] R10: 0000558f16b5e950 R11: 0000000000000246 R12: 0000000000000000
[   36.091394] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[   36.091396]  </TASK>
[   36.091397] Modules linked in: isofs nls_iso8859_1 kvm_intel joydev kvm input_leds serio_raw sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_devintf ipmi_msghandler drm msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel virtio_net net_failover crypto_simd ahci xhci_pci cryptd psmouse virtio_blk libahci xhci_pci_renesas failover
[   36.123271] ---[ end trace db3c0ab5a48fabcc ]---
[   36.123272] RIP: 0010:xaddw_ax_dx+0x9/0x10 [kvm]
[   36.123319] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3 cc cc
[   36.123320] RSP: 0018:ffffb1f541143c98 EFLAGS: 00000202
[   36.123321] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
[   36.123321] RDX: 0000000076543210 RSI: ffffffffc073c6d0 RDI: 0000000000000200
[   36.123322] RBP: ffffb1f541143ca0 R08: ffff9f1803350a70 R09: 0000000000000002
[   36.123322] R10: ffff9f1803350a70 R11: 0000000000000000 R12: ffff9f1803350a70
[   36.123323] R13: ffffffffc077fee0 R14: 0000000000000000 R15: 0000000000000000
[   36.123323] FS:  00007efdfce8d640(0000) GS:ffff9f187dd80000(0000) knlGS:0000000000000000
[   36.123324] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.123325] CR2: 0000000000000000 CR3: 0000000009b62002 CR4: 0000000000772ee0
[   36.123327] PKRU: 55555554
[   36.123328] Kernel panic - not syncing: Fatal exception in interrupt
[   36.123410] Kernel Offset: 0x1400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   36.135305] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fixes: aa3d4803 ("x86: Use return-thunk in asm code")
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Co-developed-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: NLinux Kernel Functional Testing <lkft@linaro.org>
Message-Id: <20220713171241.184026-1-cascardo@canonical.com>
Tested-by: NJack Wang <jinpu.wang@ionos.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>

[Backport] KVM: emulate: do not adjust size of fastop and setcc subroutines

stable inclusion
from stable-v5.10.133
commit 2ef1b06ceacfc6f96f6792126e107e75c0d7bd5d
category: bugfix
bugzilla: 187378

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2ef1b06ceacfc6f96f6792126e107e75c0d7bd5d

--------------------------------

commit 79629181 upstream.

Instead of doing complicated calculations to find the size of the subroutines
(which are even more complicated because they need to be stringified into
an asm statement), just hardcode to 16.

It is less dense for a few combinations of IBT/SLS/retbleed, but it has
the advantage of being really simple.

Cc: stable@vger.kernel.org # 5.15.x: 84e7051c: x86/kvm: fix FASTOP_SIZE when return thunks are enabled
Cc: stable@vger.kernel.org
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ba1b205b

x86/retbleed: Add fine grained Kconfig knobs · 0c60eeb4

由 Peter Zijlstra 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit b24fdd0f1c3328cf8ee0c518b93a7187f8cee097
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b24fdd0f1c3328cf8ee0c518b93a7187f8cee097

--------------------------------

commit f43b9876 upstream.

Do fine-grained Kconfig for all the various retbleed parts.

NOTE: if your compiler doesn't support return thunks this will
silently 'upgrade' your mitigation to IBPB, you might not like this.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
[cascardo: there is no CONFIG_OBJTOOL]
[cascardo: objtool calling and option parsing has changed]
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
[bwh: Backported to 5.10:
 - In scripts/Makefile.build, add the objtool option with an ifdef
   block, same as for other options
 - Adjust filename, context]
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

conflict:
	arch/x86/include/asm/disabled-features.h
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0c60eeb4

x86/speculation: Fill RSB on vmexit for IBRS · ab091d29

由 Josh Poimboeuf 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit 4d7f72b6e1bc630bec7e4cd51814bc2b092bf153
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4d7f72b6e1bc630bec7e4cd51814bc2b092bf153

--------------------------------

commit 9756bba2 upstream.

Prevent RSB underflow/poisoning attacks with RSB.  While at it, add a
bunch of comments to attempt to document the current state of tribal
knowledge about RSB attacks and what exactly is being mitigated.
Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ab091d29

KVM: VMX: Fix IBRS handling after vmexit · f52a3871

由 Josh Poimboeuf 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit 47ae76fb27398e867980d63789058ff7c4f12a35
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=47ae76fb27398e867980d63789058ff7c4f12a35

--------------------------------

commit bea7e31a upstream.

For legacy IBRS to work, the IBRS bit needs to be always re-written
after vmexit, even if it's already on.
Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f52a3871

KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS · c261aef7

由 Josh Poimboeuf 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit 5269be9111e2b66572e78647f2e8948f7fc96466
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5269be9111e2b66572e78647f2e8948f7fc96466

--------------------------------

commit fc02735b upstream.

On eIBRS systems, the returns in the vmexit return path from
__vmx_vcpu_run() to vmx_vcpu_run() are exposed to RSB poisoning attacks.

Fix that by moving the post-vmexit spec_ctrl handling to immediately
after the vmexit.
Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

conflict:
	arch/x86/kvm/vmx/vmx.h
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c261aef7

KVM: VMX: Convert launched argument to flags · decddf55

由 Josh Poimboeuf 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit 84061fff2ad98a7809f00e88a54f584f84830388
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=84061fff2ad98a7809f00e88a54f584f84830388

--------------------------------

commit bb066506 upstream.

Convert __vmx_vcpu_run()'s 'launched' argument to 'flags', in
preparation for doing SPEC_CTRL handling immediately after vmexit, which
will need another flag.

This is much easier than adding a fourth argument, because this code
supports both 32-bit and 64-bit, and the fourth argument on 32-bit would
have to be pushed on the stack.

Note that __vmx_vcpu_run_flags() is called outside of the noinstr
critical section because it will soon start calling potentially
traceable functions.
Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

conflict:
	arch/x86/kvm/vmx/vmx.h
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

decddf55

KVM: VMX: Flatten __vmx_vcpu_run() · 1db122e4

由 Josh Poimboeuf 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit 07401c2311f6fddd3c49a392eafc2c28a899f768
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=07401c2311f6fddd3c49a392eafc2c28a899f768

--------------------------------

commit 8bd200d2 upstream.

Move the vmx_vm{enter,exit}() functionality into __vmx_vcpu_run().  This
will make it easier to do the spec_ctrl handling before the first RET.
Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
[cascardo: remove ENDBR]
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1db122e4

x86: Add magic AMD return-thunk · c095c4e2

由 Peter Zijlstra 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit df748593c55389892902aecb8691080ad5e8cff5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=df748593c55389892902aecb8691080ad5e8cff5

--------------------------------

commit a149180f upstream.

Note: needs to be in a section distinct from Retpolines such that the
Retpoline RET substitution cannot possibly use immediate jumps.

ORC unwinding for zen_untrain_ret() and __x86_return_thunk() is a
little tricky but works due to the fact that zen_untrain_ret() doesn't
have any stack ops and as such will emit a single ORC entry at the
start (+0x3f).

Meanwhile, unwinding an IP, including the __x86_return_thunk() one
(+0x40) will search for the largest ORC entry smaller or equal to the
IP, these will find the one ORC entry (+0x3f) and all works.

  [ Alexandre: SVM part. ]
  [ bp: Build fix, massages. ]
Suggested-by: NAndrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NJosh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
[cascardo: conflicts at arch/x86/entry/entry_64_compat.S]
[cascardo: there is no ANNOTATE_NOENDBR]
[cascardo: objtool commit 34c861e8 missing]
[cascardo: conflict fixup]
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
[bwh: Backported to 5.10: SEV-ES is not supported, so drop the change
 in arch/x86/kvm/svm/vmenter.S]
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

conflict:
	arch/x86/include/asm/cpufeatures.h
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c095c4e2

x86/kvm: Fix SETcc emulation for return thunks · f050bf2b

由 Peter Zijlstra 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit ee4996f07d868ee6cc7e76151dfab9a2344cdeb0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ee4996f07d868ee6cc7e76151dfab9a2344cdeb0

--------------------------------

commit af2e140f upstream.

Prepare the SETcc fastop stuff for when RET can be larger still.

The tricky bit here is that the expressions should not only be
constant C expressions, but also absolute GAS expressions. This means
no ?: and 'true' is ~0.

Also ensure em_setcc() has the same alignment as the actual FOP_SETCC()
ops, this ensures there cannot be an alignment hole between em_setcc()
and the first op.

Additionally, add a .skip directive to the FOP_SETCC() macro to fill
any remaining space with INT3 traps; however the primary purpose of
this directive is to generate AS warnings when the remaining space
goes negative. Which is a very good indication the alignment magic
went side-ways.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NJosh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
[cascardo: ignore ENDBR when computing SETCC_LENGTH]
[cascardo: conflict fixup]
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f050bf2b

x86/kvm/vmx: Make noinstr clean · 55b31b25

由 Peter Zijlstra 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit 7070bbb66c5303117e4c7651711ea7daae4c64b5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7070bbb66c5303117e4c7651711ea7daae4c64b5

--------------------------------

commit 742ab6df upstream.

The recent mmio_stale_data fixes broke the noinstr constraints:

  vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x15b: call to wrmsrl.constprop.0() leaves .noinstr.text section
  vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x1bf: call to kvm_arch_has_assigned_device() leaves .noinstr.text section

make it all happy again.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

55b31b25

kvm/emulate: Fix SETcc emulation function offsets with SLS · 6e1388af

由 Borislav Petkov 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit bef21f88b47e399a76276ef1620fb816a0cc4e83
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bef21f88b47e399a76276ef1620fb816a0cc4e83

--------------------------------

commit fe83f5ea upstream.

The commit in Fixes started adding INT3 after RETs as a mitigation
against straight-line speculation.

The fastop SETcc implementation in kvm's insn emulator uses macro magic
to generate all possible SETcc functions and to jump to them when
emulating the respective instruction.

However, it hardcodes the size and alignment of those functions to 4: a
three-byte SETcc insn and a single-byte RET. BUT, with SLS, there's an
INT3 that gets slapped after the RET, which brings the whole scheme out
of alignment:

  15:   0f 90 c0                seto   %al
  18:   c3                      ret
  19:   cc                      int3
  1a:   0f 1f 00                nopl   (%rax)
  1d:   0f 91 c0                setno  %al
  20:   c3                      ret
  21:   cc                      int3
  22:   0f 1f 00                nopl   (%rax)
  25:   0f 92 c0                setb   %al
  28:   c3                      ret
  29:   cc                      int3

and this explodes like this:

  int3: 0000 [#1] PREEMPT SMP PTI
  CPU: 0 PID: 2435 Comm: qemu-system-x86 Not tainted 5.17.0-rc8-sls #1
  Hardware name: Dell Inc. Precision WorkStation T3400  /0TP412, BIOS A14 04/30/2012
  RIP: 0010:setc+0x5/0x8 [kvm]
  Code: 00 00 0f 1f 00 0f b6 05 43 24 06 00 c3 cc 0f 1f 80 00 00 00 00 0f 90 c0 c3 cc 0f \
	  1f 00 0f 91 c0 c3 cc 0f 1f 00 0f 92 c0 c3 cc <0f> 1f 00 0f 93 c0 c3 cc 0f 1f 00 \
	  0f 94 c0 c3 cc 0f 1f 00 0f 95 c0
  Call Trace:
   <TASK>
   ? x86_emulate_insn [kvm]
   ? x86_emulate_instruction [kvm]
   ? vmx_handle_exit [kvm_intel]
   ? kvm_arch_vcpu_ioctl_run [kvm]
   ? kvm_vcpu_ioctl [kvm]
   ? __x64_sys_ioctl
   ? do_syscall_64
   ? entry_SYSCALL_64_after_hwframe
   </TASK>

Raise the alignment value when SLS is enabled and use a macro for that
instead of hard-coding naked numbers.

Fixes: e463a09a ("x86: Add straight-line-speculation mitigation")
Reported-by: NJamie Heilman <jamie@audible.transient.net>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NJamie Heilman <jamie@audible.transient.net>
Link: https://lore.kernel.org/r/YjGzJwjrvxg5YZ0Z@audible.transient.net
[Add a comment and a bit of safety checking, since this is going to be changed
 again for IBT support. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6e1388af

x86: Prepare inline-asm for straight-line-speculation · e9e215d4

由 Peter Zijlstra 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit 277f4ddc36c578691678b8ae59b60d76ad15fa1b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=277f4ddc36c578691678b8ae59b60d76ad15fa1b

--------------------------------

commit b17c2baa upstream.

Replace all ret/retq instructions with ASM_RET in preparation of
making it more than a single instruction.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211204134907.964635458@infradead.orgSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 5.10: adjust context]
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e9e215d4

x86: Prepare asm files for straight-line-speculation · 086cecf5

由 Peter Zijlstra 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit 3c91e2257622c169bf74d587e48b96923285ed74
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3c91e2257622c169bf74d587e48b96923285ed74

--------------------------------

commit f94909ce upstream.

Replace all ret/retq instructions with RET in preparation of making
RET a macro. Since AS is case insensitive it's a big no-op without
RET defined.

  find arch/x86/ -name \*.S | while read file
  do
	sed -i 's/\<ret[q]*\>/RET/' $file
  done
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211204134907.905503893@infradead.org
[bwh: Backported to 5.10: ran the above command]
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

conflict:
	arch/x86/lib/copy_user_64.S
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

086cecf5

KVM/nVMX: Use __vmx_vcpu_run in nested_vmx_check_vmentry_hw · 45a8cb07

由 Uros Bizjak 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit dd87aa5f610be44f195cf5a99b7bc153faf30a3d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=dd87aa5f610be44f195cf5a99b7bc153faf30a3d

--------------------------------

commit 150f17bf upstream.

Replace inline assembly in nested_vmx_check_vmentry_hw
with a call to __vmx_vcpu_run.  The function is not
performance critical, so (double) GPR save/restore
in __vmx_vcpu_run can be tolerated, as far as performance
effects are concerned.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Reviewed-and-tested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
[sean: dropped versioning info from changelog]
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20201231002702.22237077-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

conflict:
	arch/x86/kvm/vmx/vmx.h
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

45a8cb07

KVM/VMX: Use TEST %REG,%REG instead of CMP $0,%REG in vmenter.S · 5b1374e2

由 Uros Bizjak 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit 0ca2ba6e4d139da809061a25626174f812303b7a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0ca2ba6e4d139da809061a25626174f812303b7a

--------------------------------

commit 6c44221b upstream.

Saves one byte in __vmx_vcpu_run for the same functionality.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
Message-Id: <20201029140457.126965-1-ubizjak@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5b1374e2

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功