提交 · f8c6f24f47972a607ace88fa75df692dcbb8c752 · openeuler / Kernel

07 11月, 2022 1 次提交

config: add HW_RANDOM_ZHAOXIN for Zhaoxin CPUs · e12535ad

由 LeoLiuoc 提交于 9月 22, 2022

zhaoxin inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5SMFS
CVE: NA

--------------------------------------------

Set CONFIG_HW_RANDOM_ZHAOXIN to 'm' by default in openeuler_configs
Signed-off-by: Nleoliuoc <leoliu-oc@zhaoxin.com>

e12535ad

04 11月, 2022 1 次提交

x86/cpufeatures: Add Zhaoxin feature bits · cf891721

由 LeoLiu-oc 提交于 8月 24, 2022

zhaoxin inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5NYQF
CVE: NA

--------------------------------------------

Add Zhaoxin feature bits on Zhaoxin CPUs.
Signed-off-by: NLeoLiu-oc <LeoLiu-oc@zhaoxin.com>

cf891721

03 11月, 2022 14 次提交

x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry · e930096f

由 Chen Zhongjin 提交于 11月 03, 2022

mainline inclusion
from mainline-v6.0-rc3
commit fc2e426b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q4RA
CVE: NA

--------------------------------

When meeting ftrace trampolines in ORC unwinding, unwinder uses address
of ftrace_{regs_}call address to find the ORC entry, which gets next frame at
sp+176.

If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be
sp+8 instead of 176. It makes unwinder skip correct frame and throw
warnings such as "wrong direction" or "can't access registers", etc,
depending on the content of the incorrect frame address.

By adding the base address ftrace_{regs_}caller with the offset
*ip - ops->trampoline*, we can get the correct address to find the ORC entry.

Also change "caller" to "tramp_addr" to make variable name conform to
its content.

[ mingo: Clarified the changelog a bit. ]

Fixes: 6be7fa3c ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines")
Signed-off-by: NChen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Reviewed-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220819084334.244016-1-chenzhongjin@huawei.comSigned-off-by: NChen Zhongjin <chenzhongjin@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e930096f

x86/bus_lock: Don't assume the init value of DEBUGCTLMSR.BUS_LOCK_DETECT to be zero · 4f38b7f4

由 Chenyi Qiang 提交于 8月 02, 2022

mainline inclusion
from mainline-v6.0-rc1
commit ffa6482e
category: feature
feature: KVM Bus Lock Debug Exception
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RHW7
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=ffa6482e

Intel-SIG: commit ffa6482e ("x86/bus_lock: Don't assume the init value of DEBUGCTLMSR.BUS_LOCK_DETECT to be zero")

-------------------------------------

x86/bus_lock: Don't assume the init value of DEBUGCTLMSR.BUS_LOCK_DETECT to be zero

It's possible that this kernel has been kexec'd from a kernel that
enabled bus lock detection, or (hypothetically) BIOS/firmware has set
DEBUGCTLMSR_BUS_LOCK_DETECT.

Disable bus lock detection explicitly if not wanted.

Fixes: ebb1064e ("x86/traps: Handle #DB for bus lock")
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20220802033206.21333-1-chenyi.qiang@intel.comSigned-off-by: NAichun Shi <aichun.shi@intel.com>

4f38b7f4

KVM: X86: Expose bus lock debug exception to guest · b9ddddea

由 Paolo Bonzini 提交于 5月 06, 2021

mainline inclusion
from mainline-v5.13-rc2
commit 76ea438b
category: feature
feature: KVM Bus Lock Debug Exception
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RHW7
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=76ea438b

Intel-SIG: commit 76ea438b ("KVM: X86: Expose bus lock debug exception to guest")

-------------------------------------

KVM: X86: Expose bus lock debug exception to guest

Bus lock debug exception is an ability to notify the kernel by an #DB
trap after the instruction acquires a bus lock and is executed when
CPL>0. This allows the kernel to enforce user application throttling or
mitigations.

Existence of bus lock debug exception is enumerated via
CPUID.(EAX=7,ECX=0).ECX[24]. Software can enable these exceptions by
setting bit 2 of the MSR_IA32_DEBUGCTL. Expose the CPUID to guest and
emulate the MSR handling when guest enables it.

Support for this feature was originally developed by Xiaoyao Li and
Chenyi Qiang, but code has since changed enough that this patch has
nothing in common with theirs, except for this commit message.
Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210202090433.13441-4-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

b9ddddea

KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit · f5e2ac5e

由 Chenyi Qiang 提交于 2月 02, 2021

mainline inclusion
from mainline-v5.13-rc2
commit e8ea85fb
category: feature
feature: KVM Bus Lock Debug Exception
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RHW7
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=e8ea85fb

Intel-SIG: commit e8ea85fb ("KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit")

-------------------------------------

KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit

Bus lock debug exception introduces a new bit DR6_BUS_LOCK (bit 11 of
DR6) to indicate that bus lock #DB exception is generated. The set/clear
of DR6_BUS_LOCK is similar to the DR6_RTM. The processor clears
DR6_BUS_LOCK when the exception is generated. For all other #DB, the
processor sets this bit to 1. Software #DB handler should set this bit
before returning to the interrupted task.

In VMM, to avoid breaking the CPUs without bus lock #DB exception
support, activate the DR6_BUS_LOCK conditionally in DR6_FIXED_1 bits.
When intercepting the #DB exception caused by bus locks, bit 11 of the
exit qualification is set to identify it. The VMM should emulate the
exception by clearing the bit 11 of the guest DR6.
Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210202090433.13441-3-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

f5e2ac5e

KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW · 232e5522

由 Chenyi Qiang 提交于 2月 02, 2021

mainline inclusion
from mainline-v5.12-rc1
commit 9a3ecd5e
category: feature
feature: KVM Bus Lock Debug Exception
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RHW7
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=9a3ecd5e

Intel-SIG: commit 9a3ecd5e ("KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW")

-------------------------------------

KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW

DR6_INIT contains the 1-reserved bits as well as the bit that is cleared
to 0 when the condition (e.g. RTM) happens. The value can be used to
initialize dr6 and also be the XOR mask between the #DB exit
qualification (or payload) and DR6.

Concerning that DR6_INIT is used as initial value only once, rename it
to DR6_ACTIVE_LOW and apply it in other places, which would make the
incoming changes for bus lock debug exception more simple.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210202090433.13441-2-chenyi.qiang@intel.com>
[Define DR6_FIXED_1 from DR6_ACTIVE_LOW and DR6_VOLATILE. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

232e5522

KVM: nSVM: set fixed bits by hand · fe26a9fc

由 Paolo Bonzini 提交于 11月 27, 2020

mainline inclusion
from mainline-v5.11-rc1
commit 8cce12b3
category: feature
feature: KVM bus lock debug exception
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RHW7
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=8cce12b3

Intel-SIG: commit 8cce12b3 ("KVM: nSVM: set fixed bits by hand")

-------------------------------------

KVM: nSVM: set fixed bits by hand

SVM generally ignores fixed-1 bits.  Set them manually so that we
do not end up by mistake without those bits set in struct kvm_vcpu;
it is part of userspace API that KVM always returns value with the
bits set.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

fe26a9fc

KVM: VMX: Enable Notify VM exit · 0cbdfd9b

由 Tao Xu 提交于 5月 24, 2022

mainline inclusion
from mainline-v6.0-rc1
commit 2f4073e0
category: feature
feature: Notify VM exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5PAJ5
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=2f4073e0

Intel-SIG: commit 2f4073e0 ("KVM: VMX: Enable Notify VM exit")

-------------------------------------

KVM: VMX: Enable Notify VM exit

There are cases that malicious virtual machines can cause CPU stuck (due
to event windows don't open up), e.g., infinite loop in microcode when
nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
IRQ) can be delivered. It leads the CPU to be unavailable to host or
other VMs.

VMM can enable notify VM exit that a VM exit generated if no event
window occurs in VM non-root mode for a specified amount of time (notify
window).

Feature enabling:
- The new vmcs field SECONDARY_EXEC_NOTIFY_VM_EXITING is introduced to
  enable this feature. VMM can set NOTIFY_WINDOW vmcs field to adjust
  the expected notify window.
- Add a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT so that user space
  can query and enable this feature in per-VM scope. The argument is a
  64bit value: bits 63:32 are used for notify window, and bits 31:0 are
  for flags. Current supported flags:
  - KVM_X86_NOTIFY_VMEXIT_ENABLED: enable the feature with the notify
    window provided.
  - KVM_X86_NOTIFY_VMEXIT_USER: exit to userspace once the exits happen.
- It's safe to even set notify window to zero since an internal hardware
  threshold is added to vmcs.notify_window.

VM exit handling:
- Introduce a vcpu state notify_window_exits to records the count of
  notify VM exits and expose it through the debugfs.
- Notify VM exit can happen incident to delivery of a vector event.
  Allow it in KVM.
- Exit to userspace unconditionally for handling when VM_CONTEXT_INVALID
  bit is set.

Nested handling
- Nested notify VM exits are not supported yet. Keep the same notify
  window control in vmcs02 as vmcs01, so that L1 can't escape the
  restriction of notify VM exits through launching L2 VM.

Notify VM exit is defined in latest Intel Architecture Instruction Set
Extensions Programming Reference, chapter 9.2.
Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NTao Xu <tao3.xu@intel.com>
Co-developed-by: NChenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220524135624.22988-5-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

0cbdfd9b

KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault · af5a4488

由 Chenyi Qiang 提交于 5月 24, 2022

mainline inclusion
from mainline-v6.0-rc1
commit ed235117
category: feature
feature: Notify VM exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5PAJ5
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=ed235117

Intel-SIG: commit ed235117 ("KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault")

-------------------------------------

KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault

For the triple fault sythesized by KVM, e.g. the RSM path or
nested_vmx_abort(), if KVM exits to userspace before the request is
serviced, userspace could migrate the VM and lose the triple fault.

Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault with a
new event KVM_VCPUEVENT_VALID_FAULT_FAULT so that userspace can save and
restore the triple fault event. This extension is guarded by a new KVM
capability KVM_CAP_TRIPLE_FAULT_EVENT.

Note that in the set_vcpu_events path, userspace is able to set/clear
the triple fault request through triple_fault.pending field.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220524135624.22988-2-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

af5a4488

KVM: VMX: Remove redundant handling of bus lock vmexit · 265cc29f

由 Hao Xiang 提交于 10月 15, 2021

mainline inclusion
from mainline-v5.15-rc7
commit d61863c6
category: feature
feature: KVM Bus Lock VM Exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=d61863c6

Intel-SIG: commit d61863c6 ("KVM: VMX: Remove redundant handling of bus lock vmexit")

-------------------------------------

KVM: VMX: Remove redundant handling of bus lock vmexit

Hardware may or may not set exit_reason.bus_lock_detected on BUS_LOCK
VM-Exits. Dealing with KVM_RUN_X86_BUS_LOCK in handle_bus_lock_vmexit
could be redundant when exit_reason.basic is EXIT_REASON_BUS_LOCK.

We can remove redundant handling of bus lock vmexit. Unconditionally Set
exit_reason.bus_lock_detected in handle_bus_lock_vmexit(), and deal with
KVM_RUN_X86_BUS_LOCK only in vmx_handle_exit().
Signed-off-by: NHao Xiang <hao.xiang@linux.alibaba.com>
Message-Id: <1634299161-30101-1-git-send-email-hao.xiang@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

265cc29f

KVM: nVMX: Fix nested bus lock VM exit · 78e6cda7

由 Chenyi Qiang 提交于 9月 14, 2021

mainline inclusion
from mainline-v5.15-rc4
commit 24a996ad
category: feature
feature: KVM Bus Lock VM Exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=24a996ad

Intel-SIG: commit 24a996ad ("KVM: nVMX: Fix nested bus lock VM exit")

-------------------------------------

KVM: nVMX: Fix nested bus lock VM exit

Nested bus lock VM exits are not supported yet. If L2 triggers bus lock
VM exit, it will be directed to L1 VMM, which would cause unexpected
behavior. Therefore, handle L2's bus lock VM exits in L0 directly.

Fixes: fe6b6bc8 ("KVM: VMX: Enable bus lock VM exit")
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210914095041.29764-1-chenyi.qiang@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

78e6cda7

KVM: VMX: Enable bus lock VM exit · 26bba696

由 Chenyi Qiang 提交于 11月 06, 2020

mainline inclusion
from mainline-v5.12-rc1
commit fe6b6bc8
category: feature
feature: KVM Bus Lock VM Exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=fe6b6bc8

Intel-SIG: commit fe6b6bc8 ("KVM: VMX: Enable bus lock VM exit")

-------------------------------------

KVM: VMX: Enable bus lock VM exit

Virtual Machine can exploit bus locks to degrade the performance of
system. Bus lock can be caused by split locked access to writeback(WB)
memory or by using locks on uncacheable(UC) memory. The bus lock is
typically >1000 cycles slower than an atomic operation within a cache
line. It also disrupts performance on other cores (which must wait for
the bus lock to be released before their memory operations can
complete).

To address the threat, bus lock VM exit is introduced to notify the VMM
when a bus lock was acquired, allowing it to enforce throttling or other
policy based mitigations.

A VMM can enable VM exit due to bus locks by setting a new "Bus Lock
Detection" VM-execution control(bit 30 of Secondary Processor-based VM
execution controls). If delivery of this VM exit was preempted by a
higher priority VM exit (e.g. EPT misconfiguration, EPT violation, APIC
access VM exit, APIC write VM exit, exception bitmap exiting), bit 26 of
exit reason in vmcs field is set to 1.

In current implementation, the KVM exposes this capability through
KVM_CAP_X86_BUS_LOCK_EXIT. The user can get the supported mode bitmap
(i.e. off and exit) and enable it explicitly (disabled by default). If
bus locks in guest are detected by KVM, exit to user space even when
current exit reason is handled by KVM internally. Set a new field
KVM_RUN_BUS_LOCK in vcpu->run->flags to inform the user space that there
is a bus lock detected in guest.

Document for Bus Lock VM exit is now available at the latest "Intel
Architecture Instruction Set Extensions Programming Reference".

Document Link:
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.htmlCo-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20201106090315.18606-4-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

26bba696

KVM: X86: Reset the vcpu->run->flags at the beginning of vcpu_run · 580aa8e4

由 Chenyi Qiang 提交于 11月 06, 2020

mainline inclusion
from mainline-v5.12-rc1
commit 15aad3be
category: feature
feature: KVM Bus Lock VM Exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=15aad3be

Intel-SIG: commit 15aad3be ("KVM: X86: Reset the vcpu->run->flags at the beginning of vcpu_run")

-------------------------------------

KVM: X86: Reset the vcpu->run->flags at the beginning of vcpu_run

Reset the vcpu->run->flags at the beginning of kvm_arch_vcpu_ioctl_run.
It can avoid every thunk of code that needs to set the flag clear it,
which increases the odds of missing a case and ending up with a flag in
an undefined state.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20201106090315.18606-3-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

580aa8e4

KVM: Expose AVX_VNNI instruction to guset · b733e0a1

由 Yang Zhong 提交于 1月 05, 2021

mainline inclusion
from mainline-v5.12-rc1
commit 1085a6b5
category: feature
feature: SPR New Instructions Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5O6WB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=1085a6b5

Intel-SIG: commit 1085a6b5 ("KVM: Expose AVX_VNNI instruction to guset")

-------------------------------------

KVM: Expose AVX_VNNI instruction to guset

Expose AVX (VEX-encoded) versions of the Vector Neural Network
Instructions to guest.

The bit definition:
CPUID.(EAX=7,ECX=1):EAX[bit 4] AVX_VNNI

The following instructions are available when this feature is
present in the guest.
  1. VPDPBUS: Multiply and Add Unsigned and Signed Bytes
  2. VPDPBUSDS: Multiply and Add Unsigned and Signed Bytes with Saturation
  3. VPDPWSSD: Multiply and Add Signed Word Integers
  4. VPDPWSSDS: Multiply and Add Signed Integers with Saturation

This instruction is currently documented in the latest "extensions"
manual (ISE). It will appear in the "main" manual (SDM) in the future.
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Message-Id: <20210105004909.42000-3-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

b733e0a1

KVM: x86: Expose AVX512_FP16 for supported CPUID · f723ffab

由 Cathy Zhang 提交于 12月 07, 2020

mainline inclusion
from mainline-v5.11-rc1
commit 2224fc9e
category: feature
feature: SPR New Instructions Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5O6WB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=2224fc9e

Intel-SIG: commit 2224fc9e ("KVM: x86: Expose AVX512_FP16 for supported CPUID")

-------------------------------------

KVM: x86: Expose AVX512_FP16 for supported CPUID

AVX512_FP16 is supported by Intel processors, like Sapphire Rapids.
It could gain better performance for it's faster compared to FP32
if the precision or magnitude requirements are met. It's availability
is indicated by CPUID.(EAX=7,ECX=0):EDX[bit 23].

Expose it in KVM supported CPUID, then guest could make use of it; no
new registers are used, only new instructions.
Signed-off-by: NCathy Zhang <cathy.zhang@intel.com>
Signed-off-by: NKyung Min Park <kyung.min.park@intel.com>
Acked-by: NDave Hansen <dave.hansen@intel.com>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Message-Id: <20201208033441.28207-3-kyung.min.park@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

f723ffab

02 11月, 2022 9 次提交

x86/tsc: use topology_max_packages() in tsc watchdog check · 2558e3b3

由 Feng Tang 提交于 10月 17, 2022

Commit b50db709 ("x86/tsc: Disable clocksource watchdog for TSC
on qualified platorms") was introduced to solve problem that
sometimes TSC clocksource is wrongly judged as unstable by watchdog
like 'jiffies', HPET, etc.

In it, the hardware socket number is a key factor for judging
whether to disable the watchdog for TSC, and 'nr_online_nodes' was
chosen as an estimation due to it is needed in early boot phase
before registering 'tsc-early' clocksource, where all none-boot
CPUs are not brought up yet.

In recent patch review, Dave Hansen pointed out there are many
cases that 'nr_online_nodes' could have issue, like:
* numa emulation (numa=fake=4 etc.)
* numa=off
* platforms with CPU+DRAM nodes, CPU-less HBM nodes, CPU-less
  persistent memory nodes.

Peter Zijlstra suggested to use logical package ids, but it is
only usable after smp_init() and all CPUs are initialized.

One solution is to skip the watchdog for 'tsc-early' clocksource,
and move the check after smp_init(), while before 'tsc'
clocksoure is registered, where topology_max_packages() could
be used as a much more accurate socket number.
Signed-off-by: NFeng Tang <feng.tang@intel.com>

2558e3b3

x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit · 428bc200

由 Jiri Slaby 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.133
commit ecc0d92a9f6cc3f74b67d2c9887d0c800018e661
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YVKO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ecc0d92a9f6cc3f74b67d2c9887d0c800018e661

--------------------------------

commit 3131ef39 upstream.

The build on x86_32 currently fails after commit

  9bb2ec60 (objtool: Update Retpoline validation)

with:

  arch/x86/kernel/../../x86/xen/xen-head.S:35: Error: no such instruction: `annotate_unret_safe'

ANNOTATE_UNRET_SAFE is defined in nospec-branch.h. And head_32.S is
missing this include. Fix this.

Fixes: 9bb2ec60 ("objtool: Update Retpoline validation")
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/63e23f80-033f-f64e-7522-2816debbc367@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

428bc200

x86/alternative: Add debug prints to apply_retpolines() · d11b3bef

由 Peter Zijlstra 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.133
commit 38a80a3ca2cb069dd5608703b015a206a672aae5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YVKO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=38a80a3ca2cb069dd5608703b015a206a672aae5

--------------------------------

commit d4b5a5c9 upstream.

Make sure we can see the text changes when booting with
'debug-alternative'.

Example output:

 [ ] SMP alternatives: retpoline at: __traceiter_initcall_level+0x1f/0x30 (ffffffff8100066f) len: 5 to: __x86_indirect_thunk_rax+0x0/0x20
 [ ] SMP alternatives: ffffffff82603e58: [2:5) optimized NOPs: ff d0 0f 1f 00
 [ ] SMP alternatives: ffffffff8100066f: orig: e8 cc 30 00 01
 [ ] SMP alternatives: ffffffff8100066f: repl: ff d0 0f 1f 00
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Tested-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.422273830@infradead.orgSigned-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

d11b3bef

x86/alternative: Try inline spectre_v2=retpoline,amd · 35148712

由 Peter Zijlstra 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.133
commit 3d13ee0d411a078ca1538d823c2c759b8b266fb1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YVKO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3d13ee0d411a078ca1538d823c2c759b8b266fb1

--------------------------------

commit bbe2df3f upstream.

Try and replace retpoline thunk calls with:

  LFENCE
  CALL    *%\reg

for spectre_v2=retpoline,amd.

Specifically, the sequence above is 5 bytes for the low 8 registers,
but 6 bytes for the high 8 registers. This means that unless the
compilers prefix stuff the call with higher registers this replacement
will fail.

Luckily GCC strongly favours RAX for the indirect calls and most (95%+
for defconfig-x86_64) will be converted. OTOH clang strongly favours
R11 and almost nothing gets converted.

Note: it will also generate a correct replacement for the Jcc.d32
case, except unless the compilers start to prefix stuff that, it'll
never fit. Specifically:

  Jncc.d8 1f
  LFENCE
  JMP     *%\reg
1:

is 7-8 bytes long, where the original instruction in unpadded form is
only 6 bytes.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Tested-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.359986601@infradead.org
[cascardo: RETPOLINE_AMD was renamed to RETPOLINE_LFENCE]
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

35148712

x86/alternative: Handle Jcc __x86_indirect_thunk_\reg · 1efa9bcc

由 Peter Zijlstra 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.133
commit b0e2dc950654162bc68cec530156251e7ad3f03a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YVKO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b0e2dc950654162bc68cec530156251e7ad3f03a

--------------------------------

commit 2f0cbb2a upstream.

Handle the rare cases where the compiler (clang) does an indirect
conditional tail-call using:

  Jcc __x86_indirect_thunk_\reg

For the !RETPOLINE case this can be rewritten to fit the original (6
byte) instruction like:

  Jncc.d8	1f
  JMP		*%\reg
  NOP
1:
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Tested-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.296470217@infradead.orgSigned-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

1efa9bcc

x86/insn-eval: Handle return values from the decoder · f5f8f3fc

由 Borislav Petkov 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.133
commit e6f8dc86a1c15b862486a61abcb54b88e8c177e3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YVKO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e6f8dc86a1c15b862486a61abcb54b88e8c177e3

--------------------------------

commit 6e8c83d2 upstream.

Now that the different instruction-inspecting functions return a value,
test that and return early from callers if error has been encountered.

While at it, do not call insn_get_modrm() when calling
insn_get_displacement() because latter will make sure to call
insn_get_modrm() if ModRM hasn't been parsed yet.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-6-bp@alien8.deSigned-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

f5f8f3fc

x86/pat: Fix x86_has_pat_wp() · 866f85a9

由 Juergen Gross 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.132
commit 06a5dc3911a3b29acefd53470bdeccb88deb155e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YS3T

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=06a5dc3911a3b29acefd53470bdeccb88deb155e

--------------------------------

commit 230ec83d upstream.

x86_has_pat_wp() is using a wrong test, as it relies on the normal
PAT configuration used by the kernel. In case the PAT MSR has been
setup by another entity (e.g. Xen hypervisor) it might return false
even if the PAT configuration is allowing WP mappings. This due to the
fact that when running as Xen PV guest the PAT MSR is setup by the
hypervisor and cannot be changed by the guest. This results in the WP
related entry to be at a different position when running as Xen PV
guest compared to the bare metal or fully virtualized case.

The correct way to test for WP support is:

1. Get the PTE protection bits needed to select WP mode by reading
   __cachemode2pte_tbl[_PAGE_CACHE_MODE_WP] (depending on the PAT MSR
   setting this might return protection bits for a stronger mode, e.g.
   UC-)
2. Translate those bits back into the real cache mode selected by those
   PTE bits by reading __pte2cachemode_tbl[__pte2cm_idx(prot)]
3. Test for the cache mode to be _PAGE_CACHE_MODE_WP

Fixes: f88a68fa ("x86/mm: Extend early_memremap() support with additional attrs")
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.14
Link: https://lore.kernel.org/r/20220503132207.17234-1-jgross@suse.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

866f85a9

KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op() · 3bad6ccc

由 Vitaly Kuznetsov 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.132
commit eb58fd350a851b5cda9f4c9a2cefb15c7ccf33f3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YS3T

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=eb58fd350a851b5cda9f4c9a2cefb15c7ccf33f3

--------------------------------

[ Upstream commit 8a414f94 ]

'vector' and 'trig_mode' fields of 'struct kvm_lapic_irq' are left
uninitialized in kvm_pv_kick_cpu_op(). While these fields are normally
not needed for APIC_DM_REMRD, they're still referenced by
__apic_accept_irq() for trace_kvm_apic_accept_irq(). Fully initialize
the structure to avoid consuming random stack memory.

Fixes: a183b638 ("KVM: x86: make apic_accept_irq tracepoint more generic")
Reported-by: syzbot+d6caa905917d353f0d07@syzkaller.appspotmail.com
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220708125147.593975-1-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

3bad6ccc

ima: force signature verification when CONFIG_KEXEC_SIG is configured · dca07861

由 Coiby Xu 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.132
commit eb360267e1e972475023d06546e18365a222698c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YS3T

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=eb360267e1e972475023d06546e18365a222698c

--------------------------------

[ Upstream commit af16df54 ]

Currently, an unsigned kernel could be kexec'ed when IMA arch specific
policy is configured unless lockdown is enabled. Enforce kernel
signature verification check in the kexec_file_load syscall when IMA
arch specific policy is configured.

Fixes: 99d5cadf ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE")
Reported-and-suggested-by: NMimi Zohar <zohar@linux.ibm.com>
Signed-off-by: NCoiby Xu <coxu@redhat.com>
Signed-off-by: NMimi Zohar <zohar@linux.ibm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

dca07861

27 10月, 2022 6 次提交

x86/ftrace: Use alternative RET encoding · 3a823508

由 Peter Zijlstra 提交于 10月 27, 2022

stable inclusion
from stable-v5.10.144
commit 35371fd68807f41a4072c01c166de5425a2a47e5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5WL0J
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=35371fd68807f41a4072c01c166de5425a2a47e5

--------------------------------

commit 1f001e9d upstream.

Use the return thunk in ftrace trampolines, if needed.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NJosh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
[cascardo: use memcpy(text_gen_insn) as there is no __text_gen_insn]
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3a823508

x86/ibt,ftrace: Make function-graph play nice · 53eea670

由 Peter Zijlstra 提交于 10月 27, 2022

stable inclusion
from stable-v5.10.144
commit 4586df06a02049f4315c25b947c6dde2627c0d18
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5WL0J
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=4586df06a02049f4315c25b947c6dde2627c0d18

--------------------------------

commit e52fc2cf upstream.

Return trampoline must not use indirect branch to return; while this
preserves the RSB, it is fundamentally incompatible with IBT. Instead
use a retpoline like ROP gadget that defeats IBT while not unbalancing
the RSB.

And since ftrace_stub is no longer a plain RET, don't use it to copy
from. Since RET is a trivial instruction, poke it directly.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.347296408@infradead.org
[cascardo: remove ENDBR]
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
[OP: adjusted context for 5.10-stable]
Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

53eea670

Revert "x86/ftrace: Use alternative RET encoding" · 40a38272

由 Thadeu Lima de Souza Cascardo 提交于 10月 27, 2022

stable inclusion
from stable-v5.10.144
commit 33015556a943d6cbb18c555925a54b8c0e46f521
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5WL0J
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=33015556a943d6cbb18c555925a54b8c0e46f521

--------------------------------

This reverts commit 00b136bb6254e0abf6aaafe62c4da5f6c4fea4cb.

This temporarily reverts the backport of upstream commit
1f001e9d. It was not correct to copy the
ftrace stub as it would contain a relative jump to the return thunk which
would not apply to the context where it was being copied to, leading to
ftrace support to be broken.
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

40a38272

bpf, x86: Fix tail call count offset calculation on bpf2bpf call · 7354137a

由 Jakub Sitnicki 提交于 10月 26, 2022

stable inclusion
from stable-v5.10.127
commit a51c199e4d2bbb8748c12e2ce846024dea012e57
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XDDK

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a51c199e4d2bbb8748c12e2ce846024dea012e57

--------------------------------

[ Upstream commit ff672c67 ]

On x86-64 the tail call count is passed from one BPF function to another
through %rax. Additionally, on function entry, the tail call count value
is stored on stack right after the BPF program stack, due to register
shortage.

The stored count is later loaded from stack either when performing a tail
call - to check if we have not reached the tail call limit - or before
calling another BPF function call in order to pass it via %rax.

In the latter case, we miscalculate the offset at which the tail call count
was stored on function entry. The JIT does not take into account that the
allocated BPF program stack is always a multiple of 8 on x86, while the
actual stack depth does not have to be.

This leads to a load from an offset that belongs to the BPF stack, as shown
in the example below:

SEC("tc")
int entry(struct __sk_buff *skb)
{
	/* Have data on stack which size is not a multiple of 8 */
	volatile char arr[1] = {};
	return subprog_tail(skb);
}

int entry(struct __sk_buff * skb):
   0: (b4) w2 = 0
   1: (73) *(u8 *)(r10 -1) = r2
   2: (85) call pc+1#bpf_prog_ce2f79bb5f3e06dd_F
   3: (95) exit

int entry(struct __sk_buff * skb):
   0xffffffffa0201788:  nop    DWORD PTR [rax+rax*1+0x0]
   0xffffffffa020178d:  xor    eax,eax
   0xffffffffa020178f:  push   rbp
   0xffffffffa0201790:  mov    rbp,rsp
   0xffffffffa0201793:  sub    rsp,0x8
   0xffffffffa020179a:  push   rax
   0xffffffffa020179b:  xor    esi,esi
   0xffffffffa020179d:  mov    BYTE PTR [rbp-0x1],sil
   0xffffffffa02017a1:  mov    rax,QWORD PTR [rbp-0x9]	!!! tail call count
   0xffffffffa02017a8:  call   0xffffffffa02017d8       !!! is at rbp-0x10
   0xffffffffa02017ad:  leave
   0xffffffffa02017ae:  ret

Fix it by rounding up the BPF stack depth to a multiple of 8, when
calculating the tail call count offset on stack.

Fixes: ebf7d1f5 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220616162037.535469-2-jakub@cloudflare.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

7354137a

KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak · 321358dd

由 Ashish Kalra 提交于 10月 26, 2022

stable inclusion
from stable-v5.10.124
commit 401bef1f95de92c3a8c6eece46e02fa88d7285ee
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6E7

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=401bef1f95de92c3a8c6eece46e02fa88d7285ee

--------------------------------

commit d22d2474 upstream.

For some sev ioctl interfaces, the length parameter that is passed maybe
less than or equal to SEV_FW_BLOB_MAX_SIZE, but larger than the data
that PSP firmware returns. In this case, kmalloc will allocate memory
that is the size of the input rather than the size of the data.
Since PSP firmware doesn't fully overwrite the allocated buffer, these
sev ioctl interface may return uninitialized kernel slab memory.
Reported-by: NAndy Nguyen <theflow@google.com>
Suggested-by: NDavid Rientjes <rientjes@google.com>
Suggested-by: NPeter Gonda <pgonda@google.com>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Fixes: eaf78265 ("KVM: SVM: Move SEV code to separate file")
Fixes: 2c07ded0 ("KVM: SVM: add support for SEV attestation command")
Fixes: 4cfdd47d ("KVM: SVM: Add KVM_SEV SEND_START command")
Fixes: d3d1af85 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
Fixes: eba04b20 ("KVM: x86: Account a variety of miscellaneous allocations")
Signed-off-by: NAshish Kalra <ashish.kalra@amd.com>
Reviewed-by: NPeter Gonda <pgonda@google.com>
Message-Id: <20220516154310.3685678-1-Ashish.Kalra@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[sudip: adjust context]
Signed-off-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

321358dd

KVM: x86: Account a variety of miscellaneous allocations · 3ed16a7d

由 Sean Christopherson 提交于 10月 26, 2022

stable inclusion
from stable-v5.10.124
commit d6be031a2f5e27f27f3648bac98d2a35874eaddc
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6E7

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d6be031a2f5e27f27f3648bac98d2a35874eaddc

--------------------------------

commit eba04b20 upstream.

Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are
clearly associated with a single task/VM.

Note, there are a several SEV allocations that aren't accounted, but
those can (hopefully) be fixed by using the local stack for memory.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210331023025.2485960-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[sudip: adjust context]
Signed-off-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

3ed16a7d

19 10月, 2022 2 次提交

add CONFIG_BLK_DEV_DUMPINFO and set it enabled in openeuler_defconfig · 6da049ba

由 Li Lingfeng 提交于 10月 19, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I53Q6M
CVE: NA

--------------------------------

openEuler need detect conflict of opening block device, so
enable it as default.
Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NChao Liu <liuchao173@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6da049ba

x86/cpu: Elide KCSAN for cpu_has() and friends · 9c403aa2

由 Peter Zijlstra 提交于 10月 18, 2022

stable inclusion
from stable-v5.10.122
commit 320acaf84a6469492f3355b75562f7472b91aaf2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5W6OE

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=320acaf84a6469492f3355b75562f7472b91aaf2

--------------------------------

[ Upstream commit a6a5eb26 ]

As x86 uses the <asm-generic/bitops/instrumented-*.h> headers, the
regular forms of all bitops are instrumented with explicit calls to
KASAN and KCSAN checks. As these are explicit calls, these are not
suppressed by the noinstr function attribute.

This can result in calls to those check functions in noinstr code, which
objtool warns about:

vmlinux.o: warning: objtool: enter_from_user_mode+0x24: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode+0x28: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode_prepare+0x24: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: irqentry_enter_from_user_mode+0x24: call to __kcsan_check_access() leaves .noinstr.text section

Prevent this by using the arch_*() bitops, which are the underlying
bitops without explciit instrumentation.

[null: Changelog]
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220502111216.290518605@infradead.orgSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

9c403aa2

13 10月, 2022 1 次提交

EDAC/i10nm: Add driver decoder for Ice Lake and Tremont CPUs · f4aa30cd

由 Youquan Song 提交于 9月 01, 2022

mainline inclusion
from mainline-v6.1-rc1
commit 2738c69a
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5V3IO
CVE: NA

Intel-SIG: commit 2738c69a EDAC/i10nm: Add driver decoder for Ice Lake and Tremont
 CPUs.
Backport to decode DDR error by MCA bank registers in replace of firmware.

--------------------------------

Current i10nm_edac only supports firmware decoder (ACPI DSM methods).
MCA bank registers of Ice Lake or Tremont CPUs contain the information
to decode DDR memory errors. To get better decoding performance, add
the driver decoder (decoding DDR memory errors via extracting error
information from MCA bank registers) for Ice Lake and Tremont CPUs.
Co-developed-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: NYouquan Song <youquan.song@intel.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20220901194310.115427-1-tony.luck@intel.com/Signed-off-by: NYouquan Song <youquan.song@intel.com>

f4aa30cd

10 10月, 2022 6 次提交

x86/cpu: fix kabi for cpuinfo_x86.vmx_capability · f0af26c8

由 Jason Zeng 提交于 10月 07, 2022

Intel inclusion
category: feature
feature: IPI Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC
CVE: N/A

-------------------------------------------------

The introduction of VMX tertiary features like IPI virtualization
causes the change of the size of struct cpu_info_x86. This patch
tries to put the tertiary features on a separate data structure.
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

f0af26c8

x86/sgx: Drop 'page_index' from sgx_backing · 68203083

由 Sean Christopherson 提交于 7月 08, 2022

mainline inclusion
from mainline-6.0-rc1
commit e0a5915f
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5USAM
CVE: NA

Intel-SIG: commit e0a5915f x86/sgx: Drop 'page_index' from
sgx_backing.
Backport for SGX EDMM support.

--------------------------------

Storing the 'page_index' value in the sgx_backing struct is
dead code and no longer needed.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NKristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20220708162124.8442-1-kristen@linux.intel.comSigned-off-by: NZhiquan Li <zhiquan1.li@intel.com>

68203083

x86/sgx: Set active memcg prior to shmem allocation · a3fd7294

由 Kristen Carlson Accardi 提交于 5月 20, 2022

mainline inclusion
from mainline-5.19-rc1
commit 0c9782e2
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5USAM
CVE: NA

Intel-SIG: commit 0c9782e2 x86/sgx: Set active memcg prior to shmem
allocation.
Backport for SGX EDMM support.

--------------------------------

When the system runs out of enclave memory, SGX can reclaim EPC pages
by swapping to normal RAM. These backing pages are allocated via a
per-enclave shared memory area. Since SGX allows unlimited over
commit on EPC memory, the reclaimer thread can allocate a large
number of backing RAM pages in response to EPC memory pressure.

When the shared memory backing RAM allocation occurs during
the reclaimer thread context, the shared memory is charged to
the root memory control group, and the shmem usage of the enclave
is not properly accounted for, making cgroups ineffective at
limiting the amount of RAM an enclave can consume.

For example, when using a cgroup to launch a set of test
enclaves, the kernel does not properly account for 50% - 75% of
shmem page allocations on average. In the worst case, when
nearly all allocations occur during the reclaimer thread, the
kernel accounts less than a percent of the amount of shmem used
by the enclave's cgroup to the correct cgroup.

SGX stores a list of mm_structs that are associated with
an enclave. Pick one of them during reclaim and charge that
mm's memcg with the shmem allocation. The one that gets picked
is arbitrary, but this list almost always only has one mm. The
cases where there is more than one mm with different memcg's
are not worth considering.

Create a new function - sgx_encl_alloc_backing(). This function
is used whenever a new backing storage page needs to be
allocated. Previously the same function was used for page
allocation as well as retrieving a previously allocated page.
Prior to backing page allocation, if there is a mm_struct associated
with the enclave that is requesting the allocation, it is set
as the active memory control group.

[ dhansen: - fix merge conflict with ELDU fixes
           - check against actual ksgxd_tsk, not ->mm ]

Cc: stable@vger.kernel.org
Signed-off-by: NKristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NRoman Gushchin <roman.gushchin@linux.dev>
Link: https://lkml.kernel.org/r/20220520174248.4918-1-kristen@linux.intel.comSigned-off-by: NZhiquan Li <zhiquan1.li@intel.com>

a3fd7294

x86/sgx: Free up EPC pages directly to support large page ranges · 1124e6cb

由 Reinette Chatre 提交于 5月 10, 2022

mainline inclusion
from mainline-6.0-rc1
commit a0506b3b
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5USAM
CVE: NA

Intel-SIG: commit a0506b3b x86/sgx: Free up EPC pages directly to
support large page ranges.
Backport for SGX EDMM support.

--------------------------------

The page reclaimer ensures availability of EPC pages across all
enclaves. In support of this it runs independently from the
individual enclaves in order to take locks from the different
enclaves as it writes pages to swap.

When needing to load a page from swap an EPC page needs to be
available for its contents to be loaded into. Loading an existing
enclave page from swap does not reclaim EPC pages directly if
none are available, instead the reclaimer is woken when the
available EPC pages are found to be below a watermark.

When iterating over a large number of pages in an oversubscribed
environment there is a race between the reclaimer woken up and
EPC pages reclaimed fast enough for the page operations to proceed.

Ensure there are EPC pages available before attempting to load
a page that may potentially be pulled from swap into an available
EPC page.
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Acked-by: NJarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/a0d8f037c4a075d56bf79f432438412985f7ff7a.1652137848.git.reinette.chatre@intel.comSigned-off-by: NZhiquan Li <zhiquan1.li@intel.com>

1124e6cb

x86/sgx: Support complete page removal · 4ac4e936

由 Reinette Chatre 提交于 5月 10, 2022

mainline inclusion
from mainline-6.0-rc1
commit 9849bb27
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5USAM
CVE: NA

Intel-SIG: commit 9849bb27 x86/sgx: Support complete page removal.
Backport for SGX EDMM support.

--------------------------------

The SGX2 page removal flow was introduced in previous patch and is
as follows:
1) Change the type of the pages to be removed to SGX_PAGE_TYPE_TRIM
   using the ioctl() SGX_IOC_ENCLAVE_MODIFY_TYPES introduced in
   previous patch.
2) Approve the page removal by running ENCLU[EACCEPT] from within
   the enclave.
3) Initiate actual page removal using the ioctl()
   SGX_IOC_ENCLAVE_REMOVE_PAGES introduced here.

Support the final step of the SGX2 page removal flow with ioctl()
SGX_IOC_ENCLAVE_REMOVE_PAGES. With this ioctl() the user specifies
a page range that should be removed. All pages in the provided
range should have the SGX_PAGE_TYPE_TRIM page type and the request
will fail with EPERM (Operation not permitted) if a page that does
not have the correct type is encountered. Page removal can fail
on any page within the provided range. Support partial success by
returning the number of pages that were successfully removed.

Since actual page removal will succeed even if ENCLU[EACCEPT] was not
run from within the enclave the ENCLU[EMODPR] instruction with RWX
permissions is used as a no-op mechanism to ensure ENCLU[EACCEPT] was
successfully run from within the enclave before the enclave page is
removed.

If the user omits running SGX_IOC_ENCLAVE_REMOVE_PAGES the pages will
still be removed when the enclave is unloaded.
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NJarkko Sakkinen <jarkko@kernel.org>
Tested-by: NHaitao Huang <haitao.huang@intel.com>
Tested-by: NVijay Dhanraj <vijay.dhanraj@intel.com>
Tested-by: NJarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/b75ee93e96774e38bb44a24b8e9bbfb67b08b51b.1652137848.git.reinette.chatre@intel.comSigned-off-by: NZhiquan Li <zhiquan1.li@intel.com>

4ac4e936

x86/sgx: Support modifying SGX page type · e642255f

由 Reinette Chatre 提交于 5月 10, 2022

mainline inclusion
from mainline-6.0-rc1
commit 45d546b8
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5USAM
CVE: NA

Intel-SIG: commit 45d546b8 x86/sgx: Support modifying SGX page type.
Backport for SGX EDMM support.

--------------------------------

Every enclave contains one or more Thread Control Structures (TCS). The
TCS contains meta-data used by the hardware to save and restore thread
specific information when entering/exiting the enclave. With SGX1 an
enclave needs to be created with enough TCSs to support the largest
number of threads expecting to use the enclave and enough enclave pages
to meet all its anticipated memory demands. In SGX1 all pages remain in
the enclave until the enclave is unloaded.

SGX2 introduces a new function, ENCLS[EMODT], that is used to change
the type of an enclave page from a regular (SGX_PAGE_TYPE_REG) enclave
page to a TCS (SGX_PAGE_TYPE_TCS) page or change the type from a
regular (SGX_PAGE_TYPE_REG) or TCS (SGX_PAGE_TYPE_TCS)
page to a trimmed (SGX_PAGE_TYPE_TRIM) page (setting it up for later
removal).

With the existing support of dynamically adding regular enclave pages
to an initialized enclave and changing the page type to TCS it is
possible to dynamically increase the number of threads supported by an
enclave.

Changing the enclave page type to SGX_PAGE_TYPE_TRIM is the first step
of dynamically removing pages from an initialized enclave. The complete
page removal flow is:
1) Change the type of the pages to be removed to SGX_PAGE_TYPE_TRIM
using the SGX_IOC_ENCLAVE_MODIFY_TYPES ioctl() introduced here.
2) Approve the page removal by running ENCLU[EACCEPT] from within
the enclave.
3) Initiate actual page removal using the ioctl() introduced in the
following patch.

Add ioctl() SGX_IOC_ENCLAVE_MODIFY_TYPES to support changing SGX
enclave page types within an initialized enclave. With
SGX_IOC_ENCLAVE_MODIFY_TYPES the user specifies a page range and the
enclave page type to be applied to all pages in the provided range.
The ioctl() itself can return an error code based on failures
encountered by the kernel. It is also possible for SGX specific
failures to be encountered. Add a result output parameter to
communicate the SGX return code. It is possible for the enclave page
type change request to fail on any page within the provided range.
Support partial success by returning the number of pages that were
successfully changed.

After the page type is changed the page continues to be accessible
from the kernel perspective with page table entries and internal
state. The page may be moved to swap. Any access until ENCLU[EACCEPT]
will encounter a page fault with SGX flag set in error code.
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NJarkko Sakkinen <jarkko@kernel.org>
Tested-by: NJarkko Sakkinen <jarkko@kernel.org>
Tested-by: NHaitao Huang <haitao.huang@intel.com>
Tested-by: NVijay Dhanraj <vijay.dhanraj@intel.com>
Link: https://lkml.kernel.org/r/babe39318c5bf16fc65fbfb38896cdee72161575.1652137848.git.reinette.chatre@intel.comSigned-off-by: NZhiquan Li <zhiquan1.li@intel.com>

e642255f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功