提交 · 4e19c36f2df8f84da22c7287de86729aaf3e352b · openeuler / Kernel

05 2月, 2020 1 次提交

kvm: x86: Introduce APICv inhibit reason bits · 4e19c36f

There are several reasons in which a VM needs to deactivate APICv
e.g. disable APICv via parameter during module loading, or when
enable Hyper-V SynIC support. Additional inhibit reasons will be
introduced later on when dynamic APICv is supported,

Introduce KVM APICv inhibit reason bits along with a new variable,
apicv_inhibit_reasons, to help keep track of APICv state for each VM,

Initially, the APICV_INHIBIT_REASON_DISABLE bit is used to indicate
the case where APICv is disabled during KVM module load.
(e.g. insmod kvm_amd avic=0 or insmod kvm_intel enable_apicv=0).
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
[Do not use get_enable_apicv; consider irqchip_split in svm.c. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4e19c36f

31 1月, 2020 2 次提交

x86/KVM: Clean up host's steal time structure · a6bd811f

由 Boris Ostrovsky 提交于 5年前

Now that we are mapping kvm_steal_time from the guest directly we
don't need keep a copy of it in kvm_vcpu_arch.st. The same is true
for the stime field.

This is part of CVE-2019-3016.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: NJoao Martins <joao.m.martins@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a6bd811f

x86/kvm: Cache gfn to pfn translation · 91724814

由 Boris Ostrovsky 提交于 5年前

__kvm_map_gfn()'s call to gfn_to_pfn_memslot() is
* relatively expensive
* in certain cases (such as when done from atomic context) cannot be called

Stashing gfn-to-pfn mapping should help with both cases.

This is part of CVE-2019-3016.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: NJoao Martins <joao.m.martins@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

91724814

28 1月, 2020 3 次提交

x86/mm: Introduce lookup_address_in_mm() · 13c72c06

由 Sean Christopherson 提交于 5年前

Add a helper, lookup_address_in_mm(), to traverse the page tables of a
given mm struct.  KVM will use the helper to retrieve the host mapping
level, e.g. 4k vs. 2mb vs. 1gb, of a compound (or DAX-backed) page
without having to resort to implementation specific metadata.  E.g. KVM
currently uses different logic for HugeTLB vs. THP, and would add a
third variant for DAX-backed files.

Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

13c72c06

KVM: X86: Drop x86_set_memory_region() · 6a3c623b

由 Peter Xu 提交于 5年前

The helper x86_set_memory_region() is only used in vmx_set_tss_addr()
and kvm_arch_destroy_vm().  Push the lock upper in both cases.  With
that, drop x86_set_memory_region().

This prepares to allow __x86_set_memory_region() to return a HVA
mapped, because the HVA will need to be protected by the lock too even
after __x86_set_memory_region() returns.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6a3c623b

kvm/svm: PKU not currently supported · a47970ed

由 John Allen 提交于 5年前

Current SVM implementation does not have support for handling PKU. Guests
running on a host with future AMD cpus that support the feature will read
garbage from the PKRU register and will hit segmentation faults on boot as
memory is getting marked as protected that should not be. Ensure that cpuid
from SVM does not advertise the feature.
Signed-off-by: NJohn Allen <john.allen@amd.com>
Cc: stable@vger.kernel.org
Fixes: 0556cbdc ("x86/pkeys: Don't check if PKRU is zero before writing it")
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a47970ed

24 1月, 2020 5 次提交

KVM: x86: Move kvm_vcpu_init() invocation to common code · 987b2594

由 Sean Christopherson 提交于 5年前

Move the kvm_cpu_{un}init() calls to common x86 code as an intermediate
step to removing kvm_cpu_{un}init() altogether.

Note, VMX'x alloc_apic_access_page() and init_rmode_identity_map() are
per-VM allocations and are intentionally kept if vCPU creation fails.
They are freed by kvm_arch_destroy_vm().

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

987b2594

KVM: x86: Allocate vcpu struct in common x86 code · a9dd6f09

由 Sean Christopherson 提交于 5年前

Move allocation of VMX and SVM vcpus to common x86. Although the struct
being allocated is technically a VMX/SVM struct, it can be interpreted
directly as a 'struct kvm_vcpu' because of the pre-existing requirement
that 'struct kvm_vcpu' be located at offset zero of the arch/vendor vcpu
struct.

Remove the message from the build-time assertions regarding placement of
the struct, as compatibility with the arch usercopy region is no longer
the sole dependent on 'struct kvm_vcpu' being at offset zero.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a9dd6f09

x86/asm: add iosubmit_cmds512() based on MOVDIR64B CPU instruction · 232bb01b

由 Dave Jiang 提交于 5年前

With the introduction of MOVDIR64B instruction, there is now an instruction
that can write 64 bytes of data atomically.

Quoting from Intel SDM:
"There is no atomicity guarantee provided for the 64-byte load operation
from source address, and processor implementations may use multiple
load operations to read the 64-bytes. The 64-byte direct-store issued
by MOVDIR64B guarantees 64-byte write-completion atomicity. This means
that the data arrives at the destination in a single undivided 64-byte
write transaction."

We have identified at least 3 different use cases for this instruction in
the format of func(dst, src, count):
1) Clear poison / Initialize MKTME memory
@dst is normal memory.
@src in normal memory. Does not increment. (Copy same line to all
targets)
@count (to clear/init multiple lines)
2) Submit command(s) to new devices
@dst is a special MMIO region for a device. Does not increment.
@src is normal memory. Increments.
@count usually is 1, but can be multiple.
3) Copy to iomem in big chunks
@dst is iomem and increments
@src in normal memory and increments
@count is number of chunks to copy

Add support for case #2 to support device that will accept commands via
this instruction. We provide a @count in order to submit a batch of
preprogrammed descriptors in virtually contiguous memory. This
allows the caller to submit multiple descriptors to a device with a single
submission. The special device requires the entire 64bytes descriptor to
be written atomically and will accept MOVDIR64B instruction.
Signed-off-by: NDave Jiang <dave.jiang@intel.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/157965022175.73301.10174614665472962675.stgit@djiang5-desk3.ch.intel.comSigned-off-by: NVinod Koul <vkoul@kernel.org>

232bb01b

x86/mpx: remove MPX from arch/x86 · 45fc24e8

由 Dave Hansen 提交于 5年前

From: Dave Hansen <dave.hansen@linux.intel.com>

MPX is being removed from the kernel due to a lack of support
in the toolchain going forward (gcc).

This removes all the remaining (dead at this point) MPX handling
code remaining in the tree.  The only remaining code is the XSAVE
support for MPX state which is currently needd for KVM to handle
VMs which might use MPX.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>

45fc24e8

mm: remove arch_bprm_mm_init() hook · 42222eae

由 Dave Hansen 提交于 5年前

From: Dave Hansen <dave.hansen@linux.intel.com>

MPX is being removed from the kernel due to a lack of support
in the toolchain going forward (gcc).

arch_bprm_mm_init() is used at execve() time.  The only non-stub
implementation is on x86 for MPX.  Remove the hook entirely from
all architectures and generic code.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>

42222eae

23 1月, 2020 8 次提交

platform/x86: intel_pmc_ipc: Drop intel_pmc_gcr_read() and intel_pmc_gcr_write() · a97368b3

由 Mika Westerberg 提交于 5年前

These functions are not used anywhere so drop them completely.
Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>

a97368b3

platform/x86: intel_pmc_ipc: Make intel_pmc_ipc_raw_cmd() static · f827e530

由 Mika Westerberg 提交于 5年前

This function is not called outside of intel_pmc_ipc.c so we can make it
static instead.
Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>

f827e530

platform/x86: intel_pmc_ipc: Make intel_pmc_ipc_simple_command() static · 3f751ba5

由 Mika Westerberg 提交于 5年前

This function is not called outside of intel_pmc_ipc.c so we can make it
static instead.
Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>

3f751ba5

platform/x86: intel_pmc_ipc: Make intel_pmc_gcr_update() static · e1f46163

由 Mika Westerberg 提交于 5年前

This function is not called outside of intel_pmc_ipc.c so we can make it
static instead.
Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>

e1f46163

platform/x86: intel_scu_ipc: Drop intel_scu_ipc_raw_command() · 49078988

由 Mika Westerberg 提交于 5年前

There is no user for this function so we can drop it from the driver.
Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>

49078988

platform/x86: intel_scu_ipc: Drop intel_scu_ipc_io[read|write][8|16]() · b7380a16

由 Mika Westerberg 提交于 5年前

There are no users for these so we can remove them.
Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>

b7380a16

platform/x86: intel_scu_ipc: Drop unused prototype intel_scu_ipc_fw_update() · a5f04a2e

由 Mika Westerberg 提交于 5年前

There is no implementation for that anymore so drop the prototype.
Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>

a5f04a2e

platform/x86: intel_scu_ipc: Drop intel_scu_ipc_i2c_cntrl() · 74e9748b

由 Mika Westerberg 提交于 5年前

There are no existing users for this functionality so drop it from the
driver completely. This also means we don't need to keep the struct
intel_scu_ipc_pdata_t around anymore so remove that as well.
Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>

74e9748b

21 1月, 2020 3 次提交

KVM: x86: Add dedicated emulator helpers for querying CPUID features · 5ae78e95

由 Sean Christopherson 提交于 5年前

Add feature-specific helpers for querying guest CPUID support from the
emulator instead of having the emulator do a full CPUID and perform its
own bit tests.  The primary motivation is to eliminate the emulator's
usage of bit() so that future patches can add more extensive build-time
assertions on the usage of bit() without having to expose yet more code
to the emulator.

Note, providing a generic guest_cpuid_has() to the emulator doesn't work
due to the existing built-time assertions in guest_cpuid_has(), which
require the feature being checked to be a compile-time constant.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5ae78e95

KVM: Fix some writing mistakes · 311497e0

由 Miaohe Lin 提交于 5年前

Fix some writing mistakes in the comments.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

311497e0

KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath · 1e9e2622

由 Wanpeng Li 提交于 5年前

ICR and TSCDEADLINE MSRs write cause the main MSRs write vmexits in our
product observation, multicast IPIs are not as common as unicast IPI like
RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc.

This patch introduce a mechanism to handle certain performance-critical
WRMSRs in a very early stage of KVM VMExit handler.

This mechanism is specifically used for accelerating writes to x2APIC ICR
that attempt to send a virtual IPI with physical destination-mode, fixed
delivery-mode and single target. Which was found as one of the main causes
of VMExits for Linux workloads.

The reason this mechanism significantly reduce the latency of such virtual
IPIs is by sending the physical IPI to the target vCPU in a very early stage
of KVM VMExit handler, before host interrupts are enabled and before expensive
operations such as reacquiring KVM’s SRCU lock.
Latency is reduced even more when KVM is able to use APICv posted-interrupt
mechanism (which allows to deliver the virtual IPI directly to target vCPU
without the need to kick it to host).

Testing on Xeon Skylake server:

The virtual IPI latency from sender send to receiver receive reduces
more than 200+ cpu cycles.
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1e9e2622

20 1月, 2020 2 次提交

efi/x86: Limit EFI old memory map to SGI UV machines · 1f299fad

由 Ard Biesheuvel 提交于 5年前

We carry a quirk in the x86 EFI code to switch back to an older
method of mapping the EFI runtime services memory regions, because
it was deemed risky at the time to implement a new method without
providing a fallback to the old method in case problems arose.

Such problems did arise, but they appear to be limited to SGI UV1
machines, and so these are the only ones for which the fallback gets
enabled automatically (via a DMI quirk). The fallback can be enabled
manually as well, by passing efi=old_map, but there is very little
evidence that suggests that this is something that is being relied
upon in the field.

Given that UV1 support is not enabled by default by the distros
(Ubuntu, Fedora), there is no point in carrying this fallback code
all the time if there are no other users. So let's move it into the
UV support code, and document that efi=old_map now requires this
support code to be enabled.

Note that efi=old_map has been used in the past on other SGI UV
machines to work around kernel regressions in production, so we
keep the option to enable it by hand, but only if the kernel was
built with UV support.
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-8-ardb@kernel.org

1f299fad

efi/libstub/x86: Use const attribute for efi_is_64bit() · 796eb8d2

由 Ard Biesheuvel 提交于 5年前

Reshuffle the x86 stub code a bit so that we can tag the efi_is_64bit()
function with the 'const' attribute, which permits the compiler to
optimize away any redundant calls. Since we have two different entry
points for 32 and 64 bit firmware in the startup code, this also
simplifies the C code since we'll enter it with the efi_is64 variable
already set.
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-2-ardb@kernel.org

796eb8d2

17 1月, 2020 1 次提交

x86/MCE/AMD, EDAC/mce_amd: Add new Load Store unit McaType · 89a76171

由 Yazen Ghannam 提交于 5年前

Add support for a new version of the Load Store unit bank type as
indicated by its McaType value, which will be present in future SMCA
systems.

Add the new (HWID, MCATYPE) tuple. Reuse the same name, since this is
logically the same to the user.

Also, add the new error descriptions to edac_mce_amd.
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200110015651.14887-2-Yazen.Ghannam@amd.com

89a76171

14 1月, 2020 9 次提交

x86/vdso: Add time napespace page · 550a77a7

由 Dmitry Safonov 提交于 5年前

To support time namespaces in the VDSO with a minimal impact on regular non
time namespace affected tasks, the namespace handling needs to be hidden in
a slow path.

The most obvious place is vdso_seq_begin(). If a task belongs to a time
namespace then the VVAR page which contains the system wide VDSO data is
replaced with a namespace specific page which has the same layout as the
VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path
and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time
namespace handling path.

The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the VDSO data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.

If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.

Allocate the time namespace page among VVAR pages and place vdso_data on
it. Provide __arch_get_timens_vdso_data() helper for VDSO code to get the
code-relative position of VVARs on that special page.
Co-developed-by: NAndrei Vagin <avagin@openvz.org>
Signed-off-by: NAndrei Vagin <avagin@openvz.org>
Signed-off-by: NDmitry Safonov <dima@arista.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-23-dima@arista.com

550a77a7

x86/vdso: Provide vdso_data offset on vvar_page · 64b302ab

由 Dmitry Safonov 提交于 5年前

VDSO support for time namespaces needs to set up a page with the same
layout as VVAR. That timens page will be placed on position of VVAR page
inside namespace. That page has vdso_data->seq set to 1 to enforce
the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce
the time namespace handling path.

To prepare the time namespace page the kernel needs to know the vdso_data
offset.  Provide arch_get_vdso_data() helper for locating vdso_data on VVAR
page.
Co-developed-by: NAndrei Vagin <avagin@openvz.org>
Signed-off-by: NAndrei Vagin <avagin@openvz.org>
Signed-off-by: NDmitry Safonov <dima@arista.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-22-dima@arista.com

64b302ab

x86/vdso: Remove unused VDSO_HAS_32BIT_FALLBACK · 0b5c1233

由 Vincenzo Frascino 提交于 5年前

VDSO_HAS_32BIT_FALLBACK has been removed from the core since
the architectures that support the generic vDSO library have
been converted to support the 32 bit fallbacks.

Remove unused VDSO_HAS_32BIT_FALLBACK from x86 vdso.
Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20190830135902.20861-9-vincenzo.frascino@arm.com

0b5c1233

perf/x86: Provide stubs of KVM helpers for non-Intel CPUs · 616c59b5

由 Sean Christopherson 提交于 5年前

Provide stubs for perf_guest_get_msrs() and intel_pt_handle_vmx() when
building without support for Intel CPUs, i.e. CPU_SUP_INTEL=n. Lack of
stubs is not currently a problem as the only user, KVM_INTEL, takes a
dependency on CPU_SUP_INTEL=y. Provide the stubs for all CPUs so that
KVM_INTEL can be built for any CPU with compatible hardware support,
e.g. Centuar and Zhaoxin CPUs.

Note, the existing stub for perf_guest_get_msrs() is essentially dead
code as KVM selects CONFIG_PERF_EVENTS, i.e. the only user guarantees
the full implementation is built.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-19-sean.j.christopherson@intel.com

616c59b5

KVM: VMX: Use VMX_FEATURE_* flags to define VMCS control bits · b39033f5

由 Sean Christopherson 提交于 5年前

Define the VMCS execution control flags (consumed by KVM) using their
associated VMX_FEATURE_* to provide a strong hint that new VMX features
are expected to be added to VMX_FEATURE and considered for reporting via
/proc/cpuinfo.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-18-sean.j.christopherson@intel.com

b39033f5

x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured · 85c17291

由 Sean Christopherson 提交于 5年前

Add a new feature flag, X86_FEATURE_MSR_IA32_FEAT_CTL, to track whether
IA32_FEAT_CTL has been initialized. This will allow KVM, and any future
subsystems that depend on IA32_FEAT_CTL, to rely purely on cpufeatures
to query platform support, e.g. allows a future patch to remove KVM's
manual IA32_FEAT_CTL MSR checks.

Various features (on platforms that support IA32_FEAT_CTL) are dependent
on IA32_FEAT_CTL being configured and locked, e.g. VMX and LMCE. The
MSR is always configured during boot, but only if the CPU vendor is
recognized by the kernel. Because CPUID doesn't incorporate the current
IA32_FEAT_CTL value in its reporting of relevant features, it's possible
for a feature to be reported as supported in cpufeatures but not truly
enabled, e.g. if the CPU supports VMX but the kernel doesn't recognize
the CPU.

As a result, without the flag, KVM would see VMX as supported even if
IA32_FEAT_CTL hasn't been initialized, and so would need to manually
read the MSR and check the various enabling bits to avoid taking an
unexpected #GP on VMXON.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-14-sean.j.christopherson@intel.com

85c17291

x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs · b47ce1fe

由 Sean Christopherson 提交于 5年前

Add an entry in struct cpuinfo_x86 to track VMX capabilities and fill
the capabilities during IA32_FEAT_CTL MSR initialization.

Make the VMX capabilities dependent on IA32_FEAT_CTL and
X86_FEATURE_NAMES so as to avoid unnecessary overhead on CPUs that can't
possibly support VMX, or when /proc/cpuinfo is not available.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-11-sean.j.christopherson@intel.com

b47ce1fe

x86/vmx: Introduce VMX_FEATURES_* · 15934878

由 Sean Christopherson 提交于 5年前

Add a VMX-specific variant of X86_FEATURE_* flags, which will eventually
supplant the synthetic VMX flags defined in cpufeatures word 8.  Use the
Intel-defined layouts for the major VMX execution controls so that their
word entries can be directly populated from their respective MSRs, and
so that the VMX_FEATURE_* flags can be used to define the existing bit
definitions in asm/vmx.h, i.e. force developers to define a VMX_FEATURE
flag when adding support for a new hardware feature.

The majority of Intel's (and compatible CPU's) VMX capabilities are
enumerated via MSRs and not CPUID, i.e. querying /proc/cpuinfo doesn't
naturally provide any insight into the virtualization capabilities of
VMX enabled CPUs.  Commit

  e38e05a8 ("x86: extended "flags" to show virtualization HW feature
		 in /proc/cpuinfo")

attempted to address the issue by synthesizing select VMX features into
a Linux-defined word in cpufeatures.

Lack of reporting of VMX capabilities via /proc/cpuinfo is problematic
because there is no sane way for a user to query the capabilities of
their platform, e.g. when trying to find a platform to test a feature or
debug an issue that has a hardware dependency.  Lack of reporting is
especially problematic when the user isn't familiar with VMX, e.g. the
format of the MSRs is non-standard, existence of some MSRs is reported
by bits in other MSRs, several "features" from KVM's point of view are
enumerated as 3+ distinct features by hardware, etc...

The synthetic cpufeatures approach has several flaws:

  - The set of synthesized VMX flags has become extremely stale with
    respect to the full set of VMX features, e.g. only one new flag
    (EPT A/D) has been added in the the decade since the introduction of
    the synthetic VMX features.  Failure to keep the VMX flags up to
    date is likely due to the lack of a mechanism that forces developers
    to consider whether or not a new feature is worth reporting.

  - The synthetic flags may incorrectly be misinterpreted as affecting
    kernel behavior, i.e. KVM, the kernel's sole consumer of VMX,
    completely ignores the synthetic flags.

  - New CPU vendors that support VMX have duplicated the hideous code
    that propagates VMX features from MSRs to cpufeatures.  Bringing the
    synthetic VMX flags up to date would exacerbate the copy+paste
    trainwreck.

Define separate VMX_FEATURE flags to set the stage for enumerating VMX
capabilities outside of the cpu_has() framework, and for adding
functional usage of VMX_FEATURE_* to help ensure the features reported
via /proc/cpuinfo is up to date with respect to kernel recognition of
VMX capabilities.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-10-sean.j.christopherson@intel.com

15934878

x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR · 32ad73db

由 Sean Christopherson 提交于 5年前

As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation.  Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.

Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.

Opportunistically, fix a few other annoyances with the defines:

  - Relocate the bit defines so that they immediately follow the MSR
    define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
  - Add whitespace around the block of feature control defines to make
    it clear they're all related.
  - Use BIT() instead of manually encoding the bit shift.
  - Use "VMX" instead of "VMXON" to match the SDM.
  - Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
    be consistent with the kernel's verbiage used for all other feature
    control bits.  Note, the SDM refers to the LMCE bit as LMCE_ON,
    likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN.  Ignore
    the (literal) one-off usage of _ON, the SDM is simply "wrong".
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com

32ad73db

13 1月, 2020 1 次提交

x86/mce: Take action on UCNA/Deferred errors again · 8438b84a

由 Jan H. Schönherr 提交于 5年前

Commit

  fa92c586 ("x86, mce: Support memory error recovery for both UCNA
		and Deferred error in machine_check_poll")

added handling of UCNA and Deferred errors by adding them to the ring
for SRAO errors.

Later, commit

  fd4cf79f ("x86/mce: Remove the MCE ring for Action Optional errors")

switched storage from the SRAO ring to the unified pool that is still
in use today. In order to only act on the intended errors, a filter
for MCE_AO_SEVERITY is used -- effectively removing handling of
UCNA/Deferred errors again.

Extend the severity filter to include UCNA/Deferred errors again.
Also, generalize the naming of the notifier from SRAO to UC to capture
the extended scope.

Note, that this change may cause a message like the following to appear,
as the same address may be reported as SRAO and as UCNA:

 Memory failure: 0x5fe3284: already hardware poisoned

Technically, this is a return to previous behavior.
Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NTony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200103150722.20313-2-jschoenh@amazon.de

8438b84a

11 1月, 2020 5 次提交

x86/nmi: Remove irq_work from the long duration NMI handler · 248ed510

由 Changbin Du 提交于 5年前

First, printk() is NMI-context safe now since the safe printk() has been
implemented and it already has an irq_work to make NMI-context safe.

Second, this NMI irq_work actually does not work if a NMI handler causes
panic by watchdog timeout. It has no chance to run in such case, while
the safe printk() will flush its per-cpu buffers before panicking.

While at it, repurpose the irq_work callback into a function which
concentrates the NMI duration checking and makes the code easier to
follow.

 [ bp: Massage. ]
Signed-off-by: NChangbin Du <changbin.du@gmail.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200111125427.15662-1-changbin.du@gmail.com

248ed510

efi: Allow disabling PCI busmastering on bridges during boot · 4444f854

由 Matthew Garrett 提交于 5年前

Add an option to disable the busmaster bit in the control register on
all PCI bridges before calling ExitBootServices() and passing control
to the runtime kernel. System firmware may configure the IOMMU to prevent
malicious PCI devices from being able to attack the OS via DMA. However,
since firmware can't guarantee that the OS is IOMMU-aware, it will tear
down IOMMU configuration when ExitBootServices() is called. This leaves
a window between where a hostile device could still cause damage before
Linux configures the IOMMU again.

If CONFIG_EFI_DISABLE_PCI_DMA is enabled or "efi=disable_early_pci_dma"
is passed on the command line, the EFI stub will clear the busmaster bit
on all PCI bridges before ExitBootServices() is called. This will
prevent any malicious PCI devices from being able to perform DMA until
the kernel reenables busmastering after configuring the IOMMU.

This option may cause failures with some poorly behaved hardware and
should not be enabled without testing. The kernel commandline options
"efi=disable_early_pci_dma" or "efi=no_disable_early_pci_dma" may be
used to override the default. Note that PCI devices downstream from PCI
bridges are disconnected from their drivers first, using the UEFI
driver model API, so that DMA can be disabled safely at the bridge
level.

[ardb: disconnect PCI I/O handles first, as suggested by Arvind]
Co-developed-by: NMatthew Garrett <mjg59@google.com>
Signed-off-by: NMatthew Garrett <mjg59@google.com>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Matthew Garrett <matthewgarrett@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-18-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

4444f854

efi/x86: Allow translating 64-bit arguments for mixed mode calls · ea7d87f9

由 Arvind Sankar 提交于 5年前

Introduce the ability to define macros to perform argument translation
for the calls that need it, and define them for the boot services that
we currently use.

When calling 32-bit firmware methods in mixed mode, all output
parameters that are 32-bit according to the firmware, but 64-bit in the
kernel (ie OUT UINTN * or OUT VOID **) must be initialized in the
kernel, or the upper 32 bits may contain garbage. Define macros that
zero out the upper 32 bits of the output before invoking the firmware
method.

When a 32-bit EFI call takes 64-bit arguments, the mixed-mode call must
push the two 32-bit halves as separate arguments onto the stack. This
can be achieved by splitting the argument into its two halves when
calling the assembler thunk. Define a macro to do this for the
free_pages boot service.
Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Matthew Garrett <mjg59@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-17-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

ea7d87f9

efi/x86: Check number of arguments to variadic functions · 14b864f4

由 Arvind Sankar 提交于 5年前

On x86 we need to thunk through assembler stubs to call the EFI services
for mixed mode, and for runtime services in 64-bit mode. The assembler
stubs have limits on how many arguments it handles. Introduce a few
macros to check that we do not try to pass too many arguments to the
stubs.
Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Matthew Garrett <mjg59@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-16-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

14b864f4

efi/x86: Simplify mixed mode call wrapper · ea5e1919

由 Ard Biesheuvel 提交于 5年前

Calling 32-bit EFI runtime services from a 64-bit OS involves
switching back to the flat mapping with a stack carved out of
memory that is 32-bit addressable.

There is no need to actually execute the 64-bit part of this
routine from the flat mapping as well, as long as the entry
and return address fit in 32 bits. There is also no need to
preserve part of the calling context in global variables: we
can simply push the old stack pointer value to the new stack,
and keep the return address from the code32 section in EBX.

While at it, move the conditional check whether to invoke
the mixed mode version of SetVirtualAddressMap() into the
64-bit implementation of the wrapper routine.
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Matthew Garrett <mjg59@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-11-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

ea5e1919

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功