提交 · 0a67361dcdaa29dca1e77ebac919c62e93a8b3bc · openeuler / Kernel

24 2月, 2020 14 次提交

efi/x86: Remove runtime table address from kexec EFI setup data · 0a67361d

由 Ard Biesheuvel 提交于 1月 20, 2020

Since commit 33b85447 ("efi/x86: Drop two near identical versions
of efi_runtime_init()"), we no longer map the EFI runtime services table
before calling SetVirtualAddressMap(), which means we don't need the 1:1
mapped physical address of this table, and so there is no point in passing
the address via EFI setup data on kexec boot.

Note that the kexec tools will still look for this address in sysfs, so
we still need to provide it.

Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

0a67361d

efi: Clean up config_parse_tables() · 06c0bd93

由 Ard Biesheuvel 提交于 1月 22, 2020

config_parse_tables() is a jumble of pointer arithmetic, due to the
fact that on x86, we may be dealing with firmware whose native word
size differs from the kernel's.

This is not a concern on other architectures, and doesn't quite
justify the state of the code, so let's clean it up by adding a
non-x86 code path, constifying statically allocated tables and
replacing preprocessor conditionals with IS_ENABLED() checks.

Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

06c0bd93

efi: Make efi_config_init() x86 only · 3a0701dc

由 Ard Biesheuvel 提交于 1月 20, 2020

The efi_config_init() routine is no longer shared with ia64 so let's
move it into the x86 arch code before making further x86 specific
changes to it.

Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

3a0701dc

efi: Merge EFI system table revision and vendor checks · 14fb4209

由 Ard Biesheuvel 提交于 1月 20, 2020

We have three different versions of the code that checks the EFI system
table revision and copies the firmware vendor string, and they are
mostly equivalent, with the exception of the use of early_memremap_ro
vs. __va() and the lowest major revision to warn about. Let's move this
into common code and factor out the commonalities.

Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

14fb4209

efi: Move mem_attr_table out of struct efi · a17e809e

由 Ard Biesheuvel 提交于 1月 22, 2020

The memory attributes table is only used at init time by the core EFI
code, so there is no need to carry its address in struct efi that is
shared with the world. So move it out, and make it __ro_after_init as
well, considering that the value is set during early boot.

Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

a17e809e

efi: Move UGA and PROP table handling to x86 code · fd506e0c

由 Ard Biesheuvel 提交于 1月 19, 2020

The UGA table is x86 specific (its handling was introduced when the
EFI support code was modified to accommodate IA32), so there is no
need to handle it in generic code.

The EFI properties table is not strictly x86 specific, but it was
deprecated almost immediately after having been introduced, due to
implementation difficulties. Only x86 takes it into account today,
and this is not going to change, so make this table x86 only as well.

Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

fd506e0c

efi/ia64: Move HCDP and MPS table handling into IA64 arch code · 120540f2

由 Ard Biesheuvel 提交于 1月 19, 2020

The HCDP and MPS tables are Itanium specific EFI config tables, so
move their handling to ia64 arch code.

Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

120540f2

efi: Drop handling of 'boot_info' configuration table · 50d53c58

由 Ard Biesheuvel 提交于 1月 19, 2020

Some plumbing exists to handle a UEFI configuration table of type
BOOT_INFO but since we never match it to a GUID anywhere, we never
actually register such a table, or access it, for that matter. So
simply drop all mentions of it.

Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

50d53c58

efi/x86: Replace #ifdefs with IS_ENABLED() checks · a570b062

由 Ard Biesheuvel 提交于 2月 02, 2020

When possible, IS_ENABLED() conditionals are preferred over #ifdefs,
given that the latter hide the code from the compiler entirely, which
reduces build test coverage when the option is not enabled.

So replace an instance in the x86 efi startup code.
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

a570b062

efi/x86: Reindent struct initializer for legibility · 14b60cc8

由 Ard Biesheuvel 提交于 2月 02, 2020

Reindent the efi_memory_map_data initializer so that all the = signs
are aligned vertically, making the resulting code much easier to read.
Suggested-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

14b60cc8

efi/libstub: Make the LoadFile EFI protocol accessible · 2931d526

由 Ard Biesheuvel 提交于 2月 10, 2020

Add the protocol definitions, GUIDs and mixed mode glue so that
the EFI loadfile protocol can be used from the stub. This will
be used in a future patch to load the initrd.
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

2931d526

efi/libstub: Expose LocateDevicePath boot service · abd26868

由 Ard Biesheuvel 提交于 2月 10, 2020

We will be adding support for loading the initrd from a GUIDed
device path in a subsequent patch, so update the prototype of
the LocateDevicePath() boot service to make it callable from
our code.
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

abd26868

efi/libstub/x86: Permit cmdline data to be allocated above 4 GB · 1e45bf73

由 Ard Biesheuvel 提交于 2月 10, 2020

We now support cmdline data that is located in memory that is not
32-bit addressable, so relax the allocation limit on systems where
this feature is enabled.
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

1e45bf73

efi/libstub/x86: Incorporate eboot.c into libstub · c2d0b470

由 Ard Biesheuvel 提交于 2月 10, 2020

Most of the EFI stub source files of all architectures reside under
drivers/firmware/efi/libstub, where they share a Makefile with special
CFLAGS and an include file with declarations that are only relevant
for stub code.

Currently, we carry a lot of stub specific stuff in linux/efi.h only
because eboot.c in arch/x86 needs them as well. So let's move eboot.c
into libstub/, and move the contents of eboot.h that we still care
about into efistub.h
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

c2d0b470

23 2月, 2020 10 次提交

efi/libstub/x86: Avoid overflowing code32_start on PE entry · 04a7d0e1

由 Ard Biesheuvel 提交于 2月 10, 2020

When using the native PE entry point (as opposed to the EFI handover
protocol entry point that is used more widely), we set code32_start,
which is a 32-bit wide field, to the effective symbol address of
startup_32, which could overflow given that the EFI loader may have
located the running image anywhere in memory, and we haven't reached
the point yet where we relocate ourselves.

Since we relocate ourselves if code32_start != pref_address, this
isn't likely to lead to problems in practice, given how unlikely
it is that the truncated effective address of startup_32 happens
to equal pref_address. But it is better to defer the assignment
of code32_start to after the relocation, when it is guaranteed to
fit.

While at it, move the call to efi_relocate_kernel() to an earlier
stage so it is more likely that our preferred offset in memory has
not been occupied by other memory allocations done in the mean time.
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

04a7d0e1

efi/libstub/x86: Remove pointless zeroing of apm_bios_info · e6d832ea

由 Ard Biesheuvel 提交于 2月 10, 2020

We have some code in the EFI stub entry point that takes the address
of the apm_bios_info struct in the newly allocated and zeroed out
boot_params structure, only to zero it out again. This is pointless
so remove it.
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

e6d832ea

efi/x86: Mark setup_graphics static · f32ea1cd

由 Arvind Sankar 提交于 1月 30, 2020

This function is only called from efi_main in the same source file.
Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200130222004.1932152-1-nivedita@alum.mit.eduSigned-off-by: NArd Biesheuvel <ardb@kernel.org>

f32ea1cd

x86/boot: Micro-optimize GDT loading instructions · 8a3abe30

由 Arvind Sankar 提交于 2月 02, 2020

Rearrange the instructions a bit to use a 32-bit displacement once
instead of 2/3 times. This saves 8 bytes of machine code.
Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-8-nivedita@alum.mit.eduSigned-off-by: NArd Biesheuvel <ardb@kernel.org>

8a3abe30

x86/boot: GDT limit value should be size - 1 · b75e2b07

由 Arvind Sankar 提交于 2月 02, 2020

The limit value for the GDTR should be such that adding it to the base
address gives the address of the last byte of the GDT, i.e. it should be
one less than the size, not the size.
Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-7-nivedita@alum.mit.eduSigned-off-by: NArd Biesheuvel <ardb@kernel.org>

b75e2b07

efi/x86: Remove GDT setup from efi_main · ef5a7b5e

由 Arvind Sankar 提交于 2月 02, 2020

The 64-bit kernel will already load a GDT in startup_64, which is the
next function to execute after return from efi_main.

Add GDT setup code to the 32-bit kernel's startup_32 as well. Doing it
in the head code has the advantage that we can avoid potentially
corrupting the GDT during copy/decompression. This also removes
dependence on having a specific GDT layout setup by the bootloader.

Both startup_32 and startup_64 now clear interrupts on entry, so we can
remove that from efi_main as well.
Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-6-nivedita@alum.mit.eduSigned-off-by: NArd Biesheuvel <ardb@kernel.org>

ef5a7b5e

x86/boot: Clear direction and interrupt flags in startup_64 · cae0e431

由 Arvind Sankar 提交于 2月 02, 2020

startup_32 already clears these flags on entry, do it in startup_64 as
well for consistency.

The direction flag in particular is not specified to be cleared in the
boot protocol documentation, and we currently call into C code
(paging_prepare) without explicitly clearing it.
Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-5-nivedita@alum.mit.eduSigned-off-by: NArd Biesheuvel <ardb@kernel.org>

cae0e431

x86/boot: Reload GDTR after copying to the end of the buffer · 32d00913

由 Arvind Sankar 提交于 2月 02, 2020

The GDT may get overwritten during the copy or during extract_kernel,
which will cause problems if any segment register is touched before the
GDTR is reloaded by the decompressed kernel. For safety update the GDTR
to point to the GDT within the copied kernel.
Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-4-nivedita@alum.mit.eduSigned-off-by: NArd Biesheuvel <ardb@kernel.org>

32d00913

efi/x86: Don't depend on firmware GDT layout · 90ff2262

由 Arvind Sankar 提交于 2月 02, 2020

When booting in mixed mode, the firmware's GDT is still installed at
handover entry in efi32_stub_entry. We save the GDTR for later use in
__efi64_thunk but we are assuming that descriptor 2 (__KERNEL_CS) is a
valid 32-bit code segment descriptor and that descriptor 3
(__KERNEL_DS/__BOOT_DS) is a valid data segment descriptor.

This happens to be true for OVMF (it actually uses descriptor 1 for data
segments, but descriptor 3 is also setup as data), but we shouldn't
depend on this being the case.

Fix this by saving the code and data selectors in addition to the GDTR
in efi32_stub_entry, and restoring them in __efi64_thunk before calling
the firmware. The UEFI specification guarantees that selectors will be
flat, so using the DS selector for all the segment registers should be
enough.

We also need to install our own GDT before initializing segment
registers in startup_32, so move the GDT load up to the beginning of the
function.

[ardb: mention mixed mode in the commit log]
Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-3-nivedita@alum.mit.eduSigned-off-by: NArd Biesheuvel <ardb@kernel.org>

90ff2262

x86/boot: Remove KEEP_SEGMENTS support · 67a6af7a

由 Arvind Sankar 提交于 2月 02, 2020

Commit a24e7851 ("i386: paravirt boot sequence") added this flag for
use by paravirtualized environments such as Xen. However, Xen never made
use of this flag [1], and it was only ever used by lguest [2].

Commit ecda85e7 ("x86/lguest: Remove lguest support") removed
lguest, so KEEP_SEGMENTS has lost its last user.

[1] https://lore.kernel.org/lkml/4D4B097C.5050405@goop.org
[2] https://www.mail-archive.com/lguest@lists.ozlabs.org/msg00469.htmlSigned-off-by: NArvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-2-nivedita@alum.mit.eduSigned-off-by: NArd Biesheuvel <ardb@kernel.org>

67a6af7a

08 2月, 2020 2 次提交

fs_parse: fold fs_parameter_desc/fs_parameter_spec · d7167b14

由 Al Viro 提交于 9月 07, 2019

The former contains nothing but a pointer to an array of the latter...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d7167b14

fs_parser: remove fs_parameter_description name field · 96cafb9c

由 Eric Sandeen 提交于 12月 06, 2019

Unused now.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

96cafb9c

07 2月, 2020 1 次提交

x86/apic: Mask IOAPIC entries when disabling the local APIC · 0f378d73

由 Tony W Wang-oc 提交于 1月 15, 2020

When a system suspends, the local APIC is disabled in the suspend sequence,
but the IOAPIC is left in the current state. This means unmasked interrupt
lines stay unmasked. This is usually the case for IOAPIC pin 9 to which the
ACPI interrupt is connected.

That means that in suspended state the IOAPIC can respond to an external
interrupt, e.g. the wakeup via keyboard/RTC/ACPI, but the interrupt message
cannot be handled by the disabled local APIC. As a consequence the Remote
IRR bit is set, but the local APIC does not send an EOI to acknowledge
it. This causes the affected interrupt line to become stale and the stale
Remote IRR bit will cause a hang when __synchronize_hardirq() is invoked
for that interrupt line.

To prevent this, mask all IOAPIC entries before disabling the local
APIC. The resume code already has the unmask operation inside.

[ tglx: Massaged changelog ]
Signed-off-by: NTony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1579076539-7267-1-git-send-email-TonyWWang-oc@zhaoxin.com

0f378d73

05 2月, 2020 13 次提交

KVM: vmx: delete meaningless vmx_decache_cr0_guest_bits() declaration · a8be1ad0

由 Miaohe Lin 提交于 2月 05, 2020

The function vmx_decache_cr0_guest_bits() is only called below its
implementation. So this is meaningless and should be removed.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a8be1ad0

KVM: x86: Mark CR4.UMIP as reserved based on associated CPUID bit · d76c7fbc

由 Sean Christopherson 提交于 1月 28, 2020

Re-add code to mark CR4.UMIP as reserved if UMIP is not supported by the
host. The UMIP handling was unintentionally dropped during a recent
refactoring.

Not flagging CR4.UMIP allows the guest to set its CR4.UMIP regardless of
host support or userspace desires. On CPUs with UMIP support, including
emulated UMIP, this allows the guest to enable UMIP against the wishes
of the userspace VMM. On CPUs without any form of UMIP, this results in
a failed VM-Enter due to invalid guest state.

Fixes: 345599f9 ("KVM: x86: Add macro to ensure reserved cr4 bits checks stay in sync")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d76c7fbc

x86: vmxfeatures: rename features for consistency with KVM and manual · bcfcff64

由 Paolo Bonzini 提交于 2月 05, 2020

Three of the feature bits in vmxfeatures.h have names that are different
from the Intel SDM.  The names have been adjusted recently in KVM but they
were using the old name in the tip tree's x86/cpu branch.  Adjust for
consistency.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bcfcff64

KVM: SVM: relax conditions for allowing MSR_IA32_SPEC_CTRL accesses · df7e8818

由 Paolo Bonzini 提交于 2月 05, 2020

Userspace that does not know about the AMD_IBRS bit might still
allow the guest to protect itself with MSR_IA32_SPEC_CTRL using
the Intel SPEC_CTRL bit.  However, svm.c disallows this and will
cause a #GP in the guest when writing to the MSR.  Fix this by
loosening the test and allowing the Intel CPUID bit, and in fact
allow the AMD_STIBP bit as well since it allows writing to
MSR_IA32_SPEC_CTRL too.
Reported-by: NZhiyi Guo <zhguo@redhat.com>
Analyzed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Analyzed-by: NLaszlo Ersek <lersek@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

df7e8818

KVM: x86: Fix perfctr WRMSR for running counters · 4400cf54

由 Eric Hankland 提交于 1月 27, 2020

Correct the logic in intel_pmu_set_msr() for fixed and general purpose
counters. This was recently changed to set pmc->counter without taking
in to account the value of pmc_read_counter() which will be incorrect if
the counter is currently running and non-zero; this changes back to the
old logic which accounted for the value of currently running counters.
Signed-off-by: NEric Hankland <ehankland@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4400cf54

x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests · a8350231

由 Vitaly Kuznetsov 提交于 2月 05, 2020

Sane L1 hypervisors are not supposed to turn any of the unsupported VMX
controls on for its guests and nested_vmx_check_controls() checks for
that. This is, however, not the case for the controls which are supported
on the host but are missing in enlightened VMCS and when eVMCS is in use.

It would certainly be possible to add these missing checks to
nested_check_vm_execution_controls()/_vm_exit_controls()/.. but it seems
preferable to keep eVMCS-specific stuff in eVMCS and reduce the impact on
non-eVMCS guests by doing less unrelated checks. Create a separate
nested_evmcs_check_controls() for this purpose.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a8350231

x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs() · 31de3d25

由 Vitaly Kuznetsov 提交于 2月 05, 2020

With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
with default (matching CPU model) values and in case eVMCS is also enabled,
fails.

It would be possible to drop VMX feature filtering completely and make
this a guest's responsibility: if it decides to use eVMCS it should know
which fields are available and which are not. Hyper-V mostly complies to
this, however, there are some problematic controls:
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL

which Hyper-V enables. As there are no corresponding fields in eVMCS, we
can't handle this properly in KVM. This is a Hyper-V issue.

Move VMX controls sanitization from nested_enable_evmcs() to vmx_get_msr(),
and do the bare minimum (only clear controls which are known to cause issues).
This allows userspace to keep setting controls it wants and at the same
time hides them from the guest.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

31de3d25

kvm: mmu: Separate generating and setting mmio ptes · 8f79b064

由 Ben Gardon 提交于 2月 03, 2020

Separate the functions for generating MMIO page table entries from the
function that inserts them into the paging structure. This refactoring
will facilitate changes to the MMU sychronization model to use atomic
compare / exchanges (which are not guaranteed to succeed) instead of a
monolithic MMU lock.

No functional change expected.

Tested by running kvm-unit-tests on an Intel Haswell machine. This
commit introduced no new failures.
Signed-off-by: NBen Gardon <bgardon@google.com>
Reviewed-by: NOliver Upton <oupton@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8f79b064

kvm: mmu: Replace unsigned with unsigned int for PTE access · 0a2b64c5

由 Ben Gardon 提交于 2月 03, 2020

There are several functions which pass an access permission mask for
SPTEs as an unsigned. This works, but checkpatch complains about it.
Switch the occurrences of unsigned to unsigned int to satisfy checkpatch.

No functional change expected.

Tested by running kvm-unit-tests on an Intel Haswell machine. This
commit introduced no new failures.
Signed-off-by: NBen Gardon <bgardon@google.com>
Reviewed-by: NOliver Upton <oupton@google.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0a2b64c5

KVM: nVMX: Remove stale comment from nested_vmx_load_cr3() · ea79a750

由 Sean Christopherson 提交于 2月 04, 2020

The blurb pertaining to the return value of nested_vmx_load_cr3() no
longer matches reality, remove it entirely as the behavior it is
attempting to document is quite obvious when reading the actual code.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ea79a750

x86/kvm: do not setup pv tlb flush when not paravirtualized · 64b38bd1

由 Thadeu Lima de Souza Cascardo 提交于 1月 31, 2020

kvm_setup_pv_tlb_flush will waste memory and print a misguiding message
when KVM paravirtualization is not available.

Intel SDM says that the when cpuid is used with EAX higher than the
maximum supported value for basic of extended function, the data for the
highest supported basic function will be returned.

So, in some systems, kvm_arch_para_features will return bogus data,
causing kvm_setup_pv_tlb_flush to detect support for pv tlb flush.

Testing for kvm_para_available will work as it checks for the hypervisor
signature.

Besides, when the "nopv" command line parameter is used, it should not
continue as well, as kvm_guest_init will no be called in that case.
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

64b38bd1

KVM: x86: Take a u64 when checking for a valid dr7 value · 9b5e8532

由 Sean Christopherson 提交于 1月 24, 2020

Take a u64 instead of an unsigned long in kvm_dr7_valid() to fix a build
warning on i386 due to right-shifting a 32-bit value by 32 when checking
for bits being set in dr7[63:32].

Alternatively, the warning could be resolved by rewriting the check to
use an i386-friendly method, but taking a u64 fixes another oddity on
32-bit KVM. Beause KVM implements natural width VMCS fields as u64s to
avoid layout issues between 32-bit and 64-bit, a devious guest can stuff
vmcs12->guest_dr7 with a 64-bit value even when both the guest and host
are 32-bit kernels. KVM eventually drops vmcs12->guest_dr7[63:32] when
propagating vmcs12->guest_dr7 to vmcs02, but ideally KVM would not rely
on that behavior for correctness.

Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Fixes: ecb697d10f70 ("KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests")
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9b5e8532

KVM: x86: use raw clock values consistently · 8171cd68

由 Paolo Bonzini 提交于 1月 22, 2020

Commit 53fafdbb ("KVM: x86: switch KVMCLOCK base to monotonic raw
clock") changed kvmclock to use tkr_raw instead of tkr_mono. However,
the default kvmclock_offset for the VM was still based on the monotonic
clock and, if the raw clock drifted enough from the monotonic clock,
this could cause a negative system_time to be written to the guest's
struct pvclock. RHEL5 does not like it and (if it boots fast enough to
observe a negative time value) it hangs.

There is another thing to be careful about: getboottime64 returns the
host boot time with tkr_mono frequency, and subtracting the tkr_raw-based
kvmclock value will cause the wallclock to be off if tkr_raw drifts
from tkr_mono. To avoid this, compute the wallclock delta from the
current time instead of being clever and using getboottime64.

Fixes: 53fafdbb ("KVM: x86: switch KVMCLOCK base to monotonic raw clock")
Cc: stable@vger.kernel.org
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8171cd68

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功