提交 · 9446e6fce0ab9dfd44b96f630b4e3a0a0ab879fd · openeuler / Kernel

13 2月, 2020 7 次提交

KVM: x86: fix WARN_ON check of an unsigned less than zero · 9446e6fc

由 Paolo Bonzini 提交于 2月 12, 2020

The check cpu->hv_clock.system_time < 0 is redundant since system_time
is a u64 and hence can never be less than zero.  But what was actually
meant is to check that the result is positive, since kernel_ns and
v->kvm->arch.kvmclock_offset are both s64.
Reported-by: NColin King <colin.king@canonical.com>
Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
Addresses-Coverity: ("Macro compares unsigned to 0")
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9446e6fc

KVM: x86/mmu: Fix struct guest_walker arrays for 5-level paging · f6ab0107

由 Sean Christopherson 提交于 2月 07, 2020

Define PT_MAX_FULL_LEVELS as PT64_ROOT_MAX_LEVEL, i.e. 5, to fix shadow
paging for 5-level guest page tables. PT_MAX_FULL_LEVELS is used to
size the arrays that track guest pages table information, i.e. using a
"max levels" of 4 causes KVM to access garbage beyond the end of an
array when querying state for level 5 entries. E.g. FNAME(gpte_changed)
will read garbage and most likely return %true for a level 5 entry,
soft-hanging the guest because FNAME(fetch) will restart the guest
instead of creating SPTEs because it thinks the guest PTE has changed.

Note, KVM doesn't yet support 5-level nested EPT, so PT_MAX_FULL_LEVELS
gets to stay "4" for the PTTYPE_EPT case.

Fixes: 855feb67 ("KVM: MMU: Add 5 level EPT & Shadow page table support.")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f6ab0107

KVM: nVMX: Use correct root level for nested EPT shadow page tables · 148d735e

由 Sean Christopherson 提交于 2月 07, 2020

Hardcode the EPT page-walk level for L2 to be 4 levels, as KVM's MMU
currently also hardcodes the page walk level for nested EPT to be 4
levels. The L2 guest is all but guaranteed to soft hang on its first
instruction when L1 is using EPT, as KVM will construct 4-level page
tables and then tell hardware to use 5-level page tables.

148d735e

KVM: nVMX: Fix some comment typos and coding style · ffdbd50d

由 Miaohe Lin 提交于 2月 07, 2020

Fix some typos in the comments. Also fix coding style.
[Sean Christopherson rewrites the comment of write_fault_to_shadow_pgtable
field in struct kvm_vcpu_arch.]
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ffdbd50d

KVM: x86/mmu: Avoid retpoline on ->page_fault() with TDP · 7a02674d

由 Sean Christopherson 提交于 2月 06, 2020

Wrap calls to ->page_fault() with a small shim to directly invoke the
TDP fault handler when the kernel is using retpolines and TDP is being
used.  Single out the TDP fault handler and annotate the TDP path as
likely to coerce the compiler into preferring it over the indirect
function call.

Rename tdp_page_fault() to kvm_tdp_page_fault(), as it's exposed outside
of mmu.c to allow inlining the shim.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7a02674d

KVM: apic: reuse smp_wmb() in kvm_make_request() · 331ca0f8

由 Miaohe Lin 提交于 2月 07, 2020

kvm_make_request() provides smp_wmb() so pending_events changes are
guaranteed to be visible.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

331ca0f8

KVM: x86: remove duplicated KVM_REQ_EVENT request · 20796447

由 Miaohe Lin 提交于 2月 07, 2020

The KVM_REQ_EVENT request is already made in kvm_set_rflags(). We should
not make it again.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

20796447

12 2月, 2020 4 次提交

KVM: x86: Deliver exception payload on KVM_GET_VCPU_EVENTS · a06230b6

由 Oliver Upton 提交于 2月 07, 2020

KVM allows the deferral of exception payloads when a vCPU is in guest
mode to allow the L1 hypervisor to intercept certain events (#PF, #DB)
before register state has been modified. However, this behavior is
incompatible with the KVM_{GET,SET}_VCPU_EVENTS ABI, as userspace
expects register state to have been immediately modified. Userspace may
opt-in for the payload deferral behavior with the
KVM_CAP_EXCEPTION_PAYLOAD per-VM capability. As such,
kvm_multiple_exception() will immediately manipulate guest registers if
the capability hasn't been requested.

Since the deferral is only necessary if a userspace ioctl were to be
serviced at the same as a payload bearing exception is recognized, this
behavior can be relaxed. Instead, opportunistically defer the payload
from kvm_multiple_exception() and deliver the payload before completing
a KVM_GET_VCPU_EVENTS ioctl.
Signed-off-by: NOliver Upton <oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a06230b6

KVM: nVMX: Handle pending #DB when injecting INIT VM-exit · 684c0422

由 Oliver Upton 提交于 2月 07, 2020

SDM 27.3.4 states that the 'pending debug exceptions' VMCS field will
be populated if a VM-exit caused by an INIT signal takes priority over a
debug-trap. Emulate this behavior when synthesizing an INIT signal
VM-exit into L1.

Fixes: 4b9852f4 ("KVM: x86: Fix INIT signal handling in various CPU states")
Signed-off-by: NOliver Upton <oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

684c0422

KVM: x86: Mask off reserved bit from #DB exception payload · 307f1cfa

由 Oliver Upton 提交于 2月 07, 2020

KVM defines the #DB payload as compatible with the 'pending debug
exceptions' field under VMX, not DR6. Mask off bit 12 when applying the
payload to DR6, as it is reserved on DR6 but not the 'pending debug
exceptions' field.

Fixes: f10c729f ("kvm: vmx: Defer setting of DR6 until #DB delivery")
Signed-off-by: NOliver Upton <oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

307f1cfa

KVM: x86: do not reset microcode version on INIT or RESET · bab0c318

由 Paolo Bonzini 提交于 2月 11, 2020

Do not initialize the microcode version at RESET or INIT, only on vCPU
creation.   Microcode updates are not lost during INIT, and exact
behavior across a warm RESET is not specified by the architecture.

Since we do not support a microcode update directly from the hypervisor,
but only as a result of userspace setting the microcode version MSR,
it's simpler for userspace if we do nothing in KVM and let userspace
emulate behavior for RESET as it sees fit.

Userspace can tie the fix to the availability of MSR_IA32_UCODE_REV in
the list of emulated MSRs.
Reported-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bab0c318

08 2月, 2020 5 次提交

powerpc: Fix CONFIG_TRACE_IRQFLAGS with CONFIG_VMAP_STACK · d4bf9053

由 Christophe Leroy 提交于 2月 07, 2020

When CONFIG_PROVE_LOCKING is selected together with (now default)
CONFIG_VMAP_STACK, kernel enter deadlock during boot.

At the point of checking whether interrupts are enabled or not, the
value of MSR saved on stack is read using the physical address of the
stack. But at this point, when using VMAP stack the DATA MMU
translation has already been re-enabled, leading to deadlock.

Don't use the physical address of the stack when
CONFIG_VMAP_STACK is set.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Fixes: 02847487 ("powerpc/32: prepare for CONFIG_VMAP_STACK")
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/daeacdc0dec0416d1c587cc9f9e7191ad3068dc0.1581095957.git.christophe.leroy@c-s.fr

d4bf9053

powerpc/futex: Fix incorrect user access blocking · 9dc086f1

由 Michael Ellerman 提交于 2月 07, 2020

The early versions of our kernel user access prevention (KUAP) were
written by Russell and Christophe, and didn't have separate
read/write access.

At some point I picked up the series and added the read/write access,
but I failed to update the usages in futex.h to correctly allow read
and write.

However we didn't notice because of another bug which was causing the
low-level code to always enable read and write. That bug was fixed
recently in commit 1d8f739b ("powerpc/kuap: Fix set direction in
allow/prevent_user_access()").

futex_atomic_cmpxchg_inatomic() is passed the user address as %3 and
does:

  1:     lwarx   %1,  0, %3
         cmpw    0,  %1, %4
         bne-    3f
  2:     stwcx.  %5,  0, %3

Which clearly loads and stores from/to %3. The logic in
arch_futex_atomic_op_inuser() is similar, so fix both of them to use
allow_read_write_user().

Without this fix, and with PPC_KUAP_DEBUG=y, we see eg:

  Bug: Read fault blocked by AMR!
  WARNING: CPU: 94 PID: 149215 at arch/powerpc/include/asm/book3s/64/kup-radix.h:126 __do_page_fault+0x600/0xf30
  CPU: 94 PID: 149215 Comm: futex_requeue_p Tainted: G        W         5.5.0-rc7-gcc9x-g4c25df56 #1
  ...
  NIP [c000000000070680] __do_page_fault+0x600/0xf30
  LR [c00000000007067c] __do_page_fault+0x5fc/0xf30
  Call Trace:
  [c00020138e5637e0] [c00000000007067c] __do_page_fault+0x5fc/0xf30 (unreliable)
  [c00020138e5638c0] [c00000000000ada8] handle_page_fault+0x10/0x30
  --- interrupt: 301 at cmpxchg_futex_value_locked+0x68/0xd0
      LR = futex_lock_pi_atomic+0xe0/0x1f0
  [c00020138e563bc0] [c000000000217b50] futex_lock_pi_atomic+0x80/0x1f0 (unreliable)
  [c00020138e563c30] [c00000000021b668] futex_requeue+0x438/0xb60
  [c00020138e563d60] [c00000000021c6cc] do_futex+0x1ec/0x2b0
  [c00020138e563d90] [c00000000021c8b8] sys_futex+0x128/0x200
  [c00020138e563e20] [c00000000000b7ac] system_call+0x5c/0x68

Fixes: de78a9c4 ("powerpc: Add a framework for Kernel Userspace Access Protection")
Cc: stable@vger.kernel.org # v5.2+
Reported-by: syzbot+e808452bad7c375cbee6@syzkaller-ppc64.appspotmail.com
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Link: https://lore.kernel.org/r/20200207122145.11928-1-mpe@ellerman.id.au

9dc086f1

irqchip/gic-v3-its: Rename VPENDBASER/VPROPBASER accessors · 5186a6cc

由 Zenghui Yu 提交于 2月 06, 2020

V{PEND,PROP}BASER registers are actually located in VLPI_base frame
of the *redistributor*. Rename their accessors to reflect this fact.

No functional changes.
Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200206075711.1275-7-yuzenghui@huawei.com

5186a6cc

fs_parse: fold fs_parameter_desc/fs_parameter_spec · d7167b14

由 Al Viro 提交于 9月 07, 2019

The former contains nothing but a pointer to an array of the latter...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d7167b14

fs_parser: remove fs_parameter_description name field · 96cafb9c

由 Eric Sandeen 提交于 12月 06, 2019

Unused now.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

96cafb9c

07 2月, 2020 1 次提交

x86/apic: Mask IOAPIC entries when disabling the local APIC · 0f378d73

由 Tony W Wang-oc 提交于 1月 15, 2020

When a system suspends, the local APIC is disabled in the suspend sequence,
but the IOAPIC is left in the current state. This means unmasked interrupt
lines stay unmasked. This is usually the case for IOAPIC pin 9 to which the
ACPI interrupt is connected.

That means that in suspended state the IOAPIC can respond to an external
interrupt, e.g. the wakeup via keyboard/RTC/ACPI, but the interrupt message
cannot be handled by the disabled local APIC. As a consequence the Remote
IRR bit is set, but the local APIC does not send an EOI to acknowledge
it. This causes the affected interrupt line to become stale and the stale
Remote IRR bit will cause a hang when __synchronize_hardirq() is invoked
for that interrupt line.

To prevent this, mask all IOAPIC entries before disabling the local
APIC. The resume code already has the unmask operation inside.

[ tglx: Massaged changelog ]
Signed-off-by: NTony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1579076539-7267-1-git-send-email-TonyWWang-oc@zhaoxin.com

0f378d73

06 2月, 2020 1 次提交

virtio-blk: remove VIRTIO_BLK_F_SCSI support · 782e067d

由 Christoph Hellwig 提交于 12月 12, 2019

Since the need for a special flag to support SCSI passthrough on a
block device was added in May 2017 the SCSI passthrough support in
virtio-blk has been disabled.  It has always been a bad idea
(just ask the original author..) and we have virtio-scsi for proper
passthrough.  The feature also never made it into the virtio 1.0
or later specifications.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

782e067d

05 2月, 2020 22 次提交

KVM: vmx: delete meaningless vmx_decache_cr0_guest_bits() declaration · a8be1ad0

由 Miaohe Lin 提交于 2月 05, 2020

The function vmx_decache_cr0_guest_bits() is only called below its
implementation. So this is meaningless and should be removed.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a8be1ad0

KVM: x86: Mark CR4.UMIP as reserved based on associated CPUID bit · d76c7fbc

由 Sean Christopherson 提交于 1月 28, 2020

Re-add code to mark CR4.UMIP as reserved if UMIP is not supported by the
host. The UMIP handling was unintentionally dropped during a recent
refactoring.

Not flagging CR4.UMIP allows the guest to set its CR4.UMIP regardless of
host support or userspace desires. On CPUs with UMIP support, including
emulated UMIP, this allows the guest to enable UMIP against the wishes
of the userspace VMM. On CPUs without any form of UMIP, this results in
a failed VM-Enter due to invalid guest state.

Fixes: 345599f9 ("KVM: x86: Add macro to ensure reserved cr4 bits checks stay in sync")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d76c7fbc

x86: vmxfeatures: rename features for consistency with KVM and manual · bcfcff64

由 Paolo Bonzini 提交于 2月 05, 2020

Three of the feature bits in vmxfeatures.h have names that are different
from the Intel SDM.  The names have been adjusted recently in KVM but they
were using the old name in the tip tree's x86/cpu branch.  Adjust for
consistency.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bcfcff64

KVM: SVM: relax conditions for allowing MSR_IA32_SPEC_CTRL accesses · df7e8818

由 Paolo Bonzini 提交于 2月 05, 2020

Userspace that does not know about the AMD_IBRS bit might still
allow the guest to protect itself with MSR_IA32_SPEC_CTRL using
the Intel SPEC_CTRL bit.  However, svm.c disallows this and will
cause a #GP in the guest when writing to the MSR.  Fix this by
loosening the test and allowing the Intel CPUID bit, and in fact
allow the AMD_STIBP bit as well since it allows writing to
MSR_IA32_SPEC_CTRL too.
Reported-by: NZhiyi Guo <zhguo@redhat.com>
Analyzed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Analyzed-by: NLaszlo Ersek <lersek@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

df7e8818

KVM: x86: Fix perfctr WRMSR for running counters · 4400cf54

由 Eric Hankland 提交于 1月 27, 2020

Correct the logic in intel_pmu_set_msr() for fixed and general purpose
counters. This was recently changed to set pmc->counter without taking
in to account the value of pmc_read_counter() which will be incorrect if
the counter is currently running and non-zero; this changes back to the
old logic which accounted for the value of currently running counters.
Signed-off-by: NEric Hankland <ehankland@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4400cf54

x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests · a8350231

由 Vitaly Kuznetsov 提交于 2月 05, 2020

Sane L1 hypervisors are not supposed to turn any of the unsupported VMX
controls on for its guests and nested_vmx_check_controls() checks for
that. This is, however, not the case for the controls which are supported
on the host but are missing in enlightened VMCS and when eVMCS is in use.

It would certainly be possible to add these missing checks to
nested_check_vm_execution_controls()/_vm_exit_controls()/.. but it seems
preferable to keep eVMCS-specific stuff in eVMCS and reduce the impact on
non-eVMCS guests by doing less unrelated checks. Create a separate
nested_evmcs_check_controls() for this purpose.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a8350231

x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs() · 31de3d25

由 Vitaly Kuznetsov 提交于 2月 05, 2020

With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
with default (matching CPU model) values and in case eVMCS is also enabled,
fails.

It would be possible to drop VMX feature filtering completely and make
this a guest's responsibility: if it decides to use eVMCS it should know
which fields are available and which are not. Hyper-V mostly complies to
this, however, there are some problematic controls:
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL

which Hyper-V enables. As there are no corresponding fields in eVMCS, we
can't handle this properly in KVM. This is a Hyper-V issue.

Move VMX controls sanitization from nested_enable_evmcs() to vmx_get_msr(),
and do the bare minimum (only clear controls which are known to cause issues).
This allows userspace to keep setting controls it wants and at the same
time hides them from the guest.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

31de3d25

kvm: mmu: Separate generating and setting mmio ptes · 8f79b064

由 Ben Gardon 提交于 2月 03, 2020

Separate the functions for generating MMIO page table entries from the
function that inserts them into the paging structure. This refactoring
will facilitate changes to the MMU sychronization model to use atomic
compare / exchanges (which are not guaranteed to succeed) instead of a
monolithic MMU lock.

No functional change expected.

Tested by running kvm-unit-tests on an Intel Haswell machine. This
commit introduced no new failures.
Signed-off-by: NBen Gardon <bgardon@google.com>
Reviewed-by: NOliver Upton <oupton@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8f79b064

kvm: mmu: Replace unsigned with unsigned int for PTE access · 0a2b64c5

由 Ben Gardon 提交于 2月 03, 2020

There are several functions which pass an access permission mask for
SPTEs as an unsigned. This works, but checkpatch complains about it.
Switch the occurrences of unsigned to unsigned int to satisfy checkpatch.

No functional change expected.

Tested by running kvm-unit-tests on an Intel Haswell machine. This
commit introduced no new failures.
Signed-off-by: NBen Gardon <bgardon@google.com>
Reviewed-by: NOliver Upton <oupton@google.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0a2b64c5

KVM: nVMX: Remove stale comment from nested_vmx_load_cr3() · ea79a750

由 Sean Christopherson 提交于 2月 04, 2020

The blurb pertaining to the return value of nested_vmx_load_cr3() no
longer matches reality, remove it entirely as the behavior it is
attempting to document is quite obvious when reading the actual code.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ea79a750

KVM: MIPS: Fold comparecount_func() into comparecount_wakeup() · 879a3763

由 Sean Christopherson 提交于 2月 03, 2020

Fold kvm_mips_comparecount_func() into kvm_mips_comparecount_wakeup() to
eliminate the nondescript function name as well as its unnecessary cast
of a vcpu to "unsigned long" and back to a vcpu.  Presumably func() was
used as a callback at some point during pre-upstream development, as
wakeup() is the only user of func() and has been the only user since
both with introduced by commit 669e846e ("KVM/MIPS32: MIPS arch
specific APIs for KVM").

Cc: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

879a3763

KVM: MIPS: Fix a build error due to referencing not-yet-defined function · 09df6307

由 Sean Christopherson 提交于 2月 03, 2020

Hoist kvm_mips_comparecount_wakeup() above its only user,
kvm_arch_vcpu_create() to fix a compilation error due to referencing an
undefined function.

Fixes: d11dfed5 ("KVM: MIPS: Move all vcpu init code into kvm_arch_vcpu_create()")
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

09df6307

x86/kvm: do not setup pv tlb flush when not paravirtualized · 64b38bd1

由 Thadeu Lima de Souza Cascardo 提交于 1月 31, 2020

kvm_setup_pv_tlb_flush will waste memory and print a misguiding message
when KVM paravirtualization is not available.

Intel SDM says that the when cpuid is used with EAX higher than the
maximum supported value for basic of extended function, the data for the
highest supported basic function will be returned.

So, in some systems, kvm_arch_para_features will return bogus data,
causing kvm_setup_pv_tlb_flush to detect support for pv tlb flush.

Testing for kvm_para_available will work as it checks for the hypervisor
signature.

Besides, when the "nopv" command line parameter is used, it should not
continue as well, as kvm_guest_init will no be called in that case.
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

64b38bd1

KVM: x86: Take a u64 when checking for a valid dr7 value · 9b5e8532

由 Sean Christopherson 提交于 1月 24, 2020

Take a u64 instead of an unsigned long in kvm_dr7_valid() to fix a build
warning on i386 due to right-shifting a 32-bit value by 32 when checking
for bits being set in dr7[63:32].

Alternatively, the warning could be resolved by rewriting the check to
use an i386-friendly method, but taking a u64 fixes another oddity on
32-bit KVM. Beause KVM implements natural width VMCS fields as u64s to
avoid layout issues between 32-bit and 64-bit, a devious guest can stuff
vmcs12->guest_dr7 with a 64-bit value even when both the guest and host
are 32-bit kernels. KVM eventually drops vmcs12->guest_dr7[63:32] when
propagating vmcs12->guest_dr7 to vmcs02, but ideally KVM would not rely
on that behavior for correctness.

Cc: Jim Mattson <jmattson@google.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Fixes: ecb697d10f70 ("KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests")
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9b5e8532

KVM: x86: use raw clock values consistently · 8171cd68

由 Paolo Bonzini 提交于 1月 22, 2020

Commit 53fafdbb ("KVM: x86: switch KVMCLOCK base to monotonic raw
clock") changed kvmclock to use tkr_raw instead of tkr_mono. However,
the default kvmclock_offset for the VM was still based on the monotonic
clock and, if the raw clock drifted enough from the monotonic clock,
this could cause a negative system_time to be written to the guest's
struct pvclock. RHEL5 does not like it and (if it boots fast enough to
observe a negative time value) it hangs.

There is another thing to be careful about: getboottime64 returns the
host boot time with tkr_mono frequency, and subtracting the tkr_raw-based
kvmclock value will cause the wallclock to be off if tkr_raw drifts
from tkr_mono. To avoid this, compute the wallclock delta from the
current time instead of being clever and using getboottime64.

Fixes: 53fafdbb ("KVM: x86: switch KVMCLOCK base to monotonic raw clock")
Cc: stable@vger.kernel.org
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8171cd68

KVM: x86: reorganize pvclock_gtod_data members · 917f9475

由 Paolo Bonzini 提交于 1月 22, 2020

We will need a copy of tk->offs_boot in the next patch.  Store it and
cleanup the struct: instead of storing tk->tkr_xxx.base with the tk->offs_boot
included, store the raw value in struct pvclock_clock and sum it in
do_monotonic_raw and do_realtime.   tk->tkr_xxx.xtime_nsec also moves
to struct pvclock_clock.

While at it, fix a (usually harmless) typo in do_monotonic_raw, which
was using gtod->clock.shift instead of gtod->raw_clock.shift.

Fixes: 53fafdbb ("KVM: x86: switch KVMCLOCK base to monotonic raw clock")
Cc: stable@vger.kernel.org
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

917f9475

KVM: nVMX: delete meaningless nested_vmx_run() declaration · 33aabd02

由 Miaohe Lin 提交于 1月 23, 2020

The function nested_vmx_run() declaration is below its implementation. So
this is meaningless and should be removed.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

33aabd02

KVM: SVM: allow AVIC without split irqchip · e8ef2a19

由 Paolo Bonzini 提交于 1月 22, 2020

SVM is now able to disable AVIC dynamically whenever the in-kernel PIT sets
up an ack notifier, so we can enable it even if in-kernel IOAPIC/PIC/PIT
are in use.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e8ef2a19

kvm: ioapic: Lazy update IOAPIC EOI · f458d039

由 Suravee Suthikulpanit 提交于 11月 14, 2019

In-kernel IOAPIC does not receive EOI with AMD SVM AVIC
since the processor accelerate write to APIC EOI register and
does not trap if the interrupt is edge-triggered.

Workaround this by lazy check for pending APIC EOI at the time when
setting new IOPIC irq, and update IOAPIC EOI if no pending APIC EOI.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f458d039

kvm: ioapic: Refactor kvm_ioapic_update_eoi() · 1ec2405c

由 Suravee Suthikulpanit 提交于 11月 14, 2019

Refactor code for handling IOAPIC EOI for subsequent patch.
There is no functional change.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1ec2405c

kvm: i8254: Deactivate APICv when using in-kernel PIT re-injection mode. · e2ed4078

由 Suravee Suthikulpanit 提交于 11月 14, 2019

AMD SVM AVIC accelerates EOI write and does not trap. This causes
in-kernel PIT re-injection mode to fail since it relies on irq-ack
notifier mechanism. So, APICv is activated only when in-kernel PIT
is in discard mode e.g. w/ qemu option:

  -global kvm-pit.lost_tick_policy=discard

Also, introduce APICV_INHIBIT_REASON_PIT_REINJ bit to be used for this
reason.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e2ed4078

svm: Temporarily deactivate AVIC during ExtINT handling · f3515dc3

由 Suravee Suthikulpanit 提交于 11月 14, 2019

AMD AVIC does not support ExtINT. Therefore, AVIC must be temporary
deactivated and fall back to using legacy interrupt injection via vINTR
and interrupt window.

Also, introduce APICV_INHIBIT_REASON_IRQWIN to be used for this reason.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
[Rename svm_request_update_avic to svm_toggle_avic_for_extint. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f3515dc3

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功