提交 · 4407a797e9412afba2f7815fb19395d4b32dca4e · openeuler / Kernel

28 9月, 2020 25 次提交

KVM: SVM: Enable INVPCID feature on AMD · 4407a797

由 Babu Moger 提交于 9月 11, 2020

The following intercept bit has been added to support VMEXIT
for INVPCID instruction:
Code    Name            Cause
A2h     VMEXIT_INVPCID  INVPCID instruction

The following bit has been added to the VMCB layout control area
to control intercept of INVPCID:
Byte Offset     Bit(s)    Function
14h             2         intercept INVPCID

Enable the interceptions when the the guest is running with shadow
page table enabled and handle the tlbflush based on the invpcid
instruction type.

For the guests with nested page table (NPT) support, the INVPCID
feature works as running it natively. KVM does not need to do any
special handling in this case.

AMD documentation for INVPCID feature is available at "AMD64
Architecture Programmer’s Manual Volume 2: System Programming,
Pub. 24593 Rev. 3.34(or later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985255929.11252.17346684135277453258.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4407a797

KVM: X86: Move handling of INVPCID types to x86 · 9715092f

由 Babu Moger 提交于 9月 11, 2020

INVPCID instruction handling is mostly same across both VMX and
SVM. So, move the code to common x86.c.
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985255212.11252.10322694343971983487.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9715092f

KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c · 3f3393b3

由 Babu Moger 提交于 9月 11, 2020

Handling of kvm_read/write_guest_virt*() errors can be moved to common
code. The same code can be used by both VMX and SVM.
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985254493.11252.6603092560732507607.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3f3393b3

KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept · 830bd71f

由 Babu Moger 提交于 9月 11, 2020

Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept. Instead
call generic svm_set_intercept, svm_clr_intercept an dsvm_is_intercep
tfor all cr intercepts.
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985253016.11252.16945893859439811480.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

830bd71f

KVM: SVM: Add new intercept word in vmcb_control_area · 4c44e8d6

由 Babu Moger 提交于 9月 11, 2020

The new intercept bits have been added in vmcb control area to support
few more interceptions. Here are the some of them.
 - INTERCEPT_INVLPGB,
 - INTERCEPT_INVLPGB_ILLEGAL,
 - INTERCEPT_INVPCID,
 - INTERCEPT_MCOMMIT,
 - INTERCEPT_TLBSYNC,

Add a new intercept word in vmcb_control_area to support these instructions.
Also update kvm_nested_vmrun trace function to support the new addition.

AMD documentation for these instructions is available at "AMD64
Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593
Rev. 3.34(or later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985251547.11252.16994139329949066945.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4c44e8d6

KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors · c62e2e94

由 Babu Moger 提交于 9月 11, 2020

Convert all the intercepts to one array of 32 bit vectors in
vmcb_control_area. This makes it easy for future intercept vector
additions. Also update trace functions.
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985250813.11252.5736581193881040525.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c62e2e94

KVM: SVM: Modify intercept_exceptions to generic intercepts · 9780d51d

由 Babu Moger 提交于 9月 11, 2020

Modify intercept_exceptions to generic intercepts in vmcb_control_area. Use
the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to
set/clear/test the intercept_exceptions bits.
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985250037.11252.1361972528657052410.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9780d51d

KVM: SVM: Change intercept_dr to generic intercepts · 30abaa88

由 Babu Moger 提交于 9月 11, 2020

Modify intercept_dr to generic intercepts in vmcb_control_area. Use
the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept
to set/clear/test the intercept_dr bits.
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985249255.11252.10000868032136333355.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

30abaa88

KVM: SVM: Change intercept_cr to generic intercepts · 03bfeeb9

由 Babu Moger 提交于 9月 11, 2020

Change intercept_cr to generic intercepts in vmcb_control_area.
Use the new vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept
where applicable.
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985248506.11252.9081085950784508671.stgit@bmoger-ubuntu>
[Change constant names. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

03bfeeb9

KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept) · c45ad722

由 Babu Moger 提交于 9月 11, 2020

This is in preparation for the future intercept vector additions.

Add new functions vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept
using kernel APIs __set_bit, __clear_bit and test_bit espectively.
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985247876.11252.16039238014239824460.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c45ad722

KVM: nSVM: Remove unused field · a90c1ed9

由 Babu Moger 提交于 9月 11, 2020

host_intercept_exceptions is not used anywhere. Clean it up.
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985252277.11252.8819848322175521354.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a90c1ed9

KVM: SVM: refactor exit labels in svm_create_vcpu · 8d22b90e

由 Maxim Levitsky 提交于 8月 27, 2020

Kernel coding style suggests not to use labels like error1,error2
Suggested-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200827171145.374620-6-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8d22b90e

KVM: SVM: use __GFP_ZERO instead of clear_page · 0681de1b

由 Maxim Levitsky 提交于 8月 27, 2020

Another small refactoring.
Suggested-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200827171145.374620-5-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0681de1b

KVM: SVM: refactor msr permission bitmap allocation · f4c847a9

由 Maxim Levitsky 提交于 8月 27, 2020

Replace svm_vcpu_init_msrpm with svm_vcpu_alloc_msrpm, that also allocates
the msr bitmap and add svm_vcpu_free_msrpm to free it.

This will be used later to move the nested msr permission bitmap allocation
to nested.c
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200827171145.374620-4-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f4c847a9

KVM: nSVM: rename nested vmcb to vmcb12 · 0dd16b5b

由 Maxim Levitsky 提交于 8月 27, 2020

This is to be more consistient with VMX, and to support
upcoming addition of vmcb02

Hopefully no functional changes.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200827171145.374620-3-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0dd16b5b

KVM: SVM: rename a variable in the svm_create_vcpu · 1feaba14

由 Maxim Levitsky 提交于 8月 27, 2020

The 'page' is to hold the vcpu's vmcb so name it as such to
avoid confusion.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <20200827171145.374620-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1feaba14

KVM: LAPIC: Reduce world switch latency caused by timer_advance_ns · 010fd37f

由 Wanpeng Li 提交于 9月 10, 2020

All the checks in lapic_timer_int_injected(), __kvm_wait_lapic_expire(), and
these function calls waste cpu cycles when the timer mode is not tscdeadline.
We can observe ~1.3% world switch time overhead by kvm-unit-tests/vmexit.flat
vmcall testing on AMD server. This patch reduces the world switch latency
caused by timer_advance_ns feature when the timer mode is not tscdeadline by
simpling move the check against apic->lapic_timer.expired_tscdeadline much
earlier.
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1599731444-3525-7-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

010fd37f

KVM: LAPIC: Narrow down the kick target vCPU · 68ca7663

由 Wanpeng Li 提交于 9月 10, 2020

The kick after setting KVM_REQ_PENDING_TIMER is used to handle the timer
fires on a different pCPU which vCPU is running on. This kick costs about
1000 clock cycles and we don't need this when injecting already-expired
timer or when using the VMX preemption timer because
kvm_lapic_expired_hv_timer() is called from the target vCPU.
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1599731444-3525-6-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

68ca7663

KVM: LAPIC: Guarantee the timer is in tsc-deadline mode when setting · 27503833

由 Wanpeng Li 提交于 9月 10, 2020

Check apic_lvtt_tscdeadline() mode directly instead of apic_lvtt_oneshot()
and apic_lvtt_period() to guarantee the timer is in tsc-deadline mode when
wrmsr MSR_IA32_TSCDEADLINE.
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1599731444-3525-3-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

27503833

KVM: LAPIC: Return 0 when getting the tscdeadline timer if the lapic is hw disabled · a970e9b2

由 Wanpeng Li 提交于 9月 10, 2020

Return 0 when getting the tscdeadline timer if the lapic is hw disabled.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1599731444-3525-2-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a970e9b2

KVM: LAPIC: Fix updating DFR missing apic map recalculation · ae6f2496

由 Wanpeng Li 提交于 8月 19, 2020

There is missing apic map recalculation after updating DFR, if it is
INIT RESET, in x2apic mode, local apic is software enabled before.
This patch fix it by introducing the function kvm_apic_set_dfr() to
be called in INIT RESET handling path.
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1597827327-25055-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ae6f2496

kvm/eventfd: move wildcard calculation outside loop · 2fc4f15d

由 Yi Li 提交于 9月 11, 2020

There is no need to calculate wildcard in each iteration
since wildcard is not changed.
Signed-off-by: NYi Li <yili@winhong.com>
Message-Id: <20200911055652.3041762-1-yili@winhong.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2fc4f15d

KVM: nVMX: Simplify the initialization of nested_vmx_msrs · b9757a4b

由 Chenyi Qiang 提交于 8月 28, 2020

The nested VMX controls MSRs can be initialized by the global capability
values stored in vmcs_config.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200828085622.8365-6-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9757a4b

KVM: nVMX: Fix VMX controls MSRs setup when nested VMX enabled · efc83133

由 Chenyi Qiang 提交于 8月 28, 2020

KVM supports the nested VM_{EXIT, ENTRY}_LOAD_IA32_PERF_GLOBAL_CTRL and
VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS, but they are not exposed by the
system ioctl KVM_GET_MSR.  Add them to the setup of nested VMX controls MSR.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20200828085622.8365-2-chenyi.qiang@intel.com>
Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

efc83133

KVM: nSVM: Avoid freeing uninitialized pointers in svm_set_nested_state() · d5cd6f34

由 Vitaly Kuznetsov 提交于 9月 14, 2020

The save and ctl pointers are passed uninitialized to kfree() when
svm_set_nested_state() follows the 'goto out_set_gif' path. While the
issue could've been fixed by initializing these on-stack varialbles to
NULL, it seems preferable to eliminate 'out_set_gif' label completely as
it is not actually a failure path and duplicating a single svm_set_gif()
call doesn't look too bad.

 [ bp: Drop obscure Addresses-Coverity: tag. ]

Fixes: 6ccbd29a ("KVM: SVM: nested: Don't allocate VMCB structures on stack")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Reported-by: NJoerg Roedel <jroedel@suse.de>
Reported-by: NColin King <colin.king@canonical.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Acked-by: NJoerg Roedel <jroedel@suse.de>
Tested-by: NTom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20200914133725.650221-1-vkuznets@redhat.comSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d5cd6f34

22 9月, 2020 5 次提交

Merge tag 'kvm-ppc-next-5.10-1' of... · 2e3df760

由 Paolo Bonzini 提交于 9月 22, 2020

Merge tag 'kvm-ppc-next-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

PPC KVM update for 5.10

- Fix for running nested guests with in-kernel IRQ chip
- Fix race condition causing occasional host hard lockup
- Minor cleanups and bugfixes

2e3df760

Merge branch 'x86-seves-for-paolo' of... · bf3c0e5e

由 Paolo Bonzini 提交于 9月 22, 2020

Merge branch 'x86-seves-for-paolo' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD

bf3c0e5e

KVM: PPC: Book3S: Fix symbol undeclared warnings · cf59eb13

由 Wang Wensheng 提交于 9月 21, 2020

Build the kernel with `C=2`:
arch/powerpc/kvm/book3s_hv_nested.c:572:25: warning: symbol
'kvmhv_alloc_nested' was not declared. Should it be static?
arch/powerpc/kvm/book3s_64_mmu_radix.c:350:6: warning: symbol
'kvmppc_radix_set_pte_at' was not declared. Should it be static?
arch/powerpc/kvm/book3s_hv.c:3568:5: warning: symbol
'kvmhv_p9_guest_entry' was not declared. Should it be static?
arch/powerpc/kvm/book3s_hv_rm_xics.c:767:15: warning: symbol 'eoi_rc'
was not declared. Should it be static?
arch/powerpc/kvm/book3s_64_vio_hv.c:240:13: warning: symbol
'iommu_tce_kill_rm' was not declared. Should it be static?
arch/powerpc/kvm/book3s_64_vio.c:492:6: warning: symbol
'kvmppc_tce_iommu_do_map' was not declared. Should it be static?
arch/powerpc/kvm/book3s_pr.c:572:6: warning: symbol 'kvmppc_set_pvr_pr'
was not declared. Should it be static?

Those symbols are used only in the files that define them so make them
static to fix the warnings.
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

cf59eb13

KVM: PPC: Book3S: Remove redundant initialization of variable ret · eb173559

由 Jing Xiangfeng 提交于 9月 19, 2020

The variable ret is being initialized with '-ENOMEM' that is meaningless.
So remove it.
Signed-off-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: NFabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

eb173559

KVM: PPC: Book3S HV: XIVE: Convert to DEFINE_SHOW_ATTRIBUTE · 45170766

由 Qinglang Miao 提交于 9月 19, 2020

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: NQinglang Miao <miaoqinglang@huawei.com>
Reviewed-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

45170766

21 9月, 2020 3 次提交

Merge tag 'kvm-s390-master-5.9-1' of... · 32251b07

由 Paolo Bonzini 提交于 9月 20, 2020

Merge tag 'kvm-s390-master-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master

KVM: s390: add documentation for KVM_CAP_S390_DIAG318

diag318 code was merged in 5.9-rc1, let us add some
missing documentation

32251b07

Merge tag 'kvmarm-fixes-5.9-2' of... · b73815a1

由 Paolo Bonzini 提交于 9月 20, 2020

Merge tag 'kvmarm-fixes-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master

KVM/arm64 fixes for 5.9, take #2

- Fix handling of S1 Page Table Walk permission fault at S2
  on instruction fetch
- Cleanup kvm_vcpu_dabt_iswrite()

b73815a1

Revert "KVM: Check the allocation of pv cpu mask" · 7d1f8691

由 Vitaly Kuznetsov 提交于 9月 20, 2020

The commit 0f990222 ("KVM: Check the allocation of pv cpu mask") we
have in 5.9-rc5 has two issue:
1) Compilation fails for !CONFIG_SMP, see:
   https://bugzilla.kernel.org/show_bug.cgi?id=209285

2) This commit completely disables PV TLB flush, see
   https://lore.kernel.org/kvm/87y2lrnnyf.fsf@vitty.brq.redhat.com/

The allocation problem is likely a theoretical one, if we don't
have memory that early in boot process we're likely doomed anyway.
Let's solve it properly later.

This reverts commit 0f990222.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7d1f8691

19 9月, 2020 2 次提交

KVM: arm64: Remove S1PTW check from kvm_vcpu_dabt_iswrite() · 620cf45f

由 Marc Zyngier 提交于 9月 15, 2020

Now that kvm_vcpu_trap_is_write_fault() checks for S1PTW, there
is no need for kvm_vcpu_dabt_iswrite() to do the same thing, as
we already check for this condition on all existing paths.

Drop the check and add a comment instead.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NWill Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200915104218.1284701-3-maz@kernel.org

620cf45f

KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch · c4ad98e4

由 Marc Zyngier 提交于 9月 15, 2020

KVM currently assumes that an instruction abort can never be a write.
This is in general true, except when the abort is triggered by
a S1PTW on instruction fetch that tries to update the S1 page tables
(to set AF, for example).

This can happen if the page tables have been paged out and brought
back in without seeing a direct write to them (they are thus marked
read only), and the fault handling code will make the PT executable(!)
instead of writable. The guest gets stuck forever.

In these conditions, the permission fault must be considered as
a write so that the Stage-1 update can take place. This is essentially
the I-side equivalent of the problem fixed by 60e21a0e ("arm64: KVM:
Take S1 walks into account when determining S2 write faults").

Update kvm_is_write_fault() to return true on IABT+S1PTW, and introduce
kvm_vcpu_trap_is_exec_fault() that only return true when no faulting
on a S1 fault. Additionally, kvm_vcpu_dabt_iss1tw() is renamed to
kvm_vcpu_abt_iss1tw(), as the above makes it plain that it isn't
specific to data abort.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NWill Deacon <will@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200915104218.1284701-2-maz@kernel.org

c4ad98e4

17 9月, 2020 3 次提交

KVM: PPC: Book3S HV: Set LPCR[HDICE] before writing HDEC · 35dfb43c

由 Paul Mackerras 提交于 9月 03, 2020

POWER8 and POWER9 machines have a hardware deviation where generation
of a hypervisor decrementer exception is suppressed if the HDICE bit
in the LPCR register is 0 at the time when the HDEC register
decrements from 0 to -1.  When entering a guest, KVM first writes the
HDEC register with the time until it wants the CPU to exit the guest,
and then writes the LPCR with the guest value, which includes
HDICE = 1.  If HDEC decrements from 0 to -1 during the interval
between those two events, it is possible that we can enter the guest
with HDEC already negative but no HDEC exception pending, meaning that
no HDEC interrupt will occur while the CPU is in the guest, or at
least not until HDEC wraps around.  Thus it is possible for the CPU to
keep executing in the guest for a long time; up to about 4 seconds on
POWER8, or about 4.46 years on POWER9 (except that the host kernel
hard lockup detector will fire first).

To fix this, we set the LPCR[HDICE] bit before writing HDEC on guest
entry.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

35dfb43c

KVM: PPC: Book3S HV: Do not allocate HPT for a nested guest · 05e6295d

由 Fabiano Rosas 提交于 9月 11, 2020

The current nested KVM code does not support HPT guests. This is
informed/enforced in some ways:

- Hosts < P9 will not be able to enable the nested HV feature;

- The nested hypervisor MMU capabilities will not contain
  KVM_CAP_PPC_MMU_HASH_V3;

- QEMU reflects the MMU capabilities in the
  'ibm,arch-vec-5-platform-support' device-tree property;

- The nested guest, at 'prom_parse_mmu_model' ignores the
  'disable_radix' kernel command line option if HPT is not supported;

- The KVM_PPC_CONFIGURE_V3_MMU ioctl will fail if trying to use HPT.

There is, however, still a way to start a HPT guest by using
max-compat-cpu=power8 at the QEMU machine options. This leads to the
guest being set to use hash after QEMU calls the KVM_PPC_ALLOCATE_HTAB
ioctl.

With the guest set to hash, the nested hypervisor goes through the
entry path that has no knowledge of nesting (kvmppc_run_vcpu) and
crashes when it tries to execute an hypervisor-privileged (mtspr
HDEC) instruction at __kvmppc_vcore_entry:

root@L1:~ $ qemu-system-ppc64 -machine pseries,max-cpu-compat=power8 ...

<snip>
[  538.543303] CPU: 83 PID: 25185 Comm: CPU 0/KVM Not tainted 5.9.0-rc4 #1
[  538.543355] NIP:  c00800000753f388 LR: c00800000753f368 CTR: c0000000001e5ec0
[  538.543417] REGS: c0000013e91e33b0 TRAP: 0700   Not tainted  (5.9.0-rc4)
[  538.543470] MSR:  8000000002843033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 22422882  XER: 20040000
[  538.543546] CFAR: c00800000753f4b0 IRQMASK: 3
               GPR00: c0080000075397a0 c0000013e91e3640 c00800000755e600 0000000080000000
               GPR04: 0000000000000000 c0000013eab19800 c000001394de0000 00000043a054db72
               GPR08: 00000000003b1652 0000000000000000 0000000000000000 c0080000075502e0
               GPR12: c0000000001e5ec0 c0000007ffa74200 c0000013eab19800 0000000000000008
               GPR16: 0000000000000000 c00000139676c6c0 c000000001d23948 c0000013e91e38b8
               GPR20: 0000000000000053 0000000000000000 0000000000000001 0000000000000000
               GPR24: 0000000000000001 0000000000000001 0000000000000000 0000000000000001
               GPR28: 0000000000000001 0000000000000053 c0000013eab19800 0000000000000001
[  538.544067] NIP [c00800000753f388] __kvmppc_vcore_entry+0x90/0x104 [kvm_hv]
[  538.544121] LR [c00800000753f368] __kvmppc_vcore_entry+0x70/0x104 [kvm_hv]
[  538.544173] Call Trace:
[  538.544196] [c0000013e91e3640] [c0000013e91e3680] 0xc0000013e91e3680 (unreliable)
[  538.544260] [c0000013e91e3820] [c0080000075397a0] kvmppc_run_core+0xbc8/0x19d0 [kvm_hv]
[  538.544325] [c0000013e91e39e0] [c00800000753d99c] kvmppc_vcpu_run_hv+0x404/0xc00 [kvm_hv]
[  538.544394] [c0000013e91e3ad0] [c0080000072da4fc] kvmppc_vcpu_run+0x34/0x48 [kvm]
[  538.544472] [c0000013e91e3af0] [c0080000072d61b8] kvm_arch_vcpu_ioctl_run+0x310/0x420 [kvm]
[  538.544539] [c0000013e91e3b80] [c0080000072c7450] kvm_vcpu_ioctl+0x298/0x778 [kvm]
[  538.544605] [c0000013e91e3ce0] [c0000000004b8c2c] sys_ioctl+0x1dc/0xc90
[  538.544662] [c0000013e91e3dc0] [c00000000002f9a4] system_call_exception+0xe4/0x1c0
[  538.544726] [c0000013e91e3e20] [c00000000000d140] system_call_common+0xf0/0x27c
[  538.544787] Instruction dump:
[  538.544821] f86d1098 60000000 60000000 48000099 e8ad0fe8 e8c500a0 e9264140 75290002
[  538.544886] 7d1602a6 7cec42a6 40820008 7d0807b4 <7d164ba6> 7d083a14 f90d10a0 480104fd
[  538.544953] ---[ end trace 74423e2b948c2e0c ]---

This patch makes the KVM_PPC_ALLOCATE_HTAB ioctl fail when running in
the nested hypervisor, causing QEMU to abort.
Reported-by: NSatheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: NFabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

05e6295d

KVM: PPC: Don't return -ENOTSUPP to userspace in ioctls · 4e1b2ab7

由 Greg Kurz 提交于 9月 11, 2020

ENOTSUPP is a linux only thingy, the value of which is unknown to
userspace, not to be confused with ENOTSUP which linux maps to
EOPNOTSUPP, as permitted by POSIX [1]:

[EOPNOTSUPP]
Operation not supported on socket. The type of socket (address family
or protocol) does not support the requested operation. A conforming
implementation may assign the same values for [EOPNOTSUPP] and [ENOTSUP].

Return -EOPNOTSUPP instead of -ENOTSUPP for the following ioctls:
- KVM_GET_FPU for Book3s and BookE
- KVM_SET_FPU for Book3s and BookE
- KVM_GET_DIRTY_LOG for BookE

This doesn't affect QEMU which doesn't call the KVM_GET_FPU and
KVM_SET_FPU ioctls on POWER anyway since they are not supported,
and _buggily_ ignores anything but -EPERM for KVM_GET_DIRTY_LOG.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.htmlSigned-off-by: NGreg Kurz <groug@kaod.org>
Acked-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

4e1b2ab7

14 9月, 2020 1 次提交

docs: kvm: add documentation for KVM_CAP_S390_DIAG318 · f20d4e92

由 Collin Walling 提交于 6月 25, 2020

Documentation for the s390 DIAGNOSE 0x318 instruction handling.
Signed-off-by: NCollin Walling <walling@linux.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/kvm/20200625150724.10021-2-walling@linux.ibm.com/
Message-Id: <20200625150724.10021-2-walling@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

f20d4e92

13 9月, 2020 1 次提交

KVM: emulator: more strict rsm checks. · 37f66bbe

由 Maxim Levitsky 提交于 8月 27, 2020

Don't ignore return values in rsm_load_state_64/32 to avoid
loading invalid state from SMM state area if it was tampered with
by the guest.

This is primarly intended to avoid letting guest set bits in EFER
(like EFER.SVME when nesting is disabled) by manipulating SMM save area.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200827171145.374620-8-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

37f66bbe

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功