提交 · 760849b1476c94da4cca5d3a5f0a1f64ffc92ba4 · openeuler / Kernel

11 11月, 2021 7 次提交

KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES · 760849b1

由 Paul Durrant 提交于 11月 05, 2021

Currently when kvm_update_cpuid_runtime() runs, it assumes that the
KVM_CPUID_FEATURES leaf is located at 0x40000001. This is not true,
however, if Hyper-V support is enabled. In this case the KVM leaves will
be offset.

This patch introdues as new 'kvm_cpuid_base' field into struct
kvm_vcpu_arch to track the location of the KVM leaves and function
kvm_update_kvm_cpuid_base() (called from kvm_set_cpuid()) to locate the
leaves using the 'KVMKVMKVM\0\0\0' signature (which is now given a
definition in kvm_para.h). Adjustment of KVM_CPUID_FEATURES will hence now
target the correct leaf.

NOTE: A new for_each_possible_hypervisor_cpuid_base() macro is intoduced
      into processor.h to avoid having duplicate code for the iteration
      over possible hypervisor base leaves.
Signed-off-by: NPaul Durrant <pdurrant@amazon.com>
Message-Id: <20211105095101.5384-3-pdurrant@amazon.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

760849b1

KVM: x86: Add helper to consolidate core logic of SET_CPUID{2} flows · 8b44b174

由 Sean Christopherson 提交于 11月 05, 2021

Move the core logic of SET_CPUID and SET_CPUID2 to a common helper, the
only difference between the two ioctls() is the format of the userspace
struct.  A future fix will add yet more code to the core logic.

No functional change intended.

Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211105095101.5384-2-pdurrant@amazon.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8b44b174

kvm: mmu: Use fast PF path for access tracking of huge pages when possible · 10c30de0

由 Junaid Shahid 提交于 11月 03, 2021

The fast page fault path bails out on write faults to huge pages in
order to accommodate dirty logging. This change adds a check to do that
only when dirty logging is actually enabled, so that access tracking for
huge pages can still use the fast path for write faults in the common
case.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Reviewed-by: NBen Gardon <bgardon@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211104003359.2201967-1-junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

10c30de0

KVM: x86/mmu: Properly dereference rcu-protected TDP MMU sptep iterator · c435d4b7

由 Sean Christopherson 提交于 11月 03, 2021

Wrap the read of iter->sptep in tdp_mmu_map_handle_target_level() with
rcu_dereference().  Shadow pages in the TDP MMU, and thus their SPTEs,
are protected by rcu.

This fixes a Sparse warning at tdp_mmu.c:900:51:
  warning: incorrect type in argument 1 (different address spaces)
  expected unsigned long long [usertype] *sptep
  got unsigned long long [noderef] [usertype] __rcu *[usertype] sptep

Fixes: 7158bee4 ("KVM: MMU: pass kvm_mmu_page struct to make_spte")
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211103161833.3769487-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c435d4b7

KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active · cae72dcc

由 Maxim Levitsky 提交于 11月 08, 2021

KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using
standard kvm's inject_pending_event, and not via APICv/AVIC.

Since this is a debug feature, just inhibit APICv/AVIC while
KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU.

Fixes: 61e5f69e ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ")
Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Tested-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cae72dcc

kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool · e6cd31f1

由 Jim Mattson 提交于 11月 05, 2021

These function names sound like predicates, and they have siblings,
*is_valid_msr(), which _are_ predicates. Moreover, there are comments
that essentially warn that these functions behave unexpectedly.

Flip the polarity of the return values, so that they become
predicates, and convert the boolean result to a success/failure code
at the outer call site.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211105202058.1048757-1-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e6cd31f1

KVM: x86: Fix recording of guest steal time / preempted status · 7e2175eb

由 David Woodhouse 提交于 11月 02, 2021

In commit b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is
not missed") we switched to using a gfn_to_pfn_cache for accessing the
guest steal time structure in order to allow for an atomic xchg of the
preempted field. This has a couple of problems.

Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the
atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a
guest vCPU using an IOMEM page for its steal time would never have its
preempted field set.

Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it
should have been. There are two stages to the GFN->PFN conversion;
first the GFN is converted to a userspace HVA, and then that HVA is
looked up in the process page tables to find the underlying host PFN.
Correct invalidation of the latter would require being hooked up to the
MMU notifiers, but that doesn't happen---so it just keeps mapping and
unmapping the *wrong* PFN after the userspace page tables change.

In the !IOMEM case at least the stale page *is* pinned all the time it's
cached, so it won't be freed and reused by anyone else while still
receiving the steal time updates. The map/unmap dance only takes care
of the KVM administrivia such as marking the page dirty.

Until the gfn_to_pfn cache handles the remapping automatically by
integrating with the MMU notifiers, we might as well not get a
kernel mapping of it, and use the perfectly serviceable userspace HVA
that we already have. We just need to implement the atomic xchg on
the userspace address with appropriate exception handling, which is
fairly trivial.

Cc: stable@vger.kernel.org
Fixes: b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed")
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel@infradead.org>
[I didn't entirely agree with David's assessment of the
usefulness of the gfn_to_pfn cache, and integrated the outcome
of the discussion in the above commit message. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7e2175eb

02 11月, 2021 1 次提交
- P
  Merge tag 'kvm-riscv-5.16-2' of https://github.com/kvm-riscv/linux into HEAD · 52cf891d
  由 Paolo Bonzini 提交于 11月 02, 2021
```
Minor cocci warning fixes:
1) Bool return warning fix
2) Unnedded semicolon warning fix
```
  52cf891d
01 11月, 2021 2 次提交

RISC-V: KVM: fix boolreturn.cocci warnings · bbd5ba8d

由 Bixuan Cui 提交于 10月 27, 2021

Fix boolreturn.cocci warnings:
./arch/riscv/kvm/mmu.c:603:9-10: WARNING: return of 0/1 in function
'kvm_age_gfn' with return type bool
./arch/riscv/kvm/mmu.c:582:9-10: WARNING: return of 0/1 in function
'kvm_set_spte_gfn' with return type bool
./arch/riscv/kvm/mmu.c:621:9-10: WARNING: return of 0/1 in function
'kvm_test_age_gfn' with return type bool
./arch/riscv/kvm/mmu.c:568:9-10: WARNING: return of 0/1 in function
'kvm_unmap_gfn_range' with return type bool
Signed-off-by: NBixuan Cui <cuibixuan@linux.alibaba.com>
Signed-off-by: NAnup Patel <anup.patel@wdc.com>

bbd5ba8d

RISC-V: KVM: remove unneeded semicolon · 7b161d9c

由 ran jianping 提交于 10月 21, 2021

 Elimate the following coccinelle check warning:
 ./arch/riscv/kvm/vcpu_sbi.c:169:2-3: Unneeded semicolon
 ./arch/riscv/kvm/vcpu_exit.c:397:2-3: Unneeded semicolon
 ./arch/riscv/kvm/vcpu_exit.c:687:2-3: Unneeded semicolon
 ./arch/riscv/kvm/vcpu_exit.c:645:2-3: Unneeded semicolon
 ./arch/riscv/kvm/vcpu.c:247:2-3: Unneeded semicolon
 ./arch/riscv/kvm/vcpu.c:284:2-3: Unneeded semicolon
 ./arch/riscv/kvm/vcpu_timer.c:123:2-3: Unneeded semicolon
 ./arch/riscv/kvm/vcpu_timer.c:170:2-3: Unneeded semicolon
Reported-by: NZeal Robot <zealci@zte.com.cn>
Signed-off-by: Nran jianping <ran.jianping@zte.com.cn>
Signed-off-by: NAnup Patel <anup.patel@wdc.com>

7b161d9c

31 10月, 2021 4 次提交

Merge tag 'kvm-s390-next-5.16-1' of... · 9c6eb531

由 Paolo Bonzini 提交于 10月 31, 2021

Merge tag 'kvm-s390-next-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixes and Features for 5.16

- SIGP Fixes
- initial preparations for lazy destroy of secure VMs
- storage key improvements/fixes
- Log the guest CPNC

9c6eb531

RISC-V: KVM: Fix GPA passed to __kvm_riscv_hfence_gvma_xyz() functions · 7c8de080

由 Anup Patel 提交于 10月 26, 2021

The parameter passed to HFENCE.GVMA instruction in rs1 register
is guest physical address right shifted by 2 (i.e. divided by 4).

Unfortunately, we overlooked the semantics of rs1 registers for
HFENCE.GVMA instruction and never right shifted guest physical
address by 2. This issue did not manifest for hypervisors till
now because:
  1) Currently, only __kvm_riscv_hfence_gvma_all() and SBI
     HFENCE calls are used to invalidate TLB.
  2) All H-extension implementations (such as QEMU, Spike,
     Rocket Core FPGA, etc) that we tried till now were
     conservatively flushing everything upon any HFENCE.GVMA
     instruction.

This patch fixes GPA passed to __kvm_riscv_hfence_gvma_vmid_gpa()
and __kvm_riscv_hfence_gvma_gpa() functions.

Fixes: fd7bb4a2 ("RISC-V: KVM: Implement VMID allocator")
Reported-by: NIan Huang <ihuang@ventanamicro.com>
Signed-off-by: NAnup Patel <anup.patel@wdc.com>
Message-Id: <20211026170136.2147619-4-anup.patel@wdc.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7c8de080

RISC-V: KVM: Factor-out FP virtualization into separate sources · 0a86512d

由 Anup Patel 提交于 10月 26, 2021

The timer and SBI virtualization is already in separate sources.
In future, we will have vector and AIA virtualization also added
as separate sources.

To align with above described modularity, we factor-out FP
virtualization into separate sources.
Signed-off-by: NAnup Patel <anup.patel@wdc.com>
Message-Id: <20211026170136.2147619-3-anup.patel@wdc.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0a86512d

Merge tag 'kvmarm-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD · 4e338684

由 Paolo Bonzini 提交于 10月 31, 2021

KVM/arm64 updates for Linux 5.16

- More progress on the protected VM front, now with the full
  fixed feature set as well as the limitation of some hypercalls
  after initialisation.

- Cleanup of the RAZ/WI sysreg handling, which was pointlessly
  complicated

- Fixes for the vgic placement in the IPA space, together with a
  bunch of selftests

- More memcg accounting of the memory allocated on behalf of a guest

- Timer and vgic selftests

- Workarounds for the Apple M1 broken vgic implementation

- KConfig cleanups

- New kvmarm.mode=none option, for those who really dislike us

4e338684

27 10月, 2021 3 次提交

KVM: s390: add debug statement for diag 318 CPNC data · 3fd8417f

由 Collin Walling 提交于 10月 26, 2021

The diag 318 data contains values that denote information regarding the
guest's environment. Currently, it is unecessarily difficult to observe
this value (either manually-inserted debug statements, gdb stepping, mem
dumping etc). It's useful to observe this information to obtain an
at-a-glance view of the guest's environment, so lets add a simple VCPU
event that prints the CPNC to the s390dbf logs.
Signed-off-by: NCollin Walling <walling@linux.ibm.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Link: https://lore.kernel.org/r/20211027025451.290124-1-walling@linux.ibm.com
[borntraeger@de.ibm.com]: change debug level to 3
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

3fd8417f

KVM: s390: pv: properly handle page flags for protected guests · 380d97bd

由 Claudio Imbrenda 提交于 9月 20, 2021

Introduce variants of the convert and destroy page functions that also
clear the PG_arch_1 bit used to mark them as secure pages.

The PG_arch_1 flag is always allowed to overindicate; using the new
functions introduced here allows to reduce the extent of overindication
and thus improve performance.

These new functions can only be called on pages for which a reference
is already being held.
Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Link: https://lore.kernel.org/r/20210920132502.36111-7-imbrenda@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

380d97bd

KVM: s390: Fix handle_sske page fault handling · 85f517b2

由 Janis Schoetterl-Glausch 提交于 10月 22, 2021

If handle_sske cannot set the storage key, because there is no
page table entry or no present large page entry, it calls
fixup_user_fault.
However, currently, if the call succeeds, handle_sske returns
-EAGAIN, without having set the storage key.
Instead, retry by continue'ing the loop without incrementing the
address.
The same issue in handle_pfmf was fixed by
a11bdb1a ("KVM: s390: Fix pfmf and conditional skey emulation").

Fixes: bd096f64 ("KVM: s390: Add skey emulation fault handling")
Signed-off-by: NJanis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20211022152648.26536-1-scgl@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

85f517b2

25 10月, 2021 20 次提交

Merge branch 'kvm-pvclock-raw-spinlock' into HEAD · e59f3e5d

由 Paolo Bonzini 提交于 10月 25, 2021

pvclock_gtod_sync_lock is completely gone in Linux 5.16.  Include this
fix into the kvm/next history to record that the syzkaller report is
not valid there.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e59f3e5d

KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock · 8228c77d

由 David Woodhouse 提交于 10月 23, 2021

On the preemption path when updating a Xen guest's runstate times, this
lock is taken inside the scheduler rq->lock, which is a raw spinlock.
This was shown in a lockdep warning:

[   89.138354] =============================
[   89.138356] [ BUG: Invalid wait context ]
[   89.138358] 5.15.0-rc5+ #834 Tainted: G S        I E
[   89.138360] -----------------------------
[   89.138361] xen_shinfo_test/2575 is trying to lock:
[   89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm]
[   89.138442] other info that might help us debug this:
[   89.138444] context-{5:5}
[   89.138445] 4 locks held by xen_shinfo_test/2575:
[   89.138447]  #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm]
[   89.138483]  #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm]
[   89.138526]  #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0
[   89.138534]  #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm]
...
[   89.138695]  get_kvmclock_ns+0x1f/0x130 [kvm]
[   89.138734]  kvm_xen_update_runstate+0x14/0x90 [kvm]
[   89.138783]  kvm_xen_update_runstate_guest+0x15/0xd0 [kvm]
[   89.138830]  kvm_arch_vcpu_put+0xe6/0x170 [kvm]
[   89.138870]  kvm_sched_out+0x2f/0x40 [kvm]
[   89.138900]  __schedule+0x5de/0xbd0

Cc: stable@vger.kernel.org
Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
Fixes: 30b5c851 ("KVM: x86/xen: Add support for vCPU runstate information")
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <1b02a06421c17993df337493a68ba923f3bd5c0f.camel@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8228c77d

KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol · 0d7d8449

由 David Edmondson 提交于 9月 20, 2021

When passing the failing address and size out to user space, SGX must
ensure not to trample on the earlier fields of the emulation_failure
sub-union of struct kvm_run.
Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210920103737.2696756-5-david.edmondson@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0d7d8449

KVM: x86: On emulation failure, convey the exit reason, etc. to userspace · e615e355

由 David Edmondson 提交于 9月 20, 2021

Should instruction emulation fail, include the VM exit reason, etc. in
the emulation_failure data passed to userspace, in order that the VMM
can report it as a debugging aid when describing the failure.
Suggested-by: NJoao Martins <joao.m.martins@oracle.com>
Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210920103737.2696756-4-david.edmondson@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e615e355

KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info · 0a62a031

由 David Edmondson 提交于 9月 20, 2021

Extend the get_exit_info static call to provide the reason for the VM
exit. Modify relevant trace points to use this rather than extracting
the reason in the caller.
Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210920103737.2696756-3-david.edmondson@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0a62a031

KVM: x86: Clarify the kvm_run.emulation_failure structure layout · a9d496d8

由 David Edmondson 提交于 9月 20, 2021

Until more flags for kvm_run.emulation_failure flags are defined, it
is undetermined whether new payload elements corresponding to those
flags will be additive or alternative. As a hint to userspace that an
alternative is possible, wrap the current payload elements in a union.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210920103737.2696756-2-david.edmondson@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a9d496d8

KVM: s390: Add a routine for setting userspace CPU state · 67cf68b6

由 Eric Farman 提交于 10月 08, 2021

This capability exists, but we don't record anything when userspace
enables it. Let's refactor that code so that a note can be made in
the debug logs that it was enabled.
Signed-off-by: NEric Farman <farman@linux.ibm.com>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20211008203112.1979843-7-farman@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

67cf68b6

KVM: s390: Simplify SIGP Set Arch handling · 8eeba194

由 Eric Farman 提交于 10月 08, 2021

The Principles of Operations describe the various reasons that
each individual SIGP orders might be rejected, and the status
bit that are set for each condition.

For example, for the Set Architecture order, it states:

  "If it is not true that all other CPUs in the configu-
   ration are in the stopped or check-stop state, ...
   bit 54 (incorrect state) ... is set to one."

However, it also states:

  "... if the CZAM facility is installed, ...
   bit 55 (invalid parameter) ... is set to one."

Since the Configuration-z/Architecture-Architectural Mode (CZAM)
facility is unconditionally presented, there is no need to examine
each VCPU to determine if it is started/stopped. It can simply be
rejected outright with the Invalid Parameter bit.

Fixes: b697e435 ("KVM: s390: Support Configuration z/Architecture Mode")
Signed-off-by: NEric Farman <farman@linux.ibm.com>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Link: https://lore.kernel.org/r/20211008203112.1979843-2-farman@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

8eeba194

KVM: s390: pv: avoid stalls when making pages secure · f0a1a061

由 Claudio Imbrenda 提交于 9月 20, 2021

Improve make_secure_pte to avoid stalls when the system is heavily
overcommitted. This was especially problematic in kvm_s390_pv_unpack,
because of the loop over all pages that needed unpacking.

Due to the locks being held, it was not possible to simply replace
uv_call with uv_call_sched. A more complex approach was
needed, in which uv_call is replaced with __uv_call, which does not
loop. When the UVC needs to be executed again, -EAGAIN is returned, and
the caller (or its caller) will try again.

When -EAGAIN is returned, the path is the same as when the page is in
writeback (and the writeback check is also performed, which is
harmless).

Fixes: 214d9bbc ("s390/mm: provide memory management functions for protected KVM guests")
Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Link: https://lore.kernel.org/r/20210920132502.36111-5-imbrenda@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

f0a1a061

KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm · 1e2aa46d

由 Claudio Imbrenda 提交于 9月 20, 2021

When the system is heavily overcommitted, kvm_s390_pv_init_vm might
generate stall notifications.

Fix this by using uv_call_sched instead of just uv_call. This is ok because
we are not holding spinlocks.
Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: 214d9bbc ("s390/mm: provide memory management functions for protected KVM guests")
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
Message-Id: <20210920132502.36111-4-imbrenda@linux.ibm.com>
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

1e2aa46d

KVM: s390: pv: avoid double free of sida page · d4074324

由 Claudio Imbrenda 提交于 9月 20, 2021

If kvm_s390_pv_destroy_cpu is called more than once, we risk calling
free_page on a random page, since the sidad field is aliased with the
gbea, which is not guaranteed to be zero.

This can happen, for example, if userspace calls the KVM_PV_DISABLE
IOCTL, and it fails, and then userspace calls the same IOCTL again.
This scenario is only possible if KVM has some serious bug or if the
hardware is broken.

The solution is to simply return successfully immediately if the vCPU
was already non secure.
Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: 19e12277 ("KVM: S390: protvirt: Introduce instruction data area bounce buffer")
Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20210920132502.36111-3-imbrenda@linux.ibm.com>
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

d4074324

KVM: s390: pv: add macros for UVC CC values · 57c5df13

由 Claudio Imbrenda 提交于 9月 20, 2021

Add macros to describe the 4 possible CC values returned by the UVC
instruction.
Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
Message-Id: <20210920132502.36111-2-imbrenda@linux.ibm.com>
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

57c5df13

s390/mm: optimize reset_guest_reference_bit() · 14ea40e2

由 David Hildenbrand 提交于 9月 09, 2021

We already optimize get_guest_storage_key() to assume that if we don't have
a PTE table and don't have a huge page mapped that the storage key is 0.

Similarly, optimize reset_guest_reference_bit() to simply do nothing if
there is no PTE table and no huge page mapped.
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: NHeiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20210909162248.14969-10-david@redhat.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

14ea40e2

s390/mm: optimize set_guest_storage_key() · 7cb70266

由 David Hildenbrand 提交于 9月 09, 2021

We already optimize get_guest_storage_key() to assume that if we don't have
a PTE table and don't have a huge page mapped that the storage key is 0.

Similarly, optimize set_guest_storage_key() to simply do nothing in case
the key to set is 0.
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: NHeiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20210909162248.14969-9-david@redhat.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

7cb70266

s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present · 8318c404

由 David Hildenbrand 提交于 9月 09, 2021

pte_map_lock() is sufficient.
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NHeiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20210909162248.14969-8-david@redhat.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

8318c404

s390/uv: fully validate the VMA before calling follow_page() · 46c22ffd

由 David Hildenbrand 提交于 9月 09, 2021

We should not walk/touch page tables outside of VMA boundaries when
holding only the mmap sem in read mode. Evil user space can modify the
VMA layout just before this function runs and e.g., trigger races with
page table removal code since commit dd2283f2 ("mm: mmap: zap pages
with read mmap_sem in munmap").

find_vma() does not check if the address is >= the VMA start address;
use vma_lookup() instead.

Fixes: 214d9bbc ("s390/mm: provide memory management functions for protected KVM guests")
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: NHeiko Carstens <hca@linux.ibm.com>
Reviewed-by: NLiam R. Howlett <Liam.Howlett@oracle.com>
Link: https://lore.kernel.org/r/20210909162248.14969-6-david@redhat.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

46c22ffd

s390/mm: fix VMA and page table handling code in storage key handling functions · 949f5c12

由 David Hildenbrand 提交于 9月 09, 2021

There are multiple things broken about our storage key handling
functions:

1. We should not walk/touch page tables outside of VMA boundaries when
holding only the mmap sem in read mode. Evil user space can modify the
VMA layout just before this function runs and e.g., trigger races with
page table removal code since commit dd2283f2 ("mm: mmap: zap pages
with read mmap_sem in munmap"). gfn_to_hva() will only translate using
KVM memory regions, but won't validate the VMA.

2. We should not allocate page tables outside of VMA boundaries: if
evil user space decides to map hugetlbfs to these ranges, bad things
will happen because we suddenly have PTE or PMD page tables where we
shouldn't have them.

3. We don't handle large PUDs that might suddenly appeared inside our page
table hierarchy.

Don't manually allocate page tables, properly validate that we have VMA and
bail out on pud_large().

All callers of page table handling functions, except
get_guest_storage_key(), call fixup_user_fault() in case they
receive an -EFAULT and retry; this will allocate the necessary page tables
if required.

To keep get_guest_storage_key() working as expected and not requiring
kvm_s390_get_skeys() to call fixup_user_fault() distinguish between
"there is simply no page table or huge page yet and the key is assumed
to be 0" and "this is a fault to be reported".

Although commit 637ff9ef ("s390/mm: Add huge pmd storage key handling")
introduced most of the affected code, it was actually already broken
before when using get_locked_pte() without any VMA checks.

Note: Ever since commit 637ff9ef ("s390/mm: Add huge pmd storage key
handling") we can no longer set a guest storage key (for example from
QEMU during VM live migration) without actually resolving a fault.
Although we would have created most page tables, we would choke on the
!pmd_present(), requiring a call to fixup_user_fault(). I would
have thought that this is problematic in combination with postcopy life
migration ... but nobody noticed and this patch doesn't change the
situation. So maybe it's just fine.

Fixes: 9fcf93b5 ("KVM: S390: Create helper function get_guest_storage_key")
Fixes: 24d5dd02 ("s390/kvm: Provide function for setting the guest storage key")
Fixes: a7e19ab5 ("KVM: s390: handle missing storage-key facility")
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: NHeiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20210909162248.14969-5-david@redhat.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

949f5c12

s390/mm: validate VMA in PGSTE manipulation functions · fe3d1002

由 David Hildenbrand 提交于 9月 09, 2021

Further, we should not allocate page tables outside of VMA boundaries: if
evil user space decides to map hugetlbfs to these ranges, bad things will
happen because we suddenly have PTE or PMD page tables where we
shouldn't have them.

Similarly, we have to check if we suddenly find a hugetlbfs VMA, before
calling get_locked_pte().

Fixes: 2d42f947 ("s390/kvm: Add PGSTE manipulation functions")
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: NHeiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20210909162248.14969-4-david@redhat.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

fe3d1002

s390/gmap: don't unconditionally call pte_unmap_unlock() in __gmap_zap() · b159f94c

由 David Hildenbrand 提交于 9月 09, 2021

... otherwise we will try unlocking a spinlock that was never locked via a
garbage pointer.

At the time we reach this code path, we usually successfully looked up
a PGSTE already; however, evil user space could have manipulated the VMA
layout in the meantime and triggered removal of the page table.

Fixes: 1e133ab2 ("s390/mm: split arch/s390/mm/pgtable.c")
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: NHeiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20210909162248.14969-3-david@redhat.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

b159f94c

s390/gmap: validate VMA in __gmap_zap() · 2d8fb8f3

由 David Hildenbrand 提交于 9月 09, 2021

We should not walk/touch page tables outside of VMA boundaries when
holding only the mmap sem in read mode. Evil user space can modify the
VMA layout just before this function runs and e.g., trigger races with
page table removal code since commit dd2283f2 ("mm: mmap: zap pages
with read mmap_sem in munmap"). The pure prescence in our guest_to_host
radix tree does not imply that there is a VMA.

Further, we should not allocate page tables (via get_locked_pte()) outside
of VMA boundaries: if evil user space decides to map hugetlbfs to these
ranges, bad things will happen because we suddenly have PTE or PMD page
tables where we shouldn't have them.

Similarly, we have to check if we suddenly find a hugetlbfs VMA, before
calling get_locked_pte().

Note that gmap_discard() is different:
zap_page_range()->unmap_single_vma() makes sure to stay within VMA
boundaries.

Fixes: b31288fa ("s390/kvm: support collaborative memory management")
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: NHeiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20210909162248.14969-2-david@redhat.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

2d8fb8f3

23 10月, 2021 3 次提交

KVM: selftests: Fix nested SVM tests when built with clang · ed290e1c

由 Jim Mattson 提交于 9月 29, 2021

Though gcc conveniently compiles a simple memset to "rep stos," clang
prefers to call the libc version of memset. If a test is dynamically
linked, the libc memset isn't available in L1 (nor is the PLT or the
GOT, for that matter). Even if the test is statically linked, the libc
memset may choose to use some CPU features, like AVX, which may not be
enabled in L1. Note that __builtin_memset doesn't solve the problem,
because (a) the compiler is free to call memset anyway, and (b)
__builtin_memset may also choose to use features like AVX, which may
not be available in L1.

To avoid a myriad of problems, use an explicit "rep stos" to clear the
VMCB in generic_svm_setup(), which is called both from L0 and L1.
Reported-by: NRicardo Koller <ricarkol@google.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Fixes: 20ba262f ("selftests: KVM: AMD Nested test infrastructure")
Message-Id: <20210930003649.4026553-1-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ed290e1c

kvm: x86: Remove stale declaration of kvm_no_apic_vcpu · dfd3c713

由 Jim Mattson 提交于 10月 21, 2021

This variable was renamed to kvm_has_noapic_vcpu in commit
6e4e3b4d ("KVM: Stop using deprecated jump label APIs").
Signed-off-by: NJim Mattson <jmattson@google.com>
Message-Id: <20211021185449.3471763-1-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dfd3c713

KVM: VMX: Unregister posted interrupt wakeup handler on hardware unsetup · ec5a4919

由 Sean Christopherson 提交于 10月 08, 2021

Unregister KVM's posted interrupt wakeup handler during unsetup so that a
spurious interrupt that arrives after kvm_intel.ko is unloaded doesn't
call into freed memory.

Fixes: bf9f6ac8 ("KVM: Update Posted-Interrupts Descriptor when vCPU is blocked")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211009001107.3936588-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ec5a4919

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功