提交 · 9a78e15802a87de2b08dfd1bd88e855201d2c8fa · openeuler / Kernel

26 1月, 2021 9 次提交

KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX · 9a78e158

由 Paolo Bonzini 提交于 1月 08, 2021

VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS,
which may need to be loaded outside guest mode.  Therefore we cannot
WARN in that case.

However, that part of nested_get_vmcs12_pages is _not_ needed at
vmentry time.  Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling,
so that both vmentry and migration (and in the latter case, independent
of is_guest_mode) do the parts that are needed.

Cc: <stable@vger.kernel.org> # 5.10.x: f2c7ef3b: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES
Cc: <stable@vger.kernel.org> # 5.10.x
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9a78e158

KVM: x86: Revert "KVM: x86: Mark GPRs dirty when written" · aed89418

由 Sean Christopherson 提交于 1月 22, 2021

Revert the dirty/available tracking of GPRs now that KVM copies the GPRs
to the GHCB on any post-VMGEXIT VMRUN, even if a GPR is not dirty.  Per
commit de3cd117 ("KVM: x86: Omit caching logic for always-available
GPRs"), tracking for GPRs noticeably impacts KVM's code footprint.

This reverts commit 1c04d8c9.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210122235049.3107620-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

aed89418

KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest · 25009140

由 Sean Christopherson 提交于 1月 22, 2021

Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the
GRPs' dirty bits are set from time zero and never cleared, i.e. will
always be seen as dirty.  The obvious alternative would be to clear
the dirty bits when appropriate, but removing the dirty checks is
desirable as it allows reverting GPR dirty+available tracking, which
adds overhead to all flavors of x86 VMs.

Note, unconditionally writing the GPRs in the GHCB is tacitly allowed
by the GHCB spec, which allows the hypervisor (or guest) to provide
unnecessary info; it's the guest's responsibility to consume only what
it needs (the hypervisor is untrusted after all).

  The guest and hypervisor can supply additional state if desired but
  must not rely on that additional state being provided.

Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Fixes: 291bd20d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210122235049.3107620-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

25009140

KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration · d51e1d3f

由 Maxim Levitsky 提交于 1月 14, 2021

Even when we are outside the nested guest, some vmcs02 fields
may not be in sync vs vmcs12.  This is intentional, even across
nested VM-exit, because the sync can be delayed until the nested
hypervisor performs a VMCLEAR or a VMREAD/VMWRITE that affects those
rarely accessed fields.

However, during KVM_GET_NESTED_STATE, the vmcs12 has to be up to date to
be able to restore it.  To fix that, call copy_vmcs02_to_vmcs12_rare()
before the vmcs12 contents are copied to userspace.

Fixes: 7952d769 ("KVM: nVMX: Sync rarely accessed guest fields only when needed")
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210114205449.8715-2-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d51e1d3f

kvm: tracing: Fix unmatched kvm_entry and kvm_exit events · d95df951

由 Lorenzo Brescia 提交于 12月 23, 2020

On VMX, if we exit and then re-enter immediately without leaving
the vmx_vcpu_run() function, the kvm_entry event is not logged.
That means we will see one (or more) kvm_exit, without its (their)
corresponding kvm_entry, as shown here:

 CPU-1979 [002] 89.871187: kvm_entry: vcpu 1
 CPU-1979 [002] 89.871218: kvm_exit:  reason MSR_WRITE
 CPU-1979 [002] 89.871259: kvm_exit:  reason MSR_WRITE

It also seems possible for a kvm_entry event to be logged, but then
we leave vmx_vcpu_run() right away (if vmx->emulation_required is
true). In this case, we will have a spurious kvm_entry event in the
trace.

Fix these situations by moving trace_kvm_entry() inside vmx_vcpu_run()
(where trace_kvm_exit() already is).

A trace obtained with this patch applied looks like this:

 CPU-14295 [000] 8388.395387: kvm_entry: vcpu 0
 CPU-14295 [000] 8388.395392: kvm_exit:  reason MSR_WRITE
 CPU-14295 [000] 8388.395393: kvm_entry: vcpu 0
 CPU-14295 [000] 8388.395503: kvm_exit:  reason EXTERNAL_INTERRUPT

Of course, not calling trace_kvm_entry() in common x86 code any
longer means that we need to adjust the SVM side of things too.
Signed-off-by: NLorenzo Brescia <lorenzo.brescia@edu.unito.it>
Signed-off-by: NDario Faggioli <dfaggioli@suse.com>
Message-Id: <160873470698.11652.13483635328769030605.stgit@Wayrath>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d95df951

KVM: x86: get smi pending status correctly · 1f7becf1

由 Jay Zhou 提交于 1月 18, 2021

The injection process of smi has two steps:

    Qemu                        KVM
Step1:
    cpu->interrupt_request &= \
        ~CPU_INTERRUPT_SMI;
    kvm_vcpu_ioctl(cpu, KVM_SMI)

                                call kvm_vcpu_ioctl_smi() and
                                kvm_make_request(KVM_REQ_SMI, vcpu);

Step2:
    kvm_vcpu_ioctl(cpu, KVM_RUN, 0)

                                call process_smi() if
                                kvm_check_request(KVM_REQ_SMI, vcpu) is
                                true, mark vcpu->arch.smi_pending = true;

The vcpu->arch.smi_pending will be set true in step2, unfortunately if
vcpu paused between step1 and step2, the kvm_run->immediate_exit will be
set and vcpu has to exit to Qemu immediately during step2 before mark
vcpu->arch.smi_pending true.
During VM migration, Qemu will get the smi pending status from KVM using
KVM_GET_VCPU_EVENTS ioctl at the downtime, then the smi pending status
will be lost.
Signed-off-by: NJay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: NShengen Zhuang <zhuangshengen@huawei.com>
Message-Id: <20210118084720.1585-1-jianjay.zhou@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1f7becf1

KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[] · 98dd2f10

由 Like Xu 提交于 12月 30, 2020

The HW_REF_CPU_CYCLES event on the fixed counter 2 is pseudo-encoded as
0x0300 in the intel_perfmon_event_map[]. Correct its usage.

Fixes: 62079d8a ("KVM: PMU: add proper support for fixed counter 2")
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20201230081916.63417-1-like.xu@linux.intel.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

98dd2f10

KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh() · e61ab2a3

由 Like Xu 提交于 1月 18, 2021

Since we know vPMU will not work properly when (1) the guest bit_width(s)
of the [gp|fixed] counters are greater than the host ones, or (2) guest
requested architectural events exceeds the range supported by the host, so
we can setup a smaller left shift value and refresh the guest cpuid entry,
thus fixing the following UBSAN shift-out-of-bounds warning:

shift exponent 197 is too large for 64-bit type 'long long unsigned int'

Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
 intel_pmu_refresh.cold+0x75/0x99 arch/x86/kvm/vmx/pmu_intel.c:348
 kvm_vcpu_after_set_cpuid+0x65a/0xf80 arch/x86/kvm/cpuid.c:177
 kvm_vcpu_ioctl_set_cpuid2+0x160/0x440 arch/x86/kvm/cpuid.c:308
 kvm_arch_vcpu_ioctl+0x11b6/0x2d70 arch/x86/kvm/x86.c:4709
 kvm_vcpu_ioctl+0x7b9/0xdb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3386
 vfs_ioctl fs/ioctl.c:48 [inline]
 __do_sys_ioctl fs/ioctl.c:753 [inline]
 __se_sys_ioctl fs/ioctl.c:739 [inline]
 __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported-by: syzbot+ae488dc136a4cc6ba32b@syzkaller.appspotmail.com
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210118025800.34620-1-like.xu@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e61ab2a3

KVM: x86: Add more protection against undefined behavior in rsvd_bits() · eb79cd00

由 Sean Christopherson 提交于 1月 13, 2021

Add compile-time asserts in rsvd_bits() to guard against KVM passing in
garbage hardcoded values, and cap the upper bound at '63' for dynamic
values to prevent generating a mask that would overflow a u64.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210113204515.3473079-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

eb79cd00

21 1月, 2021 2 次提交

KVM: arm64: Filter out v8.1+ events on v8.0 HW · 9529aaa0

由 Marc Zyngier 提交于 1月 21, 2021

When running on v8.0 HW, make sure we don't try to advertise
events in the 0x4000-0x403f range.

Cc: stable@vger.kernel.org
Fixes: 88865bec ("KVM: arm64: Mask out filtered events in PCMEID{0,1}_EL1")
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210121105636.1478491-1-maz@kernel.org

9529aaa0

KVM: arm64: Compute TPIDR_EL2 ignoring MTE tag · e1663372

由 Steven Price 提交于 1月 08, 2021

KASAN in HW_TAGS mode will store MTE tags in the top byte of the
pointer. When computing the offset for TPIDR_EL2 we don't want anything
in the top byte, so remove the tag to ensure the computation is correct
no matter what the tag.

Fixes: 94ab5b61 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS")
Signed-off-by: NSteven Price <steven.price@arm.com>
[maz: added comment]
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210108161254.53674-1-steven.price@arm.com

e1663372

14 1月, 2021 4 次提交

KVM: arm64: Use the reg_to_encoding() macro instead of sys_reg() · 7ba8b438

由 Alexandru Elisei 提交于 1月 06, 2021

The reg_to_encoding() macro is a wrapper over sys_reg() and conveniently
takes a sys_reg_desc or a sys_reg_params argument and returns the 32 bit
register encoding. Use it instead of calling sys_reg() directly.
Signed-off-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210106144218.110665-1-alexandru.elisei@arm.com

7ba8b438

KVM: arm64: Allow PSCI SYSTEM_OFF/RESET to return · 2c91ef39

由 David Brazdil 提交于 12月 29, 2020

The KVM/arm64 PSCI relay assumes that SYSTEM_OFF and SYSTEM_RESET should
not return, as dictated by the PSCI spec. However, there is firmware out
there which breaks this assumption, leading to a hyp panic. Make KVM
more robust to broken firmware by allowing these to return.
Signed-off-by: NDavid Brazdil <dbrazdil@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201229160059.64135-1-dbrazdil@google.com

2c91ef39

KVM: arm64: Simplify handling of absent PMU system registers · 7ded92e2

由 Marc Zyngier 提交于 1月 06, 2021

Now that all PMU registers are gated behind a .visibility callback,
remove the other checks against an absent PMU.
Signed-off-by: NMarc Zyngier <maz@kernel.org>

7ded92e2

KVM: arm64: Hide PMU registers from userspace when not available · 11663111

由 Marc Zyngier 提交于 1月 06, 2021

It appears that while we are now able to properly hide PMU
registers from the guest when a PMU isn't available (either
because none has been configured, the host doesn't have
the PMU support compiled in, or that the HW doesn't have
one at all), we are still exposing more than we should to
userspace.

Introduce a visibility callback gating all the PMU registers,
which covers both usrespace and guest.
Signed-off-by: NMarc Zyngier <maz@kernel.org>

11663111

09 1月, 2021 1 次提交

ARC: [hsdk]: Enable FPU_SAVE_RESTORE · e8deee4f

由 Vineet Gupta 提交于 1月 08, 2021

HSDK has hardware floating point and the common use case is with
glibc+hf so enable that as default.
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

e8deee4f

08 1月, 2021 19 次提交

ARM: picoxcell: fix missing interrupt-parent properties · bac71717

由 Arnd Bergmann 提交于 12月 30, 2020

dtc points out that the interrupts for some devices are not parsable:

picoxcell-pc3x2.dtsi:45.19-49.5: Warning (interrupts_property): /paxi/gem@30000: Missing interrupt-parent
picoxcell-pc3x2.dtsi:51.21-55.5: Warning (interrupts_property): /paxi/dmac@40000: Missing interrupt-parent
picoxcell-pc3x2.dtsi:57.21-61.5: Warning (interrupts_property): /paxi/dmac@50000: Missing interrupt-parent
picoxcell-pc3x2.dtsi:233.21-237.5: Warning (interrupts_property): /rwid-axi/axi2pico@c0000000: Missing interrupt-parent

There are two VIC instances, so it's not clear which one needs to be
used. I found the BSP sources that reference VIC0, so use that:

https://github.com/r1mikey/meta-picoxcell/blob/master/recipes-kernel/linux/linux-picochip-3.0/0001-picoxcell-support-for-Picochip-picoXcell-SoC.patchAcked-by: NJamie Iles <jamie@jamieiles.com>
Link: https://lore.kernel.org/r/20201230152010.3914962-1-arnd@kernel.org'
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

bac71717

P
KVM: x86: __kvm_vcpu_halt can be static · 872f36eb
由 Paolo Bonzini 提交于 1月 08, 2021
```
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
872f36eb

x86/resctrl: Don't move a task to the same resource group · a0195f31

由 Fenghua Yu 提交于 12月 17, 2020

Shakeel Butt reported in [1] that a user can request a task to be moved
to a resource group even if the task is already in the group. It just
wastes time to do the move operation which could be costly to send IPI
to a different CPU.

Add a sanity check to ensure that the move operation only happens when
the task is not already in the resource group.

[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/

Fixes: e02737d5 ("x86/intel_rdt: Add tasks files")
Reported-by: NShakeel Butt <shakeelb@google.com>
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/962ede65d8e95be793cb61102cca37f7bb018e66.1608243147.git.reinette.chatre@intel.com

a0195f31

x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR · ae28d1aa

由 Fenghua Yu 提交于 12月 17, 2020

Currently, when moving a task to a resource group the PQR_ASSOC MSR is
updated with the new closid and rmid in an added task callback. If the
task is running, the work is run as soon as possible. If the task is not
running, the work is executed later in the kernel exit path when the
kernel returns to the task again.

Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
is running is the right thing to do. Queueing work for a task that is
not running is unnecessary (the PQR_ASSOC MSR is already updated when
the task is scheduled in) and causing system resource waste with the way
in which it is implemented: Work to update the PQR_ASSOC register is
queued every time the user writes a task id to the "tasks" file, even if
the task already belongs to the resource group.

This could result in multiple pending work items associated with a
single task even if they are all identical and even though only a single
update with most recent values is needed. Specifically, even if a task
is moved between different resource groups while it is sleeping then it
is only the last move that is relevant but yet a work item is queued
during each move.

This unnecessary queueing of work items could result in significant
system resource waste, especially on tasks sleeping for a long time.
For example, as demonstrated by Shakeel Butt in [1] writing the same
task id to the "tasks" file can quickly consume significant memory. The
same problem (wasted system resources) occurs when moving a task between
different resource groups.

As pointed out by Valentin Schneider in [2] there is an additional issue
with the way in which the queueing of work is done in that the task_struct
update is currently done after the work is queued, resulting in a race with
the register update possibly done before the data needed by the update is
available.

To solve these issues, update the PQR_ASSOC MSR in a synchronous way
right after the new closid and rmid are ready during the task movement,
only if the task is running. If a moved task is not running nothing
is done since the PQR_ASSOC MSR will be updated next time the task is
scheduled. This is the same way used to update the register when tasks
are moved as part of resource group removal.

[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/
[2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.com

 [ bp: Massage commit message and drop the two update_task_closid_rmid()
   variants. ]

Fixes: e02737d5 ("x86/intel_rdt: Add tasks files")
Reported-by: NShakeel Butt <shakeelb@google.com>
Reported-by: NValentin Schneider <valentin.schneider@arm.com>
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Reviewed-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NValentin Schneider <valentin.schneider@arm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/17aa2fb38fc12ce7bb710106b3e7c7b45acb9e94.1608243147.git.reinette.chatre@intel.com

ae28d1aa

KVM: SVM: Add support for booting APs in an SEV-ES guest · 647daca2

由 Tom Lendacky 提交于 1月 04, 2021

Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.

Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.

First AP boot (first INIT-SIPI-SIPI sequence):
Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
support. It is up to the guest to transfer control of the AP to the
proper location.

Subsequent AP boot:
KVM will expect to receive an AP Reset Hold exit event indicating that
the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
awaken it. When the AP Reset Hold exit event is received, KVM will place
the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
sequence, KVM will make the vCPU runnable. It is again up to the guest
to then transfer control of the AP to the proper location.

To differentiate between an actual HLT and an AP Reset Hold, a new MP
state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
placed in upon receiving the AP Reset Hold exit event. Additionally, to
communicate the AP Reset Hold exit event up to userspace (if needed), a
new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.

A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

647daca2

KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit · f2c7ef3b

由 Maxim Levitsky 提交于 1月 07, 2021

It is possible to exit the nested guest mode, entered by
svm_set_nested_state prior to first vm entry to it (e.g due to pending event)
if the nested run was not pending during the migration.

In this case we must not switch to the nested msr permission bitmap.
Also add a warning to catch similar cases in the future.

Fixes: a7d5c7ce ("KVM: nSVM: delay MSR permission processing to first nested VM run")
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210107093854.882483-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f2c7ef3b

KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode · 56fe28de

由 Maxim Levitsky 提交于 1月 07, 2021

We overwrite most of vmcb fields while doing so, so we must
mark it as dirty.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210107093854.882483-5-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56fe28de

KVM: nSVM: correctly restore nested_run_pending on migration · 81f76ada

由 Maxim Levitsky 提交于 1月 07, 2021

The code to store it on the migration exists, but no code was restoring it.

One of the side effects of fixing this is that L1->L2 injected events
are no longer lost when migration happens with nested run pending.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210107093854.882483-3-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

81f76ada

KVM: x86/mmu: Clarify TDP MMU page list invariants · c0dba6e4

由 Ben Gardon 提交于 1月 06, 2021

The tdp_mmu_roots and tdp_mmu_pages in struct kvm_arch should only contain
pages with tdp_mmu_page set to true. tdp_mmu_pages should not contain any
pages with a non-zero root_count and tdp_mmu_roots should only contain
pages with a positive root_count, unless a thread holds the MMU lock and
is in the process of modifying the list. Various functions expect these
invariants to be maintained, but they are not explictily documented. Add
to the comments on both fields to document the above invariants.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210107001935.3732070-2-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c0dba6e4

KVM: x86/mmu: Ensure TDP MMU roots are freed after yield · a889ea54

由 Ben Gardon 提交于 1月 06, 2021

Many TDP MMU functions which need to perform some action on all TDP MMU
roots hold a reference on that root so that they can safely drop the MMU
lock in order to yield to other threads. However, when releasing the
reference on the root, there is a bug: the root will not be freed even
if its reference count (root_count) is reduced to 0.

To simplify acquiring and releasing references on TDP MMU root pages, and
to ensure that these roots are properly freed, move the get/put operations
into another TDP MMU root iterator macro.

Moving the get/put operations into an iterator macro also helps
simplify control flow when a root does need to be freed. Note that using
the list_for_each_entry_safe macro would not have been appropriate in
this situation because it could keep a pointer to the next root across
an MMU lock release + reacquire, during which time that root could be
freed.
Reported-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Fixes: faaf05b0 ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Fixes: 063afacd ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU")
Fixes: a6a0b05d ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
Fixes: 14881998 ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210107001935.3732070-1-bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a889ea54

KVM: x86: change in pv_eoi_get_pending() to make code more readable · de7860c8

由 Stephen Zhang 提交于 12月 18, 2020

Signed-off-by: NStephen Zhang <stephenzhangzsd@gmail.com>
Message-Id: <1608277897-1932-1-git-send-email-stephenzhangzsd@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

de7860c8

KVM: x86: fix shift out of bounds reported by UBSAN · 2f80d502

由 Paolo Bonzini 提交于 12月 22, 2020

Since we know that e >= s, we can reassociate the left shift,
changing the shifted number from 1 to 2 in exchange for
decreasing the right hand side by 1.

Reported-by: syzbot+e87846c48bf72bc85311@syzkaller.appspotmail.com
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2f80d502

KVM/SVM: Remove leftover __svm_vcpu_run prototype from svm.c · 52782d5b

由 Uros Bizjak 提交于 12月 20, 2020

Commit 16809ecd moved __svm_vcpu_run the prototype to svm.h,
but forgot to remove the original from svm.c.

Fixes: 16809ecd ("KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
Message-Id: <20201220200339.65115-1-ubizjak@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

52782d5b

KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load · f65cf84e

由 Nathan Chancellor 提交于 12月 18, 2020

When using LLVM's integrated assembler (LLVM_IAS=1) while building
x86_64_defconfig + CONFIG_KVM=y + CONFIG_KVM_AMD=y, the following build
error occurs:

 $ make LLVM=1 LLVM_IAS=1 arch/x86/kvm/svm/sev.o
 arch/x86/kvm/svm/sev.c:2004:15: error: too few operands for instruction
         asm volatile(__ex("vmsave") : : "a" (__sme_page_pa(sd->save_area)) : "memory");
                      ^
 arch/x86/kvm/svm/sev.c:28:17: note: expanded from macro '__ex'
 #define __ex(x) __kvm_handle_fault_on_reboot(x)
                 ^
 ./arch/x86/include/asm/kvm_host.h:1646:10: note: expanded from macro '__kvm_handle_fault_on_reboot'
         "666: \n\t"                                                     \
                 ^
 <inline asm>:2:2: note: instantiated into assembly here
         vmsave
         ^
 1 error generated.

This happens because LLVM currently does not support calling vmsave
without the fixed register operand (%rax for 64-bit and %eax for
32-bit). This will be fixed in LLVM 12 but the kernel currently supports
LLVM 10.0.1 and newer so this needs to be handled.

Add the proper register using the _ASM_AX macro, which matches the
vmsave call in vmenter.S.

Fixes: 86137773 ("KVM: SVM: Provide support for SEV-ES vCPU loading")
Link: https://reviews.llvm.org/D93524
Link: https://github.com/ClangBuiltLinux/linux/issues/1216Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Message-Id: <20201219063711.3526947-1-natechancellor@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f65cf84e

KVM: x86/mmu: Optimize not-present/MMIO SPTE check in get_mmio_spte() · 9aa41879

由 Sean Christopherson 提交于 12月 17, 2020

Check only the terminal leaf for a "!PRESENT || MMIO" SPTE when looking
for reserved bits on valid, non-MMIO SPTEs. The get_walk() helpers
terminate their walks if a not-present or MMIO SPTE is encountered, i.e.
the non-terminal SPTEs have already been verified to be regular SPTEs.
This eliminates an extra check-and-branch in a relatively hot loop.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9aa41879

KVM: x86/mmu: Use raw level to index into MMIO walks' sptes array · dde81f94

由 Sean Christopherson 提交于 12月 17, 2020

Bump the size of the sptes array by one and use the raw level of the
SPTE to index into the sptes array.  Using the SPTE level directly
improves readability by eliminating the need to reason out why the level
is being adjusted when indexing the array.  The array is on the stack
and is not explicitly initialized; bumping its size is nothing more than
a superficial adjustment to the stack frame.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-4-seanjc@google.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dde81f94

KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE · 39b4d43e

由 Sean Christopherson 提交于 12月 17, 2020

Get the so called "root" level from the low level shadow page table
walkers instead of manually attempting to calculate it higher up the
stack, e.g. in get_mmio_spte().  When KVM is using PAE shadow paging,
the starting level of the walk, from the callers perspective, is not
the CR3 root but rather the PDPTR "root".  Checking for reserved bits
from the CR3 root causes get_mmio_spte() to consume uninitialized stack
data due to indexing into sptes[] for a level that was not filled by
get_walk().  This can result in false positives and/or negatives
depending on what garbage happens to be on the stack.

Opportunistically nuke a few extra newlines.

Fixes: 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Reported-by: NRichard Herbert <rherbert@sympatico.ca>
Cc: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

39b4d43e

KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte() · 2aa07893

由 Sean Christopherson 提交于 12月 17, 2020

Return -1 from the get_walk() helpers if the shadow walk doesn't fill at
least one spte, which can theoretically happen if the walk hits a
not-present PDPTR.  Returning the root level in such a case will cause
get_mmio_spte() to return garbage (uninitialized stack data).  In
practice, such a scenario should be impossible as KVM shouldn't get a
reserved-bit page fault with a not-present PDPTR.

Note, using mmu->root_level in get_walk() is wrong for other reasons,
too, but that's now a moot point.

Fixes: 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Cc: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2aa07893

ARC: unbork 5.11 bootup: fix snafu in _TIF_NOTIFY_SIGNAL handling · bb12433b

由 Vineet Gupta 提交于 1月 06, 2021

Linux 5.11.rcX was failing to boot on ARC HSDK board. Turns out we have
a couple of issues, this being the first one, and I'm to blame as I
didn't pay attention during review.

TIF_NOTIFY_SIGNAL support requires checking multiple TIF_* bits in
kernel return code path. Old code only needed to check a single bit so
BBIT0 <TIF_SIGPENDING> worked. New code needs to check multiple bits so
AND <bit-mask> instruction. So needs to use bit mask variant _TIF_SIGPENDING

Cc: Jens Axboe <axboe@kernel.dk>
Fixes: 53855e12 ("arc: add support for TIF_NOTIFY_SIGNAL")
Link: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/34Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

bb12433b

07 1月, 2021 1 次提交

arm64: Move PSTATE.TCO setting to separate functions · 83b5bd62

由 Catalin Marinas 提交于 1月 04, 2021

For consistency with __uaccess_{disable,enable}_hw_pan(), move the
PSTATE.TCO setting into dedicated __uaccess_{disable,enable}_tco()
functions.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>

83b5bd62

06 1月, 2021 4 次提交

x86/mtrr: Correct the range check before performing MTRR type lookups · cb7f4a8b

由 Ying-Tsun Huang 提交于 12月 15, 2020

In mtrr_type_lookup(), if the input memory address region is not in the
MTRR, over 4GB, and not over the top of memory, a write-back attribute
is returned. These condition checks are for ensuring the input memory
address region is actually mapped to the physical memory.

However, if the end address is just aligned with the top of memory,
the condition check treats the address is over the top of memory, and
write-back attribute is not returned.

And this hits in a real use case with NVDIMM: the nd_pmem module tries
to map NVDIMMs as cacheable memories when NVDIMMs are connected. If a
NVDIMM is the last of the DIMMs, the performance of this NVDIMM becomes
very low since it is aligned with the top of memory and its memory type
is uncached-minus.

Move the input end address change to inclusive up into
mtrr_type_lookup(), before checking for the top of memory in either
mtrr_type_lookup_{variable,fixed}() helpers.

 [ bp: Massage commit message. ]

Fixes: 0cc705f5 ("x86/mm/mtrr: Clean up mtrr_type_lookup()")
Signed-off-by: NYing-Tsun Huang <ying-tsun.huang@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201215070721.4349-1-ying-tsun.huang@amd.com

cb7f4a8b

powerpc: Handle .text.{hot,unlikely}.* in linker script · 3ce47d95

由 Nathan Chancellor 提交于 1月 04, 2021

Commit eff8728f ("vmlinux.lds.h: Add PGO and AutoFDO input
sections") added ".text.unlikely.*" and ".text.hot.*" due to an LLVM
change [1].

After another LLVM change [2], these sections are seen in some PowerPC
builds, where there is a orphan section warning then build failure:

$ make -skj"$(nproc)" \
       ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 O=out \
       distclean powernv_defconfig zImage.epapr
ld.lld: warning: kernel/built-in.a(panic.o):(.text.unlikely.) is being placed in '.text.unlikely.'
...
ld.lld: warning: address (0xc000000000009314) of section .text is not a multiple of alignment (256)
...
ERROR: start_text address is c000000000009400, should be c000000000008000
ERROR: try to enable LD_HEAD_STUB_CATCH config option
ERROR: see comments in arch/powerpc/tools/head_check.sh
...

Explicitly handle these sections like in the main linker script so
there is no more build failure.

[1]: https://reviews.llvm.org/D79600
[2]: https://reviews.llvm.org/D92493

Fixes: 83a092cf ("powerpc: Link warning for orphan sections")
Cc: stable@vger.kernel.org
Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/ClangBuiltLinux/linux/issues/1218
Link: https://lore.kernel.org/r/20210104205952.1399409-1-natechancellor@gmail.com

3ce47d95

arch/arc: add copy_user_page() to <asm/page.h> to fix build error on ARC · 8a48c0a3

由 Randy Dunlap 提交于 1月 04, 2021

fs/dax.c uses copy_user_page() but ARC does not provide that interface,
resulting in a build error.

Provide copy_user_page() in <asm/page.h>.

../fs/dax.c: In function 'copy_cow_page_dax':
../fs/dax.c:702:2: error: implicit declaration of function 'copy_user_page'; did you mean 'copy_to_user_page'? [-Werror=implicit-function-declaration]
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Dan Williams <dan.j.williams@intel.com>
#Acked-by: Vineet Gupta <vgupta@synopsys.com> # v1
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
#Reviewed-by: Ira Weiny <ira.weiny@intel.com> # v2
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

8a48c0a3

x86/sev-es: Fix SEV-ES OUT/IN immediate opcode vc handling · a8f7e08a

由 Peter Gonda 提交于 1月 05, 2021

The IN and OUT instructions with port address as an immediate operand
only use an 8-bit immediate (imm8). The current VC handler uses the
entire 32-bit immediate value but these instructions only set the first
bytes.

Cast the operand to an u8 for that.

 [ bp: Massage commit message. ]

Fixes: 25189d08 ("x86/sev-es: Add support for handling IOIO exceptions")
Signed-off-by: NPeter Gonda <pgonda@google.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NDavid Rientjes <rientjes@google.com>
Link: https://lkml.kernel.org/r/20210105163311.221490-1-pgonda@google.com

a8f7e08a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功