提交 · ca5e83eddc8bc85db5698ef702b610ee64243459 · openeuler / Kernel

28 10月, 2021 1 次提交

KVM: x86: Take srcu lock in post_kvm_run_save() · f3d1436d

由 David Woodhouse 提交于 10月 26, 2021

The Xen interrupt injection for event channels relies on accessing the
guest's vcpu_info structure in __kvm_xen_has_interrupt(), through a
gfn_to_hva_cache.

This requires the srcu lock to be held, which is mostly the case except
for this code path:

[   11.822877] WARNING: suspicious RCU usage
[   11.822965] -----------------------------
[   11.823013] include/linux/kvm_host.h:664 suspicious rcu_dereference_check() usage!
[   11.823131]
[   11.823131] other info that might help us debug this:
[   11.823131]
[   11.823196]
[   11.823196] rcu_scheduler_active = 2, debug_locks = 1
[   11.823253] 1 lock held by dom:0/90:
[   11.823292]  #0: ffff998956ec8118 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x680
[   11.823379]
[   11.823379] stack backtrace:
[   11.823428] CPU: 2 PID: 90 Comm: dom:0 Kdump: loaded Not tainted 5.4.34+ #5
[   11.823496] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[   11.823612] Call Trace:
[   11.823645]  dump_stack+0x7a/0xa5
[   11.823681]  lockdep_rcu_suspicious+0xc5/0x100
[   11.823726]  __kvm_xen_has_interrupt+0x179/0x190
[   11.823773]  kvm_cpu_has_extint+0x6d/0x90
[   11.823813]  kvm_cpu_accept_dm_intr+0xd/0x40
[   11.823853]  kvm_vcpu_ready_for_interrupt_injection+0x20/0x30
              < post_kvm_run_save() inlined here >
[   11.823906]  kvm_arch_vcpu_ioctl_run+0x135/0x6a0
[   11.823947]  kvm_vcpu_ioctl+0x263/0x680

Fixes: 40da8ccd ("KVM: x86/xen: Add event channel interrupt vector upcall")
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
Message-Id: <606aaaf29fca3850a63aa4499826104e77a72346.camel@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f3d1436d

27 10月, 2021 1 次提交

KVM: SEV-ES: fix another issue with string I/O VMGEXITs · 9b0971ca

由 Paolo Bonzini 提交于 10月 25, 2021

If the guest requests string I/O from the hypervisor via VMGEXIT,
SW_EXITINFO2 will contain the REP count.  However, sev_es_string_io
was incorrectly treating it as the size of the GHCB buffer in
bytes.

This fixes the "outsw" test in the experimental SEV tests of
kvm-unit-tests.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reported-by: NMarc Orr <marcorr@google.com>
Tested-by: NMarc Orr <marcorr@google.com>
Reviewed-by: NMarc Orr <marcorr@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9b0971ca

25 10月, 2021 2 次提交

KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block() · 0985dba8

由 David Woodhouse 提交于 10月 23, 2021

In kvm_vcpu_block, the current task is set to TASK_INTERRUPTIBLE before
making a final check whether the vCPU should be woken from HLT by any
incoming interrupt.

This is a problem for the get_user() in __kvm_xen_has_interrupt(), which
really shouldn't be sleeping when the task state has already been set.
I think it's actually harmless as it would just manifest itself as a
spurious wakeup, but it's causing a debug warning:

[  230.963649] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b6bcdbc9>] prepare_to_swait_exclusive+0x30/0x80

Fix the warning by turning it into an *explicit* spurious wakeup. When
invoked with !task_is_running(current) (and we might as well add
in_atomic() there while we're at it), just return 1 to indicate that
an IRQ is pending, which will cause a wakeup and then something will
call it again in a context that *can* sleep so it can fault the page
back in.

Cc: stable@vger.kernel.org
Fixes: 40da8ccd ("KVM: x86/xen: Add event channel interrupt vector upcall")
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>

Message-Id: <168bf8c689561da904e48e2ff5ae4713eaef9e2d.camel@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0985dba8

KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock · 8228c77d

由 David Woodhouse 提交于 10月 23, 2021

On the preemption path when updating a Xen guest's runstate times, this
lock is taken inside the scheduler rq->lock, which is a raw spinlock.
This was shown in a lockdep warning:

[   89.138354] =============================
[   89.138356] [ BUG: Invalid wait context ]
[   89.138358] 5.15.0-rc5+ #834 Tainted: G S        I E
[   89.138360] -----------------------------
[   89.138361] xen_shinfo_test/2575 is trying to lock:
[   89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm]
[   89.138442] other info that might help us debug this:
[   89.138444] context-{5:5}
[   89.138445] 4 locks held by xen_shinfo_test/2575:
[   89.138447]  #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm]
[   89.138483]  #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm]
[   89.138526]  #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0
[   89.138534]  #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm]
...
[   89.138695]  get_kvmclock_ns+0x1f/0x130 [kvm]
[   89.138734]  kvm_xen_update_runstate+0x14/0x90 [kvm]
[   89.138783]  kvm_xen_update_runstate_guest+0x15/0xd0 [kvm]
[   89.138830]  kvm_arch_vcpu_put+0xe6/0x170 [kvm]
[   89.138870]  kvm_sched_out+0x2f/0x40 [kvm]
[   89.138900]  __schedule+0x5de/0xbd0

Cc: stable@vger.kernel.org
Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
Fixes: 30b5c851 ("KVM: x86/xen: Add support for vCPU runstate information")
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <1b02a06421c17993df337493a68ba923f3bd5c0f.camel@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8228c77d

22 10月, 2021 9 次提交

KVM: SEV-ES: go over the sev_pio_data buffer in multiple passes if needed · 95e16b47

由 Paolo Bonzini 提交于 10月 12, 2021

The PIO scratch buffer is larger than a single page, and therefore
it is not possible to copy it in a single step to vcpu->arch/pio_data.
Bound each call to emulator_pio_in/out to a single page; keep
track of how many I/O operations are left in vcpu->arch.sev_pio_count,
so that the operation can be restarted in the complete_userspace_io
callback.

For OUT, this means that the previous kvm_sev_es_outs implementation
becomes an iterator of the loop, and we can consume the sev_pio_data
buffer before leaving to userspace.

For IN, instead, consuming the buffer and decreasing sev_pio_count
is always done in the complete_userspace_io callback, because that
is when the memcpy is done into sev_pio_data.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reported-by: NFelix Wilhelm <fwilhelm@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

95e16b47

KVM: SEV-ES: keep INS functions together · 4fa4b38d

由 Paolo Bonzini 提交于 10月 12, 2021

Make the diff a little nicer when we actually get to fixing
the bug.  No functional change intended.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4fa4b38d

KVM: x86: remove unnecessary arguments from complete_emulator_pio_in · 6b5efc93

由 Paolo Bonzini 提交于 10月 12, 2021

complete_emulator_pio_in can expect that vcpu->arch.pio has been filled in,
and therefore does not need the size and count arguments.  This makes things
nicer when the function is called directly from a complete_userspace_io
callback.

No functional change intended.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6b5efc93

KVM: x86: split the two parts of emulator_pio_in · 3b27de27

由 Paolo Bonzini 提交于 10月 13, 2021

emulator_pio_in handles both the case where the data is pending in
vcpu->arch.pio.count, and the case where I/O has to be done via either
an in-kernel device or a userspace exit.  For SEV-ES we would like
to split these, to identify clearly the moment at which the
sev_pio_data is consumed.  To this end, create two different
functions: __emulator_pio_in fills in vcpu->arch.pio.count, while
complete_emulator_pio_in clears it and releases vcpu->arch.pio.data.

Because this patch has to be backported, things are left a bit messy.
kernel_pio() operates on vcpu->arch.pio, which leads to emulator_pio_in()
having with two calls to complete_emulator_pio_in().  It will be fixed
in the next release.

While at it, remove the unused void* val argument of emulator_pio_in_out.
The function currently hardcodes vcpu->arch.pio_data as the
source/destination buffer, which sucks but will be fixed after the more
severe SEV-ES buffer overflow.

No functional change intended.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3b27de27

KVM: SEV-ES: clean up kvm_sev_es_ins/outs · ea724ea4

由 Paolo Bonzini 提交于 10月 12, 2021

A few very small cleanups to the functions, smushed together because
the patch is already very small like this:

- inline emulator_pio_in_emulated and emulator_pio_out_emulated,
  since we already have the vCPU

- remove the data argument and pull setting vcpu->arch.sev_pio_data into
  the caller

- remove unnecessary clearing of vcpu->arch.pio.count when
  emulation is done by the kernel (and therefore vcpu->arch.pio.count
  is already clear on exit from emulator_pio_in and emulator_pio_out).

No functional change intended.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ea724ea4

KVM: x86: leave vcpu->arch.pio.count alone in emulator_pio_in_out · 0d33b1ba

由 Paolo Bonzini 提交于 10月 13, 2021

Currently emulator_pio_in clears vcpu->arch.pio.count twice if
emulator_pio_in_out performs kernel PIO.  Move the clear into
emulator_pio_out where it is actually necessary.

No functional change intended.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0d33b1ba

KVM: SEV-ES: rename guest_ins_data to sev_pio_data · b5998402

由 Paolo Bonzini 提交于 10月 12, 2021

We will be using this field for OUTS emulation as well, in case the
data that is pushed via OUTS spans more than one page.  In that case,
there will be a need to save the data pointer across exits to userspace.

So, change the name to something that refers to any kind of PIO.
Also spell out what it is used for, namely SEV-ES.

No functional change intended.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b5998402

crypto: x86/sm4 - Fix invalid section entry size · f8690a4b

由 Tianjia Zhang 提交于 10月 15, 2021

This fixes the following warning:

  vmlinux.o: warning: objtool: elf_update: invalid section entry size

The size of the rodata section is 164 bytes, directly using the
entry_size of 164 bytes will cause errors in some versions of the
gcc compiler, while using 16 bytes directly will cause errors in
the clang compiler. This patch correct it by filling the size of
rodata to a 16-byte boundary.

Fixes: a7ee22ee ("crypto: x86/sm4 - add AES-NI/AVX/x86_64 implementation")
Fixes: 5b2efa2b ("crypto: x86/sm4 - add AES-NI/AVX2/x86_64 implementation")
Reported-by: NPeter Zijlstra <peterz@infradead.org>
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Signed-off-by: NTianjia Zhang <tianjia.zhang@linux.alibaba.com>
Tested-by: NHeyuan Shi <heyuan@linux.alibaba.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

f8690a4b

KVM: SEV: Flush cache on non-coherent systems before RECEIVE_UPDATE_DATA · c8c340a9

由 Masahiro Kozuka 提交于 9月 14, 2021

Flush the destination page before invoking RECEIVE_UPDATE_DATA, as the
PSP encrypts the data with the guest's key when writing to guest memory.
If the target memory was not previously encrypted, the cache may contain
dirty, unecrypted data that will persist on non-coherent systems.

Fixes: 15fb7de1 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: NMasahiro Kozuka <masa.koz@kozuka.jp>
[sean: converted bug report to changelog]
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210914210951.2994260-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c8c340a9

21 10月, 2021 3 次提交

KVM: MMU: Reset mmu->pkru_mask to avoid stale data · a3ca5281

由 Chenyi Qiang 提交于 10月 21, 2021

When updating mmu->pkru_mask, the value can only be added but it isn't
reset in advance. This will make mmu->pkru_mask keep the stale data.
Fix this issue.

Fixes: 2d344105 ("KVM, pkeys: introduce pkru_mask to cache conditions")
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20211021071022.1140-1-chenyi.qiang@intel.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a3ca5281

KVM: nVMX: promptly process interrupts delivered while in guest mode · 3a25dfa6

由 Paolo Bonzini 提交于 10月 20, 2021

Since commit c300ab9f ("KVM: x86: Replace late check_nested_events() hack with
more precise fix") there is no longer the certainty that check_nested_events()
tries to inject an external interrupt vmexit to L1 on every call to vcpu_enter_guest.
Therefore, even in that case we need to set KVM_REQ_EVENT. This ensures
that inject_pending_event() is called, and from there kvm_check_nested_events().

Fixes: c300ab9f ("KVM: x86: Replace late check_nested_events() hack with more precise fix")
Cc: stable@vger.kernel.org
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3a25dfa6

KVM: x86: check for interrupts before deciding whether to exit the fast path · de7cd3f6

由 Paolo Bonzini 提交于 10月 20, 2021

The kvm_x86_sync_pir_to_irr callback can sometimes set KVM_REQ_EVENT.
If that happens exactly at the time that an exit is handled as
EXIT_FASTPATH_REENTER_GUEST, vcpu_enter_guest will go incorrectly
through the loop that calls kvm_x86_run, instead of processing
the request promptly.

Fixes: 379a3c8e ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath")
Cc: stable@vger.kernel.org
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

de7cd3f6

19 10月, 2021 6 次提交

KVM: SEV-ES: reduce ghcb_sa_len to 32 bits · 9f1ee7b1

由 Paolo Bonzini 提交于 10月 18, 2021

The size of the GHCB scratch area is limited to 16 KiB (GHCB_SCRATCH_AREA_LIMIT),
so there is no need for it to be a u64.  This fixes a build error on 32-bit
systems:

i686-linux-gnu-ld: arch/x86/kvm/svm/sev.o: in function `sev_es_string_io:
sev.c:(.text+0x110f): undefined reference to `__udivdi3'

Cc: stable@vger.kernel.org
Fixes: 019057bd ("KVM: SEV-ES: fix length of string I/O")
Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9f1ee7b1

KVM: VMX: Remove redundant handling of bus lock vmexit · d61863c6

由 Hao Xiang 提交于 10月 15, 2021

Hardware may or may not set exit_reason.bus_lock_detected on BUS_LOCK
VM-Exits. Dealing with KVM_RUN_X86_BUS_LOCK in handle_bus_lock_vmexit
could be redundant when exit_reason.basic is EXIT_REASON_BUS_LOCK.

We can remove redundant handling of bus lock vmexit. Unconditionally Set
exit_reason.bus_lock_detected in handle_bus_lock_vmexit(), and deal with
KVM_RUN_X86_BUS_LOCK only in vmx_handle_exit().
Signed-off-by: NHao Xiang <hao.xiang@linux.alibaba.com>
Message-Id: <1634299161-30101-1-git-send-email-hao.xiang@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d61863c6

KVM: x86: WARN if APIC HW/SW disable static keys are non-zero on unload · 9139a7a6

由 Sean Christopherson 提交于 10月 12, 2021

WARN if the static keys used to track if any vCPU has disabled its APIC
are left elevated at module exit. Unlike the underflow case, nothing in
the static key infrastructure will complain if a key is left elevated,
and because an elevated key only affects performance, nothing in KVM will
fail if either key is improperly incremented.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211013003554.47705-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9139a7a6

Revert "KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET" · f7d8a19f

由 Sean Christopherson 提交于 10月 12, 2021

Revert a change to open code bits of kvm_lapic_set_base() when emulating
APIC RESET to fix an apic_hw_disabled underflow bug due to arch.apic_base
and apic_hw_disabled being unsyncrhonized when the APIC is created.  If
kvm_arch_vcpu_create() fails after creating the APIC, kvm_free_lapic()
will see the initialized-to-zero vcpu->arch.apic_base and decrement
apic_hw_disabled without KVM ever having incremented apic_hw_disabled.

Using kvm_lapic_set_base() in kvm_lapic_reset() is also desirable for a
potential future where KVM supports RESET outside of vCPU creation, in
which case all the side effects of kvm_lapic_set_base() are needed, e.g.
to handle the transition from x2APIC => xAPIC.

Alternatively, KVM could temporarily increment apic_hw_disabled (and call
kvm_lapic_set_base() at RESET), but that's a waste of cycles and would
impact the performance of other vCPUs and VMs.  The other subtle side
effect is that updating the xAPIC ID needs to be done at RESET regardless
of whether the APIC was previously enabled, i.e. kvm_lapic_reset() needs
an explicit call to kvm_apic_set_xapic_id() regardless of whether or not
kvm_lapic_set_base() also performs the update.  That makes stuffing the
enable bit at vCPU creation slightly more palatable, as doing so affects
only the apic_hw_disabled key.

Opportunistically tweak the comment to explicitly call out the connection
between vcpu->arch.apic_base and apic_hw_disabled, and add a comment to
call out the need to always do kvm_apic_set_xapic_id() at RESET.

Underflow scenario:

  kvm_vm_ioctl() {
    kvm_vm_ioctl_create_vcpu() {
      kvm_arch_vcpu_create() {
        if (something_went_wrong)
          goto fail_free_lapic;
        /* vcpu->arch.apic_base is initialized when something_went_wrong is false. */
        kvm_vcpu_reset() {
          kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event) {
            vcpu->arch.apic_base = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
          }
        }
        return 0;
      fail_free_lapic:
        kvm_free_lapic() {
          /* vcpu->arch.apic_base is not yet initialized when something_went_wrong is true. */
          if (!(vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE))
            static_branch_slow_dec_deferred(&apic_hw_disabled); // <= underflow bug.
        }
        return r;
      }
    }
  }

This (mostly) reverts commit 42122123.

Fixes: 42122123 ("KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET")
Reported-by: syzbot+9fc046ab2b0cf295a063@syzkaller.appspotmail.com
Debugged-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211013003554.47705-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f7d8a19f

KVM: X86: fix lazy allocation of rmaps · fa13843d

由 Paolo Bonzini 提交于 10月 15, 2021

If allocation of rmaps fails, but some of the pointers have already been written,
those pointers can be cleaned up when the memslot is freed, or even reused later
for another attempt at allocating the rmaps. Therefore there is no need to
WARN, as done for example in memslot_rmap_alloc, but the allocation *must* be
skipped lest KVM will overwrite the previous pointer and will indeed leak memory.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fa13843d

KVM: SEV-ES: Set guest_state_protected after VMSA update · baa1e5ca

由 Peter Gonda 提交于 10月 15, 2021

The refactoring in commit bb18a677 ("KVM: SEV: Acquire
vcpu mutex when updating VMSA") left behind the assignment to
svm->vcpu.arch.guest_state_protected; add it back.
Signed-off-by: NPeter Gonda <pgonda@google.com>
[Delta between v2 and v3 of Peter's patch, which had already been
 committed; the commit message is my own. - Paolo]
Fixes: bb18a677 ("KVM: SEV: Acquire vcpu mutex when updating VMSA")
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

baa1e5ca

16 10月, 2021 1 次提交

x86/fpu: Mask out the invalid MXCSR bits properly · b2381acd

由 Borislav Petkov 提交于 10月 15, 2021

This is a fix for the fix (yeah, /facepalm).

The correct mask to use is not the negation of the MXCSR_MASK but the
actual mask which contains the supported bits in the MXCSR register.

Reported and debugged by Ville Syrjälä <ville.syrjala@linux.intel.com>

Fixes: d298b035 ("x86/fpu: Restore the masking out of reserved MXCSR bits")
Signed-off-by: NBorislav Petkov <bp@suse.de>
Tested-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: NSer Olmy <ser.olmy@protonmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/YWgYIYXLriayyezv@intel.com

b2381acd

15 10月, 2021 2 次提交

perf/x86/msr: Add Sapphire Rapids CPU support · 71920ea9

由 Kan Liang 提交于 10月 06, 2021

SMI_COUNT MSR is supported on Sapphire Rapids CPU.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1633551137-192083-1-git-send-email-kan.liang@linux.intel.com

71920ea9

KVM: SEV-ES: fix length of string I/O · 019057bd

由 Paolo Bonzini 提交于 10月 12, 2021

The size of the data in the scratch buffer is not divided by the size of
each port I/O operation, so vcpu->arch.pio.count ends up being larger
than it should be by a factor of size.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Acked-by: NTom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

019057bd

12 10月, 2021 1 次提交

x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT automatically · 71188590

由 Borislav Petkov 提交于 10月 06, 2021

This Kconfig option was added initially so that memory encryption is
enabled by default on machines which support it.

However, devices which have DMA masks that are less than the bit
position of the encryption bit, aka C-bit, require the use of an IOMMU
or the use of SWIOTLB.

If the IOMMU is disabled or in passthrough mode, the kernel would switch
to SWIOTLB bounce-buffering for those transfers.

In order to avoid that,

  2cc13bb4 ("iommu: Disable passthrough mode when SME is active")

disables the default IOMMU passthrough mode so that devices for which the
default 256K DMA is insufficient, can use the IOMMU instead.

However 2, there are cases where the IOMMU is disabled in the BIOS, etc.
(think the usual hardware folk "oops, I dropped the ball there" cases) or a
driver doesn't properly use the DMA APIs or a device has a firmware or
hardware bug, e.g.:

  ea68573d ("drm/amdgpu: Fail to load on RAVEN if SME is active")

However 3, in the above GPU use case, there are APIs like Vulkan and
some OpenGL/OpenCL extensions which are under the assumption that
user-allocated memory can be passed in to the kernel driver and both the
GPU and CPU can do coherent and concurrent access to the same memory.
That cannot work with SWIOTLB bounce buffers, of course.

So, in order for those devices to function, drop the "default y" for the
SME by default active option so that users who want to have SME enabled,
will need to either enable it in their config or use "mem_encrypt=on" on
the kernel command line.

 [ tlendacky: Generalize commit message. ]

Fixes: 7744ccdb ("x86/mm: Add Secure Memory Encryption (SME) support")
Reported-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NTom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/8bbacd0e-4580-3194-19d2-a0ecad7df09c@molgen.mpg.de

71188590

08 10月, 2021 1 次提交

x86/fpu: Restore the masking out of reserved MXCSR bits · d298b035

由 Borislav Petkov 提交于 10月 06, 2021

Ser Olmy reported a boot failure:

  init[1] bad frame in sigreturn frame:(ptrval) ip:b7c9fbe6 sp:bf933310 orax:ffffffff \
	  in libc-2.33.so[b7bed000+156000]
  Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
  CPU: 0 PID: 1 Comm: init Tainted: G        W         5.14.9 #1
  Hardware name: Hewlett-Packard HP PC/HP Board, BIOS  JD.00.06 12/06/2001
  Call Trace:
   dump_stack_lvl
   dump_stack
   panic
   do_exit.cold
   do_group_exit
   get_signal
   arch_do_signal_or_restart
   ? force_sig_info_to_task
   ? force_sig
   exit_to_user_mode_prepare
   syscall_exit_to_user_mode
   do_int80_syscall_32
   entry_INT80_32

on an old 32-bit Intel CPU:

  vendor_id       : GenuineIntel
  cpu family      : 6
  model           : 6
  model name      : Celeron (Mendocino)
  stepping        : 5
  microcode       : 0x3

Ser bisected the problem to the commit in Fixes.

tglx suggested reverting the rejection of invalid MXCSR values which
this commit introduced and replacing it with what the old code did -
simply masking them out to zero.

Further debugging confirmed his suggestion:

  fpu->state.fxsave.mxcsr: 0xb7be13b4, mxcsr_feature_mask: 0xffbf
  WARNING: CPU: 0 PID: 1 at arch/x86/kernel/fpu/signal.c:384 __fpu_restore_sig+0x51f/0x540

so restore the original behavior only for 32-bit kernels where you have
ancient machines with buggy hardware. For 32-bit programs on 64-bit
kernels, user space which supplies wrong MXCSR values is considered
malicious so fail the sigframe restoration there.

Fixes: 6f9866a1 ("x86/fpu/signal: Let xrstor handle the features to init")
Reported-by: NSer Olmy <ser.olmy@protonmail.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Tested-by: NSer Olmy <ser.olmy@protonmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/YVtA67jImg3KlBTw@zn.tnic

d298b035

07 10月, 2021 7 次提交

firmware: include drivers/firmware/Kconfig unconditionally · 951cd3a0

由 Arnd Bergmann 提交于 9月 28, 2021

Compile-testing drivers that require access to a firmware layer
fails when that firmware symbol is unavailable. This happened
twice this week:

 - My proposed to change to rework the QCOM_SCM firmware symbol
   broke on ppc64 and others.

 - The cs_dsp firmware patch added device specific firmware loader
   into drivers/firmware, which broke on the same set of
   architectures.

We should probably do the same thing for other subsystems as well,
but fix this one first as this is a dependency for other patches
getting merged.
Reviewed-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: NCharles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: NWill Deacon <will@kernel.org>
Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Charles Keepax <ckeepax@opensource.cirrus.com>
Cc: Simon Trimmer <simont@opensource.cirrus.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NMark Brown <broonie@kernel.org>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

951cd3a0

x86/Kconfig: Correct reference to MWINCHIP3D · 225bac2d

由 Lukas Bulwahn 提交于 8月 03, 2021

Commit in Fixes intended to exclude the Winchip series and referred to
CONFIG_WINCHIP3D, but the config symbol is called CONFIG_MWINCHIP3D.

Hence, scripts/checkkconfigsymbols.py warns:

WINCHIP3D
Referencing files: arch/x86/Kconfig

Correct the reference to the intended config symbol.

Fixes: 69b8d3fc ("x86/Kconfig: Exclude i586-class CPUs lacking PAE support from the HIGHMEM64G Kconfig group")
Suggested-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20210803113531.30720-4-lukas.bulwahn@gmail.com

225bac2d

x86/platform/olpc: Correct ifdef symbol to intended CONFIG_OLPC_XO15_SCI · 4758fd80

由 Lukas Bulwahn 提交于 8月 03, 2021

The refactoring in the commit in Fixes introduced an ifdef
CONFIG_OLPC_XO1_5_SCI, however the config symbol is actually called
"CONFIG_OLPC_XO15_SCI".

Fortunately, ./scripts/checkkconfigsymbols.py warns:

OLPC_XO1_5_SCI
Referencing files: arch/x86/platform/olpc/olpc.c

Correct this ifdef condition to the intended config symbol.

Fixes: ec9964b4 ("Platform: OLPC: Move EC-specific functionality out from x86")
Suggested-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20210803113531.30720-3-lukas.bulwahn@gmail.com

4758fd80

x86/entry: Clear X86_FEATURE_SMAP when CONFIG_X86_SMAP=n · 3958b9c3

由 Vegard Nossum 提交于 10月 04, 2021

Commit

  3c73b81a ("x86/entry, selftests: Further improve user entry sanity checks")

added a warning if AC is set when in the kernel.

Commit

  662a0221 ("x86/entry: Fix AC assertion")

changed the warning to only fire if the CPU supports SMAP.

However, the warning can still trigger on a machine that supports SMAP
but where it's disabled in the kernel config and when running the
syscall_nt selftest, for example:

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 49 at irqentry_enter_from_user_mode
  CPU: 0 PID: 49 Comm: init Tainted: G                T 5.15.0-rc4+ #98 e6202628ee053b4f310759978284bd8bb0ce6905
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
  RIP: 0010:irqentry_enter_from_user_mode
  ...
  Call Trace:
   ? irqentry_enter
   ? exc_general_protection
   ? asm_exc_general_protection
   ? asm_exc_general_protectio

IS_ENABLED(CONFIG_X86_SMAP) could be added to the warning condition, but
even this would not be enough in case SMAP is disabled at boot time with
the "nosmap" parameter.

To be consistent with "nosmap" behaviour, clear X86_FEATURE_SMAP when
!CONFIG_X86_SMAP.

Found using entry-fuzz + satrandconfig.

 [ bp: Massage commit message. ]

Fixes: 3c73b81a ("x86/entry, selftests: Further improve user entry sanity checks")
Fixes: 662a0221 ("x86/entry: Fix AC assertion")
Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20211003223423.8666-1-vegard.nossum@oracle.com

3958b9c3

x86/entry: Correct reference to intended CONFIG_64_BIT · 2c861f2b

由 Lukas Bulwahn 提交于 8月 03, 2021

Commit in Fixes adds a condition with IS_ENABLED(CONFIG_64_BIT),
but the intended config item is called CONFIG_64BIT, as defined in
arch/x86/Kconfig.

Fortunately, scripts/checkkconfigsymbols.py warns:

64_BIT
Referencing files: arch/x86/include/asm/entry-common.h

Correct the reference to the intended config symbol.

Fixes: 662a0221 ("x86/entry: Fix AC assertion")
Suggested-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20210803113531.30720-2-lukas.bulwahn@gmail.com

2c861f2b

x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu() · d4ebfca2

由 James Morse 提交于 9月 17, 2021

Commit in Fixes separated the architecture specific and filesystem parts
of the resctrl domain structures.

This left the error paths in domain_add_cpu() kfree()ing the memory with
the wrong type.

This will cause a problem if someone adds a new member to struct
rdt_hw_domain meaning d_resctrl is no longer the first member.

Fixes: 792e0f6f ("x86/resctrl: Split struct rdt_domain")
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NReinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20210917165924.28254-1-james.morse@arm.com

d4ebfca2

x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() fails · 64e87d4b

由 James Morse 提交于 9月 17, 2021

domain_add_cpu() is called whenever a CPU is brought online. The
earlier call to domain_setup_ctrlval() allocates the control value
arrays.

If domain_setup_mon_state() fails, the control value arrays are not
freed.

Add the missing kfree() calls.

Fixes: 1bd2a63b ("x86/intel_rdt/mba_sc: Add initialization support")
Fixes: edf6fa1c ("x86/intel_rdt/cqm: Add RMID (Resource monitoring ID) management")
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NReinette Chatre <reinette.chatre@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20210917165958.28313-1-james.morse@arm.com

64e87d4b

06 10月, 2021 1 次提交

x86/hyperv: Avoid erroneously sending IPI to 'self' · f5c20e4a

由 Vitaly Kuznetsov 提交于 10月 06, 2021

__send_ipi_mask_ex() uses an optimization: when the target CPU mask is
equal to 'cpu_present_mask' it uses 'HV_GENERIC_SET_ALL' format to avoid
converting the specified cpumask to VP_SET. This case was overlooked when
'exclude_self' parameter was added. As the result, a spurious IPI to
'self' can be send.
Reported-by: NThomas Gleixner <tglx@linutronix.de>
Fixes: dfb5c1e1 ("x86/hyperv: remove on-stack cpumask from hv_send_ipi_mask_allbutself")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20211006125016.941616-1-vkuznets@redhat.comSigned-off-by: NWei Liu <wei.liu@kernel.org>

f5c20e4a

05 10月, 2021 5 次提交

xen/x86: adjust data placement · 9c11112c

由 Jan Beulich 提交于 9月 30, 2021

Both xen_pvh and xen_start_flags get written just once early during
init. Using the respective annotation then allows the open-coded placing
in .data to go away.

Additionally the former, like the latter, wants exporting, or else
xen_pvh_domain() can't be used from modules.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/8155ed26-5a1d-c06f-42d8-596d26e75849@suse.comSigned-off-by: NJuergen Gross <jgross@suse.com>

9c11112c

x86/PVH: adjust function/data placement · 59f7e537

由 Jan Beulich 提交于 9月 30, 2021

Two of the variables can live in .init.data, allowing the open-coded
placing in .data to go away. Another "variable" is used to communicate a
size value only to very early assembly code, which hence can be both
const and live in .init.*. Additionally two functions were lacking
__init annotations.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>

Link: https://lore.kernel.org/r/3b0bb22e-43f4-e459-c5cb-169f996b5669@suse.comSigned-off-by: NJuergen Gross <jgross@suse.com>

59f7e537

xen/x86: hook up xen_banner() also for PVH · 079c4baa

由 Jan Beulich 提交于 9月 30, 2021

This was effectively lost while dropping PVHv1 code. Move the function
and arrange for it to be called the same way as done in PV mode. Clearly
this then needs re-introducing the XENFEAT_mmu_pt_update_preserve_ad
check that was recently removed, as that's a PV-only feature.

Since the string pointed at by pv_info.name describes the mode, drop
"paravirtualized" from the log message while moving the code.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/de03054d-a20d-2114-bb86-eec28e17b3b8@suse.comSigned-off-by: NJuergen Gross <jgross@suse.com>

079c4baa

xen/x86: generalize preferred console model from PV to PVH Dom0 · 4d1ab432

由 Jan Beulich 提交于 9月 30, 2021

Without announcing hvc0 as preferred it won't get used as long as tty0
gets registered earlier. This is particularly problematic with there not
being any screen output for PVH Dom0 when the screen is in graphics
mode, as the necessary information doesn't get conveyed yet from the
hypervisor.

Follow PV's model, but be conservative and do this for Dom0 only for
now.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/582328b6-c86c-37f3-d802-5539b7a86736@suse.comSigned-off-by: NJuergen Gross <jgross@suse.com>

4d1ab432

xen/x86: allow "earlyprintk=xen" to work for PV Dom0 · 8e24d9bf

由 Jan Beulich 提交于 9月 30, 2021

With preferred consoles "tty" and "hvc" announced as preferred,
registering "xenboot" early won't result in use of the console: It also
needs to be registered as preferred. Generalize this from being DomU-
only so far.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/d4a34540-a476-df2c-bca6-732d0d58c5f0@suse.comSigned-off-by: NJuergen Gross <jgross@suse.com>

8e24d9bf

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功