1. 25 5月, 2022 15 次提交
    • Y
      KVM: x86: Fix the intel_pt PMI handling wrongly considered from guest · ffd1925a
      Yanfei Xu 提交于
      When kernel handles the vm-exit caused by external interrupts and NMI,
      it always sets kvm_intr_type to tell if it's dealing an IRQ or NMI. For
      the PMI scenario, it could be IRQ or NMI.
      
      However, intel_pt PMIs are only generated for HARDWARE perf events, and
      HARDWARE events are always configured to generate NMIs.  Use
      kvm_handling_nmi_from_guest() to precisely identify if the intel_pt PMI
      came from the guest; this avoids false positives if an intel_pt PMI/NMI
      arrives while the host is handling an unrelated IRQ VM-Exit.
      
      Fixes: db215756 ("KVM: x86: More precisely identify NMI from guest when handling PMI")
      Signed-off-by: NYanfei Xu <yanfei.xu@intel.com>
      Message-Id: <20220523140821.1345605-1-yanfei.xu@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ffd1925a
    • L
      KVM: selftests: x86: Sync the new name of the test case to .gitignore · 366d4a12
      Like Xu 提交于
      Fixing side effect of the so-called opportunistic change in the commit.
      
      Fixes: dc8a9febbab0 ("KVM: selftests: x86: Fix test failure on arch lbr capable platforms")
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Message-Id: <20220518170118.66263-2-likexu@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      366d4a12
    • P
    • P
      x86, kvm: use correct GFP flags for preemption disabled · baec4f5a
      Paolo Bonzini 提交于
      Commit ddd7ed842627 ("x86/kvm: Alloc dummy async #PF token outside of
      raw spinlock") leads to the following Smatch static checker warning:
      
      	arch/x86/kernel/kvm.c:212 kvm_async_pf_task_wake()
      	warn: sleeping in atomic context
      
      arch/x86/kernel/kvm.c
          202         raw_spin_lock(&b->lock);
          203         n = _find_apf_task(b, token);
          204         if (!n) {
          205                 /*
          206                  * Async #PF not yet handled, add a dummy entry for the token.
          207                  * Allocating the token must be down outside of the raw lock
          208                  * as the allocator is preemptible on PREEMPT_RT kernels.
          209                  */
          210                 if (!dummy) {
          211                         raw_spin_unlock(&b->lock);
      --> 212                         dummy = kzalloc(sizeof(*dummy), GFP_KERNEL);
                                                                      ^^^^^^^^^^
      Smatch thinks the caller has preempt disabled.  The `smdb.py preempt
      kvm_async_pf_task_wake` output call tree is:
      
      sysvec_kvm_asyncpf_interrupt() <- disables preempt
      -> __sysvec_kvm_asyncpf_interrupt()
         -> kvm_async_pf_task_wake()
      
      The caller is this:
      
      arch/x86/kernel/kvm.c
         290        DEFINE_IDTENTRY_SYSVEC(sysvec_kvm_asyncpf_interrupt)
         291        {
         292                struct pt_regs *old_regs = set_irq_regs(regs);
         293                u32 token;
         294
         295                ack_APIC_irq();
         296
         297                inc_irq_stat(irq_hv_callback_count);
         298
         299                if (__this_cpu_read(apf_reason.enabled)) {
         300                        token = __this_cpu_read(apf_reason.token);
         301                        kvm_async_pf_task_wake(token);
         302                        __this_cpu_write(apf_reason.token, 0);
         303                        wrmsrl(MSR_KVM_ASYNC_PF_ACK, 1);
         304                }
         305
         306                set_irq_regs(old_regs);
         307        }
      
      The DEFINE_IDTENTRY_SYSVEC() is a wrapper that calls this function
      from the call_on_irqstack_cond().  It's inside the call_on_irqstack_cond()
      where preempt is disabled (unless it's already disabled).  The
      irq_enter/exit_rcu() functions disable/enable preempt.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      baec4f5a
    • W
      KVM: LAPIC: Drop pending LAPIC timer injection when canceling the timer · 619f51da
      Wanpeng Li 提交于
      The timer is disarmed when switching between TSC deadline and other modes;
      however, the pending timer is still in-flight, so let's accurately remove
      any traces of the previous mode.
      
      Fixes: 44275932 ("KVM: x86: thoroughly disarm LAPIC timer around TSC deadline switch")
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      619f51da
    • S
      x86/kvm: Alloc dummy async #PF token outside of raw spinlock · 0547758a
      Sean Christopherson 提交于
      Drop the raw spinlock in kvm_async_pf_task_wake() before allocating the
      the dummy async #PF token, the allocator is preemptible on PREEMPT_RT
      kernels and must not be called from truly atomic contexts.
      
      Opportunistically document why it's ok to loop on allocation failure,
      i.e. why the function won't get stuck in an infinite loop.
      Reported-by: NYajun Deng <yajun.deng@linux.dev>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0547758a
    • S
      KVM: x86: avoid calling x86 emulator without a decoded instruction · fee060cd
      Sean Christopherson 提交于
      Whenever x86_decode_emulated_instruction() detects a breakpoint, it
      returns the value that kvm_vcpu_check_breakpoint() writes into its
      pass-by-reference second argument.  Unfortunately this is completely
      bogus because the expected outcome of x86_decode_emulated_instruction
      is an EMULATION_* value.
      
      Then, if kvm_vcpu_check_breakpoint() does "*r = 0" (corresponding to
      a KVM_EXIT_DEBUG userspace exit), it is misunderstood as EMULATION_OK
      and x86_emulate_instruction() is called without having decoded the
      instruction.  This causes various havoc from running with a stale
      emulation context.
      
      The fix is to move the call to kvm_vcpu_check_breakpoint() where it was
      before commit 4aa2691d ("KVM: x86: Factor out x86 instruction
      emulation with decoding") introduced x86_decode_emulated_instruction().
      The other caller of the function does not need breakpoint checks,
      because it is invoked as part of a vmexit and the processor has already
      checked those before executing the instruction that #GP'd.
      
      This fixes CVE-2022-1852.
      Reported-by: NQiuhao Li <qiuhao@sysec.org>
      Reported-by: NGaoning Pan <pgn@zju.edu.cn>
      Reported-by: NYongkang Jia <kangel@zju.edu.cn>
      Fixes: 4aa2691d ("KVM: x86: Factor out x86 instruction emulation with decoding")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220311032801.3467418-2-seanjc@google.com>
      [Rewrote commit message according to Qiuhao's report, since a patch
       already existed to fix the bug. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fee060cd
    • A
      KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak · d22d2474
      Ashish Kalra 提交于
      For some sev ioctl interfaces, the length parameter that is passed maybe
      less than or equal to SEV_FW_BLOB_MAX_SIZE, but larger than the data
      that PSP firmware returns. In this case, kmalloc will allocate memory
      that is the size of the input rather than the size of the data.
      Since PSP firmware doesn't fully overwrite the allocated buffer, these
      sev ioctl interface may return uninitialized kernel slab memory.
      Reported-by: NAndy Nguyen <theflow@google.com>
      Suggested-by: NDavid Rientjes <rientjes@google.com>
      Suggested-by: NPeter Gonda <pgonda@google.com>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Fixes: eaf78265 ("KVM: SVM: Move SEV code to separate file")
      Fixes: 2c07ded0 ("KVM: SVM: add support for SEV attestation command")
      Fixes: 4cfdd47d ("KVM: SVM: Add KVM_SEV SEND_START command")
      Fixes: d3d1af85 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
      Fixes: eba04b20 ("KVM: x86: Account a variety of miscellaneous allocations")
      Signed-off-by: NAshish Kalra <ashish.kalra@amd.com>
      Reviewed-by: NPeter Gonda <pgonda@google.com>
      Message-Id: <20220516154310.3685678-1-Ashish.Kalra@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d22d2474
    • S
      x86/fpu: KVM: Set the base guest FPU uABI size to sizeof(struct kvm_xsave) · d187ba53
      Sean Christopherson 提交于
      Set the starting uABI size of KVM's guest FPU to 'struct kvm_xsave',
      i.e. to KVM's historical uABI size.  When saving FPU state for usersapce,
      KVM (well, now the FPU) sets the FP+SSE bits in the XSAVE header even if
      the host doesn't support XSAVE.  Setting the XSAVE header allows the VM
      to be migrated to a host that does support XSAVE without the new host
      having to handle FPU state that may or may not be compatible with XSAVE.
      
      Setting the uABI size to the host's default size results in out-of-bounds
      writes (setting the FP+SSE bits) and data corruption (that is thankfully
      caught by KASAN) when running on hosts without XSAVE, e.g. on Core2 CPUs.
      
      WARN if the default size is larger than KVM's historical uABI size; all
      features that can push the FPU size beyond the historical size must be
      opt-in.
      
        ==================================================================
        BUG: KASAN: slab-out-of-bounds in fpu_copy_uabi_to_guest_fpstate+0x86/0x130
        Read of size 8 at addr ffff888011e33a00 by task qemu-build/681
        CPU: 1 PID: 681 Comm: qemu-build Not tainted 5.18.0-rc5-KASAN-amd64 #1
        Hardware name:  /DG35EC, BIOS ECG3510M.86A.0118.2010.0113.1426 01/13/2010
        Call Trace:
         <TASK>
         dump_stack_lvl+0x34/0x45
         print_report.cold+0x45/0x575
         kasan_report+0x9b/0xd0
         fpu_copy_uabi_to_guest_fpstate+0x86/0x130
         kvm_arch_vcpu_ioctl+0x72a/0x1c50 [kvm]
         kvm_vcpu_ioctl+0x47f/0x7b0 [kvm]
         __x64_sys_ioctl+0x5de/0xc90
         do_syscall_64+0x31/0x50
         entry_SYSCALL_64_after_hwframe+0x44/0xae
         </TASK>
        Allocated by task 0:
        (stack is not available)
        The buggy address belongs to the object at ffff888011e33800
         which belongs to the cache kmalloc-512 of size 512
        The buggy address is located 0 bytes to the right of
         512-byte region [ffff888011e33800, ffff888011e33a00)
        The buggy address belongs to the physical page:
        page:0000000089cd4adb refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11e30
        head:0000000089cd4adb order:2 compound_mapcount:0 compound_pincount:0
        flags: 0x4000000000010200(slab|head|zone=1)
        raw: 4000000000010200 dead000000000100 dead000000000122 ffff888001041c80
        raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
        page dumped because: kasan: bad access detected
        Memory state around the buggy address:
         ffff888011e33900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
         ffff888011e33980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        >ffff888011e33a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                           ^
         ffff888011e33a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
         ffff888011e33b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        ==================================================================
        Disabling lock debugging due to kernel taint
      
      Fixes: be50b206 ("kvm: x86: Add support for getting/setting expanded xstate buffer")
      Fixes: c60427dd ("x86/fpu: Add uabi_size to guest_fpu")
      Reported-by: NZdenek Kaspar <zkaspar82@gmail.com>
      Cc: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Tested-by: NZdenek Kaspar <zkaspar82@gmail.com>
      Message-Id: <20220504001219.983513-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d187ba53
    • P
      s390/uv_uapi: depend on CONFIG_S390 · eb3de2d8
      Paolo Bonzini 提交于
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eb3de2d8
    • P
      Merge tag 'kvm-s390-next-5.19-1' of... · 1644e270
      Paolo Bonzini 提交于
      Merge tag 'kvm-s390-next-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      
      KVM: s390: Fix and feature for 5.19
      
      - ultravisor communication device driver
      - fix TEID on terminating storage key ops
      1644e270
    • P
      Merge tag 'kvm-riscv-5.19-1' of https://github.com/kvm-riscv/linux into HEAD · b699da3d
      Paolo Bonzini 提交于
      KVM/riscv changes for 5.19
      
      - Added Sv57x4 support for G-stage page table
      - Added range based local HFENCE functions
      - Added remote HFENCE functions based on VCPU requests
      - Added ISA extension registers in ONE_REG interface
      - Updated KVM RISC-V maintainers entry to cover selftests support
      b699da3d
    • P
      Merge tag 'kvmarm-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD · 47e8eec8
      Paolo Bonzini 提交于
      KVM/arm64 updates for 5.19
      
      - Add support for the ARMv8.6 WFxT extension
      
      - Guard pages for the EL2 stacks
      
      - Trap and emulate AArch32 ID registers to hide unsupported features
      
      - Ability to select and save/restore the set of hypercalls exposed
        to the guest
      
      - Support for PSCI-initiated suspend in collaboration with userspace
      
      - GICv3 register-based LPI invalidation support
      
      - Move host PMU event merging into the vcpu data structure
      
      - GICv3 ITS save/restore fixes
      
      - The usual set of small-scale cleanups and fixes
      
      [Due to the conflict, KVM_SYSTEM_EVENT_SEV_TERM is relocated
       from 4 to 6. - Paolo]
      47e8eec8
    • Y
      KVM: selftests: x86: Fix test failure on arch lbr capable platforms · 825be3b5
      Yang Weijiang 提交于
      On Arch LBR capable platforms, LBR_FMT in perf capability msr is 0x3f,
      so the last format test will fail. Use a true invalid format(0x30) for
      the test if it's running on these platforms. Opportunistically change
      the file name to reflect the tests actually carried out.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NYang Weijiang <weijiang.yang@intel.com>
      Message-Id: <20220512084046.105479-1-weijiang.yang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      825be3b5
    • W
      KVM: LAPIC: Trace LAPIC timer expiration on every vmentry · e0ac5351
      Wanpeng Li 提交于
      In commit ec0671d5 ("KVM: LAPIC: Delay trace_kvm_wait_lapic_expire
      tracepoint to after vmexit", 2019-06-04), trace_kvm_wait_lapic_expire
      was moved after guest_exit_irqoff() because invoking tracepoints within
      kvm_guest_enter/kvm_guest_exit caused a lockdep splat.
      
      These days this is not necessary, because commit 87fa7f3e ("x86/kvm:
      Move context tracking where it belongs", 2020-07-09) restricted
      the RCU extended quiescent state to be closer to vmentry/vmexit.
      Moving the tracepoint back to __kvm_wait_lapic_expire is more accurate,
      because it will be reported even if vcpu_enter_guest causes multiple
      vmentries via the IPI/Timer fast paths, and it allows the removal of
      advance_expire_delta.
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1650961551-38390-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e0ac5351
  2. 20 5月, 2022 15 次提交
  3. 17 5月, 2022 7 次提交
    • M
      Merge branch kvm-arm64/its-save-restore-fixes-5.19 into kvmarm-master/next · 5c0ad551
      Marc Zyngier 提交于
      * kvm-arm64/its-save-restore-fixes-5.19:
        : .
        : Tighten the ITS save/restore infrastructure to fail early rather
        : than late. Patches courtesy of Rocardo Koller.
        : .
        KVM: arm64: vgic: Undo work in failed ITS restores
        KVM: arm64: vgic: Do not ignore vgic_its_restore_cte failures
        KVM: arm64: vgic: Add more checks when restoring ITS tables
        KVM: arm64: vgic: Check that new ITEs could be saved in guest memory
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      5c0ad551
    • M
      Merge branch kvm-arm64/misc-5.19 into kvmarm-master/next · 822ca7f8
      Marc Zyngier 提交于
      * kvm-arm64/misc-5.19:
        : .
        : Misc fixes and general improvements for KVMM/arm64:
        :
        : - Better handle out of sequence sysregs in the global tables
        :
        : - Remove a couple of unnecessary loads from constant pool
        :
        : - Drop unnecessary pKVM checks
        :
        : - Add all known M1 implementations to the SEIS workaround
        :
        : - Cleanup kerneldoc warnings
        : .
        KVM: arm64: vgic-v3: List M1 Pro/Max as requiring the SEIS workaround
        KVM: arm64: pkvm: Don't mask already zeroed FEAT_SVE
        KVM: arm64: pkvm: Drop unnecessary FP/SIMD trap handler
        KVM: arm64: nvhe: Eliminate kernel-doc warnings
        KVM: arm64: Avoid unnecessary absolute addressing via literals
        KVM: arm64: Print emulated register table name when it is unsorted
        KVM: arm64: Don't BUG_ON() if emulated register table is unsorted
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      822ca7f8
    • M
      Merge branch kvm-arm64/per-vcpu-host-pmu-data into kvmarm-master/next · 8794b4f5
      Marc Zyngier 提交于
      * kvm-arm64/per-vcpu-host-pmu-data:
        : .
        : Pass the host PMU state in the vcpu to avoid the use of additional
        : shared memory between EL1 and EL2 (this obviously only applies
        : to nVHE and Protected setups).
        :
        : Patches courtesy of Fuad Tabba.
        : .
        KVM: arm64: pmu: Restore compilation when HW_PERF_EVENTS isn't selected
        KVM: arm64: Reenable pmu in Protected Mode
        KVM: arm64: Pass pmu events to hyp via vcpu
        KVM: arm64: Repack struct kvm_pmu to reduce size
        KVM: arm64: Wrapper for getting pmu_events
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      8794b4f5
    • M
      Merge branch kvm-arm64/vgic-invlpir into kvmarm-master/next · ec2cff6c
      Marc Zyngier 提交于
      * kvm-arm64/vgic-invlpir:
        : .
        : Implement MMIO-based LPI invalidation for vGICv3.
        : .
        KVM: arm64: vgic-v3: Advertise GICR_CTLR.{IR, CES} as a new GICD_IIDR revision
        KVM: arm64: vgic-v3: Implement MMIO-based LPI invalidation
        KVM: arm64: vgic-v3: Expose GICR_CTLR.RWP when disabling LPIs
        irqchip/gic-v3: Exposes bit values for GICR_CTLR.{IR, CES}
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      ec2cff6c
    • M
      Merge branch kvm-arm64/psci-suspend into kvmarm-master/next · 3b8e21e3
      Marc Zyngier 提交于
      * kvm-arm64/psci-suspend:
        : .
        : Add support for PSCI SYSTEM_SUSPEND and allow userspace to
        : filter the wake-up events.
        :
        : Patches courtesy of Oliver.
        : .
        Documentation: KVM: Fix title level for PSCI_SUSPEND
        selftests: KVM: Test SYSTEM_SUSPEND PSCI call
        selftests: KVM: Refactor psci_test to make it amenable to new tests
        selftests: KVM: Use KVM_SET_MP_STATE to power off vCPU in psci_test
        selftests: KVM: Create helper for making SMCCC calls
        selftests: KVM: Rename psci_cpu_on_test to psci_test
        KVM: arm64: Implement PSCI SYSTEM_SUSPEND
        KVM: arm64: Add support for userspace to suspend a vCPU
        KVM: arm64: Return a value from check_vcpu_requests()
        KVM: arm64: Rename the KVM_REQ_SLEEP handler
        KVM: arm64: Track vCPU power state using MP state values
        KVM: arm64: Dedupe vCPU power off helpers
        KVM: arm64: Don't depend on fallthrough to hide SYSTEM_RESET2
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      3b8e21e3
    • M
      Merge branch kvm-arm64/hcall-selection into kvmarm-master/next · 0586e28a
      Marc Zyngier 提交于
      * kvm-arm64/hcall-selection:
        : .
        : Introduce a new set of virtual sysregs for userspace to
        : select the hypercalls it wants to see exposed to the guest.
        :
        : Patches courtesy of Raghavendra and Oliver.
        : .
        KVM: arm64: Fix hypercall bitmap writeback when vcpus have already run
        KVM: arm64: Hide KVM_REG_ARM_*_BMAP_BIT_COUNT from userspace
        Documentation: Fix index.rst after psci.rst renaming
        selftests: KVM: aarch64: Add the bitmap firmware registers to get-reg-list
        selftests: KVM: aarch64: Introduce hypercall ABI test
        selftests: KVM: Create helper for making SMCCC calls
        selftests: KVM: Rename psci_cpu_on_test to psci_test
        tools: Import ARM SMCCC definitions
        Docs: KVM: Add doc for the bitmap firmware registers
        Docs: KVM: Rename psci.rst to hypercalls.rst
        KVM: arm64: Add vendor hypervisor firmware register
        KVM: arm64: Add standard hypervisor firmware register
        KVM: arm64: Setup a framework for hypercall bitmap firmware registers
        KVM: arm64: Factor out firmware register handling from psci.c
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      0586e28a
    • M
      KVM: arm64: Fix hypercall bitmap writeback when vcpus have already run · 528ada28
      Marc Zyngier 提交于
      We generally want to disallow hypercall bitmaps being changed
      once vcpus have already run. But we must allow the write if
      the written value is unchanged so that userspace can rewrite
      the register file on reboot, for example.
      
      Without this, a QEMU-based VM will fail to reboot correctly.
      
      The original code was correct, and it is me that introduced
      the regression.
      
      Fixes: 05714cab ("KVM: arm64: Setup a framework for hypercall bitmap firmware registers")
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      528ada28
  4. 16 5月, 2022 3 次提交