1. 28 10月, 2021 1 次提交
    • D
      KVM: x86: Take srcu lock in post_kvm_run_save() · f3d1436d
      David Woodhouse 提交于
      The Xen interrupt injection for event channels relies on accessing the
      guest's vcpu_info structure in __kvm_xen_has_interrupt(), through a
      gfn_to_hva_cache.
      
      This requires the srcu lock to be held, which is mostly the case except
      for this code path:
      
      [   11.822877] WARNING: suspicious RCU usage
      [   11.822965] -----------------------------
      [   11.823013] include/linux/kvm_host.h:664 suspicious rcu_dereference_check() usage!
      [   11.823131]
      [   11.823131] other info that might help us debug this:
      [   11.823131]
      [   11.823196]
      [   11.823196] rcu_scheduler_active = 2, debug_locks = 1
      [   11.823253] 1 lock held by dom:0/90:
      [   11.823292]  #0: ffff998956ec8118 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x680
      [   11.823379]
      [   11.823379] stack backtrace:
      [   11.823428] CPU: 2 PID: 90 Comm: dom:0 Kdump: loaded Not tainted 5.4.34+ #5
      [   11.823496] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      [   11.823612] Call Trace:
      [   11.823645]  dump_stack+0x7a/0xa5
      [   11.823681]  lockdep_rcu_suspicious+0xc5/0x100
      [   11.823726]  __kvm_xen_has_interrupt+0x179/0x190
      [   11.823773]  kvm_cpu_has_extint+0x6d/0x90
      [   11.823813]  kvm_cpu_accept_dm_intr+0xd/0x40
      [   11.823853]  kvm_vcpu_ready_for_interrupt_injection+0x20/0x30
                    < post_kvm_run_save() inlined here >
      [   11.823906]  kvm_arch_vcpu_ioctl_run+0x135/0x6a0
      [   11.823947]  kvm_vcpu_ioctl+0x263/0x680
      
      Fixes: 40da8ccd ("KVM: x86/xen: Add event channel interrupt vector upcall")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: stable@vger.kernel.org
      Message-Id: <606aaaf29fca3850a63aa4499826104e77a72346.camel@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f3d1436d
  2. 27 10月, 2021 1 次提交
  3. 25 10月, 2021 2 次提交
    • D
      KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block() · 0985dba8
      David Woodhouse 提交于
      In kvm_vcpu_block, the current task is set to TASK_INTERRUPTIBLE before
      making a final check whether the vCPU should be woken from HLT by any
      incoming interrupt.
      
      This is a problem for the get_user() in __kvm_xen_has_interrupt(), which
      really shouldn't be sleeping when the task state has already been set.
      I think it's actually harmless as it would just manifest itself as a
      spurious wakeup, but it's causing a debug warning:
      
      [  230.963649] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b6bcdbc9>] prepare_to_swait_exclusive+0x30/0x80
      
      Fix the warning by turning it into an *explicit* spurious wakeup. When
      invoked with !task_is_running(current) (and we might as well add
      in_atomic() there while we're at it), just return 1 to indicate that
      an IRQ is pending, which will cause a wakeup and then something will
      call it again in a context that *can* sleep so it can fault the page
      back in.
      
      Cc: stable@vger.kernel.org
      Fixes: 40da8ccd ("KVM: x86/xen: Add event channel interrupt vector upcall")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      
      Message-Id: <168bf8c689561da904e48e2ff5ae4713eaef9e2d.camel@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0985dba8
    • D
      KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock · 8228c77d
      David Woodhouse 提交于
      On the preemption path when updating a Xen guest's runstate times, this
      lock is taken inside the scheduler rq->lock, which is a raw spinlock.
      This was shown in a lockdep warning:
      
      [   89.138354] =============================
      [   89.138356] [ BUG: Invalid wait context ]
      [   89.138358] 5.15.0-rc5+ #834 Tainted: G S        I E
      [   89.138360] -----------------------------
      [   89.138361] xen_shinfo_test/2575 is trying to lock:
      [   89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm]
      [   89.138442] other info that might help us debug this:
      [   89.138444] context-{5:5}
      [   89.138445] 4 locks held by xen_shinfo_test/2575:
      [   89.138447]  #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm]
      [   89.138483]  #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm]
      [   89.138526]  #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0
      [   89.138534]  #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm]
      ...
      [   89.138695]  get_kvmclock_ns+0x1f/0x130 [kvm]
      [   89.138734]  kvm_xen_update_runstate+0x14/0x90 [kvm]
      [   89.138783]  kvm_xen_update_runstate_guest+0x15/0xd0 [kvm]
      [   89.138830]  kvm_arch_vcpu_put+0xe6/0x170 [kvm]
      [   89.138870]  kvm_sched_out+0x2f/0x40 [kvm]
      [   89.138900]  __schedule+0x5de/0xbd0
      
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
      Fixes: 30b5c851 ("KVM: x86/xen: Add support for vCPU runstate information")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <1b02a06421c17993df337493a68ba923f3bd5c0f.camel@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8228c77d
  4. 22 10月, 2021 9 次提交
  5. 21 10月, 2021 3 次提交
  6. 19 10月, 2021 6 次提交
    • P
      KVM: SEV-ES: reduce ghcb_sa_len to 32 bits · 9f1ee7b1
      Paolo Bonzini 提交于
      The size of the GHCB scratch area is limited to 16 KiB (GHCB_SCRATCH_AREA_LIMIT),
      so there is no need for it to be a u64.  This fixes a build error on 32-bit
      systems:
      
      i686-linux-gnu-ld: arch/x86/kvm/svm/sev.o: in function `sev_es_string_io:
      sev.c:(.text+0x110f): undefined reference to `__udivdi3'
      
      Cc: stable@vger.kernel.org
      Fixes: 019057bd ("KVM: SEV-ES: fix length of string I/O")
      Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9f1ee7b1
    • H
      KVM: VMX: Remove redundant handling of bus lock vmexit · d61863c6
      Hao Xiang 提交于
      Hardware may or may not set exit_reason.bus_lock_detected on BUS_LOCK
      VM-Exits. Dealing with KVM_RUN_X86_BUS_LOCK in handle_bus_lock_vmexit
      could be redundant when exit_reason.basic is EXIT_REASON_BUS_LOCK.
      
      We can remove redundant handling of bus lock vmexit. Unconditionally Set
      exit_reason.bus_lock_detected in handle_bus_lock_vmexit(), and deal with
      KVM_RUN_X86_BUS_LOCK only in vmx_handle_exit().
      Signed-off-by: NHao Xiang <hao.xiang@linux.alibaba.com>
      Message-Id: <1634299161-30101-1-git-send-email-hao.xiang@linux.alibaba.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d61863c6
    • S
      KVM: x86: WARN if APIC HW/SW disable static keys are non-zero on unload · 9139a7a6
      Sean Christopherson 提交于
      WARN if the static keys used to track if any vCPU has disabled its APIC
      are left elevated at module exit.  Unlike the underflow case, nothing in
      the static key infrastructure will complain if a key is left elevated,
      and because an elevated key only affects performance, nothing in KVM will
      fail if either key is improperly incremented.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211013003554.47705-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9139a7a6
    • S
      Revert "KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET" · f7d8a19f
      Sean Christopherson 提交于
      Revert a change to open code bits of kvm_lapic_set_base() when emulating
      APIC RESET to fix an apic_hw_disabled underflow bug due to arch.apic_base
      and apic_hw_disabled being unsyncrhonized when the APIC is created.  If
      kvm_arch_vcpu_create() fails after creating the APIC, kvm_free_lapic()
      will see the initialized-to-zero vcpu->arch.apic_base and decrement
      apic_hw_disabled without KVM ever having incremented apic_hw_disabled.
      
      Using kvm_lapic_set_base() in kvm_lapic_reset() is also desirable for a
      potential future where KVM supports RESET outside of vCPU creation, in
      which case all the side effects of kvm_lapic_set_base() are needed, e.g.
      to handle the transition from x2APIC => xAPIC.
      
      Alternatively, KVM could temporarily increment apic_hw_disabled (and call
      kvm_lapic_set_base() at RESET), but that's a waste of cycles and would
      impact the performance of other vCPUs and VMs.  The other subtle side
      effect is that updating the xAPIC ID needs to be done at RESET regardless
      of whether the APIC was previously enabled, i.e. kvm_lapic_reset() needs
      an explicit call to kvm_apic_set_xapic_id() regardless of whether or not
      kvm_lapic_set_base() also performs the update.  That makes stuffing the
      enable bit at vCPU creation slightly more palatable, as doing so affects
      only the apic_hw_disabled key.
      
      Opportunistically tweak the comment to explicitly call out the connection
      between vcpu->arch.apic_base and apic_hw_disabled, and add a comment to
      call out the need to always do kvm_apic_set_xapic_id() at RESET.
      
      Underflow scenario:
      
        kvm_vm_ioctl() {
          kvm_vm_ioctl_create_vcpu() {
            kvm_arch_vcpu_create() {
              if (something_went_wrong)
                goto fail_free_lapic;
              /* vcpu->arch.apic_base is initialized when something_went_wrong is false. */
              kvm_vcpu_reset() {
                kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event) {
                  vcpu->arch.apic_base = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
                }
              }
              return 0;
            fail_free_lapic:
              kvm_free_lapic() {
                /* vcpu->arch.apic_base is not yet initialized when something_went_wrong is true. */
                if (!(vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE))
                  static_branch_slow_dec_deferred(&apic_hw_disabled); // <= underflow bug.
              }
              return r;
            }
          }
        }
      
      This (mostly) reverts commit 42122123.
      
      Fixes: 42122123 ("KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET")
      Reported-by: syzbot+9fc046ab2b0cf295a063@syzkaller.appspotmail.com
      Debugged-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211013003554.47705-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f7d8a19f
    • P
      KVM: X86: fix lazy allocation of rmaps · fa13843d
      Paolo Bonzini 提交于
      If allocation of rmaps fails, but some of the pointers have already been written,
      those pointers can be cleaned up when the memslot is freed, or even reused later
      for another attempt at allocating the rmaps.  Therefore there is no need to
      WARN, as done for example in memslot_rmap_alloc, but the allocation *must* be
      skipped lest KVM will overwrite the previous pointer and will indeed leak memory.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fa13843d
    • P
      KVM: SEV-ES: Set guest_state_protected after VMSA update · baa1e5ca
      Peter Gonda 提交于
      The refactoring in commit bb18a677 ("KVM: SEV: Acquire
      vcpu mutex when updating VMSA") left behind the assignment to
      svm->vcpu.arch.guest_state_protected; add it back.
      Signed-off-by: NPeter Gonda <pgonda@google.com>
      [Delta between v2 and v3 of Peter's patch, which had already been
       committed; the commit message is my own. - Paolo]
      Fixes: bb18a677 ("KVM: SEV: Acquire vcpu mutex when updating VMSA")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      baa1e5ca
  7. 16 10月, 2021 1 次提交
  8. 15 10月, 2021 2 次提交
  9. 12 10月, 2021 1 次提交
    • B
      x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT automatically · 71188590
      Borislav Petkov 提交于
      This Kconfig option was added initially so that memory encryption is
      enabled by default on machines which support it.
      
      However, devices which have DMA masks that are less than the bit
      position of the encryption bit, aka C-bit, require the use of an IOMMU
      or the use of SWIOTLB.
      
      If the IOMMU is disabled or in passthrough mode, the kernel would switch
      to SWIOTLB bounce-buffering for those transfers.
      
      In order to avoid that,
      
        2cc13bb4 ("iommu: Disable passthrough mode when SME is active")
      
      disables the default IOMMU passthrough mode so that devices for which the
      default 256K DMA is insufficient, can use the IOMMU instead.
      
      However 2, there are cases where the IOMMU is disabled in the BIOS, etc.
      (think the usual hardware folk "oops, I dropped the ball there" cases) or a
      driver doesn't properly use the DMA APIs or a device has a firmware or
      hardware bug, e.g.:
      
        ea68573d ("drm/amdgpu: Fail to load on RAVEN if SME is active")
      
      However 3, in the above GPU use case, there are APIs like Vulkan and
      some OpenGL/OpenCL extensions which are under the assumption that
      user-allocated memory can be passed in to the kernel driver and both the
      GPU and CPU can do coherent and concurrent access to the same memory.
      That cannot work with SWIOTLB bounce buffers, of course.
      
      So, in order for those devices to function, drop the "default y" for the
      SME by default active option so that users who want to have SME enabled,
      will need to either enable it in their config or use "mem_encrypt=on" on
      the kernel command line.
      
       [ tlendacky: Generalize commit message. ]
      
      Fixes: 7744ccdb ("x86/mm: Add Secure Memory Encryption (SME) support")
      Reported-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NAlex Deucher <alexander.deucher@amd.com>
      Acked-by: NTom Lendacky <thomas.lendacky@amd.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/8bbacd0e-4580-3194-19d2-a0ecad7df09c@molgen.mpg.de
      71188590
  10. 08 10月, 2021 1 次提交
    • B
      x86/fpu: Restore the masking out of reserved MXCSR bits · d298b035
      Borislav Petkov 提交于
      Ser Olmy reported a boot failure:
      
        init[1] bad frame in sigreturn frame:(ptrval) ip:b7c9fbe6 sp:bf933310 orax:ffffffff \
      	  in libc-2.33.so[b7bed000+156000]
        Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
        CPU: 0 PID: 1 Comm: init Tainted: G        W         5.14.9 #1
        Hardware name: Hewlett-Packard HP PC/HP Board, BIOS  JD.00.06 12/06/2001
        Call Trace:
         dump_stack_lvl
         dump_stack
         panic
         do_exit.cold
         do_group_exit
         get_signal
         arch_do_signal_or_restart
         ? force_sig_info_to_task
         ? force_sig
         exit_to_user_mode_prepare
         syscall_exit_to_user_mode
         do_int80_syscall_32
         entry_INT80_32
      
      on an old 32-bit Intel CPU:
      
        vendor_id       : GenuineIntel
        cpu family      : 6
        model           : 6
        model name      : Celeron (Mendocino)
        stepping        : 5
        microcode       : 0x3
      
      Ser bisected the problem to the commit in Fixes.
      
      tglx suggested reverting the rejection of invalid MXCSR values which
      this commit introduced and replacing it with what the old code did -
      simply masking them out to zero.
      
      Further debugging confirmed his suggestion:
      
        fpu->state.fxsave.mxcsr: 0xb7be13b4, mxcsr_feature_mask: 0xffbf
        WARNING: CPU: 0 PID: 1 at arch/x86/kernel/fpu/signal.c:384 __fpu_restore_sig+0x51f/0x540
      
      so restore the original behavior only for 32-bit kernels where you have
      ancient machines with buggy hardware. For 32-bit programs on 64-bit
      kernels, user space which supplies wrong MXCSR values is considered
      malicious so fail the sigframe restoration there.
      
      Fixes: 6f9866a1 ("x86/fpu/signal: Let xrstor handle the features to init")
      Reported-by: NSer Olmy <ser.olmy@protonmail.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NSer Olmy <ser.olmy@protonmail.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/YVtA67jImg3KlBTw@zn.tnic
      d298b035
  11. 07 10月, 2021 7 次提交
  12. 06 10月, 2021 1 次提交
  13. 05 10月, 2021 5 次提交