1. 02 8月, 2021 16 次提交
  2. 30 7月, 2021 1 次提交
    • P
      KVM: x86: accept userspace interrupt only if no event is injected · fa7a549d
      Paolo Bonzini 提交于
      Once an exception has been injected, any side effects related to
      the exception (such as setting CR2 or DR6) have been taked place.
      Therefore, once KVM sets the VM-entry interruption information
      field or the AMD EVENTINJ field, the next VM-entry must deliver that
      exception.
      
      Pending interrupts are processed after injected exceptions, so
      in theory it would not be a problem to use KVM_INTERRUPT when
      an injected exception is present.  However, DOSEMU is using
      run->ready_for_interrupt_injection to detect interrupt windows
      and then using KVM_SET_SREGS/KVM_SET_REGS to inject the
      interrupt manually.  For this to work, the interrupt window
      must be delayed after the completion of the previous event
      injection.
      
      Cc: stable@vger.kernel.org
      Reported-by: NStas Sergeev <stsp2@yandex.ru>
      Tested-by: NStas Sergeev <stsp2@yandex.ru>
      Fixes: 71cc849b ("KVM: x86: Fix split-irqchip vs interrupt injection window request")
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fa7a549d
  3. 28 7月, 2021 6 次提交
  4. 26 7月, 2021 3 次提交
  5. 17 7月, 2021 8 次提交
  6. 16 7月, 2021 4 次提交
    • M
      arm64: entry: fix KCOV suppression · e6f85cbe
      Mark Rutland 提交于
      We suppress KCOV for entry.o rather than entry-common.o. As entry.o is
      built from entry.S, this is pointless, and permits instrumentation of
      entry-common.o, which is built from entry-common.c.
      
      Fix the Makefile to suppress KCOV for entry-common.o, as we had intended
      to begin with. I've verified with objdump that this is working as
      expected.
      
      Fixes: bf6fa2c0 ("arm64: entry: don't instrument entry code with KCOV")
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20210715123049.9990-1-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>
      e6f85cbe
    • M
      arm64: entry: add missing noinstr · 31a7f0f6
      Mark Rutland 提交于
      We intend that all the early exception handling code is marked as
      `noinstr`, but we forgot this for __el0_error_handler_common(), which is
      called before we have completed entry from user mode. If it were
      instrumented, we could run into problems with RCU, lockdep, etc.
      
      Mark it as `noinstr` to prevent this.
      
      The few other functions in entry-common.c which do not have `noinstr` are
      called once we've completed entry, and are safe to instrument.
      
      Fixes: bb8e93a2 ("arm64: entry: convert SError handlers to C")
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Joey Gouly <joey.gouly@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20210714172801.16475-1-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>
      31a7f0f6
    • M
      arm64: mte: fix restoration of GCR_EL1 from suspend · 59f44069
      Mark Rutland 提交于
      Since commit:
      
        bad1e1c6 ("arm64: mte: switch GCR_EL1 in kernel entry and exit")
      
      we saved/restored the user GCR_EL1 value at exception boundaries, and
      update_gcr_el1_excl() is no longer used for this. However it is used to
      restore the kernel's GCR_EL1 value when returning from a suspend state.
      Thus, the comment is misleading (and an ISB is necessary).
      
      When restoring the kernel's GCR value, we need an ISB to ensure this is
      used by subsequent instructions. We don't necessarily get an ISB by
      other means (e.g. if the kernel is built without support for pointer
      authentication). As __cpu_setup() initialised GCR_EL1.Exclude to 0xffff,
      until a context synchronization event, allocation tag 0 may be used
      rather than the desired set of tags.
      
      This patch drops the misleading comment, adds the missing ISB, and for
      clarity folds update_gcr_el1_excl() into its only user.
      
      Fixes: bad1e1c6 ("arm64: mte: switch GCR_EL1 in kernel entry and exit")
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20210714143843.56537-2-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>
      59f44069
    • R
      arm64: Avoid premature usercopy failure · 295cf156
      Robin Murphy 提交于
      Al reminds us that the usercopy API must only return complete failure
      if absolutely nothing could be copied. Currently, if userspace does
      something silly like giving us an unaligned pointer to Device memory,
      or a size which overruns MTE tag bounds, we may fail to honour that
      requirement when faulting on a multi-byte access even though a smaller
      access could have succeeded.
      
      Add a mitigation to the fixup routines to fall back to a single-byte
      copy if we faulted on a larger access before anything has been written
      to the destination, to guarantee making *some* forward progress. We
      needn't be too concerned about the overall performance since this should
      only occur when callers are doing something a bit dodgy in the first
      place. Particularly broken userspace might still be able to trick
      generic_perform_write() into an infinite loop by targeting write() at
      an mmap() of some read-only device register where the fault-in load
      succeeds but any store synchronously aborts such that copy_to_user() is
      genuinely unable to make progress, but, well, don't do that...
      
      CC: stable@vger.kernel.org
      Reported-by: NChen Huang <chenhuang5@huawei.com>
      Suggested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Link: https://lore.kernel.org/r/dc03d5c675731a1f24a62417dba5429ad744234e.1626098433.git.robin.murphy@arm.comSigned-off-by: NWill Deacon <will@kernel.org>
      295cf156
  7. 15 7月, 2021 2 次提交
    • V
      KVM: nSVM: Restore nested control upon leaving SMM · bb00bd9c
      Vitaly Kuznetsov 提交于
      If the VM was migrated while in SMM, no nested state was saved/restored,
      and therefore svm_leave_smm has to load both save and control area
      of the vmcb12. Save area is already loaded from HSAVE area,
      so now load the control area as well from the vmcb12.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210628104425.391276-6-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bb00bd9c
    • V
      KVM: nSVM: Fix L1 state corruption upon return from SMM · 37be407b
      Vitaly Kuznetsov 提交于
      VMCB split commit 4995a368 ("KVM: SVM: Use a separate vmcb for the
      nested L2 guest") broke return from SMM when we entered there from guest
      (L2) mode. Gen2 WS2016/Hyper-V is known to do this on boot. The problem
      manifests itself like this:
      
        kvm_exit:             reason EXIT_RSM rip 0x7ffbb280 info 0 0
        kvm_emulate_insn:     0:7ffbb280: 0f aa
        kvm_smm_transition:   vcpu 0: leaving SMM, smbase 0x7ffb3000
        kvm_nested_vmrun:     rip: 0x000000007ffbb280 vmcb: 0x0000000008224000
          nrip: 0xffffffffffbbe119 int_ctl: 0x01020000 event_inj: 0x00000000
          npt: on
        kvm_nested_intercepts: cr_read: 0000 cr_write: 0010 excp: 40060002
          intercepts: fd44bfeb 0000217f 00000000
        kvm_entry:            vcpu 0, rip 0xffffffffffbbe119
        kvm_exit:             reason EXIT_NPF rip 0xffffffffffbbe119 info
          200000006 1ab000
        kvm_nested_vmexit:    vcpu 0 reason npf rip 0xffffffffffbbe119 info1
          0x0000000200000006 info2 0x00000000001ab000 intr_info 0x00000000
          error_code 0x00000000
        kvm_page_fault:       address 1ab000 error_code 6
        kvm_nested_vmexit_inject: reason EXIT_NPF info1 200000006 info2 1ab000
          int_info 0 int_info_err 0
        kvm_entry:            vcpu 0, rip 0x7ffbb280
        kvm_exit:             reason EXIT_EXCP_GP rip 0x7ffbb280 info 0 0
        kvm_emulate_insn:     0:7ffbb280: 0f aa
        kvm_inj_exception:    #GP (0x0)
      
      Note: return to L2 succeeded but upon first exit to L1 its RIP points to
      'RSM' instruction but we're not in SMM.
      
      The problem appears to be that VMCB01 gets irreversibly destroyed during
      SMM execution. Previously, we used to have 'hsave' VMCB where regular
      (pre-SMM) L1's state was saved upon nested_svm_vmexit() but now we just
      switch to VMCB01 from VMCB02.
      
      Pre-split (working) flow looked like:
      - SMM is triggered during L2's execution
      - L2's state is pushed to SMRAM
      - nested_svm_vmexit() restores L1's state from 'hsave'
      - SMM -> RSM
      - enter_svm_guest_mode() switches to L2 but keeps 'hsave' intact so we have
        pre-SMM (and pre L2 VMRUN) L1's state there
      - L2's state is restored from SMRAM
      - upon first exit L1's state is restored from L1.
      
      This was always broken with regards to svm_get_nested_state()/
      svm_set_nested_state(): 'hsave' was never a part of what's being
      save and restored so migration happening during SMM triggered from L2 would
      never restore L1's state correctly.
      
      Post-split flow (broken) looks like:
      - SMM is triggered during L2's execution
      - L2's state is pushed to SMRAM
      - nested_svm_vmexit() switches to VMCB01 from VMCB02
      - SMM -> RSM
      - enter_svm_guest_mode() switches from VMCB01 to VMCB02 but pre-SMM VMCB01
        is already lost.
      - L2's state is restored from SMRAM
      - upon first exit L1's state is restored from VMCB01 but it is corrupted
       (reflects the state during 'RSM' execution).
      
      VMX doesn't have this problem because unlike VMCB, VMCS keeps both guest
      and host state so when we switch back to VMCS02 L1's state is intact there.
      
      To resolve the issue we need to save L1's state somewhere. We could've
      created a third VMCB for SMM but that would require us to modify saved
      state format. L1's architectural HSAVE area (pointed by MSR_VM_HSAVE_PA)
      seems appropriate: L0 is free to save any (or none) of L1's state there.
      Currently, KVM does 'none'.
      
      Note, for nested state migration to succeed, both source and destination
      hypervisors must have the fix. We, however, don't need to create a new
      flag indicating the fact that HSAVE area is now populated as migration
      during SMM triggered from L2 was always broken.
      
      Fixes: 4995a368 ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      37be407b