1. 02 4月, 2022 9 次提交
    • D
      KVM: Remove dirty handling from gfn_to_pfn_cache completely · cf1d88b3
      David Woodhouse 提交于
      It isn't OK to cache the dirty status of a page in internal structures
      for an indefinite period of time.
      
      Any time a vCPU exits the run loop to userspace might be its last; the
      VMM might do its final check of the dirty log, flush the last remaining
      dirty pages to the destination and complete a live migration. If we
      have internal 'dirty' state which doesn't get flushed until the vCPU
      is finally destroyed on the source after migration is complete, then
      we have lost data because that will escape the final copy.
      
      This problem already exists with the use of kvm_vcpu_unmap() to mark
      pages dirty in e.g. VMX nesting.
      
      Note that the actual Linux MM already considers the page to be dirty
      since we have a writeable mapping of it. This is just about the KVM
      dirty logging.
      
      For the nesting-style use cases (KVM_GUEST_USES_PFN) we will need to
      track which gfn_to_pfn_caches have been used and explicitly mark the
      corresponding pages dirty before returning to userspace. But we would
      have needed external tracking of that anyway, rather than walking the
      full list of GPCs to find those belonging to this vCPU which are dirty.
      
      So let's rely *solely* on that external tracking, and keep it simple
      rather than laying a tempting trap for callers to fall into.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20220303154127.202856-3-dwmw2@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cf1d88b3
    • S
      KVM: Use enum to track if cached PFN will be used in guest and/or host · d0d96121
      Sean Christopherson 提交于
      Replace the guest_uses_pa and kernel_map booleans in the PFN cache code
      with a unified enum/bitmask. Using explicit names makes it easier to
      review and audit call sites.
      
      Opportunistically add a WARN to prevent passing garbage; instantating a
      cache without declaring its usage is either buggy or pointless.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20220303154127.202856-2-dwmw2@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d0d96121
    • P
      KVM: SVM: Fix kvm_cache_regs.h inclusions for is_guest_mode() · 4a9e7b9e
      Peter Gonda 提交于
      Include kvm_cache_regs.h to pick up the definition of is_guest_mode(),
      which is referenced by nested_svm_virtualize_tpr() in svm.h. Remove
      include from svm_onhpyerv.c which was done only because of lack of
      include in svm.h.
      
      Fixes: 883b0a91 ("KVM: SVM: Move Nested SVM Implementation to nested.c")
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NPeter Gonda <pgonda@google.com>
      Message-Id: <20220304161032.2270688-1-pgonda@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4a9e7b9e
    • J
      KVM: x86/pmu: Use different raw event masks for AMD and Intel · 95b065bf
      Jim Mattson 提交于
      The third nybble of AMD's event select overlaps with Intel's IN_TX and
      IN_TXCP bits. Therefore, we can't use AMD64_RAW_EVENT_MASK on Intel
      platforms that support TSX.
      
      Declare a raw_event_mask in the kvm_pmu structure, initialize it in
      the vendor-specific pmu_refresh() functions, and use that mask for
      PERF_TYPE_RAW configurations in reprogram_gp_counter().
      
      Fixes: 710c4765 ("KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW")
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Message-Id: <20220308012452.3468611-1-jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      95b065bf
    • S
      KVM: Don't actually set a request when evicting vCPUs for GFN cache invd · df06dae3
      Sean Christopherson 提交于
      Don't actually set a request bit in vcpu->requests when making a request
      purely to force a vCPU to exit the guest.  Logging a request but not
      actually consuming it would cause the vCPU to get stuck in an infinite
      loop during KVM_RUN because KVM would see the pending request and bail
      from VM-Enter to service the request.
      
      Note, it's currently impossible for KVM to set KVM_REQ_GPC_INVALIDATE as
      nothing in KVM is wired up to set guest_uses_pa=true.  But, it'd be all
      too easy for arch code to introduce use of kvm_gfn_to_pfn_cache_init()
      without implementing handling of the request, especially since getting
      test coverage of MMU notifier interaction with specific KVM features
      usually requires a directed test.
      
      Opportunistically rename gfn_to_pfn_cache_invalidate_start()'s wake_vcpus
      to evict_vcpus.  The purpose of the request is to get vCPUs out of guest
      mode, it's supposed to _avoid_ waking vCPUs that are blocking.
      
      Opportunistically rename KVM_REQ_GPC_INVALIDATE to be more specific as to
      what it wants to accomplish, and to genericize the name so that it can
      used for similar but unrelated scenarios, should they arise in the future.
      Add a comment and documentation to explain why the "no action" request
      exists.
      
      Add compile-time assertions to help detect improper usage.  Use the inner
      assertless helper in the one s390 path that makes requests without a
      hardcoded request.
      
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220223165302.3205276-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      df06dae3
    • D
      KVM: avoid double put_page with gfn-to-pfn cache · 79593c08
      David Woodhouse 提交于
      If the cache's user host virtual address becomes invalid, there
      is still a path from kvm_gfn_to_pfn_cache_refresh() where __release_gpc()
      could release the pfn but the gpc->pfn field has not been overwritten
      with an error value.  If this happens, kvm_gfn_to_pfn_cache_unmap will
      call put_page again on the same page.
      
      Cc: stable@vger.kernel.org
      Fixes: 982ed0de ("KVM: Reinstate gfn_to_pfn_cache with invalidation support")
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      79593c08
    • S
      KVM: x86/mmu: Zap only TDP MMU leafs in zap range and mmu_notifier unmap · f47e5bbb
      Sean Christopherson 提交于
      Re-introduce zapping only leaf SPTEs in kvm_zap_gfn_range() and
      kvm_tdp_mmu_unmap_gfn_range(), this time without losing a pending TLB
      flush when processing multiple roots (including nested TDP shadow roots).
      Dropping the TLB flush resulted in random crashes when running Hyper-V
      Server 2019 in a guest with KSM enabled in the host (or any source of
      mmu_notifier invalidations, KSM is just the easiest to force).
      
      This effectively revert commits 873dd122
      and fcb93eb6, and thus restores commit
      cf3e2642, plus this delta on top:
      
      bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end,
              struct kvm_mmu_page *root;
      
              for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
      -               flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, false);
      +               flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush);
      
              return flush;
       }
      
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220325230348.2587437-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f47e5bbb
    • Y
      KVM: SVM: fix panic on out-of-bounds guest IRQ · a80ced6e
      Yi Wang 提交于
      As guest_irq is coming from KVM_IRQFD API call, it may trigger
      crash in svm_update_pi_irte() due to out-of-bounds:
      
      crash> bt
      PID: 22218  TASK: ffff951a6ad74980  CPU: 73  COMMAND: "vcpu8"
       #0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397
       #1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d
       #2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d
       #3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d
       #4 [ffffb1ba6707fb90] no_context at ffffffff856692c9
       #5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51
       #6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace
          [exception RIP: svm_update_pi_irte+227]
          RIP: ffffffffc0761b53  RSP: ffffb1ba6707fd08  RFLAGS: 00010086
          RAX: ffffb1ba6707fd78  RBX: ffffb1ba66d91000  RCX: 0000000000000001
          RDX: 00003c803f63f1c0  RSI: 000000000000019a  RDI: ffffb1ba66db2ab8
          RBP: 000000000000019a   R8: 0000000000000040   R9: ffff94ca41b82200
          R10: ffffffffffffffcf  R11: 0000000000000001  R12: 0000000000000001
          R13: 0000000000000001  R14: ffffffffffffffcf  R15: 000000000000005f
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm]
       #8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm]
       #9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm]
          RIP: 00007f143c36488b  RSP: 00007f143a4e04b8  RFLAGS: 00000246
          RAX: ffffffffffffffda  RBX: 00007f05780041d0  RCX: 00007f143c36488b
          RDX: 00007f05780041d0  RSI: 000000004008ae6a  RDI: 0000000000000020
          RBP: 00000000000004e8   R8: 0000000000000008   R9: 00007f05780041e0
          R10: 00007f0578004560  R11: 0000000000000246  R12: 00000000000004e0
          R13: 000000000000001a  R14: 00007f1424001c60  R15: 00007f0578003bc0
          ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b
      
      Vmx have been fix this in commit 3a8b0677 (KVM: VMX: Do not BUG() on
      out-of-bounds guest IRQ), so we can just copy source from that to fix
      this.
      Co-developed-by: NYi Liu <liu.yi24@zte.com.cn>
      Signed-off-by: NYi Liu <liu.yi24@zte.com.cn>
      Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
      Message-Id: <20220309113025.44469-1-wang.yi59@zte.com.cn>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a80ced6e
    • P
      KVM: MMU: propagate alloc_workqueue failure · a1a39128
      Paolo Bonzini 提交于
      If kvm->arch.tdp_mmu_zap_wq cannot be created, the failure has
      to be propagated up to kvm_mmu_init_vm and kvm_arch_init_vm.
      kvm_arch_init_vm also has to undo all the initialization, so
      group all the MMU initialization code at the beginning and
      handle cleaning up of kvm_page_track_init.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a1a39128
  2. 30 3月, 2022 11 次提交
  3. 21 3月, 2022 8 次提交
  4. 19 3月, 2022 1 次提交
    • P
      Merge tag 'kvmarm-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD · 714797c9
      Paolo Bonzini 提交于
      KVM/arm64 updates for 5.18
      
      - Proper emulation of the OSLock feature of the debug architecture
      
      - Scalibility improvements for the MMU lock when dirty logging is on
      
      - New VMID allocator, which will eventually help with SVA in VMs
      
      - Better support for PMUs in heterogenous systems
      
      - PSCI 1.1 support, enabling support for SYSTEM_RESET2
      
      - Implement CONFIG_DEBUG_LIST at EL2
      
      - Make CONFIG_ARM64_ERRATUM_2077057 default y
      
      - Reduce the overhead of VM exit when no interrupt is pending
      
      - Remove traces of 32bit ARM host support from the documentation
      
      - Updated vgic selftests
      
      - Various cleanups, doc updates and spelling fixes
      714797c9
  5. 18 3月, 2022 2 次提交
  6. 16 3月, 2022 2 次提交
  7. 14 3月, 2022 6 次提交
  8. 11 3月, 2022 1 次提交