1. 22 10月, 2021 21 次提交
  2. 19 10月, 2021 6 次提交
    • O
      KVM: x86: Expose TSC offset controls to userspace · 828ca896
      Oliver Upton 提交于
      To date, VMM-directed TSC synchronization and migration has been a bit
      messy. KVM has some baked-in heuristics around TSC writes to infer if
      the VMM is attempting to synchronize. This is problematic, as it depends
      on host userspace writing to the guest's TSC within 1 second of the last
      write.
      
      A much cleaner approach to configuring the guest's views of the TSC is to
      simply migrate the TSC offset for every vCPU. Offsets are idempotent,
      and thus not subject to change depending on when the VMM actually
      reads/writes values from/to KVM. The VMM can then read the TSC once with
      KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
      the guest is paused.
      
      Cc: David Matlack <dmatlack@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: NOliver Upton <oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20210916181538.968978-8-oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      828ca896
    • O
      KVM: x86: Refactor tsc synchronization code · 58d4277b
      Oliver Upton 提交于
      Refactor kvm_synchronize_tsc to make a new function that allows callers
      to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
      for the sake of participating in TSC synchronization.
      Signed-off-by: NOliver Upton <oupton@google.com>
      Message-Id: <20210916181538.968978-7-oupton@google.com>
      [Make sure kvm->arch.cur_tsc_generation and vcpu->arch.this_tsc_generation are
       equal at the end of __kvm_synchronize_tsc, if matched is false. Reported by
       Maxim Levitsky. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      58d4277b
    • P
      kvm: x86: protect masterclock with a seqcount · 869b4421
      Paolo Bonzini 提交于
      Protect the reference point for kvmclock with a seqcount, so that
      kvmclock updates for all vCPUs can proceed in parallel.  Xen runstate
      updates will also run in parallel and not bounce the kvmclock cacheline.
      
      Of the variables that were protected by pvclock_gtod_sync_lock,
      nr_vcpus_matched_tsc is different because it is updated outside
      pvclock_update_vm_gtod_copy and read inside it.  Therefore, we
      need to keep it protected by a spinlock.  In fact it must now
      be a raw spinlock, because pvclock_update_vm_gtod_copy, being the
      write-side of a seqcount, is non-preemptible.  Since we already
      have tsc_write_lock which is a raw spinlock, we can just use
      tsc_write_lock as the lock that protects the write-side of the
      seqcount.
      Co-developed-by: NOliver Upton <oupton@google.com>
      Message-Id: <20210916181538.968978-6-oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      869b4421
    • O
      KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK · c68dc1b5
      Oliver Upton 提交于
      Handling the migration of TSCs correctly is difficult, in part because
      Linux does not provide userspace with the ability to retrieve a (TSC,
      realtime) clock pair for a single instant in time. In lieu of a more
      convenient facility, KVM can report similar information in the kvm_clock
      structure.
      
      Provide userspace with a host TSC & realtime pair iff the realtime clock
      is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
      realtime value, advance the KVM clock by the amount of elapsed time. Do
      not step the KVM clock backwards, though, as it is a monotonic
      oscillator.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NOliver Upton <oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20210916181538.968978-5-oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c68dc1b5
    • P
      KVM: x86: avoid warning with -Wbitwise-instead-of-logical · 3d5e7a28
      Paolo Bonzini 提交于
      This is a new warning in clang top-of-tree (will be clang 14):
      
      In file included from arch/x86/kvm/mmu/mmu.c:27:
      arch/x86/kvm/mmu/spte.h:318:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
              return __is_bad_mt_xwr(rsvd_check, spte) |
                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                       ||
      arch/x86/kvm/mmu/spte.h:318:9: note: cast one or both operands to int to silence this warning
      
      The code is fine, but change it anyway to shut up this clever clogs
      of a compiler.
      
      Reported-by: torvic9@mailbox.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3d5e7a28
    • P
      KVM: X86: fix lazy allocation of rmaps · fa13843d
      Paolo Bonzini 提交于
      If allocation of rmaps fails, but some of the pointers have already been written,
      those pointers can be cleaned up when the memslot is freed, or even reused later
      for another attempt at allocating the rmaps.  Therefore there is no need to
      WARN, as done for example in memslot_rmap_alloc, but the allocation *must* be
      skipped lest KVM will overwrite the previous pointer and will indeed leak memory.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fa13843d
  3. 18 10月, 2021 1 次提交
  4. 15 10月, 2021 1 次提交
  5. 01 10月, 2021 11 次提交
    • D
      KVM: x86: only allocate gfn_track when necessary · deae4a10
      David Stevens 提交于
      Avoid allocating the gfn_track arrays if nothing needs them. If there
      are no external to KVM users of the API (i.e. no GVT-g), then page
      tracking is only needed for shadow page tables. This means that when tdp
      is enabled and there are no external users, then the gfn_track arrays
      can be lazily allocated when the shadow MMU is actually used. This avoid
      allocations equal to .05% of guest memory when nested virtualization is
      not used, if the kernel is compiled without GVT-g.
      Signed-off-by: NDavid Stevens <stevensd@chromium.org>
      Message-Id: <20210922045859.2011227-3-stevensd@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      deae4a10
    • D
      KVM: x86: add config for non-kvm users of page tracking · e9d0c0c4
      David Stevens 提交于
      Add a config option that allows kvm to determine whether or not there
      are any external users of page tracking.
      Signed-off-by: NDavid Stevens <stevensd@chromium.org>
      Message-Id: <20210922045859.2011227-2-stevensd@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e9d0c0c4
    • K
      nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB · 174a921b
      Krish Sadhukhan 提交于
      According to section "TLB Flush" in APM vol 2,
      
          "Support for TLB_CONTROL commands other than the first two, is
           optional and is indicated by CPUID Fn8000_000A_EDX[FlushByAsid].
      
           All encodings of TLB_CONTROL not defined in the APM are reserved."
      Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Message-Id: <20210920235134.101970-3-krish.sadhukhan@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      174a921b
    • J
      kvm: use kvfree() in kvm_arch_free_vm() · 78b497f2
      Juergen Gross 提交于
      By switching from kfree() to kvfree() in kvm_arch_free_vm() Arm64 can
      use the common variant. This can be accomplished by adding another
      macro __KVM_HAVE_ARCH_VM_FREE, which will be used only by x86 for now.
      
      Further simplification can be achieved by adding __kvm_arch_free_vm()
      doing the common part.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Message-Id: <20210903130808.30142-5-jgross@suse.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      78b497f2
    • B
      KVM: x86: Expose Predictive Store Forwarding Disable · b73a5432
      Babu Moger 提交于
      Predictive Store Forwarding: AMD Zen3 processors feature a new
      technology called Predictive Store Forwarding (PSF).
      
      PSF is a hardware-based micro-architectural optimization designed
      to improve the performance of code execution by predicting address
      dependencies between loads and stores.
      
      How PSF works:
      
      It is very common for a CPU to execute a load instruction to an address
      that was recently written by a store. Modern CPUs implement a technique
      known as Store-To-Load-Forwarding (STLF) to improve performance in such
      cases. With STLF, data from the store is forwarded directly to the load
      without having to wait for it to be written to memory. In a typical CPU,
      STLF occurs after the address of both the load and store are calculated
      and determined to match.
      
      PSF expands on this by speculating on the relationship between loads and
      stores without waiting for the address calculation to complete. With PSF,
      the CPU learns over time the relationship between loads and stores. If
      STLF typically occurs between a particular store and load, the CPU will
      remember this.
      
      In typical code, PSF provides a performance benefit by speculating on
      the load result and allowing later instructions to begin execution
      sooner than they otherwise would be able to.
      
      The details of security analysis of AMD predictive store forwarding is
      documented here.
      https://www.amd.com/system/files/documents/security-analysis-predictive-store-forwarding.pdf
      
      Predictive Store Forwarding controls:
      There are two hardware control bits which influence the PSF feature:
      - MSR 48h bit 2 – Speculative Store Bypass (SSBD)
      - MSR 48h bit 7 – Predictive Store Forwarding Disable (PSFD)
      
      The PSF feature is disabled if either of these bits are set.  These bits
      are controllable on a per-thread basis in an SMT system. By default, both
      SSBD and PSFD are 0 meaning that the speculation features are enabled.
      
      While the SSBD bit disables PSF and speculative store bypass, PSFD only
      disables PSF.
      
      PSFD may be desirable for software which is concerned with the
      speculative behavior of PSF but desires a smaller performance impact than
      setting SSBD.
      
      Support for PSFD is indicated in CPUID Fn8000_0008 EBX[28].
      All processors that support PSF will also support PSFD.
      
      Linux kernel does not have the interface to enable/disable PSFD yet. Plan
      here is to expose the PSFD technology to KVM so that the guest kernel can
      make use of it if they wish to.
      Signed-off-by: NBabu Moger <Babu.Moger@amd.com>
      Message-Id: <163244601049.30292.5855870305350227855.stgit@bmoger-ubuntu>
      [Keep feature private to KVM, as requested by Borislav Petkov. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b73a5432
    • D
      KVM: x86/mmu: Avoid memslot lookup in make_spte and mmu_try_to_unsync_pages · 53597858
      David Matlack 提交于
      mmu_try_to_unsync_pages checks if page tracking is active for the given
      gfn, which requires knowing the memslot. We can pass down the memslot
      via make_spte to avoid this lookup.
      
      The memslot is also handy for make_spte's marking of the gfn as dirty:
      we can test whether dirty page tracking is enabled, and if so ensure that
      pages are mapped as writable with 4K granularity.  Apart from the warning,
      no functional change is intended.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20210813203504.2742757-7-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      53597858
    • D
      KVM: x86/mmu: Avoid memslot lookup in rmap_add · 8a9f566a
      David Matlack 提交于
      Avoid the memslot lookup in rmap_add, by passing it down from the fault
      handling code to mmu_set_spte and then to rmap_add.
      
      No functional change intended.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20210813203504.2742757-6-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8a9f566a
    • P
      KVM: MMU: pass struct kvm_page_fault to mmu_set_spte · a12f4381
      Paolo Bonzini 提交于
      mmu_set_spte is called for either PTE prefetching or page faults.  The
      three boolean arguments write_fault, speculative and host_writable are
      always respectively false/true/true for prefetching and coming from
      a struct kvm_page_fault for page faults.
      
      Let mmu_set_spte distinguish these two situation by accepting a
      possibly NULL struct kvm_page_fault argument.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a12f4381
    • P
      KVM: MMU: pass kvm_mmu_page struct to make_spte · 7158bee4
      Paolo Bonzini 提交于
      The level and A/D bit support of the new SPTE can be found in the role,
      which is stored in the kvm_mmu_page struct.  This merges two arguments
      into one.
      
      For the TDP MMU, the kvm_mmu_page was not used (kvm_tdp_mmu_map does
      not use it if the SPTE is already present) so we fetch it just before
      calling make_spte.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7158bee4
    • P
      KVM: MMU: set ad_disabled in TDP MMU role · 87e888ea
      Paolo Bonzini 提交于
      Prepare for removing the ad_disabled argument of make_spte; instead it can
      be found in the role of a struct kvm_mmu_page.  First of all, the TDP MMU
      must set the role accurately.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      87e888ea
    • P
      KVM: MMU: remove unnecessary argument to mmu_set_spte · eb5cd7ff
      Paolo Bonzini 提交于
      The level of the new SPTE can be found in the kvm_mmu_page struct; there
      is no need to pass it down.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eb5cd7ff