1. 25 6月, 2021 8 次提交
    • S
      KVM: x86: Properly reset MMU context at vCPU RESET/INIT · 0aa18375
      Sean Christopherson 提交于
      Reset the MMU context at vCPU INIT (and RESET for good measure) if CR0.PG
      was set prior to INIT.  Simply re-initializing the current MMU is not
      sufficient as the current root HPA may not be usable in the new context.
      E.g. if TDP is disabled and INIT arrives while the vCPU is in long mode,
      KVM will fail to switch to the 32-bit pae_root and bomb on the next
      VM-Enter due to running with a 64-bit CR3 in 32-bit mode.
      
      This bug was papered over in both VMX and SVM, but still managed to rear
      its head in the MMU role on VMX.  Because EFER.LMA=1 requires CR0.PG=1,
      kvm_calc_shadow_mmu_root_page_role() checks for EFER.LMA without first
      checking CR0.PG.  VMX's RESET/INIT flow writes CR0 before EFER, and so
      an INIT with the vCPU in 64-bit mode will cause the hack-a-fix to
      generate the wrong MMU role.
      
      In VMX, the INIT issue is specific to running without unrestricted guest
      since unrestricted guest is available if and only if EPT is enabled.
      Commit 8668a3c4 ("KVM: VMX: Reset mmu context when entering real
      mode") resolved the issue by forcing a reset when entering emulated real
      mode.
      
      In SVM, commit ebae871a ("kvm: svm: reset mmu on VCPU reset") forced
      a MMU reset on every INIT to workaround the flaw in common x86.  Note, at
      the time the bug was fixed, the SVM problem was exacerbated by a complete
      lack of a CR4 update.
      
      The vendor resets will be reverted in future patches, primarily to aid
      bisection in case there are non-INIT flows that rely on the existing VMX
      logic.
      
      Because CR0.PG is unconditionally cleared on INIT, and because CR0.WP and
      all CR4/EFER paging bits are ignored if CR0.PG=0, simply checking that
      CR0.PG was '1' prior to INIT/RESET is sufficient to detect a required MMU
      context reset.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0aa18375
    • S
      KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs · 112022bd
      Sean Christopherson 提交于
      Mark NX as being used for all non-nested shadow MMUs, as KVM will set the
      NX bit for huge SPTEs if the iTLB mutli-hit mitigation is enabled.
      Checking the mitigation itself is not sufficient as it can be toggled on
      at any time and KVM doesn't reset MMU contexts when that happens.  KVM
      could reset the contexts, but that would require purging all SPTEs in all
      MMUs, for no real benefit.  And, KVM already forces EFER.NX=1 when TDP is
      disabled (for WP=0, SMEP=1, NX=0), so technically NX is never reserved
      for shadow MMUs.
      
      Fixes: b8e8c830 ("kvm: mmu: ITLB_MULTIHIT mitigation")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      112022bd
    • S
      KVM: x86/mmu: Remove broken WARN that fires on 32-bit KVM w/ nested EPT · f0d43790
      Sean Christopherson 提交于
      Remove a misguided WARN that attempts to detect the scenario where using
      a special A/D tracking flag will set reserved bits on a non-MMIO spte.
      The WARN triggers false positives when using EPT with 32-bit KVM because
      of the !64-bit clause, which is just flat out wrong.  The whole A/D
      tracking goo is specific to EPT, and one of the big selling points of EPT
      is that EPT is decoupled from the host's native paging mode.
      
      Drop the WARN instead of trying to salvage the check.  Keeping a check
      specific to A/D tracking bits would essentially regurgitate the same code
      that led to KVM needed the tracking bits in the first place.
      
      A better approach would be to add a generic WARN on reserved bits being
      set, which would naturally cover the A/D tracking bits, work for all
      flavors of paging, and be self-documenting to some extent.
      
      Fixes: 8a406c89 ("KVM: x86/mmu: Rename and document A/D scheme for TDP SPTEs")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f0d43790
    • J
      KVM: debugfs: Reuse binary stats descriptors · bc9e9e67
      Jing Zhang 提交于
      To remove code duplication, use the binary stats descriptors in the
      implementation of the debugfs interface for statistics. This unifies
      the definition of statistics for the binary and debugfs interfaces.
      Signed-off-by: NJing Zhang <jingzhangos@google.com>
      Message-Id: <20210618222709.1858088-8-jingzhangos@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bc9e9e67
    • J
      KVM: selftests: Add selftest for KVM statistics data binary interface · 0b45d587
      Jing Zhang 提交于
      Add selftest to check KVM stats descriptors validity.
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NRicardo Koller <ricarkol@google.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Tested-by: Fuad Tabba <tabba@google.com> #arm64
      Signed-off-by: NJing Zhang <jingzhangos@google.com>
      Message-Id: <20210618222709.1858088-7-jingzhangos@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0b45d587
    • J
      KVM: stats: Add documentation for binary statistics interface · fdc09ddd
      Jing Zhang 提交于
      This new API provides a file descriptor for every VM and VCPU to read
      KVM statistics data in binary format.
      It is meant to provide a lightweight, flexible, scalable and efficient
      lock-free solution for user space telemetry applications to pull the
      statistics data periodically for large scale systems. The pulling
      frequency could be as high as a few times per second.
      The statistics descriptors are defined by KVM in kernel and can be
      by userspace to discover VM/VCPU statistics during the one-time setup
      stage.
      The statistics data itself could be read out by userspace telemetry
      periodically without any extra parsing or setup effort.
      There are a few existed interface protocols and definitions, but no
      one can fulfil all the requirements this interface implemented as
      below:
      1. During high frequency periodic stats reading, there should be no
         extra efforts except the stats data read itself.
      2. Support stats annotation, like type (cumulative, instantaneous,
         peak, histogram, etc) and unit (counter, time, size, cycles, etc).
      3. The stats data reading should be free of lock/synchronization. We
         don't care about the consistency between all the stats data. All
         stats data can not be read out at exactly the same time. We really
         care about the change or trend of the stats data. The lock-free
         solution is not just for efficiency and scalability, also for the
         stats data accuracy and usability. For example, in the situation
         that all the stats data readings are protected by a global lock,
         if one VCPU died somehow with that lock held, then all stats data
         reading would be blocked, then we have no way from stats data that
         which VCPU has died.
      4. The stats data reading workload can be handed over to other
         unprivileged process.
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NRicardo Koller <ricarkol@google.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: NFuad Tabba <tabba@google.com>
      Signed-off-by: NJing Zhang <jingzhangos@google.com>
      Message-Id: <20210618222709.1858088-6-jingzhangos@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fdc09ddd
    • J
      KVM: stats: Support binary stats retrieval for a VCPU · ce55c049
      Jing Zhang 提交于
      Add a VCPU ioctl to get a statistics file descriptor by which a read
      functionality is provided for userspace to read out VCPU stats header,
      descriptors and data.
      Define VCPU statistics descriptors and header for all architectures.
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NRicardo Koller <ricarkol@google.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: NFuad Tabba <tabba@google.com>
      Tested-by: Fuad Tabba <tabba@google.com> #arm64
      Signed-off-by: NJing Zhang <jingzhangos@google.com>
      Message-Id: <20210618222709.1858088-5-jingzhangos@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ce55c049
    • J
      KVM: stats: Support binary stats retrieval for a VM · fcfe1bae
      Jing Zhang 提交于
      Add a VM ioctl to get a statistics file descriptor by which a read
      functionality is provided for userspace to read out VM stats header,
      descriptors and data.
      Define VM statistics descriptors and header for all architectures.
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NRicardo Koller <ricarkol@google.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: NFuad Tabba <tabba@google.com>
      Tested-by: Fuad Tabba <tabba@google.com> #arm64
      Signed-off-by: NJing Zhang <jingzhangos@google.com>
      Message-Id: <20210618222709.1858088-4-jingzhangos@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fcfe1bae
  2. 24 6月, 2021 26 次提交
  3. 23 6月, 2021 1 次提交
  4. 22 6月, 2021 5 次提交