1. 03 7月, 2019 1 次提交
  2. 21 6月, 2019 1 次提交
    • P
      KVM: nVMX: reorganize initial steps of vmx_set_nested_state · 9fd58877
      Paolo Bonzini 提交于
      Commit 332d0797 ("KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS
      state before setting new state", 2019-05-02) broke evmcs_test because the
      eVMCS setup must be performed even if there is no VMXON region defined,
      as long as the eVMCS bit is set in the assist page.
      
      While the simplest possible fix would be to add a check on
      kvm_state->flags & KVM_STATE_NESTED_EVMCS in the initial "if" that
      covers kvm_state->hdr.vmx.vmxon_pa == -1ull, that is quite ugly.
      
      Instead, this patch moves checks earlier in the function and
      conditionalizes them on kvm_state->hdr.vmx.vmxon_pa, so that
      vmx_set_nested_state always goes through vmx_leave_nested
      and nested_enable_evmcs.
      
      Fixes: 332d0797 ("KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state")
      Cc: Aaron Lewis <aaronlewis@google.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9fd58877
  3. 19 6月, 2019 1 次提交
    • L
      KVM: x86: Modify struct kvm_nested_state to have explicit fields for data · 6ca00dfa
      Liran Alon 提交于
      Improve the KVM_{GET,SET}_NESTED_STATE structs by detailing the format
      of VMX nested state data in a struct.
      
      In order to avoid changing the ioctl values of
      KVM_{GET,SET}_NESTED_STATE, there is a need to preserve
      sizeof(struct kvm_nested_state). This is done by defining the data
      struct as "data.vmx[0]". It was the most elegant way I found to
      preserve struct size while still keeping struct readable and easy to
      maintain. It does have a misfortunate side-effect that now it has to be
      accessed as "data.vmx[0]" rather than just "data.vmx".
      
      Because we are already modifying these structs, I also modified the
      following:
      * Define the "format" field values as macros.
      * Rename vmcs_pa to vmcs12_pa for better readability.
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      [Remove SVM stubs, add KVM_STATE_NESTED_VMX_VMCS12_SIZE. - Paolo]
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6ca00dfa
  4. 13 6月, 2019 1 次提交
  5. 25 5月, 2019 3 次提交
  6. 16 5月, 2019 1 次提交
  7. 08 5月, 2019 2 次提交
  8. 01 5月, 2019 11 次提交
  9. 16 4月, 2019 8 次提交
  10. 06 4月, 2019 2 次提交
    • M
      KVM: x86: nVMX: fix x2APIC VTPR read intercept · c73f4c99
      Marc Orr 提交于
      Referring to the "VIRTUALIZING MSR-BASED APIC ACCESSES" chapter of the
      SDM, when "virtualize x2APIC mode" is 1 and "APIC-register
      virtualization" is 0, a RDMSR of 808H should return the VTPR from the
      virtual APIC page.
      
      However, for nested, KVM currently fails to disable the read intercept
      for this MSR. This means that a RDMSR exit takes precedence over
      "virtualize x2APIC mode", and KVM passes through L1's TPR to L2,
      instead of sourcing the value from L2's virtual APIC page.
      
      This patch fixes the issue by disabling the read intercept, in VMCS02,
      for the VTPR when "APIC-register virtualization" is 0.
      
      The issue described above and fix prescribed here, were verified with
      a related patch in kvm-unit-tests titled "Test VMX's virtualize x2APIC
      mode w/ nested".
      Signed-off-by: NMarc Orr <marcorr@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Fixes: c992384b ("KVM: vmx: speed up MSR bitmap merge")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c73f4c99
    • M
      KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887) · acff7847
      Marc Orr 提交于
      The nested_vmx_prepare_msr_bitmap() function doesn't directly guard the
      x2APIC MSR intercepts with the "virtualize x2APIC mode" MSR. As a
      result, we discovered the potential for a buggy or malicious L1 to get
      access to L0's x2APIC MSRs, via an L2, as follows.
      
      1. L1 executes WRMSR(IA32_SPEC_CTRL, 1). This causes the spec_ctrl
      variable, in nested_vmx_prepare_msr_bitmap() to become true.
      2. L1 disables "virtualize x2APIC mode" in VMCS12.
      3. L1 enables "APIC-register virtualization" in VMCS12.
      
      Now, KVM will set VMCS02's x2APIC MSR intercepts from VMCS12, and then
      set "virtualize x2APIC mode" to 0 in VMCS02. Oops.
      
      This patch closes the leak by explicitly guarding VMCS02's x2APIC MSR
      intercepts with VMCS12's "virtualize x2APIC mode" control.
      
      The scenario outlined above and fix prescribed here, were verified with
      a related patch in kvm-unit-tests titled "Add leak scenario to
      virt_x2apic_mode_test".
      
      Note, it looks like this issue may have been introduced inadvertently
      during a merge---see 15303ba5.
      Signed-off-by: NMarc Orr <marcorr@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      acff7847
  11. 29 3月, 2019 1 次提交
  12. 21 2月, 2019 7 次提交
    • B
      kvm: vmx: Add memcg accounting to KVM allocations · 41836839
      Ben Gardon 提交于
      There are many KVM kernel memory allocations which are tied to the life of
      the VM process and should be charged to the VM process's cgroup. If the
      allocations aren't tied to the process, the OOM killer will not know
      that killing the process will free the associated kernel memory.
      Add __GFP_ACCOUNT flags to many of the allocations which are not yet being
      charged to the VM process's cgroup.
      
      Tested:
      	Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch
      	introduced no new failures.
      	Ran a kernel memory accounting test which creates a VM to touch
      	memory and then checks that the kernel memory allocated for the
      	process is within certain bounds.
      	With this patch we account for much more of the vmalloc and slab memory
      	allocated for the VM.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      41836839
    • P
      KVM: nVMX: do not start the preemption timer hrtimer unnecessarily · 359a6c3d
      Paolo Bonzini 提交于
      The preemption timer can be started even if there is a vmentry
      failure during or after loading guest state.  That is pointless,
      move the call after all conditions have been checked.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      359a6c3d
    • P
      KVM: x86: cleanup freeing of nested state · b4b65b56
      Paolo Bonzini 提交于
      Ensure that the VCPU free path goes through vmx_leave_nested and
      thus nested_vmx_vmexit, so that the cancellation of the timer does
      not have to be in free_nested.  In addition, because some paths through
      nested_vmx_vmexit do not go through sync_vmcs12, the cancellation of
      the timer is moved there.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b4b65b56
    • P
      KVM: nVMX: remove useless is_protmode check · e0dfacbf
      Paolo Bonzini 提交于
      VMX is only accessible in protected mode, remove a confusing check
      that causes the conditional to lack a final "else" branch.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e0dfacbf
    • S
      KVM: nVMX: Ignore limit checks on VMX instructions using flat segments · 34333cc6
      Sean Christopherson 提交于
      Regarding segments with a limit==0xffffffff, the SDM officially states:
      
          When the effective limit is FFFFFFFFH (4 GBytes), these accesses may
          or may not cause the indicated exceptions.  Behavior is
          implementation-specific and may vary from one execution to another.
      
      In practice, all CPUs that support VMX ignore limit checks for "flat
      segments", i.e. an expand-up data or code segment with base=0 and
      limit=0xffffffff.  This is subtly different than wrapping the effective
      address calculation based on the address size, as the flat segment
      behavior also applies to accesses that would wrap the 4g boundary, e.g.
      a 4-byte access starting at 0xffffffff will access linear addresses
      0xffffffff, 0x0, 0x1 and 0x2.
      
      Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      34333cc6
    • S
      KVM: nVMX: Apply addr size mask to effective address for VMX instructions · 8570f9e8
      Sean Christopherson 提交于
      The address size of an instruction affects the effective address, not
      the virtual/linear address.  The final address may still be truncated,
      e.g. to 32-bits outside of long mode, but that happens irrespective of
      the address size, e.g. a 32-bit address size can yield a 64-bit virtual
      address when using FS/GS with a non-zero base.
      
      Fixes: 064aea77 ("KVM: nVMX: Decoding memory operands of VMX instructions")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8570f9e8
    • S
      KVM: nVMX: Sign extend displacements of VMX instr's mem operands · 946c522b
      Sean Christopherson 提交于
      The VMCS.EXIT_QUALIFCATION field reports the displacements of memory
      operands for various instructions, including VMX instructions, as a
      naturally sized unsigned value, but masks the value by the addr size,
      e.g. given a ModRM encoded as -0x28(%ebp), the -0x28 displacement is
      reported as 0xffffffd8 for a 32-bit address size.  Despite some weird
      wording regarding sign extension, the SDM explicitly states that bits
      beyond the instructions address size are undefined:
      
          In all cases, bits of this field beyond the instruction’s address
          size are undefined.
      
      Failure to sign extend the displacement results in KVM incorrectly
      treating a negative displacement as a large positive displacement when
      the address size of the VMX instruction is smaller than KVM's native
      size, e.g. a 32-bit address size on a 64-bit KVM.
      
      The very original decoding, added by commit 064aea77 ("KVM: nVMX:
      Decoding memory operands of VMX instructions"), sort of modeled sign
      extension by truncating the final virtual/linear address for a 32-bit
      address size.  I.e. it messed up the effective address but made it work
      by adjusting the final address.
      
      When segmentation checks were added, the truncation logic was kept
      as-is and no sign extension logic was introduced.  In other words, it
      kept calculating the wrong effective address while mostly generating
      the correct virtual/linear address.  As the effective address is what's
      used in the segment limit checks, this results in KVM incorreclty
      injecting #GP/#SS faults due to non-existent segment violations when
      a nested VMM uses negative displacements with an address size smaller
      than KVM's native address size.
      
      Using the -0x28(%ebp) example, an EBP value of 0x1000 will result in
      KVM using 0x100000fd8 as the effective address when checking for a
      segment limit violation.  This causes a 100% failure rate when running
      a 32-bit KVM build as L1 on top of a 64-bit KVM L0.
      
      Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      946c522b
  13. 14 2月, 2019 1 次提交