1. 21 2月, 2019 36 次提交
    • S
      KVM: Explicitly define the "memslot update in-progress" bit · 361209e0
      Sean Christopherson 提交于
      KVM uses bit 0 of the memslots generation as an "update in-progress"
      flag, which is used by x86 to prevent caching MMIO access while the
      memslots are changing.  Although the intended behavior is flag-like,
      e.g. MMIO sptes intentionally drop the in-progress bit so as to avoid
      caching data from in-flux memslots, the implementation oftentimes treats
      the bit as part of the generation number itself, e.g. incrementing the
      generation increments twice, once to set the flag and once to clear it.
      
      Prior to commit 4bd518f1 ("KVM: use separate generations for
      each address space"), incorporating the "update in-progress" bit into
      the generation number largely made sense, e.g. "real" generations are
      even, "bogus" generations are odd, most code doesn't need to be aware of
      the bit, etc...
      
      Now that unique memslots generation numbers are assigned to each address
      space, stealthing the in-progress status into the generation number
      results in a wide variety of subtle code, e.g. kvm_create_vm() jumps
      over bit 0 when initializing the memslots generation without any hint as
      to why.
      
      Explicitly define the flag and convert as much code as possible (which
      isn't much) to actually treat it like a flag.  This paves the way for
      eventually using a different bit for "update in-progress" so that it can
      be a flag in truth instead of a awkward extension to the generation
      number.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      361209e0
    • S
      KVM: x86/mmu: Do not cache MMIO accesses while memslots are in flux · ddfd1730
      Sean Christopherson 提交于
      When installing new memslots, KVM sets bit 0 of the generation number to
      indicate that an update is in-progress.  Until the update is complete,
      there are no guarantees as to whether a vCPU will see the old or the new
      memslots.  Explicity prevent caching MMIO accesses so as to avoid using
      an access cached from the old memslots after the new memslots have been
      installed.
      
      Note that it is unclear whether or not disabling caching during the
      update window is strictly necessary as there is no definitive
      documentation as to what ordering guarantees KVM provides with respect
      to updating memslots.  That being said, the MMIO spte code does not
      allow reusing sptes created while an update is in-progress, and the
      associated documentation explicitly states:
      
          We do not want to use an MMIO sptes created with an odd generation
          number, ...  If KVM is unlucky and creates an MMIO spte while the
          low bit is 1, the next access to the spte will always be a cache miss.
      
      At the very least, disabling the per-vCPU MMIO cache during updates will
      make its behavior consistent with the MMIO spte behavior and
      documentation.
      
      Fixes: 56f17dd3 ("kvm: x86: fix stale mmio cache bug")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ddfd1730
    • S
      KVM: x86/mmu: Detect MMIO generation wrap in any address space · e1359e2b
      Sean Christopherson 提交于
      The check to detect a wrap of the MMIO generation explicitly looks for a
      generation number of zero.  Now that unique memslots generation numbers
      are assigned to each address space, only address space 0 will get a
      generation number of exactly zero when wrapping.  E.g. when address
      space 1 goes from 0x7fffe to 0x80002, the MMIO generation number will
      wrap to 0x2.  Adjust the MMIO generation to strip the address space
      modifier prior to checking for a wrap.
      
      Fixes: 4bd518f1 ("KVM: use separate generations for each address space")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e1359e2b
    • S
      KVM: Call kvm_arch_memslots_updated() before updating memslots · 15248258
      Sean Christopherson 提交于
      kvm_arch_memslots_updated() is at this point in time an x86-specific
      hook for handling MMIO generation wraparound.  x86 stashes 19 bits of
      the memslots generation number in its MMIO sptes in order to avoid
      full page fault walks for repeat faults on emulated MMIO addresses.
      Because only 19 bits are used, wrapping the MMIO generation number is
      possible, if unlikely.  kvm_arch_memslots_updated() alerts x86 that
      the generation has changed so that it can invalidate all MMIO sptes in
      case the effective MMIO generation has wrapped so as to avoid using a
      stale spte, e.g. a (very) old spte that was created with generation==0.
      
      Given that the purpose of kvm_arch_memslots_updated() is to prevent
      consuming stale entries, it needs to be called before the new generation
      is propagated to memslots.  Invalidating the MMIO sptes after updating
      memslots means that there is a window where a vCPU could dereference
      the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
      spte that was created with (pre-wrap) generation==0.
      
      Fixes: e59dbe09 ("KVM: Introduce kvm_arch_memslots_updated()")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      15248258
    • B
      kvm: vmx: Add memcg accounting to KVM allocations · 41836839
      Ben Gardon 提交于
      There are many KVM kernel memory allocations which are tied to the life of
      the VM process and should be charged to the VM process's cgroup. If the
      allocations aren't tied to the process, the OOM killer will not know
      that killing the process will free the associated kernel memory.
      Add __GFP_ACCOUNT flags to many of the allocations which are not yet being
      charged to the VM process's cgroup.
      
      Tested:
      	Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch
      	introduced no new failures.
      	Ran a kernel memory accounting test which creates a VM to touch
      	memory and then checks that the kernel memory allocated for the
      	process is within certain bounds.
      	With this patch we account for much more of the vmalloc and slab memory
      	allocated for the VM.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      41836839
    • B
      kvm: svm: Add memcg accounting to KVM allocations · 1ec69647
      Ben Gardon 提交于
      There are many KVM kernel memory allocations which are tied to the life of
      the VM process and should be charged to the VM process's cgroup. If the
      allocations aren't tied to the process, the OOM killer will not know
      that killing the process will free the associated kernel memory.
      Add __GFP_ACCOUNT flags to many of the allocations which are not yet being
      charged to the VM process's cgroup.
      
      Tested:
      	Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch
      	introduced no new failures.
      	Ran a kernel memory accounting test which creates a VM to touch
      	memory and then checks that the kernel memory allocated for the
      	process is within certain bounds.
      	With this patch we account for much more of the vmalloc and slab memory
      	allocated for the VM.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1ec69647
    • B
      kvm: x86: Add memcg accounting to KVM allocations · 254272ce
      Ben Gardon 提交于
      There are many KVM kernel memory allocations which are tied to the life of
      the VM process and should be charged to the VM process's cgroup. If the
      allocations aren't tied to the process, the OOM killer will not know
      that killing the process will free the associated kernel memory.
      Add __GFP_ACCOUNT flags to many of the allocations which are not yet being
      charged to the VM process's cgroup.
      
      Tested:
      	Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch
      	introduced no new failures.
      	Ran a kernel memory accounting test which creates a VM to touch
      	memory and then checks that the kernel memory allocated for the
      	process is within certain bounds.
      	With this patch we account for much more of the vmalloc and slab memory
      	allocated for the VM.
      
      There remain a few allocations which should be charged to the VM's
      cgroup but are not. In x86, they include:
      	vcpu->arch.pio_data
      There allocations are unaccounted in this patch because they are mapped
      to userspace, and accounting them to a cgroup causes problems. This
      should be addressed in a future patch.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      254272ce
    • B
      kvm: Add memcg accounting to KVM allocations · b12ce36a
      Ben Gardon 提交于
      There are many KVM kernel memory allocations which are tied to the life of
      the VM process and should be charged to the VM process's cgroup. If the
      allocations aren't tied to the process, the OOM killer will not know
      that killing the process will free the associated kernel memory.
      Add __GFP_ACCOUNT flags to many of the allocations which are not yet being
      charged to the VM process's cgroup.
      
      Tested:
      	Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch
      	introduced no new failures.
      	Ran a kernel memory accounting test which creates a VM to touch
      	memory and then checks that the kernel memory allocated for the
      	process is within certain bounds.
      	With this patch we account for much more of the vmalloc and slab memory
      	allocated for the VM.
      
      There remain a few allocations which should be charged to the VM's
      cgroup but are not. In they include:
              vcpu->run
              kvm->coalesced_mmio_ring
      There allocations are unaccounted in this patch because they are mapped
      to userspace, and accounting them to a cgroup causes problems. This
      should be addressed in a future patch.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b12ce36a
    • P
      KVM: nVMX: do not start the preemption timer hrtimer unnecessarily · 359a6c3d
      Paolo Bonzini 提交于
      The preemption timer can be started even if there is a vmentry
      failure during or after loading guest state.  That is pointless,
      move the call after all conditions have been checked.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      359a6c3d
    • Y
      kvm: vmx: Fix typos in vmentry/vmexit control setting · d9293597
      Yu Zhang 提交于
      Previously, 'commit f99e3daf ("KVM: x86: Add Intel PT
      virtualization work mode")' work mode' offered framework
      to support Intel PT virtualization. However, the patch has
      some typos in vmx_vmentry_ctrl() and vmx_vmexit_ctrl(), e.g.
      used wrong flags and wrong variable, which will cause the
      VM entry failure later.
      
      Fixes: 'commit f99e3daf ("KVM: x86: Add Intel PT virtualization work mode")'
      Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d9293597
    • P
      KVM: x86: cleanup freeing of nested state · b4b65b56
      Paolo Bonzini 提交于
      Ensure that the VCPU free path goes through vmx_leave_nested and
      thus nested_vmx_vmexit, so that the cancellation of the timer does
      not have to be in free_nested.  In addition, because some paths through
      nested_vmx_vmexit do not go through sync_vmcs12, the cancellation of
      the timer is moved there.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b4b65b56
    • L
      KVM: x86: Sync the pending Posted-Interrupts · 81b01667
      Luwei Kang 提交于
      Some Posted-Interrupts from passthrough devices may be lost or
      overwritten when the vCPU is in runnable state.
      
      The SN (Suppress Notification) of PID (Posted Interrupt Descriptor) will
      be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE state
      but not running on physical CPU). If a posted interrupt coming at this
      time, the irq remmaping facility will set the bit of PIR (Posted
      Interrupt Requests) without ON (Outstanding Notification).
      So this interrupt can't be sync to APIC virtualization register and
      will not be handled by Guest because ON is zero.
      Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
      [Eliminate the pi_clear_sn fast path. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      81b01667
    • L
      KVM: x86: expose MOVDIR64B CPU feature into VM. · c029b5de
      Liu Jingqi 提交于
      MOVDIR64B moves 64-bytes as direct-store with 64-bytes write atomicity.
      Direct store is implemented by using write combining (WC) for writing
      data directly into memory without caching the data.
      
      Availability of the MOVDIR64B instruction is indicated by the presence
      of the CPUID feature flag MOVDIR64B (CPUID.0x07.0x0:ECX[bit 28]).
      
      This patch exposes the movdir64b feature to the guest.
      
      The release document ref below link:
      https://software.intel.com/sites/default/files/managed/c5/15/\
      architecture-instruction-set-extensions-programming-reference.pdf
      Signed-off-by: NLiu Jingqi <jingqi.liu@intel.com>
      Cc: Xu Tao <tao3.xu@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c029b5de
    • L
      KVM: x86: expose MOVDIRI CPU feature into VM. · 74f2370b
      Liu Jingqi 提交于
      MOVDIRI moves doubleword or quadword from register to memory through
      direct store which is implemented by using write combining (WC) for
      writing data directly into memory without caching the data.
      
      Availability of the MOVDIRI instruction is indicated by the presence of
      the CPUID feature flag MOVDIRI(CPUID.0x07.0x0:ECX[bit 27]).
      
      This patch exposes the movdiri feature to the guest.
      
      The release document ref below link:
      https://software.intel.com/sites/default/files/managed/c5/15/\
      architecture-instruction-set-extensions-programming-reference.pdf
      Signed-off-by: NLiu Jingqi <jingqi.liu@intel.com>
      Cc: Xu Tao <tao3.xu@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      74f2370b
    • K
      kvm, x86, mmu: Use kernel generic dynamic physical address mask · 8acc0993
      Kai Huang 提交于
      AMD's SME/SEV is no longer the only case which reduces supported
      physical address bits, since Intel introduced Multi-key Total Memory
      Encryption (MKTME), which repurposes high bits of physical address as
      keyID, thus effectively shrinks supported physical address bits. To
      cover both cases (and potential similar future features), kernel MM
      introduced generic dynamaic physical address mask instead of hard-coded
      __PHYSICAL_MASK in 'commit 94d49eb3 ("x86/mm: Decouple dynamic
      __PHYSICAL_MASK from AMD SME")'. KVM should use that too.
      
      Change PT64_BASE_ADDR_MASK to use kernel dynamic physical address mask
      when it is enabled, instead of sme_clr. PT64_DIR_BASE_ADDR_MASK is also
      deleted since it is not used at all.
      Signed-off-by: NKai Huang <kai.huang@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8acc0993
    • P
      KVM: nVMX: remove useless is_protmode check · e0dfacbf
      Paolo Bonzini 提交于
      VMX is only accessible in protected mode, remove a confusing check
      that causes the conditional to lack a final "else" branch.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e0dfacbf
    • S
      KVM: nVMX: Ignore limit checks on VMX instructions using flat segments · 34333cc6
      Sean Christopherson 提交于
      Regarding segments with a limit==0xffffffff, the SDM officially states:
      
          When the effective limit is FFFFFFFFH (4 GBytes), these accesses may
          or may not cause the indicated exceptions.  Behavior is
          implementation-specific and may vary from one execution to another.
      
      In practice, all CPUs that support VMX ignore limit checks for "flat
      segments", i.e. an expand-up data or code segment with base=0 and
      limit=0xffffffff.  This is subtly different than wrapping the effective
      address calculation based on the address size, as the flat segment
      behavior also applies to accesses that would wrap the 4g boundary, e.g.
      a 4-byte access starting at 0xffffffff will access linear addresses
      0xffffffff, 0x0, 0x1 and 0x2.
      
      Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      34333cc6
    • S
      KVM: nVMX: Apply addr size mask to effective address for VMX instructions · 8570f9e8
      Sean Christopherson 提交于
      The address size of an instruction affects the effective address, not
      the virtual/linear address.  The final address may still be truncated,
      e.g. to 32-bits outside of long mode, but that happens irrespective of
      the address size, e.g. a 32-bit address size can yield a 64-bit virtual
      address when using FS/GS with a non-zero base.
      
      Fixes: 064aea77 ("KVM: nVMX: Decoding memory operands of VMX instructions")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8570f9e8
    • S
      KVM: nVMX: Sign extend displacements of VMX instr's mem operands · 946c522b
      Sean Christopherson 提交于
      The VMCS.EXIT_QUALIFCATION field reports the displacements of memory
      operands for various instructions, including VMX instructions, as a
      naturally sized unsigned value, but masks the value by the addr size,
      e.g. given a ModRM encoded as -0x28(%ebp), the -0x28 displacement is
      reported as 0xffffffd8 for a 32-bit address size.  Despite some weird
      wording regarding sign extension, the SDM explicitly states that bits
      beyond the instructions address size are undefined:
      
          In all cases, bits of this field beyond the instruction’s address
          size are undefined.
      
      Failure to sign extend the displacement results in KVM incorrectly
      treating a negative displacement as a large positive displacement when
      the address size of the VMX instruction is smaller than KVM's native
      size, e.g. a 32-bit address size on a 64-bit KVM.
      
      The very original decoding, added by commit 064aea77 ("KVM: nVMX:
      Decoding memory operands of VMX instructions"), sort of modeled sign
      extension by truncating the final virtual/linear address for a 32-bit
      address size.  I.e. it messed up the effective address but made it work
      by adjusting the final address.
      
      When segmentation checks were added, the truncation logic was kept
      as-is and no sign extension logic was introduced.  In other words, it
      kept calculating the wrong effective address while mostly generating
      the correct virtual/linear address.  As the effective address is what's
      used in the segment limit checks, this results in KVM incorreclty
      injecting #GP/#SS faults due to non-existent segment violations when
      a nested VMM uses negative displacements with an address size smaller
      than KVM's native address size.
      
      Using the -0x28(%ebp) example, an EBP value of 0x1000 will result in
      KVM using 0x100000fd8 as the effective address when checking for a
      segment limit violation.  This causes a 100% failure rate when running
      a 32-bit KVM build as L1 on top of a 64-bit KVM L0.
      
      Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      946c522b
    • S
      svm: Fix improper check when deactivate AVIC · c57cd3c8
      Suthikulpanit, Suravee 提交于
      The function svm_refresh_apicv_exec_ctrl() always returning prematurely
      as kvm_vcpu_apicv_active() always return false when calling from
      the function arch/x86/kvm/x86.c:kvm_vcpu_deactivate_apicv().
      This is because the apicv_active is set to false just before calling
      refresh_apicv_exec_ctrl().
      
      Also, we need to mark VMCB_AVIC bit as dirty instead of VMCB_INTR.
      
      So, fix svm_refresh_apicv_exec_ctrl() to properly deactivate AVIC.
      
      Fixes: 67034bb9 ('KVM: SVM: Add irqchip_split() checks before enabling AVIC')
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c57cd3c8
    • P
      KVM: x86: cull apicv code when userspace irqchip is requested · f7589cca
      Paolo Bonzini 提交于
      Currently apicv_active can be true even if in-kernel LAPIC
      emulation is disabled.  Avoid this by properly initializing
      it in kvm_arch_vcpu_init, and then do not do anything to
      deactivate APICv when it is actually not used
      
      (Currently APICv is only deactivated by SynIC code that in turn
      is only reachable when in-kernel LAPIC is in use.  However, it is
      cleaner if kvm_vcpu_deactivate_apicv avoids relying on this.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f7589cca
    • S
      svm: Fix AVIC DFR and LDR handling · 98d90582
      Suthikulpanit, Suravee 提交于
      Current SVM AVIC driver makes two incorrect assumptions:
        1. APIC LDR register cannot be zero
        2. APIC DFR for all vCPUs must be the same
      
      LDR=0 means the local APIC does not support logical destination mode.
      Therefore, the driver should mark any previously assigned logical APIC ID
      table entry as invalid, and return success.  Also, DFR is specific to
      a particular local APIC, and can be different among all vCPUs
      (as observed on Windows 10).
      
      These incorrect assumptions cause Windows 10 and FreeBSD VMs to fail
      to boot with AVIC enabled. So, instead of flush the whole logical APIC ID
      table, handle DFR and LDR for each vCPU independently.
      
      Fixes: 18f40c53 ('svm: Add VMEXIT handlers for AVIC')
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Reported-by: NJulian Stecklina <jsteckli@amazon.de>
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      98d90582
    • G
      kvm: Use struct_size() in kmalloc() · 90952cd3
      Gustavo A. R. Silva 提交于
      One of the more common cases of allocation size calculations is finding
      the size of a structure that has a zero-sized array at the end, along
      with memory for some number of elements for that array. For example:
      
      struct foo {
          int stuff;
          void *entry[];
      };
      
      instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);
      
      Instead of leaving these open-coded and prone to type mistakes, we can
      now use the new struct_size() helper:
      
      instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);
      
      This code was detected with the help of Coccinelle.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      90952cd3
    • P
      x86/kvmclock: set offset for kvm unstable clock · b5179ec4
      Pavel Tatashin 提交于
      VMs may show incorrect uptime and dmesg printk offsets on hypervisors with
      unstable clock. The problem is produced when VM is rebooted without exiting
      from qemu.
      
      The fix is to calculate clock offset not only for stable clock but for
      unstable clock as well, and use kvm_sched_clock_read() which substracts
      the offset for both clocks.
      
      This is safe, because pvclock_clocksource_read() does the right thing and
      makes sure that clock always goes forward, so once offset is calculated
      with unstable clock, we won't get new reads that are smaller than offset,
      and thus won't get negative results.
      
      Thank you Jon DeVree for helping to reproduce this issue.
      
      Fixes: 857baa87 ("sched/clock: Enable sched clock early")
      Cc: stable@vger.kernel.org
      Reported-by: NDominique Martinet <asmadeus@codewreck.org>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b5179ec4
    • S
      KVM: VMX: Reorder clearing of registers in the vCPU-run assembly flow · 4f44c4ee
      Sean Christopherson 提交于
      Move the clearing of the common registers (not 64-bit-only) to the start
      of the flow that clears registers holding guest state.  This is
      purely a cosmetic change so that the label doesn't point at a blank line
      and a #define.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4f44c4ee
    • S
      KVM: VMX: Call vCPU-run asm sub-routine from C and remove clobbering · fc2ba5a2
      Sean Christopherson 提交于
      ...now that the sub-routine follows standard calling conventions.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fc2ba5a2
    • S
      KVM: VMX: Preserve callee-save registers in vCPU-run asm sub-routine · 3b895ef4
      Sean Christopherson 提交于
      ...to make it callable from C code.
      
      Note that because KVM chooses to be ultra paranoid about guest register
      values, all callee-save registers are still cleared after VM-Exit even
      though the host's values are now reloaded from the stack.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3b895ef4
    • S
      KVM: VMX: Return VM-Fail from vCPU-run assembly via standard ABI reg · e75c3c3a
      Sean Christopherson 提交于
      ...to prepare for making the assembly sub-routine callable from C code.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e75c3c3a
    • S
      KVM: VMX: Pass @launched to the vCPU-run asm via standard ABI regs · 77df5495
      Sean Christopherson 提交于
      ...to prepare for making the sub-routine callable from C code.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      77df5495
    • S
      KVM: VMX: Use RAX as the scratch register during vCPU-run · a62fd5a7
      Sean Christopherson 提交于
      ...to prepare for making the sub-routine callable from C code.  That
      means returning the result in RAX.  Since RAX will be used to return the
      result, use it as the scratch register as well to make the code readable
      and to document that the scratch register is more or less arbitrary.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a62fd5a7
    • S
      KVM: VMX: Rename ____vmx_vcpu_run() to __vmx_vcpu_run() · ee2fc635
      Sean Christopherson 提交于
      ...now that the name is no longer usurped by a defunct helper function.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ee2fc635
    • S
      KVM: VMX: Fold __vmx_vcpu_run() back into vmx_vcpu_run() · c823dd5c
      Sean Christopherson 提交于
      ...now that the code is no longer tagged with STACK_FRAME_NON_STANDARD.
      Arguably, providing __vmx_vcpu_run() to break up vmx_vcpu_run() is
      valuable on its own, but the previous split was purposely made as small
      as possible to limit the effects STACK_FRAME_NON_STANDARD.  In other
      words, the current split is now completely arbitrary and likely not the
      most logical.
      
      This also allows renaming ____vmx_vcpu_run() to __vmx_vcpu_run() in a
      future patch.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c823dd5c
    • S
      KVM: VMX: Move vCPU-run code to a proper assembly routine · 5e0781df
      Sean Christopherson 提交于
      As evidenced by the myriad patches leading up to this moment, using
      an inline asm blob for vCPU-run is nothing short of horrific.  It's also
      been called "unholy", "an abomination" and likely a whole host of other
      names that would violate the Code of Conduct if recorded here and now.
      
      The code is relocated nearly verbatim, e.g. quotes, newlines, tabs and
      __stringify need to be dropped, but other than those cosmetic changes
      the only functional changees are to add the "call" and replace the final
      "jmp" with a "ret".
      
      Note that STACK_FRAME_NON_STANDARD is also dropped from __vmx_vcpu_run().
      Suggested-by: NAndi Kleen <ak@linux.intel.com>
      Suggested-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5e0781df
    • S
      KVM: VMX: Create a stack frame in vCPU-run · 63c73aa0
      Sean Christopherson 提交于
      ...in preparation for moving to a proper assembly sub-routnine.
      vCPU-run isn't a leaf function since it calls vmx_update_host_rsp()
      and vmx_vmenter().  And since we need to save/restore RBP anyways,
      unconditionally creating the frame costs a single MOV, i.e. don't
      bother keying off CONFIG_FRAME_POINTER or using FRAME_BEGIN, etc...
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      63c73aa0
    • S
      KVM: VMX: Use #defines in place of immediates in VM-Enter inline asm · c14f9dd5
      Sean Christopherson 提交于
      ...to prepare for moving the inline asm to a proper asm sub-routine.
      Eliminating the immediates allows a nearly verbatim move, e.g. quotes,
      newlines, tabs and __stringify need to be dropped, but other than those
      cosmetic changes the only function change will be to replace the final
      "jmp" with a "ret".
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c14f9dd5
    • S
      KVM: x86: Explicitly #define the VCPU_REGS_* indices · 95c7b77d
      Sean Christopherson 提交于
      Declaring the VCPU_REGS_* as enums allows for more robust C code, but it
      prevents using the values in assembly files.  Expliciting #define the
      indices in an asm-friendly file to prepare for VMX moving its transition
      code to a proper assembly file, but keep the enums for general usage.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      95c7b77d
  2. 12 2月, 2019 4 次提交