1. 15 2月, 2017 13 次提交
    • J
      kvm: nVMX: Refactor nested_vmx_run() · 858e25c0
      Jim Mattson 提交于
      Nested_vmx_run is split into two parts: the part that handles the
      VMLAUNCH/VMRESUME instruction, and the part that modifies the vcpu state
      to transition from VMX root mode to VMX non-root mode. The latter will
      be used when restoring the checkpointed state of a vCPU that was in VMX
      operation when a snapshot was taken.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      858e25c0
    • J
      kvm: nVMX: Split VMCS checks from nested_vmx_run() · ca0bde28
      Jim Mattson 提交于
      The checks performed on the contents of the vmcs12 are extracted from
      nested_vmx_run so that they can be used to validate a vmcs12 that has
      been restored from a checkpoint.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      [Change prepare_vmcs02 and nested_vmx_load_cr3's last argument to u32,
       to match check_vmentry_postreqs.  Update comments for singlestep
       handling. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ca0bde28
    • J
      kvm: nVMX: Refactor nested_get_vmcs12_pages() · 6beb7bd5
      Jim Mattson 提交于
      Perform the checks on vmcs12 state early, but defer the gpa->hpa lookups
      until after prepare_vmcs02. Later, when we restore the checkpointed
      state of a vCPU in guest mode, we will not be able to do the gpa->hpa
      lookups when the restore is done.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6beb7bd5
    • J
      kvm: nVMX: Refactor handle_vmptrld() · a8bc284e
      Jim Mattson 提交于
      Handle_vmptrld is split into two parts: the part that handles the
      VMPTRLD instruction, and the part that establishes the current VMCS
      pointer.  The latter will be used when restoring the checkpointed state
      of a vCPU that had a valid VMCS pointer when a snapshot was taken.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a8bc284e
    • J
      kvm: nVMX: Refactor handle_vmon() · e29acc55
      Jim Mattson 提交于
      Handle_vmon is split into two parts: the part that handles the VMXON
      instruction, and the part that modifies the vcpu state to transition
      from legacy mode to VMX operation. The latter will be used when
      restoring the checkpointed state of a vCPU that was in VMX operation
      when a snapshot was taken.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e29acc55
    • J
      kvm: nVMX: Prepare for checkpointing L2 state · cf8b84f4
      Jim Mattson 提交于
      Split prepare_vmcs12 into two parts: the part that stores the current L2
      guest state and the part that sets up the exit information fields. The
      former will be used when checkpointing the vCPU's VMX state.
      
      Modify prepare_vmcs02 so that it can construct a vmcs02 midway through
      L2 execution, using the checkpointed L2 guest state saved into the
      cached vmcs12 above.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      [Rebasing: add from_vmentry argument to prepare_vmcs02 instead of using
       vmx->nested.nested_run_pending, because it is no longer 1 at the
       point prepare_vmcs02 is called. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cf8b84f4
    • P
      kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection · b95234c8
      Paolo Bonzini 提交于
      Since bf9f6ac8 ("KVM: Update Posted-Interrupts Descriptor when vCPU
      is blocked", 2015-09-18) the posted interrupt descriptor is checked
      unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
      trigger the scan and, if NMIs or SMIs are not involved, we can avoid
      the complicated event injection path.
      
      Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
      there since APICv was introduced.
      
      However, without the KVM_REQ_EVENT safety net KVM needs to be much
      more careful about races between vmx_deliver_posted_interrupt and
      vcpu_enter_guest.  First, the IPI for posted interrupts may be issued
      between setting vcpu->mode = IN_GUEST_MODE and disabling interrupts.
      If that happens, kvm_trigger_posted_interrupt returns true, but
      smp_kvm_posted_intr_ipi doesn't do anything about it.  The guest is
      entered with PIR.ON, but the posted interrupt IPI has not been sent
      and the interrupt is only delivered to the guest on the next vmentry
      (if any).  To fix this, disable interrupts before setting vcpu->mode.
      This ensures that the IPI is delayed until the guest enters non-root mode;
      it is then trapped by the processor causing the interrupt to be injected.
      
      Second, the IPI may be issued between kvm_x86_ops->sync_pir_to_irr(vcpu)
      and vcpu->mode = IN_GUEST_MODE.  In this case, kvm_vcpu_kick is called
      but it (correctly) doesn't do anything because it sees vcpu->mode ==
      OUTSIDE_GUEST_MODE.  Again, the guest is entered with PIR.ON but no
      posted interrupt IPI is pending; this time, the fix for this is to move
      the RVI update after IN_GUEST_MODE.
      
      Both issues were mostly masked by the liberal usage of KVM_REQ_EVENT,
      though the second could actually happen with VT-d posted interrupts.
      In both race scenarios KVM_REQ_EVENT would cancel guest entry, resulting
      in another vmentry which would inject the interrupt.
      
      This saves about 300 cycles on the self_ipi_* tests of vmexit.flat.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b95234c8
    • P
      KVM: x86: do not scan IRR twice on APICv vmentry · 76dfafd5
      Paolo Bonzini 提交于
      Calls to apic_find_highest_irr are scanning IRR twice, once
      in vmx_sync_pir_from_irr and once in apic_search_irr.  Change
      sync_pir_from_irr to get the new maximum IRR from kvm_apic_update_irr;
      now that it does the computation, it can also do the RVI write.
      
      In order to avoid complications in svm.c, make the callback optional.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      76dfafd5
    • P
    • P
      KVM: x86: preparatory changes for APICv cleanups · 810e6def
      Paolo Bonzini 提交于
      Add return value to __kvm_apic_update_irr/kvm_apic_update_irr.
      Move vmx_sync_pir_to_irr around.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      810e6def
    • P
      kvm: nVMX: move nested events check to kvm_vcpu_running · 0ad3bed6
      Paolo Bonzini 提交于
      vcpu_run calls kvm_vcpu_running, not kvm_arch_vcpu_runnable,
      and the former does not call check_nested_events.
      
      Once KVM_REQ_EVENT is removed from the APICv interrupt injection
      path, however, this would leave no place to trigger a vmexit
      from L2 to L1, causing a missed interrupt delivery while in guest
      mode.  This is caught by the "ack interrupt on exit" test in
      vmx.flat.
      
      [This does not change the calls to check_nested_events in
       inject_pending_event.  That is material for a separate cleanup.]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0ad3bed6
    • P
      KVM: vmx: clear pending interrupts on KVM_SET_LAPIC · 967235d3
      Paolo Bonzini 提交于
      Pending interrupts might be in the PI descriptor when the
      LAPIC is restored from an external state; we do not want
      them to be injected.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      967235d3
    • P
      kvm: vmx: Use the hardware provided GPA instead of page walk · db1c056c
      Paolo Bonzini 提交于
      As in the SVM patch, the guest physical address is passed by
      VMX to x86_emulate_instruction already, so mark the GPA as available
      in vcpu->arch.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      db1c056c
  2. 09 2月, 2017 3 次提交
  3. 08 2月, 2017 10 次提交
  4. 07 2月, 2017 2 次提交
  5. 06 2月, 2017 2 次提交
  6. 03 2月, 2017 10 次提交
    • J
      KVM: MIPS: Allow multiple VCPUs to be created · 12ed1fae
      James Hogan 提交于
      Increase the maximum number of MIPS KVM VCPUs to 8, and implement the
      KVM_CAP_NR_VCPUS and KVM_CAP_MAX_CPUS capabilities which expose the
      recommended and maximum number of VCPUs to userland. The previous
      maximum of 1 didn't allow for any form of SMP guests.
      
      We calculate the values similarly to ARM, recommending as many VCPUs as
      there are CPUs online in the system. This will allow userland to know
      how many VCPUs it is possible to create.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      12ed1fae
    • J
      KVM: MIPS/T&E: Expose read-only CP0_IntCtl register · ad58d4d4
      James Hogan 提交于
      Expose the CP0_IntCtl register through the KVM register access API,
      which is a required register since MIPS32r2. It is currently read-only
      since the VS field isn't implemented due to lack of Config3.VInt or
      Config3.VEIC.
      
      It is implemented in trap_emul.c so that a VZ implementation can allow
      writes.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      ad58d4d4
    • J
      KVM: MIPS/T&E: Expose CP0_EntryLo0/1 registers · 013044cc
      James Hogan 提交于
      Expose the CP0_EntryLo0 and CP0_EntryLo1 registers through the KVM
      register access API. This is fairly straightforward for trap & emulate
      since we don't support the RI and XI bits. For the sake of future
      proofing (particularly for VZ) it is explicitly specified that the API
      always exposes the 64-bit version of these registers (i.e. with the RI
      and XI bits in bit positions 63 and 62 respectively), and they are
      implemented in trap_emul.c rather than mips.c to allow them to be
      implemented differently for VZ.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      013044cc
    • J
      KVM: MIPS/T&E: Default to reset vector · be67a0be
      James Hogan 提交于
      Set the default VCPU state closer to the architectural reset state, with
      PC pointing at the reset vector (uncached PA 0x1fc00000, which for KVM
      T&E is VA 0x5fc00000), and with CP0_Status.BEV and CP0_Status.ERL to 1.
      
      Although QEMU at least will overwrite this state, it makes sense to do
      this now that CP0_EBase is properly implemented to check BEV, and now
      that we support a sparse GPA layout potentially with a boot ROM at GPA
      0x1fc00000.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      be67a0be
    • J
      KVM: MIPS/T&E: Implement CP0_EBase register · 7801bbe1
      James Hogan 提交于
      The CP0_EBase register is a standard feature of MIPS32r2, so we should
      always have been implementing it properly. However the register value
      was ignored and wasn't exposed to userland.
      
      Fix the emulation of exceptions and interrupts to use the value stored
      in guest CP0_EBase, and fix the masks so that the top 3 bits (rather
      than the standard 2) are fixed, so that it is always in the guest KSeg0
      segment.
      
      Also add CP0_EBASE to the KVM one_reg interface so it can be accessed by
      userland, also allowing the CPU number field to be written (which isn't
      permitted by the guest).
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      7801bbe1
    • J
      KVM: MIPS/T&E: Move CP0 register access into T&E · 654229a0
      James Hogan 提交于
      Access to various CP0 registers via the KVM register access API needs to
      be implementation specific to allow restrictions to be made on changes,
      for example when VZ guest registers aren't present, so move them all
      into trap_emul.c in preparation for VZ.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      654229a0
    • J
      KVM: MIPS: Claim KVM_CAP_READONLY_MEM support · 230c5724
      James Hogan 提交于
      Now that load/store faults due to read only memory regions are treated
      as MMIO accesses it is safe to claim support for read only memory
      regions (KVM_CAP_READONLY_MEM).
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      230c5724
    • J
      KVM: MIPS/MMU: Implement KVM_CAP_SYNC_MMU · 411740f5
      James Hogan 提交于
      Implement the SYNC_MMU capability for KVM MIPS, allowing changes in the
      underlying user host virtual address (HVA) mappings to be promptly
      reflected in the corresponding guest physical address (GPA) mappings.
      
      This allows for several features to work with guest RAM which require
      mappings to be altered or protected, such as copy-on-write, KSM (Kernel
      Samepage Merging), idle page tracking, memory swapping, and guest memory
      ballooning.
      
      There are two main aspects of this change, described below.
      
      The KVM MMU notifier architecture callbacks are implemented so we can be
      notified of changes in the HVA mappings. These arrange for the guest
      physical address (GPA) page tables to be modified and possibly for
      derived mappings (GVA page tables and TLBs) to be flushed.
      
       - kvm_unmap_hva[_range]() - These deal with HVA mappings being removed,
         for example before a copy-on-write takes place, which requires the
         corresponding GPA page table mappings to be removed too.
      
       - kvm_set_spte_hva() - These update a GPA page table entry to match the
         new HVA entry, but must be careful to respect KVM specific
         configuration such as not dirtying a clean guest page which is dirty
         to the host, and write protecting writable pages in read only
         memslots (which will soon be supported).
      
       - kvm[_test]_age_hva() - These update GPA page table entries to be old
         (invalid) so that access can be tracked, making them young again.
      
      The GPA page fault handling (kvm_mips_map_page) is updated to use
      gfn_to_pfn_prot() (which may provide read-only pages), to handle
      asynchronous page table invalidation from MMU notifier callbacks, and to
      handle more cases in the fast path.
      
       - mmu_notifier_seq is used to detect asynchronous page table
         invalidations while we're holding a pfn from gfn_to_pfn_prot()
         outside of kvm->mmu_lock, retrying if invalidations have taken place,
         e.g. a COW or a KSM page merge.
      
       - The fast path (_kvm_mips_map_page_fast) now handles marking old pages
         as young / accessed, and disallowing dirtying of clean pages that
         aren't actually writable (e.g. shared pages that should COW, and
         read-only memory regions when they are enabled in a future patch).
      
       - Due to the use of MMU notifications we no longer need to keep the
         page references after we've updated the GPA page tables.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      411740f5
    • J
      KVM: MIPS/MMU: Pass GPA PTE bits to mapped GVA PTEs · f9b11e51
      James Hogan 提交于
      Propagate the GPA PTE protection bits on to the GVA PTEs on a mapped
      fault (except _PAGE_WRITE, and filtered by the guest TLB entry), rather
      than always overriding the protection. This allows dirty page tracking
      to work in mapped guest segments as a clear dirty bit in the GPA PTE
      will propagate to the GVA PTEs even when the guest TLB has the dirty bit
      set.
      
      Since the filtering of protection bits is now abstracted, if the buddy
      GVA PTE is also valid, we obtain the corresponding GPA PTE using a
      simple non-allocating walk and load that into the GVA PTE similarly
      (which may itself be invalid).
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      f9b11e51
    • J
      KVM: MIPS/MMU: Pass GPA PTE bits to KSeg0 GVA PTEs · b584f460
      James Hogan 提交于
      Propagate the GPA PTE protection bits on to the GVA PTEs on a KSeg0
      fault (except _PAGE_WRITE), rather than always overriding the
      protection. This allows dirty page tracking to work in KSeg0 as a clear
      dirty bit in the GPA PTE will propagate to the GVA PTEs.
      
      This makes it simpler to use a single kvm_mips_map_page() to obtain both
      the main GPA PTE and its buddy (which may be invalid), which also allows
      memory regions to be fully accessible when they don't start and end on a
      2*PAGE_SIZE boundary.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      b584f460