1. 25 5月, 2018 2 次提交
    • D
      KVM: arm64: Save host SVE context as appropriate · 85acda3b
      Dave Martin 提交于
      This patch adds SVE context saving to the hyp FPSIMD context switch
      path.  This means that it is no longer necessary to save the host
      SVE state in advance of entering the guest, when in use.
      
      In order to avoid adding pointless complexity to the code, VHE is
      assumed if SVE is in use.  VHE is an architectural prerequisite for
      SVE, so there is no good reason to turn CONFIG_ARM64_VHE off in
      kernels that support both SVE and KVM.
      
      Historically, software models exist that can expose the
      architecturally invalid configuration of SVE without VHE, so if
      this situation is detected at kvm_init() time then KVM will be
      disabled.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      85acda3b
    • D
      KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing · e6b673b7
      Dave Martin 提交于
      This patch refactors KVM to align the host and guest FPSIMD
      save/restore logic with each other for arm64.  This reduces the
      number of redundant save/restore operations that must occur, and
      reduces the common-case IRQ blackout time during guest exit storms
      by saving the host state lazily and optimising away the need to
      restore the host state before returning to the run loop.
      
      Four hooks are defined in order to enable this:
      
       * kvm_arch_vcpu_run_map_fp():
         Called on PID change to map necessary bits of current to Hyp.
      
       * kvm_arch_vcpu_load_fp():
         Set up FP/SIMD for entering the KVM run loop (parse as
         "vcpu_load fp").
      
       * kvm_arch_vcpu_ctxsync_fp():
         Get FP/SIMD into a safe state for re-enabling interrupts after a
         guest exit back to the run loop.
      
         For arm64 specifically, this involves updating the host kernel's
         FPSIMD context tracking metadata so that kernel-mode NEON use
         will cause the vcpu's FPSIMD state to be saved back correctly
         into the vcpu struct.  This must be done before re-enabling
         interrupts because kernel-mode NEON may be used by softirqs.
      
       * kvm_arch_vcpu_put_fp():
         Save guest FP/SIMD state back to memory and dissociate from the
         CPU ("vcpu_put fp").
      
      Also, the arm64 FPSIMD context switch code is updated to enable it
      to save back FPSIMD state for a vcpu, not just current.  A few
      helpers drive this:
      
       * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
         mark this CPU as having context fp (which may belong to a vcpu)
         currently loaded in its registers.  This is the non-task
         equivalent of the static function fpsimd_bind_to_cpu() in
         fpsimd.c.
      
       * task_fpsimd_save():
         exported to allow KVM to save the guest's FPSIMD state back to
         memory on exit from the run loop.
      
       * fpsimd_flush_state():
         invalidate any context's FPSIMD state that is currently loaded.
         Used to disassociate the vcpu from the CPU regs on run loop exit.
      
      These changes allow the run loop to enable interrupts (and thus
      softirqs that may use kernel-mode NEON) without having to save the
      guest's FPSIMD state eagerly.
      
      Some new vcpu_arch fields are added to make all this work.  Because
      host FPSIMD state can now be saved back directly into current's
      thread_struct as appropriate, host_cpu_context is no longer used
      for preserving the FPSIMD state.  However, it is still needed for
      preserving other things such as the host's system registers.  To
      avoid ABI churn, the redundant storage space in host_cpu_context is
      not removed for now.
      
      arch/arm is not addressed by this patch and continues to use its
      current save/restore logic.  It could provide implementations of
      the helpers later if desired.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      e6b673b7
  2. 04 5月, 2018 1 次提交
  3. 27 4月, 2018 3 次提交
  4. 20 4月, 2018 1 次提交
    • M
      arm/arm64: KVM: Add PSCI version selection API · 85bd0ba1
      Marc Zyngier 提交于
      Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1
      or 1.0 to a guest, defaulting to the latest version of the PSCI
      implementation that is compatible with the requested version. This is
      no different from doing a firmware upgrade on KVM.
      
      But in order to give a chance to hypothetical badly implemented guests
      that would have a fit by discovering something other than PSCI 0.2,
      let's provide a new API that allows userspace to pick one particular
      version of the API.
      
      This is implemented as a new class of "firmware" registers, where
      we expose the PSCI version. This allows the PSCI version to be
      save/restored as part of a guest migration, and also set to
      any supported version if the guest requires it.
      
      Cc: stable@vger.kernel.org #4.16
      Reviewed-by: NChristoffer Dall <cdall@kernel.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      85bd0ba1
  5. 17 4月, 2018 2 次提交
    • A
      KVM: arm/arm64: vgic: Kick new VCPU on interrupt migration · bf9a4137
      Andre Przywara 提交于
      When vgic_prune_ap_list() finds an interrupt that needs to be migrated
      to a new VCPU, we should notify this VCPU of the pending interrupt,
      since it requires immediate action.
      Kick this VCPU once we have added the new IRQ to the list, but only
      after dropping the locks.
      Reported-by: NStefano Stabellini <sstabellini@kernel.org>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      bf9a4137
    • M
      KVM: arm/arm64: Close VMID generation race · f0cf47d9
      Marc Zyngier 提交于
      Before entering the guest, we check whether our VMID is still
      part of the current generation. In order to avoid taking a lock,
      we start with checking that the generation is still current, and
      only if not current do we take the lock, recheck, and update the
      generation and VMID.
      
      This leaves open a small race: A vcpu can bump up the global
      generation number as well as the VM's, but has not updated
      the VMID itself yet.
      
      At that point another vcpu from the same VM comes in, checks
      the generation (and finds it not needing anything), and jumps
      into the guest. At this point, we end-up with two vcpus belonging
      to the same VM running with two different VMIDs. Eventually, the
      VMID used by the second vcpu will get reassigned, and things will
      really go wrong...
      
      A simple solution would be to drop this initial check, and always take
      the lock. This is likely to cause performance issues. A middle ground
      is to convert the spinlock to a rwlock, and only take the read lock
      on the fast path. If the check fails at that point, drop it and
      acquire the write lock, rechecking the condition.
      
      This ensures that the above scenario doesn't occur.
      
      Cc: stable@vger.kernel.org
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: NShannon Zhao <zhaoshenglong@huawei.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      f0cf47d9
  6. 26 3月, 2018 2 次提交
    • M
      KVM: arm/arm64: vgic-its: Fix potential overrun in vgic_copy_lpi_list · 7d8b44c5
      Marc Zyngier 提交于
      vgic_copy_lpi_list() parses the LPI list and picks LPIs targeting
      a given vcpu. We allocate the array containing the intids before taking
      the lpi_list_lock, which means we can have an array size that is not
      equal to the number of LPIs.
      
      This is particularly obvious when looking at the path coming from
      vgic_enable_lpis, which is not a command, and thus can run in parallel
      with commands:
      
      vcpu 0:                                        vcpu 1:
      vgic_enable_lpis
        its_sync_lpi_pending_table
          vgic_copy_lpi_list
            intids = kmalloc_array(irq_count)
                                                     MAPI(lpi targeting vcpu 0)
            list_for_each_entry(lpi_list_head)
              intids[i++] = irq->intid;
      
      At that stage, we will happily overrun the intids array. Boo. An easy
      fix is is to break once the array is full. The MAPI command will update
      the config anyway, and we won't miss a thing. We also make sure that
      lpi_list_count is read exactly once, so that further updates of that
      value will not affect the array bound check.
      
      Cc: stable@vger.kernel.org
      Fixes: ccb1d791 ("KVM: arm64: vgic-its: Fix pending table sync")
      Reviewed-by: NAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      7d8b44c5
    • M
      KVM: arm/arm64: vgic: Disallow Active+Pending for level interrupts · 67b5b673
      Marc Zyngier 提交于
      It was recently reported that VFIO mediated devices, and anything
      that VFIO exposes as level interrupts, do no strictly follow the
      expected logic of such interrupts as it only lowers the input
      line when the guest has EOId the interrupt at the GIC level, rather
      than when it Acked the interrupt at the device level.
      
      THe GIC's Active+Pending state is fundamentally incompatible with
      this behaviour, as it prevents KVM from observing the EOI, and in
      turn results in VFIO never dropping the line. This results in an
      interrupt storm in the guest, which it really never expected.
      
      As we cannot really change VFIO to follow the strict rules of level
      signalling, let's forbid the A+P state altogether, as it is in the
      end only an optimization. It ensures that we will transition via
      an invalid state, which we can use to notify VFIO of the EOI.
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Tested-by: NEric Auger <eric.auger@redhat.com>
      Tested-by: NShunyong Yang <shunyong.yang@hxt-semitech.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      67b5b673
  7. 19 3月, 2018 22 次提交
  8. 15 3月, 2018 6 次提交
    • M
      kvm: arm/arm64: vgic-v3: Tighten synchronization for guests using v2 on v3 · 27e91ad1
      Marc Zyngier 提交于
      On guest exit, and when using GICv2 on GICv3, we use a dsb(st) to
      force synchronization between the memory-mapped guest view and
      the system-register view that the hypervisor uses.
      
      This is incorrect, as the spec calls out the need for "a DSB whose
      required access type is both loads and stores with any Shareability
      attribute", while we're only synchronizing stores.
      
      We also lack an isb after the dsb to ensure that the latter has
      actually been executed before we start reading stuff from the sysregs.
      
      The fix is pretty easy: turn dsb(st) into dsb(sy), and slap an isb()
      just after.
      
      Cc: stable@vger.kernel.org
      Fixes: f68d2b1b ("arm64: KVM: Implement vgic-v3 save/restore")
      Acked-by: NChristoffer Dall <cdall@kernel.org>
      Reviewed-by: NAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      27e91ad1
    • M
      KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid · 16ca6a60
      Marc Zyngier 提交于
      The vgic code is trying to be clever when injecting GICv2 SGIs,
      and will happily populate LRs with the same interrupt number if
      they come from multiple vcpus (after all, they are distinct
      interrupt sources).
      
      Unfortunately, this is against the letter of the architecture,
      and the GICv2 architecture spec says "Each valid interrupt stored
      in the List registers must have a unique VirtualID for that
      virtual CPU interface.". GICv3 has similar (although slightly
      ambiguous) restrictions.
      
      This results in guests locking up when using GICv2-on-GICv3, for
      example. The obvious fix is to stop trying so hard, and inject
      a single vcpu per SGI per guest entry. After all, pending SGIs
      with multiple source vcpus are pretty rare, and are mostly seen
      in scenario where the physical CPUs are severely overcomitted.
      
      But as we now only inject a single instance of a multi-source SGI per
      vcpu entry, we may delay those interrupts for longer than strictly
      necessary, and run the risk of injecting lower priority interrupts
      in the meantime.
      
      In order to address this, we adopt a three stage strategy:
      - If we encounter a multi-source SGI in the AP list while computing
        its depth, we force the list to be sorted
      - When populating the LRs, we prevent the injection of any interrupt
        of lower priority than that of the first multi-source SGI we've
        injected.
      - Finally, the injection of a multi-source SGI triggers the request
        of a maintenance interrupt when there will be no pending interrupt
        in the LRs (HCR_NPIE).
      
      At the point where the last pending interrupt in the LRs switches
      from Pending to Active, the maintenance interrupt will be delivered,
      allowing us to add the remaining SGIs using the same process.
      
      Cc: stable@vger.kernel.org
      Fixes: 0919e84c ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
      Acked-by: NChristoffer Dall <cdall@kernel.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      16ca6a60
    • A
      KVM: arm/arm64: Reduce verbosity of KVM init log · 76600428
      Ard Biesheuvel 提交于
      On my GICv3 system, the following is printed to the kernel log at boot:
      
         kvm [1]: 8-bit VMID
         kvm [1]: IDMAP page: d20e35000
         kvm [1]: HYP VA range: 800000000000:ffffffffffff
         kvm [1]: vgic-v2@2c020000
         kvm [1]: GIC system register CPU interface enabled
         kvm [1]: vgic interrupt IRQ1
         kvm [1]: virtual timer IRQ4
         kvm [1]: Hyp mode initialized successfully
      
      The KVM IDMAP is a mapping of a statically allocated kernel structure,
      and so printing its physical address leaks the physical placement of
      the kernel when physical KASLR in effect. So change the kvm_info() to
      kvm_debug() to remove it from the log output.
      
      While at it, trim the output a bit more: IRQ numbers can be found in
      /proc/interrupts, and the HYP VA and vgic-v2 lines are not highly
      informational either.
      
      Cc: <stable@vger.kernel.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NChristoffer Dall <cdall@kernel.org>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      76600428
    • C
      KVM: arm/arm64: Reset mapped IRQs on VM reset · 413aa807
      Christoffer Dall 提交于
      We currently don't allow resetting mapped IRQs from userspace, because
      their state is controlled by the hardware.  But we do need to reset the
      state when the VM is reset, so we provide a function for the 'owner' of
      the mapped interrupt to reset the interrupt state.
      
      Currently only the timer uses mapped interrupts, so we call this
      function from the timer reset logic.
      
      Cc: stable@vger.kernel.org
      Fixes: 4c60e360 ("KVM: arm/arm64: Provide a get_input_level for the arch timer")
      Signed-off-by: NChristoffer Dall <cdall@kernel.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      413aa807
    • C
      KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN · e21a4f3a
      Christoffer Dall 提交于
      Calling vcpu_load() registers preempt notifiers for this vcpu and calls
      kvm_arch_vcpu_load().  The latter will soon be doing a lot of heavy
      lifting on arm/arm64 and will try to do things such as enabling the
      virtual timer and setting us up to handle interrupts from the timer
      hardware.
      
      Loading state onto hardware registers and enabling hardware to signal
      interrupts can be problematic when we're not actually about to run the
      VCPU, because it makes it difficult to establish the right context when
      handling interrupts from the timer, and it makes the register access
      code difficult to reason about.
      
      Luckily, now when we call vcpu_load in each ioctl implementation, we can
      simply remove the call from the non-KVM_RUN vcpu ioctls, and our
      kvm_arch_vcpu_load() is only used for loading vcpu content to the
      physical CPU when we're actually going to run the vcpu.
      
      Cc: stable@vger.kernel.org
      Fixes: 9b062471 ("KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl")
      Reviewed-by: NJulien Grall <julien.grall@arm.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      e21a4f3a
    • A
      KVM: arm/arm64: vgic: Add missing irq_lock to vgic_mmio_read_pending · 62b06f8f
      Andre Przywara 提交于
      Our irq_is_pending() helper function accesses multiple members of the
      vgic_irq struct, so we need to hold the lock when calling it.
      Add that requirement as a comment to the definition and take the lock
      around the call in vgic_mmio_read_pending(), where we were missing it
      before.
      
      Fixes: 96b29800 ("KVM: arm/arm64: vgic-new: Add PENDING registers handlers")
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      62b06f8f
  9. 26 2月, 2018 1 次提交