1. 20 4月, 2018 1 次提交
    • M
      arm/arm64: KVM: Add PSCI version selection API · 85bd0ba1
      Marc Zyngier 提交于
      Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1
      or 1.0 to a guest, defaulting to the latest version of the PSCI
      implementation that is compatible with the requested version. This is
      no different from doing a firmware upgrade on KVM.
      
      But in order to give a chance to hypothetical badly implemented guests
      that would have a fit by discovering something other than PSCI 0.2,
      let's provide a new API that allows userspace to pick one particular
      version of the API.
      
      This is implemented as a new class of "firmware" registers, where
      we expose the PSCI version. This allows the PSCI version to be
      save/restored as part of a guest migration, and also set to
      any supported version if the guest requires it.
      
      Cc: stable@vger.kernel.org #4.16
      Reviewed-by: NChristoffer Dall <cdall@kernel.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      85bd0ba1
  2. 17 4月, 2018 2 次提交
    • A
      KVM: arm/arm64: vgic: Kick new VCPU on interrupt migration · bf9a4137
      Andre Przywara 提交于
      When vgic_prune_ap_list() finds an interrupt that needs to be migrated
      to a new VCPU, we should notify this VCPU of the pending interrupt,
      since it requires immediate action.
      Kick this VCPU once we have added the new IRQ to the list, but only
      after dropping the locks.
      Reported-by: NStefano Stabellini <sstabellini@kernel.org>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      bf9a4137
    • M
      KVM: arm/arm64: Close VMID generation race · f0cf47d9
      Marc Zyngier 提交于
      Before entering the guest, we check whether our VMID is still
      part of the current generation. In order to avoid taking a lock,
      we start with checking that the generation is still current, and
      only if not current do we take the lock, recheck, and update the
      generation and VMID.
      
      This leaves open a small race: A vcpu can bump up the global
      generation number as well as the VM's, but has not updated
      the VMID itself yet.
      
      At that point another vcpu from the same VM comes in, checks
      the generation (and finds it not needing anything), and jumps
      into the guest. At this point, we end-up with two vcpus belonging
      to the same VM running with two different VMIDs. Eventually, the
      VMID used by the second vcpu will get reassigned, and things will
      really go wrong...
      
      A simple solution would be to drop this initial check, and always take
      the lock. This is likely to cause performance issues. A middle ground
      is to convert the spinlock to a rwlock, and only take the read lock
      on the fast path. If the check fails at that point, drop it and
      acquire the write lock, rechecking the condition.
      
      This ensures that the above scenario doesn't occur.
      
      Cc: stable@vger.kernel.org
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: NShannon Zhao <zhaoshenglong@huawei.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      f0cf47d9
  3. 26 3月, 2018 2 次提交
    • M
      KVM: arm/arm64: vgic-its: Fix potential overrun in vgic_copy_lpi_list · 7d8b44c5
      Marc Zyngier 提交于
      vgic_copy_lpi_list() parses the LPI list and picks LPIs targeting
      a given vcpu. We allocate the array containing the intids before taking
      the lpi_list_lock, which means we can have an array size that is not
      equal to the number of LPIs.
      
      This is particularly obvious when looking at the path coming from
      vgic_enable_lpis, which is not a command, and thus can run in parallel
      with commands:
      
      vcpu 0:                                        vcpu 1:
      vgic_enable_lpis
        its_sync_lpi_pending_table
          vgic_copy_lpi_list
            intids = kmalloc_array(irq_count)
                                                     MAPI(lpi targeting vcpu 0)
            list_for_each_entry(lpi_list_head)
              intids[i++] = irq->intid;
      
      At that stage, we will happily overrun the intids array. Boo. An easy
      fix is is to break once the array is full. The MAPI command will update
      the config anyway, and we won't miss a thing. We also make sure that
      lpi_list_count is read exactly once, so that further updates of that
      value will not affect the array bound check.
      
      Cc: stable@vger.kernel.org
      Fixes: ccb1d791 ("KVM: arm64: vgic-its: Fix pending table sync")
      Reviewed-by: NAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      7d8b44c5
    • M
      KVM: arm/arm64: vgic: Disallow Active+Pending for level interrupts · 67b5b673
      Marc Zyngier 提交于
      It was recently reported that VFIO mediated devices, and anything
      that VFIO exposes as level interrupts, do no strictly follow the
      expected logic of such interrupts as it only lowers the input
      line when the guest has EOId the interrupt at the GIC level, rather
      than when it Acked the interrupt at the device level.
      
      THe GIC's Active+Pending state is fundamentally incompatible with
      this behaviour, as it prevents KVM from observing the EOI, and in
      turn results in VFIO never dropping the line. This results in an
      interrupt storm in the guest, which it really never expected.
      
      As we cannot really change VFIO to follow the strict rules of level
      signalling, let's forbid the A+P state altogether, as it is in the
      end only an optimization. It ensures that we will transition via
      an invalid state, which we can use to notify VFIO of the EOI.
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Tested-by: NEric Auger <eric.auger@redhat.com>
      Tested-by: NShunyong Yang <shunyong.yang@hxt-semitech.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      67b5b673
  4. 19 3月, 2018 22 次提交
  5. 15 3月, 2018 6 次提交
    • M
      kvm: arm/arm64: vgic-v3: Tighten synchronization for guests using v2 on v3 · 27e91ad1
      Marc Zyngier 提交于
      On guest exit, and when using GICv2 on GICv3, we use a dsb(st) to
      force synchronization between the memory-mapped guest view and
      the system-register view that the hypervisor uses.
      
      This is incorrect, as the spec calls out the need for "a DSB whose
      required access type is both loads and stores with any Shareability
      attribute", while we're only synchronizing stores.
      
      We also lack an isb after the dsb to ensure that the latter has
      actually been executed before we start reading stuff from the sysregs.
      
      The fix is pretty easy: turn dsb(st) into dsb(sy), and slap an isb()
      just after.
      
      Cc: stable@vger.kernel.org
      Fixes: f68d2b1b ("arm64: KVM: Implement vgic-v3 save/restore")
      Acked-by: NChristoffer Dall <cdall@kernel.org>
      Reviewed-by: NAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      27e91ad1
    • M
      KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid · 16ca6a60
      Marc Zyngier 提交于
      The vgic code is trying to be clever when injecting GICv2 SGIs,
      and will happily populate LRs with the same interrupt number if
      they come from multiple vcpus (after all, they are distinct
      interrupt sources).
      
      Unfortunately, this is against the letter of the architecture,
      and the GICv2 architecture spec says "Each valid interrupt stored
      in the List registers must have a unique VirtualID for that
      virtual CPU interface.". GICv3 has similar (although slightly
      ambiguous) restrictions.
      
      This results in guests locking up when using GICv2-on-GICv3, for
      example. The obvious fix is to stop trying so hard, and inject
      a single vcpu per SGI per guest entry. After all, pending SGIs
      with multiple source vcpus are pretty rare, and are mostly seen
      in scenario where the physical CPUs are severely overcomitted.
      
      But as we now only inject a single instance of a multi-source SGI per
      vcpu entry, we may delay those interrupts for longer than strictly
      necessary, and run the risk of injecting lower priority interrupts
      in the meantime.
      
      In order to address this, we adopt a three stage strategy:
      - If we encounter a multi-source SGI in the AP list while computing
        its depth, we force the list to be sorted
      - When populating the LRs, we prevent the injection of any interrupt
        of lower priority than that of the first multi-source SGI we've
        injected.
      - Finally, the injection of a multi-source SGI triggers the request
        of a maintenance interrupt when there will be no pending interrupt
        in the LRs (HCR_NPIE).
      
      At the point where the last pending interrupt in the LRs switches
      from Pending to Active, the maintenance interrupt will be delivered,
      allowing us to add the remaining SGIs using the same process.
      
      Cc: stable@vger.kernel.org
      Fixes: 0919e84c ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
      Acked-by: NChristoffer Dall <cdall@kernel.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      16ca6a60
    • A
      KVM: arm/arm64: Reduce verbosity of KVM init log · 76600428
      Ard Biesheuvel 提交于
      On my GICv3 system, the following is printed to the kernel log at boot:
      
         kvm [1]: 8-bit VMID
         kvm [1]: IDMAP page: d20e35000
         kvm [1]: HYP VA range: 800000000000:ffffffffffff
         kvm [1]: vgic-v2@2c020000
         kvm [1]: GIC system register CPU interface enabled
         kvm [1]: vgic interrupt IRQ1
         kvm [1]: virtual timer IRQ4
         kvm [1]: Hyp mode initialized successfully
      
      The KVM IDMAP is a mapping of a statically allocated kernel structure,
      and so printing its physical address leaks the physical placement of
      the kernel when physical KASLR in effect. So change the kvm_info() to
      kvm_debug() to remove it from the log output.
      
      While at it, trim the output a bit more: IRQ numbers can be found in
      /proc/interrupts, and the HYP VA and vgic-v2 lines are not highly
      informational either.
      
      Cc: <stable@vger.kernel.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NChristoffer Dall <cdall@kernel.org>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      76600428
    • C
      KVM: arm/arm64: Reset mapped IRQs on VM reset · 413aa807
      Christoffer Dall 提交于
      We currently don't allow resetting mapped IRQs from userspace, because
      their state is controlled by the hardware.  But we do need to reset the
      state when the VM is reset, so we provide a function for the 'owner' of
      the mapped interrupt to reset the interrupt state.
      
      Currently only the timer uses mapped interrupts, so we call this
      function from the timer reset logic.
      
      Cc: stable@vger.kernel.org
      Fixes: 4c60e360 ("KVM: arm/arm64: Provide a get_input_level for the arch timer")
      Signed-off-by: NChristoffer Dall <cdall@kernel.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      413aa807
    • C
      KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN · e21a4f3a
      Christoffer Dall 提交于
      Calling vcpu_load() registers preempt notifiers for this vcpu and calls
      kvm_arch_vcpu_load().  The latter will soon be doing a lot of heavy
      lifting on arm/arm64 and will try to do things such as enabling the
      virtual timer and setting us up to handle interrupts from the timer
      hardware.
      
      Loading state onto hardware registers and enabling hardware to signal
      interrupts can be problematic when we're not actually about to run the
      VCPU, because it makes it difficult to establish the right context when
      handling interrupts from the timer, and it makes the register access
      code difficult to reason about.
      
      Luckily, now when we call vcpu_load in each ioctl implementation, we can
      simply remove the call from the non-KVM_RUN vcpu ioctls, and our
      kvm_arch_vcpu_load() is only used for loading vcpu content to the
      physical CPU when we're actually going to run the vcpu.
      
      Cc: stable@vger.kernel.org
      Fixes: 9b062471 ("KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl")
      Reviewed-by: NJulien Grall <julien.grall@arm.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      e21a4f3a
    • A
      KVM: arm/arm64: vgic: Add missing irq_lock to vgic_mmio_read_pending · 62b06f8f
      Andre Przywara 提交于
      Our irq_is_pending() helper function accesses multiple members of the
      vgic_irq struct, so we need to hold the lock when calling it.
      Add that requirement as a comment to the definition and take the lock
      around the call in vgic_mmio_read_pending(), where we were missing it
      before.
      
      Fixes: 96b29800 ("KVM: arm/arm64: vgic-new: Add PENDING registers handlers")
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      62b06f8f
  6. 26 2月, 2018 1 次提交
  7. 16 2月, 2018 1 次提交
    • C
      KVM: arm/arm64: Fix arch timers with userspace irqchips · d60d8b64
      Christoffer Dall 提交于
      When introducing support for irqchip in userspace we needed a way to
      mask the timer signal to prevent the guest continuously exiting due to a
      screaming timer.
      
      We did this by disabling the corresponding percpu interrupt on the
      host interrupt controller, because we cannot rely on the host system
      having a GIC, and therefore cannot make any assumptions about having an
      active state to hide the timer signal.
      
      Unfortunately, when introducing this feature, it became entirely
      possible that a VCPU which belongs to a VM that has a userspace irqchip
      can disable the vtimer irq on the host on some physical CPU, and then go
      away without ever enabling the vtimer irq on that physical CPU again.
      
      This means that using irqchips in userspace on a system that also
      supports running VMs with an in-kernel GIC can prevent forward progress
      from in-kernel GIC VMs.
      
      Later on, when we started taking virtual timer interrupts in the arch
      timer code, we would also leave this timer state active for userspace
      irqchip VMs, because we leave it up to a VGIC-enabled guest to
      deactivate the hardware IRQ using the HW bit in the LR.
      
      Both issues are solved by only using the enable/disable trick on systems
      that do not have a host GIC which supports the active state, because all
      VMs on such systems must use irqchips in userspace.  Systems that have a
      working GIC with support for an active state use the active state to
      mask the timer signal for both userspace and in-kernel irqchips.
      
      Cc: Alexander Graf <agraf@suse.de>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: <stable@vger.kernel.org> # v4.12+
      Fixes: d9e13977 ("KVM: arm/arm64: Support arch timers with a userspace gic")
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      d60d8b64
  8. 07 2月, 2018 5 次提交