1. 09 4月, 2017 3 次提交
    • C
      KVM: arm/arm64: Report PMU overflow interrupts to userspace irqchip · 3dbbdf78
      Christoffer Dall 提交于
      When not using an in-kernel VGIC, but instead emulating an interrupt
      controller in userspace, we should report the PMU overflow status to
      that userspace interrupt controller using the KVM_CAP_ARM_USER_IRQ
      feature.
      Reviewed-by: NAlexander Graf <agraf@suse.de>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      3dbbdf78
    • A
      KVM: arm/arm64: Support arch timers with a userspace gic · d9e13977
      Alexander Graf 提交于
      If you're running with a userspace gic or other interrupt controller
      (that is no vgic in the kernel), then you have so far not been able to
      use the architected timers, because the output of the architected
      timers, which are driven inside the kernel, was a kernel-only construct
      between the arch timer code and the vgic.
      
      This patch implements the new KVM_CAP_ARM_USER_IRQ feature, where we use a
      side channel on the kvm_run structure, run->s.regs.device_irq_level, to
      always notify userspace of the timer output levels when using a userspace
      irqchip.
      
      This works by ensuring that before we enter the guest, if the timer
      output level has changed compared to what we last told userspace, we
      don't enter the guest, but instead return to userspace to notify it of
      the new level.  If we are exiting, because of an MMIO for example, and
      the level changed at the same time, the value is also updated and
      userspace can sample the line as it needs.  This is nicely achieved
      simply always updating the timer_irq_level field after the main run
      loop.
      
      Note that the kvm_timer_update_irq trace event is changed to show the
      host IRQ number for the timer instead of the guest IRQ number, because
      the kernel no longer know which IRQ userspace wires up the timer signal
      to.
      
      Also note that this patch implements all required functionality but does
      not yet advertise the capability.
      Reviewed-by: NAlexander Graf <agraf@suse.de>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      d9e13977
    • C
      KVM: arm/arm64: Cleanup the arch timer code's irqchip checking · b22e7df2
      Christoffer Dall 提交于
      Currently we check if we have an in-kernel irqchip and if the vgic was
      properly implemented several places in the arch timer code.  But, we
      already predicate our enablement of the arm timers on having a valid
      and initialized gic, so we can simply check if the timers are enabled or
      not.
      
      This also gets rid of the ugly "error that's not an error but used to
      signal that the timer shouldn't poke the gic" construct we have.
      Reviewed-by: NAlexander Graf <agraf@suse.de>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      b22e7df2
  2. 08 2月, 2017 8 次提交
  3. 01 2月, 2017 1 次提交
  4. 13 1月, 2017 2 次提交
    • J
      KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems · 488f94d7
      Jintack Lim 提交于
      Current KVM world switch code is unintentionally setting wrong bits to
      CNTHCTL_EL2 when E2H == 1, which may allow guest OS to access physical
      timer.  Bit positions of CNTHCTL_EL2 are changing depending on
      HCR_EL2.E2H bit.  EL1PCEN and EL1PCTEN are 1st and 0th bits when E2H is
      not set, but they are 11th and 10th bits respectively when E2H is set.
      
      In fact, on VHE we only need to set those bits once, not for every world
      switch. This is because the host kernel runs in EL2 with HCR_EL2.TGE ==
      1, which makes those bits have no effect for the host kernel execution.
      So we just set those bits once for guests, and that's it.
      Signed-off-by: NJintack Lim <jintack@cs.columbia.edu>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      488f94d7
    • C
      KVM: arm/arm64: Fix occasional warning from the timer work function · 63e41226
      Christoffer Dall 提交于
      When a VCPU blocks (WFI) and has programmed the vtimer, we program a
      soft timer to expire in the future to wake up the vcpu thread when
      appropriate.  Because such as wake up involves a vcpu kick, and the
      timer expire function can get called from interrupt context, and the
      kick may sleep, we have to schedule the kick in the work function.
      
      The work function currently has a warning that gets raised if it turns
      out that the timer shouldn't fire when it's run, which was added because
      the idea was that in that case the work should never have been cancelled.
      
      However, it turns out that this whole thing is racy and we can get
      spurious warnings.  The problem is that we clear the armed flag in the
      work function, which may run in parallel with the
      kvm_timer_unschedule->timer_disarm() call.  This results in a possible
      situation where the timer_disarm() call does not call
      cancel_work_sync(), which effectively synchronizes the completion of the
      work function with running the VCPU.  As a result, the VCPU thread
      proceeds before the work function completees, causing changes to the
      timer state such that kvm_timer_should_fire(vcpu) returns false in the
      work function.
      
      All we do in the work function is to kick the VCPU, and an occasional
      rare extra kick never harmed anyone.  Since the race above is extremely
      rare, we don't bother checking if the race happens but simply remove the
      check and the clearing of the armed flag from the work function.
      Reported-by: NMatthias Brugger <mbrugger@suse.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      63e41226
  5. 25 12月, 2016 2 次提交
  6. 09 12月, 2016 1 次提交
  7. 15 11月, 2016 1 次提交
  8. 08 9月, 2016 2 次提交
    • P
      KVM: ARM: cleanup kvm_timer_hyp_init · 5d947a14
      Paolo Bonzini 提交于
      Remove two unnecessary labels now that kvm_timer_hyp_init is not
      creating its own workqueue anymore.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      5d947a14
    • B
      KVM: Remove deprecated create_singlethread_workqueue · 3706feac
      Bhaktipriya Shridhar 提交于
      The workqueue "irqfd_cleanup_wq" queues a single work item
      &irqfd->shutdown and hence doesn't require ordering. It is a host-wide
      workqueue for issuing deferred shutdown requests aggregated from all
      vm* instances. It is not being used on a memory reclaim path.
      Hence, it has been converted to use system_wq.
      The work item has been flushed in kvm_irqfd_release().
      
      The workqueue "wqueue" queues a single work item &timer->expired
      and hence doesn't require ordering. Also, it is not being used on
      a memory reclaim path. Hence, it has been converted to use system_wq.
      
      System workqueues have been able to handle high level of concurrency
      for a long time now and hence it's not required to have a singlethreaded
      workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
      created with create_singlethread_workqueue(), system_wq allows multiple
      work items to overlap executions even on the same CPU; however, a
      per-cpu workqueue doesn't have any CPU locality or global ordering
      guarantee unless the target CPU is explicitly specified and thus the
      increase of local concurrency shouldn't make any difference.
      Signed-off-by: NBhaktipriya Shridhar <bhaktipriya96@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3706feac
  9. 17 8月, 2016 1 次提交
  10. 15 7月, 2016 1 次提交
  11. 20 5月, 2016 7 次提交
  12. 03 5月, 2016 1 次提交
  13. 06 4月, 2016 1 次提交
    • M
      KVM: arm/arm64: Handle forward time correction gracefully · 1c5631c7
      Marc Zyngier 提交于
      On a host that runs NTP, corrections can have a direct impact on
      the background timer that we program on the behalf of a vcpu.
      
      In particular, NTP performing a forward correction will result in
      a timer expiring sooner than expected from a guest point of view.
      Not a big deal, we kick the vcpu anyway.
      
      But on wake-up, the vcpu thread is going to perform a check to
      find out whether or not it should block. And at that point, the
      timer check is going to say "timer has not expired yet, go back
      to sleep". This results in the timer event being lost forever.
      
      There are multiple ways to handle this. One would be record that
      the timer has expired and let kvm_cpu_has_pending_timer return
      true in that case, but that would be fairly invasive. Another is
      to check for the "short sleep" condition in the hrtimer callback,
      and restart the timer for the remaining time when the condition
      is detected.
      
      This patch implements the latter, with a bit of refactoring in
      order to avoid too much code duplication.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: NAlexander Graf <agraf@suse.de>
      Reviewed-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      1c5631c7
  14. 03 3月, 2016 1 次提交
  15. 01 3月, 2016 1 次提交
    • M
      KVM: arm/arm64: timer: Add active state caching · 9b4a3004
      Marc Zyngier 提交于
      Programming the active state in the (re)distributor can be an
      expensive operation so it makes some sense to try and reduce
      the number of accesses as much as possible. So far, we
      program the active state on each VM entry, but there is some
      opportunity to do less.
      
      An obvious solution is to cache the active state in memory,
      and only program it in the HW when conditions change. But
      because the HW can also change things under our feet (the active
      state can transition from 1 to 0 when the guest does an EOI),
      some precautions have to be taken, which amount to only caching
      an "inactive" state, and always programing it otherwise.
      
      With this in place, we observe a reduction of around 700 cycles
      on a 2GHz GICv2 platform for a NULL hypercall.
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      9b4a3004
  16. 08 2月, 2016 1 次提交
    • A
      KVM: arm/arm64: Fix reference to uninitialised VGIC · b3aff6cc
      Andre Przywara 提交于
      Commit 4b4b4512 ("arm/arm64: KVM: Rework the arch timer to use
      level-triggered semantics") brought the virtual architected timer
      closer to the VGIC. There is one occasion were we don't properly
      check for the VGIC actually having been initialized before, but
      instead go on to check the active state of some IRQ number.
      If userland hasn't instantiated a virtual GIC, we end up with a
      kernel NULL pointer dereference:
      =========
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      pgd = ffffffc9745c5000
      [00000000] *pgd=00000009f631e003, *pud=00000009f631e003, *pmd=0000000000000000
      Internal error: Oops: 96000006 [#2] PREEMPT SMP
      Modules linked in:
      CPU: 0 PID: 2144 Comm: kvm_simplest-ar Tainted: G      D 4.5.0-rc2+ #1300
      Hardware name: ARM Juno development board (r1) (DT)
      task: ffffffc976da8000 ti: ffffffc976e28000 task.ti: ffffffc976e28000
      PC is at vgic_bitmap_get_irq_val+0x78/0x90
      LR is at kvm_vgic_map_is_active+0xac/0xc8
      pc : [<ffffffc0000b7e28>] lr : [<ffffffc0000b972c>] pstate: 20000145
      ....
      =========
      
      Fix this by bailing out early of kvm_timer_flush_hwstate() if we don't
      have a VGIC at all.
      Reported-by: NCosmin Gorgovan <cosmin@linux-geek.org>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: <stable@vger.kernel.org> # 4.4.x
      b3aff6cc
  17. 27 1月, 2016 1 次提交
  18. 25 11月, 2015 1 次提交
    • C
      KVM: arm/arm64: arch_timer: Preserve physical dist. active state on LR.active · 0e3dfda9
      Christoffer Dall 提交于
      We were incorrectly removing the active state from the physical
      distributor on the timer interrupt when the timer output level was
      deasserted.  We shouldn't be doing this without considering the virtual
      interrupt's active state, because the architecture requires that when an
      LR has the HW bit set and the pending or active bits set, then the
      physical interrupt must also have the corresponding bits set.
      
      This addresses an issue where we have been observing an inconsistency
      between the LR state and the physical distributor state where the LR
      state was active and the physical distributor was not active, which
      shouldn't happen.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      0e3dfda9
  19. 23 10月, 2015 3 次提交
    • C
      arm/arm64: KVM: Add tracepoints for vgic and timer · e21f0910
      Christoffer Dall 提交于
      The VGIC and timer code for KVM arm/arm64 doesn't have any tracepoints
      or tracepoint infrastructure defined.  Rewriting some of the timer code
      handling showed me how much we need this, so let's add these simple
      trace points once and for all and we can easily expand with additional
      trace points in these files as we go along.
      
      Cc: Wei Huang <wei@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      e21f0910
    • C
      arm/arm64: KVM: Rework the arch timer to use level-triggered semantics · 4b4b4512
      Christoffer Dall 提交于
      The arch timer currently uses edge-triggered semantics in the sense that
      the line is never sampled by the vgic and lowering the line from the
      timer to the vgic doesn't have any effect on the pending state of
      virtual interrupts in the vgic.  This means that we do not support a
      guest with the otherwise valid behavior of (1) disable interrupts (2)
      enable the timer (3) disable the timer (4) enable interrupts.  Such a
      guest would validly not expect to see any interrupts on real hardware,
      but will see interrupts on KVM.
      
      This patch fixes this shortcoming through the following series of
      changes.
      
      First, we change the flow of the timer/vgic sync/flush operations.  Now
      the timer is always flushed/synced before the vgic, because the vgic
      samples the state of the timer output.  This has the implication that we
      move the timer operations in to non-preempible sections, but that is
      fine after the previous commit getting rid of hrtimer schedules on every
      entry/exit.
      
      Second, we change the internal behavior of the timer, letting the timer
      keep track of its previous output state, and only lower/raise the line
      to the vgic when the state changes.  Note that in theory this could have
      been accomplished more simply by signalling the vgic every time the
      state *potentially* changed, but we don't want to be hitting the vgic
      more often than necessary.
      
      Third, we get rid of the use of the map->active field in the vgic and
      instead simply set the interrupt as active on the physical distributor
      whenever the input to the GIC is asserted and conversely clear the
      physical active state when the input to the GIC is deasserted.
      
      Fourth, and finally, we now initialize the timer PPIs (and all the other
      unused PPIs for now), to be level-triggered, and modify the sync code to
      sample the line state on HW sync and re-inject a new interrupt if it is
      still pending at that time.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      4b4b4512
    • C
      arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block · d35268da
      Christoffer Dall 提交于
      We currently schedule a soft timer every time we exit the guest if the
      timer did not expire while running the guest.  This is really not
      necessary, because the only work we do in the timer work function is to
      kick the vcpu.
      
      Kicking the vcpu does two things:
      (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
      from the waitqueue.
      (2) If the vcpu is running on a different physical CPU from the one
      doing the kick, it sends a reschedule IPI.
      
      The second case cannot happen, because the soft timer is only ever
      scheduled when the vcpu is not running.  The first case is only relevant
      when the vcpu thread is on a waitqueue, which is only the case when the
      vcpu thread has called kvm_vcpu_block().
      
      Therefore, we only need to make sure a timer is scheduled for
      kvm_vcpu_block(), which we do by encapsulating all calls to
      kvm_vcpu_block() with kvm_timer_{un}schedule calls.
      
      Additionally, we only schedule a soft timer if the timer is enabled and
      unmasked, since it is useless otherwise.
      
      Note that theoretically userspace can use the SET_ONE_REG interface to
      change registers that should cause the timer to fire, even if the vcpu
      is blocked without a scheduled timer, but this case was not supported
      before this patch and we leave it for future work for now.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      d35268da
  20. 21 10月, 2015 1 次提交
    • C
      arm/arm64: KVM: Fix arch timer behavior for disabled interrupts · cff9211e
      Christoffer Dall 提交于
      We have an interesting issue when the guest disables the timer interrupt
      on the VGIC, which happens when turning VCPUs off using PSCI, for
      example.
      
      The problem is that because the guest disables the virtual interrupt at
      the VGIC level, we never inject interrupts to the guest and therefore
      never mark the interrupt as active on the physical distributor.  The
      host also never takes the timer interrupt (we only use the timer device
      to trigger a guest exit and everything else is done in software), so the
      interrupt does not become active through normal means.
      
      The result is that we keep entering the guest with a programmed timer
      that will always fire as soon as we context switch the hardware timer
      state and run the guest, preventing forward progress for the VCPU.
      
      Since the active state on the physical distributor is really part of the
      timer logic, it is the job of our virtual arch timer driver to manage
      this state.
      
      The timer->map->active boolean field indicates whether we have signalled
      this interrupt to the vgic and if that interrupt is still pending or
      active.  As long as that is the case, the hardware doesn't have to
      generate physical interrupts and therefore we mark the interrupt as
      active on the physical distributor.
      
      We also have to restore the pending state of an interrupt that was
      queued to an LR but was retired from the LR for some reason, while
      remaining pending in the LR.
      
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Reported-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      cff9211e