1. 08 2月, 2017 8 次提交
  2. 01 2月, 2017 1 次提交
  3. 13 1月, 2017 2 次提交
    • J
      KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems · 488f94d7
      Jintack Lim 提交于
      Current KVM world switch code is unintentionally setting wrong bits to
      CNTHCTL_EL2 when E2H == 1, which may allow guest OS to access physical
      timer.  Bit positions of CNTHCTL_EL2 are changing depending on
      HCR_EL2.E2H bit.  EL1PCEN and EL1PCTEN are 1st and 0th bits when E2H is
      not set, but they are 11th and 10th bits respectively when E2H is set.
      
      In fact, on VHE we only need to set those bits once, not for every world
      switch. This is because the host kernel runs in EL2 with HCR_EL2.TGE ==
      1, which makes those bits have no effect for the host kernel execution.
      So we just set those bits once for guests, and that's it.
      Signed-off-by: NJintack Lim <jintack@cs.columbia.edu>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      488f94d7
    • C
      KVM: arm/arm64: Fix occasional warning from the timer work function · 63e41226
      Christoffer Dall 提交于
      When a VCPU blocks (WFI) and has programmed the vtimer, we program a
      soft timer to expire in the future to wake up the vcpu thread when
      appropriate.  Because such as wake up involves a vcpu kick, and the
      timer expire function can get called from interrupt context, and the
      kick may sleep, we have to schedule the kick in the work function.
      
      The work function currently has a warning that gets raised if it turns
      out that the timer shouldn't fire when it's run, which was added because
      the idea was that in that case the work should never have been cancelled.
      
      However, it turns out that this whole thing is racy and we can get
      spurious warnings.  The problem is that we clear the armed flag in the
      work function, which may run in parallel with the
      kvm_timer_unschedule->timer_disarm() call.  This results in a possible
      situation where the timer_disarm() call does not call
      cancel_work_sync(), which effectively synchronizes the completion of the
      work function with running the VCPU.  As a result, the VCPU thread
      proceeds before the work function completees, causing changes to the
      timer state such that kvm_timer_should_fire(vcpu) returns false in the
      work function.
      
      All we do in the work function is to kick the VCPU, and an occasional
      rare extra kick never harmed anyone.  Since the race above is extremely
      rare, we don't bother checking if the race happens but simply remove the
      check and the clearing of the armed flag from the work function.
      Reported-by: NMatthias Brugger <mbrugger@suse.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      63e41226
  4. 25 12月, 2016 2 次提交
  5. 09 12月, 2016 1 次提交
  6. 15 11月, 2016 1 次提交
  7. 08 9月, 2016 2 次提交
    • P
      KVM: ARM: cleanup kvm_timer_hyp_init · 5d947a14
      Paolo Bonzini 提交于
      Remove two unnecessary labels now that kvm_timer_hyp_init is not
      creating its own workqueue anymore.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      5d947a14
    • B
      KVM: Remove deprecated create_singlethread_workqueue · 3706feac
      Bhaktipriya Shridhar 提交于
      The workqueue "irqfd_cleanup_wq" queues a single work item
      &irqfd->shutdown and hence doesn't require ordering. It is a host-wide
      workqueue for issuing deferred shutdown requests aggregated from all
      vm* instances. It is not being used on a memory reclaim path.
      Hence, it has been converted to use system_wq.
      The work item has been flushed in kvm_irqfd_release().
      
      The workqueue "wqueue" queues a single work item &timer->expired
      and hence doesn't require ordering. Also, it is not being used on
      a memory reclaim path. Hence, it has been converted to use system_wq.
      
      System workqueues have been able to handle high level of concurrency
      for a long time now and hence it's not required to have a singlethreaded
      workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
      created with create_singlethread_workqueue(), system_wq allows multiple
      work items to overlap executions even on the same CPU; however, a
      per-cpu workqueue doesn't have any CPU locality or global ordering
      guarantee unless the target CPU is explicitly specified and thus the
      increase of local concurrency shouldn't make any difference.
      Signed-off-by: NBhaktipriya Shridhar <bhaktipriya96@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3706feac
  8. 17 8月, 2016 1 次提交
  9. 15 7月, 2016 1 次提交
  10. 20 5月, 2016 7 次提交
  11. 03 5月, 2016 1 次提交
  12. 06 4月, 2016 1 次提交
    • M
      KVM: arm/arm64: Handle forward time correction gracefully · 1c5631c7
      Marc Zyngier 提交于
      On a host that runs NTP, corrections can have a direct impact on
      the background timer that we program on the behalf of a vcpu.
      
      In particular, NTP performing a forward correction will result in
      a timer expiring sooner than expected from a guest point of view.
      Not a big deal, we kick the vcpu anyway.
      
      But on wake-up, the vcpu thread is going to perform a check to
      find out whether or not it should block. And at that point, the
      timer check is going to say "timer has not expired yet, go back
      to sleep". This results in the timer event being lost forever.
      
      There are multiple ways to handle this. One would be record that
      the timer has expired and let kvm_cpu_has_pending_timer return
      true in that case, but that would be fairly invasive. Another is
      to check for the "short sleep" condition in the hrtimer callback,
      and restart the timer for the remaining time when the condition
      is detected.
      
      This patch implements the latter, with a bit of refactoring in
      order to avoid too much code duplication.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: NAlexander Graf <agraf@suse.de>
      Reviewed-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      1c5631c7
  13. 03 3月, 2016 1 次提交
  14. 01 3月, 2016 1 次提交
    • M
      KVM: arm/arm64: timer: Add active state caching · 9b4a3004
      Marc Zyngier 提交于
      Programming the active state in the (re)distributor can be an
      expensive operation so it makes some sense to try and reduce
      the number of accesses as much as possible. So far, we
      program the active state on each VM entry, but there is some
      opportunity to do less.
      
      An obvious solution is to cache the active state in memory,
      and only program it in the HW when conditions change. But
      because the HW can also change things under our feet (the active
      state can transition from 1 to 0 when the guest does an EOI),
      some precautions have to be taken, which amount to only caching
      an "inactive" state, and always programing it otherwise.
      
      With this in place, we observe a reduction of around 700 cycles
      on a 2GHz GICv2 platform for a NULL hypercall.
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      9b4a3004
  15. 08 2月, 2016 1 次提交
    • A
      KVM: arm/arm64: Fix reference to uninitialised VGIC · b3aff6cc
      Andre Przywara 提交于
      Commit 4b4b4512 ("arm/arm64: KVM: Rework the arch timer to use
      level-triggered semantics") brought the virtual architected timer
      closer to the VGIC. There is one occasion were we don't properly
      check for the VGIC actually having been initialized before, but
      instead go on to check the active state of some IRQ number.
      If userland hasn't instantiated a virtual GIC, we end up with a
      kernel NULL pointer dereference:
      =========
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      pgd = ffffffc9745c5000
      [00000000] *pgd=00000009f631e003, *pud=00000009f631e003, *pmd=0000000000000000
      Internal error: Oops: 96000006 [#2] PREEMPT SMP
      Modules linked in:
      CPU: 0 PID: 2144 Comm: kvm_simplest-ar Tainted: G      D 4.5.0-rc2+ #1300
      Hardware name: ARM Juno development board (r1) (DT)
      task: ffffffc976da8000 ti: ffffffc976e28000 task.ti: ffffffc976e28000
      PC is at vgic_bitmap_get_irq_val+0x78/0x90
      LR is at kvm_vgic_map_is_active+0xac/0xc8
      pc : [<ffffffc0000b7e28>] lr : [<ffffffc0000b972c>] pstate: 20000145
      ....
      =========
      
      Fix this by bailing out early of kvm_timer_flush_hwstate() if we don't
      have a VGIC at all.
      Reported-by: NCosmin Gorgovan <cosmin@linux-geek.org>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: <stable@vger.kernel.org> # 4.4.x
      b3aff6cc
  16. 27 1月, 2016 1 次提交
  17. 25 11月, 2015 1 次提交
    • C
      KVM: arm/arm64: arch_timer: Preserve physical dist. active state on LR.active · 0e3dfda9
      Christoffer Dall 提交于
      We were incorrectly removing the active state from the physical
      distributor on the timer interrupt when the timer output level was
      deasserted.  We shouldn't be doing this without considering the virtual
      interrupt's active state, because the architecture requires that when an
      LR has the HW bit set and the pending or active bits set, then the
      physical interrupt must also have the corresponding bits set.
      
      This addresses an issue where we have been observing an inconsistency
      between the LR state and the physical distributor state where the LR
      state was active and the physical distributor was not active, which
      shouldn't happen.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      0e3dfda9
  18. 23 10月, 2015 3 次提交
    • C
      arm/arm64: KVM: Add tracepoints for vgic and timer · e21f0910
      Christoffer Dall 提交于
      The VGIC and timer code for KVM arm/arm64 doesn't have any tracepoints
      or tracepoint infrastructure defined.  Rewriting some of the timer code
      handling showed me how much we need this, so let's add these simple
      trace points once and for all and we can easily expand with additional
      trace points in these files as we go along.
      
      Cc: Wei Huang <wei@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      e21f0910
    • C
      arm/arm64: KVM: Rework the arch timer to use level-triggered semantics · 4b4b4512
      Christoffer Dall 提交于
      The arch timer currently uses edge-triggered semantics in the sense that
      the line is never sampled by the vgic and lowering the line from the
      timer to the vgic doesn't have any effect on the pending state of
      virtual interrupts in the vgic.  This means that we do not support a
      guest with the otherwise valid behavior of (1) disable interrupts (2)
      enable the timer (3) disable the timer (4) enable interrupts.  Such a
      guest would validly not expect to see any interrupts on real hardware,
      but will see interrupts on KVM.
      
      This patch fixes this shortcoming through the following series of
      changes.
      
      First, we change the flow of the timer/vgic sync/flush operations.  Now
      the timer is always flushed/synced before the vgic, because the vgic
      samples the state of the timer output.  This has the implication that we
      move the timer operations in to non-preempible sections, but that is
      fine after the previous commit getting rid of hrtimer schedules on every
      entry/exit.
      
      Second, we change the internal behavior of the timer, letting the timer
      keep track of its previous output state, and only lower/raise the line
      to the vgic when the state changes.  Note that in theory this could have
      been accomplished more simply by signalling the vgic every time the
      state *potentially* changed, but we don't want to be hitting the vgic
      more often than necessary.
      
      Third, we get rid of the use of the map->active field in the vgic and
      instead simply set the interrupt as active on the physical distributor
      whenever the input to the GIC is asserted and conversely clear the
      physical active state when the input to the GIC is deasserted.
      
      Fourth, and finally, we now initialize the timer PPIs (and all the other
      unused PPIs for now), to be level-triggered, and modify the sync code to
      sample the line state on HW sync and re-inject a new interrupt if it is
      still pending at that time.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      4b4b4512
    • C
      arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block · d35268da
      Christoffer Dall 提交于
      We currently schedule a soft timer every time we exit the guest if the
      timer did not expire while running the guest.  This is really not
      necessary, because the only work we do in the timer work function is to
      kick the vcpu.
      
      Kicking the vcpu does two things:
      (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
      from the waitqueue.
      (2) If the vcpu is running on a different physical CPU from the one
      doing the kick, it sends a reschedule IPI.
      
      The second case cannot happen, because the soft timer is only ever
      scheduled when the vcpu is not running.  The first case is only relevant
      when the vcpu thread is on a waitqueue, which is only the case when the
      vcpu thread has called kvm_vcpu_block().
      
      Therefore, we only need to make sure a timer is scheduled for
      kvm_vcpu_block(), which we do by encapsulating all calls to
      kvm_vcpu_block() with kvm_timer_{un}schedule calls.
      
      Additionally, we only schedule a soft timer if the timer is enabled and
      unmasked, since it is useless otherwise.
      
      Note that theoretically userspace can use the SET_ONE_REG interface to
      change registers that should cause the timer to fire, even if the vcpu
      is blocked without a scheduled timer, but this case was not supported
      before this patch and we leave it for future work for now.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      d35268da
  19. 21 10月, 2015 1 次提交
    • C
      arm/arm64: KVM: Fix arch timer behavior for disabled interrupts · cff9211e
      Christoffer Dall 提交于
      We have an interesting issue when the guest disables the timer interrupt
      on the VGIC, which happens when turning VCPUs off using PSCI, for
      example.
      
      The problem is that because the guest disables the virtual interrupt at
      the VGIC level, we never inject interrupts to the guest and therefore
      never mark the interrupt as active on the physical distributor.  The
      host also never takes the timer interrupt (we only use the timer device
      to trigger a guest exit and everything else is done in software), so the
      interrupt does not become active through normal means.
      
      The result is that we keep entering the guest with a programmed timer
      that will always fire as soon as we context switch the hardware timer
      state and run the guest, preventing forward progress for the VCPU.
      
      Since the active state on the physical distributor is really part of the
      timer logic, it is the job of our virtual arch timer driver to manage
      this state.
      
      The timer->map->active boolean field indicates whether we have signalled
      this interrupt to the vgic and if that interrupt is still pending or
      active.  As long as that is the case, the hardware doesn't have to
      generate physical interrupts and therefore we mark the interrupt as
      active on the physical distributor.
      
      We also have to restore the pending state of an interrupt that was
      queued to an LR but was retired from the LR for some reason, while
      remaining pending in the LR.
      
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Reported-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      cff9211e
  20. 04 9月, 2015 1 次提交
  21. 12 8月, 2015 1 次提交
  22. 14 3月, 2015 1 次提交
    • C
      arm/arm64: KVM: Fix migration race in the arch timer · 1a748478
      Christoffer Dall 提交于
      When a VCPU is no longer running, we currently check to see if it has a
      timer scheduled in the future, and if it does, we schedule a host
      hrtimer to notify is in case the timer expires while the VCPU is still
      not running.  When the hrtimer fires, we mask the guest's timer and
      inject the timer IRQ (still relying on the guest unmasking the time when
      it receives the IRQ).
      
      This is all good and fine, but when migration a VM (checkpoint/restore)
      this introduces a race.  It is unlikely, but possible, for the following
      sequence of events to happen:
      
       1. Userspace stops the VM
       2. Hrtimer for VCPU is scheduled
       3. Userspace checkpoints the VGIC state (no pending timer interrupts)
       4. The hrtimer fires, schedules work in a workqueue
       5. Workqueue function runs, masks the timer and injects timer interrupt
       6. Userspace checkpoints the timer state (timer masked)
      
      At restore time, you end up with a masked timer without any timer
      interrupts and your guest halts never receiving timer interrupts.
      
      Fix this by only kicking the VCPU in the workqueue function, and sample
      the expired state of the timer when entering the guest again and inject
      the interrupt and mask the timer only then.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      1a748478