1. 23 10月, 2015 5 次提交
    • C
      arm/arm64: KVM: Use appropriate define in VGIC reset code · 54723bb3
      Christoffer Dall 提交于
      We currently initialize the SGIs to be enabled in the VGIC code, but we
      use the VGIC_NR_PPIS define for this purpose, instead of the the more
      natural VGIC_NR_SGIS.  Change this slightly confusing use of the
      defines.
      
      Note: This should have no functional change, as both names are defined
      to the number 16.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      54723bb3
    • C
      arm/arm64: KVM: Implement GICD_ICFGR as RO for PPIs · 8bf9a701
      Christoffer Dall 提交于
      The GICD_ICFGR allows the bits for the SGIs and PPIs to be read only.
      We currently simulate this behavior by writing a hardcoded value to the
      register for the SGIs and PPIs on every write of these bits to the
      register (ignoring what the guest actually wrote), and by writing the
      same value as the reset value to the register.
      
      This is a bit counter-intuitive, as the register is RO for these bits,
      and we can just implement it that way, allowing us to control the value
      of the bits purely in the reset code.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      8bf9a701
    • C
      arm/arm64: KVM: vgic: Factor out level irq processing on guest exit · 9103617d
      Christoffer Dall 提交于
      Currently vgic_process_maintenance() processes dealing with a completed
      level-triggered interrupt directly, but we are soon going to reuse this
      logic for level-triggered mapped interrupts with the HW bit set, so
      move this logic into a separate static function.
      
      Probably the most scary part of this commit is convincing yourself that
      the current flow is safe compared to the old one.  In the following I
      try to list the changes and why they are harmless:
      
        Move vgic_irq_clear_queued after kvm_notify_acked_irq:
          Harmless because the only potential effect of clearing the queued
          flag wrt.  kvm_set_irq is that vgic_update_irq_pending does not set
          the pending bit on the emulated CPU interface or in the
          pending_on_cpu bitmask if the function is called with level=1.
          However, the point of kvm_notify_acked_irq is to call kvm_set_irq
          with level=0, and we set the queued flag again in
          __kvm_vgic_sync_hwstate later on if the level is stil high.
      
        Move vgic_set_lr before kvm_notify_acked_irq:
          Also, harmless because the LR are cpu-local operations and
          kvm_notify_acked only affects the dist
      
        Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
          Also harmless, because now we check the level state in the
          clear_soft_pend function and lower the pending bits if the level is
          low.
      Reviewed-by: NEric Auger <eric.auger@linaro.org>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      9103617d
    • C
      arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block · d35268da
      Christoffer Dall 提交于
      We currently schedule a soft timer every time we exit the guest if the
      timer did not expire while running the guest.  This is really not
      necessary, because the only work we do in the timer work function is to
      kick the vcpu.
      
      Kicking the vcpu does two things:
      (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
      from the waitqueue.
      (2) If the vcpu is running on a different physical CPU from the one
      doing the kick, it sends a reschedule IPI.
      
      The second case cannot happen, because the soft timer is only ever
      scheduled when the vcpu is not running.  The first case is only relevant
      when the vcpu thread is on a waitqueue, which is only the case when the
      vcpu thread has called kvm_vcpu_block().
      
      Therefore, we only need to make sure a timer is scheduled for
      kvm_vcpu_block(), which we do by encapsulating all calls to
      kvm_vcpu_block() with kvm_timer_{un}schedule calls.
      
      Additionally, we only schedule a soft timer if the timer is enabled and
      unmasked, since it is useless otherwise.
      
      Note that theoretically userspace can use the SET_ONE_REG interface to
      change registers that should cause the timer to fire, even if the vcpu
      is blocked without a scheduled timer, but this case was not supported
      before this patch and we leave it for future work for now.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      d35268da
    • C
      KVM: Add kvm_arch_vcpu_{un}blocking callbacks · 3217f7c2
      Christoffer Dall 提交于
      Some times it is useful for architecture implementations of KVM to know
      when the VCPU thread is about to block or when it comes back from
      blocking (arm/arm64 needs to know this to properly implement timers, for
      example).
      
      Therefore provide a generic architecture callback function in line with
      what we do elsewhere for KVM generic-arch interactions.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      3217f7c2
  2. 21 10月, 2015 4 次提交
    • C
      arm/arm64: KVM: Fix disabled distributor operation · 0d997491
      Christoffer Dall 提交于
      We currently do a single update of the vgic state when the distributor
      enable/disable control register is accessed and then bypass updating the
      state for as long as the distributor remains disabled.
      
      This is incorrect, because updating the state does not consider the
      distributor enable bit, and this you can end up in a situation where an
      interrupt is marked as pending on the CPU interface, but not pending on
      the distributor, which is an impossible state to be in, and triggers a
      warning.  Consider for example the following sequence of events:
      
      1. An interrupt is marked as pending on the distributor
         - the interrupt is also forwarded to the CPU interface
      2. The guest turns off the distributor (it's about to do a reboot)
         - we stop updating the CPU interface state from now on
      3. The guest disables the pending interrupt
         - we remove the pending state from the distributor, but don't touch
           the CPU interface, see point 2.
      
      Since the distributor disable bit really means that no interrupts should
      be forwarded to the CPU interface, we modify the code to keep updating
      the internal VGIC state, but always set the CPU interface pending bits
      to zero when the distributor is disabled.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      0d997491
    • C
      arm/arm64: KVM: Clear map->active on pend/active clear · 544c572e
      Christoffer Dall 提交于
      When a guest reboots or offlines/onlines CPUs, it is not uncommon for it
      to clear the pending and active states of an interrupt through the
      emulated VGIC distributor.  However, since the architected timers are
      defined by the architecture to be level triggered and the guest
      rightfully expects them to be that, but we emulate them as
      edge-triggered, we have to mimic level-triggered behavior for an
      edge-triggered virtual implementation.
      
      We currently do not signal the VGIC when the map->active field is true,
      because it indicates that the guest has already been signalled of the
      interrupt as required.  Normally this field is set to false when the
      guest deactivates the virtual interrupt through the sync path.
      
      We also need to catch the case where the guest deactivates the interrupt
      through the emulated distributor, again allowing guests to boot even if
      the original virtual timer signal hit before the guest's GIC
      initialization sequence is run.
      Reviewed-by: NEric Auger <eric.auger@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      544c572e
    • C
      arm/arm64: KVM: Fix arch timer behavior for disabled interrupts · cff9211e
      Christoffer Dall 提交于
      We have an interesting issue when the guest disables the timer interrupt
      on the VGIC, which happens when turning VCPUs off using PSCI, for
      example.
      
      The problem is that because the guest disables the virtual interrupt at
      the VGIC level, we never inject interrupts to the guest and therefore
      never mark the interrupt as active on the physical distributor.  The
      host also never takes the timer interrupt (we only use the timer device
      to trigger a guest exit and everything else is done in software), so the
      interrupt does not become active through normal means.
      
      The result is that we keep entering the guest with a programmed timer
      that will always fire as soon as we context switch the hardware timer
      state and run the guest, preventing forward progress for the VCPU.
      
      Since the active state on the physical distributor is really part of the
      timer logic, it is the job of our virtual arch timer driver to manage
      this state.
      
      The timer->map->active boolean field indicates whether we have signalled
      this interrupt to the vgic and if that interrupt is still pending or
      active.  As long as that is the case, the hardware doesn't have to
      generate physical interrupts and therefore we mark the interrupt as
      active on the physical distributor.
      
      We also have to restore the pending state of an interrupt that was
      queued to an LR but was retired from the LR for some reason, while
      remaining pending in the LR.
      
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Reported-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      cff9211e
    • P
      KVM: arm/arm64: Do not inject spurious interrupts · 437f9963
      Pavel Fedin 提交于
      When lowering a level-triggered line from userspace, we forgot to lower
      the pending bit on the emulated CPU interface and we also did not
      re-compute the pending_on_cpu bitmap for the CPU affected by the change.
      
      Update vgic_update_irq_pending() to fix the two issues above and also
      raise a warning in vgic_quue_irq_to_lr if we encounter an interrupt
      pending on a CPU which is neither marked active nor pending.
      
        [ Commit text reworked completely - Christoffer ]
      Signed-off-by: NPavel Fedin <p.fedin@samsung.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      437f9963
  3. 25 9月, 2015 1 次提交
  4. 17 9月, 2015 1 次提交
    • M
      arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS' · ef748917
      Ming Lei 提交于
      This patch removes config option of KVM_ARM_MAX_VCPUS,
      and like other ARCHs, just choose the maximum allowed
      value from hardware, and follows the reasons:
      
      1) from distribution view, the option has to be
      defined as the max allowed value because it need to
      meet all kinds of virtulization applications and
      need to support most of SoCs;
      
      2) using a bigger value doesn't introduce extra memory
      consumption, and the help text in Kconfig isn't accurate
      because kvm_vpu structure isn't allocated until request
      of creating VCPU is sent from QEMU;
      
      3) the main effect is that the field of vcpus[] in 'struct kvm'
      becomes a bit bigger(sizeof(void *) per vcpu) and need more cache
      lines to hold the structure, but 'struct kvm' is one generic struct,
      and it has worked well on other ARCHs already in this way. Also,
      the world switch frequecy is often low, for example, it is ~2000
      when running kernel building load in VM from APM xgene KVM host,
      so the effect is very small, and the difference can't be observed
      in my test at all.
      
      Cc: Dann Frazier <dann.frazier@canonical.com>
      Signed-off-by: NMing Lei <ming.lei@canonical.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      ef748917
  5. 16 9月, 2015 1 次提交
    • P
      KVM: add halt_attempted_poll to VCPU stats · 62bea5bf
      Paolo Bonzini 提交于
      This new statistic can help diagnosing VCPUs that, for any reason,
      trigger bad behavior of halt_poll_ns autotuning.
      
      For example, say halt_poll_ns = 480000, and wakeups are spaced exactly
      like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes
      10+20+40+80+160+320+480 = 1110 microseconds out of every
      479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then
      is consuming about 30% more CPU than it would use without
      polling.  This would show as an abnormally high number of
      attempted polling compared to the successful polls.
      
      Acked-by: Christian Borntraeger <borntraeger@de.ibm.com<
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      62bea5bf
  6. 15 9月, 2015 5 次提交
    • J
      kvm: fix zero length mmio searching · 8f4216c7
      Jason Wang 提交于
      Currently, if we had a zero length mmio eventfd assigned on
      KVM_MMIO_BUS. It will never be found by kvm_io_bus_cmp() since it
      always compares the kvm_io_range() with the length that guest
      wrote. This will cause e.g for vhost, kick will be trapped by qemu
      userspace instead of vhost. Fixing this by using zero length if an
      iodevice is zero length.
      
      Cc: stable@vger.kernel.org
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8f4216c7
    • J
      kvm: fix double free for fast mmio eventfd · eefd6b06
      Jason Wang 提交于
      We register wildcard mmio eventfd on two buses, once for KVM_MMIO_BUS
      and once on KVM_FAST_MMIO_BUS but with a single iodev
      instance. This will lead to an issue: kvm_io_bus_destroy() knows
      nothing about the devices on two buses pointing to a single dev. Which
      will lead to double free[1] during exit. Fix this by allocating two
      instances of iodevs then registering one on KVM_MMIO_BUS and another
      on KVM_FAST_MMIO_BUS.
      
      CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu
      Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
      task: ffff88009ae0c4b0 ti: ffff88020e7f0000 task.ti: ffff88020e7f0000
      RIP: 0010:[<ffffffffc07e25d8>]  [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm]
      RSP: 0018:ffff88020e7f3bc8  EFLAGS: 00010292
      RAX: dead000000200200 RBX: ffff8801ec19c900 RCX: 000000018200016d
      RDX: ffff8801ec19cf80 RSI: ffffea0008bf1d40 RDI: ffff8801ec19c900
      RBP: ffff88020e7f3bd8 R08: 000000002fc75a01 R09: 000000018200016d
      R10: ffffffffc07df6ae R11: ffff88022fc75a98 R12: ffff88021e7cc000
      R13: ffff88021e7cca48 R14: ffff88021e7cca50 R15: ffff8801ec19c880
      FS:  00007fc1ee3e6700(0000) GS:ffff88023e240000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f8f389d8000 CR3: 000000023dc13000 CR4: 00000000001427e0
      Stack:
      ffff88021e7cc000 0000000000000000 ffff88020e7f3be8 ffffffffc07e2622
      ffff88020e7f3c38 ffffffffc07df69a ffff880232524160 ffff88020e792d80
       0000000000000000 ffff880219b78c00 0000000000000008 ffff8802321686a8
      Call Trace:
      [<ffffffffc07e2622>] ioeventfd_destructor+0x12/0x20 [kvm]
      [<ffffffffc07df69a>] kvm_put_kvm+0xca/0x210 [kvm]
      [<ffffffffc07df818>] kvm_vcpu_release+0x18/0x20 [kvm]
      [<ffffffff811f69f7>] __fput+0xe7/0x250
      [<ffffffff811f6bae>] ____fput+0xe/0x10
      [<ffffffff81093f04>] task_work_run+0xd4/0xf0
      [<ffffffff81079358>] do_exit+0x368/0xa50
      [<ffffffff81082c8f>] ? recalc_sigpending+0x1f/0x60
      [<ffffffff81079ad5>] do_group_exit+0x45/0xb0
      [<ffffffff81085c71>] get_signal+0x291/0x750
      [<ffffffff810144d8>] do_signal+0x28/0xab0
      [<ffffffff810f3a3b>] ? do_futex+0xdb/0x5d0
      [<ffffffff810b7028>] ? __wake_up_locked_key+0x18/0x20
      [<ffffffff810f3fa6>] ? SyS_futex+0x76/0x170
      [<ffffffff81014fc9>] do_notify_resume+0x69/0xb0
      [<ffffffff817cb9af>] int_signal+0x12/0x17
      Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 <48> 89 10 48 b8 00 01 10 00 00
       RIP  [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm]
       RSP <ffff88020e7f3bc8>
      
      Cc: stable@vger.kernel.org
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eefd6b06
    • J
      kvm: factor out core eventfd assign/deassign logic · 85da11ca
      Jason Wang 提交于
      This patch factors out core eventfd assign/deassign logic and leaves
      the argument checking and bus index selection to callers.
      
      Cc: stable@vger.kernel.org
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      85da11ca
    • J
      kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd · 8453fecb
      Jason Wang 提交于
      We only want zero length mmio eventfd to be registered on
      KVM_FAST_MMIO_BUS. So check this explicitly when arg->len is zero to
      make sure this.
      
      Cc: stable@vger.kernel.org
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8453fecb
    • W
      KVM: make the declaration of functions within 80 characters · 81523aac
      Wei Yang 提交于
      After 'commit 0b8ba4a2 ("KVM: fix checkpatch.pl errors in
      kvm/coalesced_mmio.h")', the declaration of the two function will exceed 80
      characters.
      
      This patch reduces the TAPs to make each line in 80 characters.
      Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      81523aac
  7. 14 9月, 2015 1 次提交
  8. 11 9月, 2015 1 次提交
    • V
      mmu-notifier: add clear_young callback · 1d7715c6
      Vladimir Davydov 提交于
      In the scope of the idle memory tracking feature, which is introduced by
      the following patch, we need to clear the referenced/accessed bit not only
      in primary, but also in secondary ptes.  The latter is required in order
      to estimate wss of KVM VMs.  At the same time we want to avoid flushing
      tlb, because it is quite expensive and it won't really affect the final
      result.
      
      Currently, there is no function for clearing pte young bit that would meet
      our requirements, so this patch introduces one.  To achieve that we have
      to add a new mmu-notifier callback, clear_young, since there is no method
      for testing-and-clearing a secondary pte w/o flushing tlb.  The new method
      is not mandatory and currently only implemented by KVM.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Reviewed-by: NAndres Lagar-Cavilla <andreslc@google.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d7715c6
  9. 08 9月, 2015 1 次提交
  10. 06 9月, 2015 3 次提交
    • W
      KVM: trace kvm_halt_poll_ns grow/shrink · 2cbd7824
      Wanpeng Li 提交于
      Tracepoint for dynamic halt_pool_ns, fired on every potential change.
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2cbd7824
    • W
      KVM: dynamic halt-polling · aca6ff29
      Wanpeng Li 提交于
      There is a downside of always-poll since poll is still happened for idle
      vCPUs which can waste cpu usage. This patchset add the ability to adjust
      halt_poll_ns dynamically, to grow halt_poll_ns when shot halt is detected,
      and to shrink halt_poll_ns when long halt is detected.
      
      There are two new kernel parameters for changing the halt_poll_ns:
      halt_poll_ns_grow and halt_poll_ns_shrink.
      
                              no-poll      always-poll    dynamic-poll
      -----------------------------------------------------------------------
      Idle (nohz) vCPU %c0     0.15%        0.3%            0.2%
      Idle (250HZ) vCPU %c0    1.1%         4.6%~14%        1.2%
      TCP_RR latency           34us         27us            26.7us
      
      "Idle (X) vCPU %c0" is the percent of time the physical cpu spent in
      c0 over 60 seconds (each vCPU is pinned to a pCPU). (nohz) means the
      guest was tickless. (250HZ) means the guest was ticking at 250HZ.
      
      The big win is with ticking operating systems. Running the linux guest
      with nohz=off (and HZ=250), we save 3.4%~12.8% CPUs/second and get close
      to no-polling overhead levels by using the dynamic-poll. The savings
      should be even higher for higher frequency ticks.
      Suggested-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      [Simplify the patch. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      aca6ff29
    • W
      KVM: make halt_poll_ns per-vCPU · 19020f8a
      Wanpeng Li 提交于
      Change halt_poll_ns into per-VCPU variable, seeded from module parameter,
      to allow greater flexibility.
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      19020f8a
  11. 04 9月, 2015 2 次提交
  12. 12 8月, 2015 7 次提交
  13. 30 7月, 2015 1 次提交
  14. 29 7月, 2015 1 次提交
  15. 10 7月, 2015 1 次提交
  16. 04 7月, 2015 1 次提交
  17. 19 6月, 2015 3 次提交
  18. 18 6月, 2015 1 次提交