1. 26 5月, 2015 1 次提交
  2. 07 5月, 2015 1 次提交
  3. 22 4月, 2015 1 次提交
    • A
      KVM: arm/arm64: check IRQ number on userland injection · fd1d0ddf
      Andre Przywara 提交于
      When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently
      only check it against a fixed limit, which historically is set
      to 127. With the new dynamic IRQ allocation the effective limit may
      actually be smaller (64).
      So when now a malicious or buggy userland injects a SPI in that
      range, we spill over on our VGIC bitmaps and bytemaps memory.
      I could trigger a host kernel NULL pointer dereference with current
      mainline by injecting some bogus IRQ number from a hacked kvmtool:
      -----------------
      ....
      DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1)
      DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1)
      DEBUG: IRQ #114 still in the game, writing to bytemap now...
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      pgd = ffffffc07652e000
      [00000000] *pgd=00000000f658b003, *pud=00000000f658b003, *pmd=0000000000000000
      Internal error: Oops: 96000006 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027
      Hardware name: FVP Base (DT)
      task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000
      PC is at kvm_vgic_inject_irq+0x234/0x310
      LR is at kvm_vgic_inject_irq+0x30c/0x310
      pc : [<ffffffc0000ae0a8>] lr : [<ffffffc0000ae180>] pstate: 80000145
      .....
      
      So this patch fixes this by checking the SPI number against the
      actual limit. Also we remove the former legacy hard limit of
      127 in the ioctl code.
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      CC: <stable@vger.kernel.org> # 4.0, 3.19, 3.18
      [maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__,
      as suggested by Christopher Covington]
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      fd1d0ddf
  4. 31 3月, 2015 2 次提交
    • N
      KVM: arm/arm64: enable KVM_CAP_IOEVENTFD · d44758c0
      Nikolay Nikolaev 提交于
      As the infrastructure for eventfd has now been merged, report the
      ioeventfd capability as being supported.
      Signed-off-by: NNikolay Nikolaev <n.nikolaev@virtualopensystems.com>
      [maz: grouped the case entry with the others, fixed commit log]
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      d44758c0
    • A
      KVM: arm/arm64: rework MMIO abort handling to use KVM MMIO bus · 950324ab
      Andre Przywara 提交于
      Currently we have struct kvm_exit_mmio for encapsulating MMIO abort
      data to be passed on from syndrome decoding all the way down to the
      VGIC register handlers. Now as we switch the MMIO handling to be
      routed through the KVM MMIO bus, it does not make sense anymore to
      use that structure already from the beginning. So we keep the data in
      local variables until we put them into the kvm_io_bus framework.
      Then we fill kvm_exit_mmio in the VGIC only, making it a VGIC private
      structure. On that way we replace the data buffer in that structure
      with a pointer pointing to a single location in a local variable, so
      we get rid of some copying on the way.
      With all of the virtual GIC emulation code now being registered with
      the kvm_io_bus, we can remove all of the old MMIO handling code and
      its dispatching functionality.
      
      I didn't bother to rename kvm_exit_mmio (to vgic_mmio or something),
      because that touches a lot of code lines without any good reason.
      
      This is based on an original patch by Nikolay.
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      Cc: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      950324ab
  5. 27 3月, 2015 2 次提交
  6. 23 3月, 2015 1 次提交
  7. 20 3月, 2015 1 次提交
    • A
      ARM, arm64: kvm: get rid of the bounce page · 06f75a1f
      Ard Biesheuvel 提交于
      The HYP init bounce page is a runtime construct that ensures that the
      HYP init code does not cross a page boundary. However, this is something
      we can do perfectly well at build time, by aligning the code appropriately.
      
      For arm64, we just align to 4 KB, and enforce that the code size is less
      than 4 KB, regardless of the chosen page size.
      
      For ARM, the whole code is less than 256 bytes, so we tweak the linker
      script to align at a power of 2 upper bound of the code size
      
      Note that this also fixes a benign off-by-one error in the original bounce
      page code, where a bounce page would be allocated unnecessarily if the code
      was exactly 1 page in size.
      
      On ARM, it also fixes an issue with very large kernels reported by Arnd
      Bergmann, where stub sections with linker emitted veneers could erroneously
      trigger the size/alignment ASSERT() in the linker script.
      Tested-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      06f75a1f
  8. 14 3月, 2015 2 次提交
    • C
      arm/arm64: KVM: Fix migration race in the arch timer · 1a748478
      Christoffer Dall 提交于
      When a VCPU is no longer running, we currently check to see if it has a
      timer scheduled in the future, and if it does, we schedule a host
      hrtimer to notify is in case the timer expires while the VCPU is still
      not running.  When the hrtimer fires, we mask the guest's timer and
      inject the timer IRQ (still relying on the guest unmasking the time when
      it receives the IRQ).
      
      This is all good and fine, but when migration a VM (checkpoint/restore)
      this introduces a race.  It is unlikely, but possible, for the following
      sequence of events to happen:
      
       1. Userspace stops the VM
       2. Hrtimer for VCPU is scheduled
       3. Userspace checkpoints the VGIC state (no pending timer interrupts)
       4. The hrtimer fires, schedules work in a workqueue
       5. Workqueue function runs, masks the timer and injects timer interrupt
       6. Userspace checkpoints the timer state (timer masked)
      
      At restore time, you end up with a masked timer without any timer
      interrupts and your guest halts never receiving timer interrupts.
      
      Fix this by only kicking the VCPU in the workqueue function, and sample
      the expired state of the timer when entering the guest again and inject
      the interrupt and mask the timer only then.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      1a748478
    • A
      arm/arm64: KVM: export VCPU power state via MP_STATE ioctl · ecccf0cc
      Alex Bennée 提交于
      To cleanly restore an SMP VM we need to ensure that the current pause
      state of each vcpu is correctly recorded. Things could get confused if
      the CPU starts running after migration restore completes when it was
      paused before it state was captured.
      
      We use the existing KVM_GET/SET_MP_STATE ioctl to do this. The arm/arm64
      interface is a lot simpler as the only valid states are
      KVM_MP_STATE_RUNNABLE and KVM_MP_STATE_STOPPED.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      ecccf0cc
  9. 13 3月, 2015 3 次提交
  10. 12 3月, 2015 4 次提交
  11. 11 3月, 2015 3 次提交
    • P
      KVM: arm/arm64: prefer IS_ENABLED to a static variable · 69ff5c61
      Paolo Bonzini 提交于
      IS_ENABLED gives compile-time checking and keeps the code clearer.
      
      The one exception is inside kvm_vm_ioctl_check_extension, where
      the established idiom is to wrap the case labels with an #ifdef.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      69ff5c61
    • M
      arm64: KVM: Do not use pgd_index to index stage-2 pgd · 04b8dc85
      Marc Zyngier 提交于
      The kernel's pgd_index macro is designed to index a normal, page
      sized array. KVM is a bit diffferent, as we can use concatenated
      pages to have a bigger address space (for example 40bit IPA with
      4kB pages gives us an 8kB PGD.
      
      In the above case, the use of pgd_index will always return an index
      inside the first 4kB, which makes a guest that has memory above
      0x8000000000 rather unhappy, as it spins forever in a page fault,
      whist the host happilly corrupts the lower pgd.
      
      The obvious fix is to get our own kvm_pgd_index that does the right
      thing(tm).
      
      Tested on X-Gene with a hacked kvmtool that put memory at a stupidly
      high address.
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      04b8dc85
    • M
      arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting · a987370f
      Marc Zyngier 提交于
      We're using __get_free_pages with to allocate the guest's stage-2
      PGD. The standard behaviour of this function is to return a set of
      pages where only the head page has a valid refcount.
      
      This behaviour gets us into trouble when we're trying to increment
      the refcount on a non-head page:
      
      page:ffff7c00cfb693c0 count:0 mapcount:0 mapping:          (null) index:0x0
      flags: 0x4000000000000000()
      page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0)
      BUG: failure at include/linux/mm.h:548/get_page()!
      Kernel panic - not syncing: BUG!
      CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825
      Hardware name: APM X-Gene Mustang board (DT)
      Call trace:
      [<ffff80000008a09c>] dump_backtrace+0x0/0x13c
      [<ffff80000008a1e8>] show_stack+0x10/0x1c
      [<ffff800000691da8>] dump_stack+0x74/0x94
      [<ffff800000690d78>] panic+0x100/0x240
      [<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc
      [<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0
      [<ffff8000000a420c>] handle_exit+0x58/0x180
      [<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c
      [<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754
      [<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8
      [<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78
      CPU0: stopping
      
      A possible approach for this is to split the compound page using
      split_page() at allocation time, and change the teardown path to
      free one page at a time.  It turns out that alloc_pages_exact() and
      free_pages_exact() does exactly that.
      
      While we're at it, the PGD allocation code is reworked to reduce
      duplication.
      
      This has been tested on an X-Gene platform with a 4kB/48bit-VA host
      kernel, and kvmtool hacked to place memory in the second page of
      the hardware PGD (PUD for the host kernel). Also regression-tested
      on a Cubietruck (Cortex-A7).
      
       [ Reworked to use alloc_pages_exact() and free_pages_exact() and to
         return pointers directly instead of by reference as arguments
          - Christoffer ]
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      a987370f
  12. 24 2月, 2015 1 次提交
  13. 30 1月, 2015 3 次提交
    • M
      arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault · 0d3e4d4f
      Marc Zyngier 提交于
      When handling a fault in stage-2, we need to resync I$ and D$, just
      to be sure we don't leave any old cache line behind.
      
      That's very good, except that we do so using the *user* address.
      Under heavy load (swapping like crazy), we may end up in a situation
      where the page gets mapped in stage-2 while being unmapped from
      userspace by another CPU.
      
      At that point, the DC/IC instructions can generate a fault, which
      we handle with kvm->mmu_lock held. The box quickly deadlocks, user
      is unhappy.
      
      Instead, perform this invalidation through the kernel mapping,
      which is guaranteed to be present. The box is much happier, and so
      am I.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      0d3e4d4f
    • M
      arm/arm64: KVM: Invalidate data cache on unmap · 363ef89f
      Marc Zyngier 提交于
      Let's assume a guest has created an uncached mapping, and written
      to that page. Let's also assume that the host uses a cache-coherent
      IO subsystem. Let's finally assume that the host is under memory
      pressure and starts to swap things out.
      
      Before this "uncached" page is evicted, we need to make sure
      we invalidate potential speculated, clean cache lines that are
      sitting there, or the IO subsystem is going to swap out the
      cached view, loosing the data that has been written directly
      into memory.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      363ef89f
    • M
      arm/arm64: KVM: Use set/way op trapping to track the state of the caches · 3c1e7165
      Marc Zyngier 提交于
      Trying to emulate the behaviour of set/way cache ops is fairly
      pointless, as there are too many ways we can end-up missing stuff.
      Also, there is some system caches out there that simply ignore
      set/way operations.
      
      So instead of trying to implement them, let's convert it to VA ops,
      and use them as a way to re-enable the trapping of VM ops. That way,
      we can detect the point when the MMU/caches are turned off, and do
      a full VM flush (which is what the guest was trying to do anyway).
      
      This allows a 32bit zImage to boot on the APM thingy, and will
      probably help bootloaders in general.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      3c1e7165
  14. 29 1月, 2015 1 次提交
  15. 23 1月, 2015 2 次提交
  16. 21 1月, 2015 4 次提交
  17. 16 1月, 2015 5 次提交
  18. 15 1月, 2015 1 次提交
  19. 07 1月, 2015 1 次提交
    • P
      rcu: Make SRCU optional by using CONFIG_SRCU · 83fe27ea
      Pranith Kumar 提交于
      SRCU is not necessary to be compiled by default in all cases. For tinification
      efforts not compiling SRCU unless necessary is desirable.
      
      The current patch tries to make compiling SRCU optional by introducing a new
      Kconfig option CONFIG_SRCU which is selected when any of the components making
      use of SRCU are selected.
      
      If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.
      
         text    data     bss     dec     hex filename
         2007       0       0    2007     7d7 kernel/rcu/srcu.o
      
      Size of arch/powerpc/boot/zImage changes from
      
         text    data     bss     dec     hex filename
       831552   64180   23944  919676   e087c arch/powerpc/boot/zImage : before
       829504   64180   23952  917636   e0084 arch/powerpc/boot/zImage : after
      
      so the savings are about ~2000 bytes.
      Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
      CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      CC: Josh Triplett <josh@joshtriplett.org>
      CC: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]
      83fe27ea
  20. 15 12月, 2014 1 次提交
    • C
      arm/arm64: KVM: Require in-kernel vgic for the arch timers · 05971120
      Christoffer Dall 提交于
      It is curently possible to run a VM with architected timers support
      without creating an in-kernel VGIC, which will result in interrupts from
      the virtual timer going nowhere.
      
      To address this issue, move the architected timers initialization to the
      time when we run a VCPU for the first time, and then only initialize
      (and enable) the architected timers if we have a properly created and
      initialized in-kernel VGIC.
      
      When injecting interrupts from the virtual timer to the vgic, the
      current setup should ensure that this never calls an on-demand init of
      the VGIC, which is the only call path that could return an error from
      kvm_vgic_inject_irq(), so capture the return value and raise a warning
      if there's an error there.
      
      We also change the kvm_timer_init() function from returning an int to be
      a void function, since the function always succeeds.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      05971120