1. 11 3月, 2015 1 次提交
    • M
      arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting · a987370f
      Marc Zyngier 提交于
      We're using __get_free_pages with to allocate the guest's stage-2
      PGD. The standard behaviour of this function is to return a set of
      pages where only the head page has a valid refcount.
      
      This behaviour gets us into trouble when we're trying to increment
      the refcount on a non-head page:
      
      page:ffff7c00cfb693c0 count:0 mapcount:0 mapping:          (null) index:0x0
      flags: 0x4000000000000000()
      page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0)
      BUG: failure at include/linux/mm.h:548/get_page()!
      Kernel panic - not syncing: BUG!
      CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825
      Hardware name: APM X-Gene Mustang board (DT)
      Call trace:
      [<ffff80000008a09c>] dump_backtrace+0x0/0x13c
      [<ffff80000008a1e8>] show_stack+0x10/0x1c
      [<ffff800000691da8>] dump_stack+0x74/0x94
      [<ffff800000690d78>] panic+0x100/0x240
      [<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc
      [<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0
      [<ffff8000000a420c>] handle_exit+0x58/0x180
      [<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c
      [<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754
      [<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8
      [<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78
      CPU0: stopping
      
      A possible approach for this is to split the compound page using
      split_page() at allocation time, and change the teardown path to
      free one page at a time.  It turns out that alloc_pages_exact() and
      free_pages_exact() does exactly that.
      
      While we're at it, the PGD allocation code is reworked to reduce
      duplication.
      
      This has been tested on an X-Gene platform with a 4kB/48bit-VA host
      kernel, and kvmtool hacked to place memory in the second page of
      the hardware PGD (PUD for the host kernel). Also regression-tested
      on a Cubietruck (Cortex-A7).
      
       [ Reworked to use alloc_pages_exact() and free_pages_exact() and to
         return pointers directly instead of by reference as arguments
          - Christoffer ]
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      a987370f
  2. 30 1月, 2015 3 次提交
    • M
      arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault · 0d3e4d4f
      Marc Zyngier 提交于
      When handling a fault in stage-2, we need to resync I$ and D$, just
      to be sure we don't leave any old cache line behind.
      
      That's very good, except that we do so using the *user* address.
      Under heavy load (swapping like crazy), we may end up in a situation
      where the page gets mapped in stage-2 while being unmapped from
      userspace by another CPU.
      
      At that point, the DC/IC instructions can generate a fault, which
      we handle with kvm->mmu_lock held. The box quickly deadlocks, user
      is unhappy.
      
      Instead, perform this invalidation through the kernel mapping,
      which is guaranteed to be present. The box is much happier, and so
      am I.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      0d3e4d4f
    • M
      arm/arm64: KVM: Invalidate data cache on unmap · 363ef89f
      Marc Zyngier 提交于
      Let's assume a guest has created an uncached mapping, and written
      to that page. Let's also assume that the host uses a cache-coherent
      IO subsystem. Let's finally assume that the host is under memory
      pressure and starts to swap things out.
      
      Before this "uncached" page is evicted, we need to make sure
      we invalidate potential speculated, clean cache lines that are
      sitting there, or the IO subsystem is going to swap out the
      cached view, loosing the data that has been written directly
      into memory.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      363ef89f
    • M
      arm/arm64: KVM: Use set/way op trapping to track the state of the caches · 3c1e7165
      Marc Zyngier 提交于
      Trying to emulate the behaviour of set/way cache ops is fairly
      pointless, as there are too many ways we can end-up missing stuff.
      Also, there is some system caches out there that simply ignore
      set/way operations.
      
      So instead of trying to implement them, let's convert it to VA ops,
      and use them as a way to re-enable the trapping of VM ops. That way,
      we can detect the point when the MMU/caches are turned off, and do
      a full VM flush (which is what the guest was trying to do anyway).
      
      This allows a 32bit zImage to boot on the APM thingy, and will
      probably help bootloaders in general.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      3c1e7165
  3. 29 1月, 2015 1 次提交
  4. 23 1月, 2015 1 次提交
  5. 16 1月, 2015 4 次提交
  6. 13 12月, 2014 1 次提交
    • C
      arm/arm64: KVM: Introduce stage2_unmap_vm · 957db105
      Christoffer Dall 提交于
      Introduce a new function to unmap user RAM regions in the stage2 page
      tables.  This is needed on reboot (or when the guest turns off the MMU)
      to ensure we fault in pages again and make the dcache, RAM, and icache
      coherent.
      
      Using unmap_stage2_range for the whole guest physical range does not
      work, because that unmaps IO regions (such as the GIC) which will not be
      recreated or in the best case faulted in on a page-by-page basis.
      
      Call this function on secondary and subsequent calls to the
      KVM_ARM_VCPU_INIT ioctl so that a reset VCPU will detect the guest
      Stage-1 MMU is off when faulting in pages and make the caches coherent.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      957db105
  7. 26 11月, 2014 2 次提交
    • A
      arm/arm64: kvm: drop inappropriate use of kvm_is_mmio_pfn() · bb55e9b1
      Ard Biesheuvel 提交于
      Instead of using kvm_is_mmio_pfn() to decide whether a host region
      should be stage 2 mapped with device attributes, add a new static
      function kvm_is_device_pfn() that disregards RAM pages with the
      reserved bit set, as those should usually not be mapped as device
      memory.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bb55e9b1
    • M
      arm64: KVM: fix unmapping with 48-bit VAs · 7cbb87d6
      Mark Rutland 提交于
      Currently if using a 48-bit VA, tearing down the hyp page tables (which
      can happen in the absence of a GICH or GICV resource) results in the
      rather nasty splat below, evidently becasue we access a table that
      doesn't actually exist.
      
      Commit 38f791a4 (arm64: KVM: Implement 48 VA support for KVM EL2
      and Stage-2) added a pgd_none check to __create_hyp_mappings to account
      for the additional level of tables, but didn't add a corresponding check
      to unmap_range, and this seems to be the source of the problem.
      
      This patch adds the missing pgd_none check, ensuring we don't try to
      access tables that don't exist.
      
      Original splat below:
      
      kvm [1]: Using HYP init bounce page @83fe94a000
      kvm [1]: Cannot obtain GICH resource
      Unable to handle kernel paging request at virtual address ffff7f7fff000000
      pgd = ffff800000770000
      [ffff7f7fff000000] *pgd=0000000000000000
      Internal error: Oops: 96000004 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc2+ #89
      task: ffff8003eb500000 ti: ffff8003eb45c000 task.ti: ffff8003eb45c000
      PC is at unmap_range+0x120/0x580
      LR is at free_hyp_pgds+0xac/0xe4
      pc : [<ffff80000009b768>] lr : [<ffff80000009cad8>] pstate: 80000045
      sp : ffff8003eb45fbf0
      x29: ffff8003eb45fbf0 x28: ffff800000736000
      x27: ffff800000735000 x26: ffff7f7fff000000
      x25: 0000000040000000 x24: ffff8000006f5000
      x23: 0000000000000000 x22: 0000007fffffffff
      x21: 0000800000000000 x20: 0000008000000000
      x19: 0000000000000000 x18: ffff800000648000
      x17: ffff800000537228 x16: 0000000000000000
      x15: 000000000000001f x14: 0000000000000000
      x13: 0000000000000001 x12: 0000000000000020
      x11: 0000000000000062 x10: 0000000000000006
      x9 : 0000000000000000 x8 : 0000000000000063
      x7 : 0000000000000018 x6 : 00000003ff000000
      x5 : ffff800000744188 x4 : 0000000000000001
      x3 : 0000000040000000 x2 : ffff800000000000
      x1 : 0000007fffffffff x0 : 000000003fffffff
      
      Process swapper/0 (pid: 1, stack limit = 0xffff8003eb45c058)
      Stack: (0xffff8003eb45fbf0 to 0xffff8003eb460000)
      fbe0:                                     eb45fcb0 ffff8003 0009cad8 ffff8000
      fc00: 00000000 00000080 00736140 ffff8000 00736000 ffff8000 00000000 00007c80
      fc20: 00000000 00000080 006f5000 ffff8000 00000000 00000080 00743000 ffff8000
      fc40: 00735000 ffff8000 006d3030 ffff8000 006fe7b8 ffff8000 00000000 00000080
      fc60: ffffffff 0000007f fdac1000 ffff8003 fd94b000 ffff8003 fda47000 ffff8003
      fc80: 00502b40 ffff8000 ff000000 ffff7f7f fdec6000 00008003 fdac1630 ffff8003
      fca0: eb45fcb0 ffff8003 ffffffff 0000007f eb45fd00 ffff8003 0009b378 ffff8000
      fcc0: ffffffea 00000000 006fe000 ffff8000 00736728 ffff8000 00736120 ffff8000
      fce0: 00000040 00000000 00743000 ffff8000 006fe7b8 ffff8000 0050cd48 00000000
      fd00: eb45fd60 ffff8003 00096070 ffff8000 006f06e0 ffff8000 006f06e0 ffff8000
      fd20: fd948b40 ffff8003 0009a320 ffff8000 00000000 00000000 00000000 00000000
      fd40: 00000ae0 00000000 006aa25c ffff8000 eb45fd60 ffff8003 0017ca44 00000002
      fd60: eb45fdc0 ffff8003 0009a33c ffff8000 006f06e0 ffff8000 006f06e0 ffff8000
      fd80: fd948b40 ffff8003 0009a320 ffff8000 00000000 00000000 00735000 ffff8000
      fda0: 006d3090 ffff8000 006aa25c ffff8000 00735000 ffff8000 006d3030 ffff8000
      fdc0: eb45fdd0 ffff8003 000814c0 ffff8000 eb45fe50 ffff8003 006aaac4 ffff8000
      fde0: 006ddd90 ffff8000 00000006 00000000 006d3000 ffff8000 00000095 00000000
      fe00: 006a1e90 ffff8000 00735000 ffff8000 006d3000 ffff8000 006aa25c ffff8000
      fe20: 00735000 ffff8000 006d3030 ffff8000 eb45fe50 ffff8003 006fac68 ffff8000
      fe40: 00000006 00000006 fe293ee6 ffff8003 eb45feb0 ffff8003 004f8ee8 ffff8000
      fe60: 004f8ed4 ffff8000 00735000 ffff8000 00000000 00000000 00000000 00000000
      fe80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      fea0: 00000000 00000000 00000000 00000000 00000000 00000000 000843d0 ffff8000
      fec0: 004f8ed4 ffff8000 00000000 00000000 00000000 00000000 00000000 00000000
      fee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      ff00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      ff20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      ff40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      ff60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      ff80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      ffa0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000005 00000000
      ffe0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      Call trace:
      [<ffff80000009b768>] unmap_range+0x120/0x580
      [<ffff80000009cad4>] free_hyp_pgds+0xa8/0xe4
      [<ffff80000009b374>] kvm_arch_init+0x268/0x44c
      [<ffff80000009606c>] kvm_init+0x24/0x260
      [<ffff80000009a338>] arm_init+0x18/0x24
      [<ffff8000000814bc>] do_one_initcall+0x88/0x1a0
      [<ffff8000006aaac0>] kernel_init_freeable+0x148/0x1e8
      [<ffff8000004f8ee4>] kernel_init+0x10/0xd4
      Code: 8b000263 92628479 d1000720 eb01001f (f9400340)
      ---[ end trace 3bc230562e926fa4 ]---
      Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Jungseok Lee <jungseoklee85@gmail.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7cbb87d6
  8. 25 11月, 2014 3 次提交
  9. 15 10月, 2014 1 次提交
  10. 14 10月, 2014 2 次提交
    • C
      arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE · c3058d5d
      Christoffer Dall 提交于
      When creating or moving a memslot, make sure the IPA space is within the
      addressable range of the guest.  Otherwise, user space can create too
      large a memslot and KVM would try to access potentially unallocated page
      table entries when inserting entries in the Stage-2 page tables.
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      c3058d5d
    • C
      arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 · 38f791a4
      Christoffer Dall 提交于
      This patch adds the necessary support for all host kernel PGSIZE and
      VA_SPACE configuration options for both EL2 and the Stage-2 page tables.
      
      However, for 40bit and 42bit PARange systems, the architecture mandates
      that VTCR_EL2.SL0 is maximum 1, resulting in fewer levels of stage-2
      pagge tables than levels of host kernel page tables.  At the same time,
      systems with a PARange > 42bit, we limit the IPA range by always setting
      VTCR_EL2.T0SZ to 24.
      
      To solve the situation with different levels of page tables for Stage-2
      translation than the host kernel page tables, we allocate a dummy PGD
      with pointers to our actual inital level Stage-2 page table, in order
      for us to reuse the kernel pgtable manipulation primitives.  Reproducing
      all these in KVM does not look pretty and unnecessarily complicates the
      32-bit side.
      
      Systems with a PARange < 40bits are not yet supported.
      
       [ I have reworked this patch from its original form submitted by
         Jungseok to take the architecture constraints into consideration.
         There were too many changes from the original patch for me to
         preserve the authorship.  Thanks to Catalin Marinas for his help in
         figuring out a good solution to this challenge.  I have also fixed
         various bugs and missing error code handling from the original
         patch. - Christoffer ]
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NJungseok Lee <jungseoklee85@gmail.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      38f791a4
  11. 13 10月, 2014 1 次提交
  12. 10 10月, 2014 3 次提交
  13. 26 9月, 2014 1 次提交
  14. 11 9月, 2014 1 次提交
  15. 28 8月, 2014 1 次提交
  16. 11 7月, 2014 3 次提交
  17. 28 4月, 2014 1 次提交
    • M
      arm: KVM: fix possible misalignment of PGDs and bounce page · 5d4e08c4
      Mark Salter 提交于
      The kvm/mmu code shared by arm and arm64 uses kalloc() to allocate
      a bounce page (if hypervisor init code crosses page boundary) and
      hypervisor PGDs. The problem is that kalloc() does not guarantee
      the proper alignment. In the case of the bounce page, the page sized
      buffer allocated may also cross a page boundary negating the purpose
      and leading to a hang during kvm initialization. Likewise the PGDs
      allocated may not meet the minimum alignment requirements of the
      underlying MMU. This patch uses __get_free_page() to guarantee the
      worst case alignment needs of the bounce page and PGDs on both arm
      and arm64.
      
      Cc: <stable@vger.kernel.org> # 3.10+
      Signed-off-by: NMark Salter <msalter@redhat.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      5d4e08c4
  18. 03 3月, 2014 4 次提交
  19. 09 1月, 2014 1 次提交
    • M
      arm/arm64: KVM: relax the requirements of VMA alignment for THP · 136d737f
      Marc Zyngier 提交于
      The THP code in KVM/ARM is a bit restrictive in not allowing a THP
      to be used if the VMA is not 2MB aligned. Actually, it is not so much
      the VMA that matters, but the associated memslot:
      
      A process can perfectly mmap a region with no particular alignment
      restriction, and then pass a 2MB aligned address to KVM. In this
      case, KVM will only use this 2MB aligned region, and will ignore
      the range between vma->vm_start and memslot->userspace_addr.
      
      It can also choose to place this memslot at whatever alignment it
      wants in the IPA space. In the end, what matters is the relative
      alignment of the user space and IPA mappings with respect to a
      2M page. They absolutely must be the same if you want to use THP.
      
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      136d737f
  20. 12 12月, 2013 1 次提交
  21. 17 11月, 2013 1 次提交
  22. 18 10月, 2013 2 次提交
  23. 14 8月, 2013 1 次提交