1. 22 12月, 2013 6 次提交
  2. 17 11月, 2013 1 次提交
  3. 08 11月, 2013 2 次提交
  4. 29 10月, 2013 1 次提交
  5. 22 10月, 2013 4 次提交
  6. 18 10月, 2013 4 次提交
    • C
      KVM: ARM: Transparent huge page (THP) support · 9b5fdb97
      Christoffer Dall 提交于
      Support transparent huge pages in KVM/ARM and KVM/ARM64.  The
      transparent_hugepage_adjust is not very pretty, but this is also how
      it's solved on x86 and seems to be simply an artifact on how THPs
      behave.  This should eventually be shared across architectures if
      possible, but that can always be changed down the road.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      9b5fdb97
    • C
      KVM: ARM: Support hugetlbfs backed huge pages · ad361f09
      Christoffer Dall 提交于
      Support huge pages in KVM/ARM and KVM/ARM64.  The pud_huge checking on
      the unmap path may feel a bit silly as the pud_huge check is always
      defined to false, but the compiler should be smart about this.
      
      Note: This deals only with VMAs marked as huge which are allocated by
      users through hugetlbfs only.  Transparent huge pages can only be
      detected by looking at the underlying pages (or the page tables
      themselves) and this patch so far simply maps these on a page-by-page
      level in the Stage-2 page tables.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      ad361f09
    • C
      KVM: ARM: Update comments for kvm_handle_wfi · 86ed81aa
      Christoffer Dall 提交于
      Update comments to reflect what is really going on and add the TWE bit
      to the comments in kvm_arm.h.
      
      Also renames the function to kvm_handle_wfx like is done on arm64 for
      consistency and uber-correctness.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      86ed81aa
    • M
      ARM: KVM: Yield CPU when vcpu executes a WFE · 58d5ec8f
      Marc Zyngier 提交于
      On an (even slightly) oversubscribed system, spinlocks are quickly
      becoming a bottleneck, as some vcpus are spinning, waiting for a
      lock to be released, while the vcpu holding the lock may not be
      running at all.
      
      This creates contention, and the observed slowdown is 40x for
      hackbench. No, this isn't a typo.
      
      The solution is to trap blocking WFEs and tell KVM that we're
      now spinning. This ensures that other vpus will get a scheduling
      boost, allowing the lock to be released more quickly. Also, using
      CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
      when the VM is severely overcommited.
      
      Quick test to estimate the performance: hackbench 1 process 1000
      
      2xA15 host (baseline):	1.843s
      
      2xA15 guest w/o patch:	2.083s
      4xA15 guest w/o patch:	80.212s
      8xA15 guest w/o patch:	Could not be bothered to find out
      
      2xA15 guest w/ patch:	2.102s
      4xA15 guest w/ patch:	3.205s
      8xA15 guest w/ patch:	6.887s
      
      So we go from a 40x degradation to 1.5x in the 2x overcommit case,
      which is vaguely more acceptable.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      58d5ec8f
  7. 17 10月, 2013 1 次提交
  8. 16 10月, 2013 2 次提交
    • C
      KVM: ARM: Update comments for kvm_handle_wfi · 82ea046c
      Christoffer Dall 提交于
      Update comments to reflect what is really going on and add the TWE bit
      to the comments in kvm_arm.h.
      
      Also renames the function to kvm_handle_wfx like is done on arm64 for
      consistency and uber-correctness.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      82ea046c
    • M
      ARM: KVM: Yield CPU when vcpu executes a WFE · 1f558098
      Marc Zyngier 提交于
      On an (even slightly) oversubscribed system, spinlocks are quickly
      becoming a bottleneck, as some vcpus are spinning, waiting for a
      lock to be released, while the vcpu holding the lock may not be
      running at all.
      
      This creates contention, and the observed slowdown is 40x for
      hackbench. No, this isn't a typo.
      
      The solution is to trap blocking WFEs and tell KVM that we're
      now spinning. This ensures that other vpus will get a scheduling
      boost, allowing the lock to be released more quickly. Also, using
      CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
      when the VM is severely overcommited.
      
      Quick test to estimate the performance: hackbench 1 process 1000
      
      2xA15 host (baseline):	1.843s
      
      2xA15 guest w/o patch:	2.083s
      4xA15 guest w/o patch:	80.212s
      8xA15 guest w/o patch:	Could not be bothered to find out
      
      2xA15 guest w/ patch:	2.102s
      4xA15 guest w/ patch:	3.205s
      8xA15 guest w/ patch:	6.887s
      
      So we go from a 40x degradation to 1.5x in the 2x overcommit case,
      which is vaguely more acceptable.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      1f558098
  9. 13 10月, 2013 2 次提交
  10. 03 10月, 2013 3 次提交
  11. 25 9月, 2013 1 次提交
  12. 31 8月, 2013 3 次提交
  13. 14 8月, 2013 1 次提交
  14. 12 8月, 2013 2 次提交
  15. 08 8月, 2013 2 次提交
    • M
      arm64: KVM: fix 2-level page tables unmapping · 979acd5e
      Marc Zyngier 提交于
      When using 64kB pages, we only have two levels of page tables,
      meaning that PGD, PUD and PMD are fused. In this case, trying
      to refcount PUDs and PMDs independently is a a complete disaster,
      as they are the same.
      
      We manage to get it right for the allocation (stage2_set_pte uses
      {pmd,pud}_none), but the unmapping path clears both pud and pmd
      refcounts, which fails spectacularly with 2-level page tables.
      
      The fix is to avoid calling clear_pud_entry when both the pmd and
      pud pages are empty. For this, and instead of introducing another
      pud_empty function, consolidate both pte_empty and pmd_empty into
      page_empty (the code is actually identical) and use that to also
      test the validity of the pud.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      979acd5e
    • C
      ARM: KVM: Fix unaligned unmap_range leak · d3840b26
      Christoffer Dall 提交于
      The unmap_range function did not properly cover the case when the start
      address was not aligned to PMD_SIZE or PUD_SIZE and an entire pte table
      or pmd table was cleared, causing us to leak memory when incrementing
      the addr.
      
      The fix is to always move onto the next page table entry boundary
      instead of adding the full size of the VA range covered by the
      corresponding table level entry.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      d3840b26
  16. 07 8月, 2013 1 次提交
    • C
      ARM: KVM: Fix 64-bit coprocessor handling · 240e99cb
      Christoffer Dall 提交于
      The PAR was exported as CRn == 7 and CRm == 0, but in fact the primary
      coprocessor register number was determined by CRm for 64-bit coprocessor
      registers as the user space API was modeled after the coprocessor
      access instructions (see the ARM ARM rev. C - B3-1445).
      
      However, just changing the CRn to CRm breaks the sorting check when
      booting the kernel, because the internal kernel logic always treats CRn
      as the primary register number, and it makes the table sorting
      impossible to understand for humans.
      
      Alternatively we could change the logic to always have CRn == CRm, but
      that becomes unclear in the number of ways we do look up of a coprocessor
      register.  We could also have a separate 64-bit table but that feels
      somewhat over-engineered.  Instead, keep CRn the primary representation
      of the primary coproc. register number in-kernel and always export the
      primary number as CRm as per the existing user space ABI.
      
      Note: The TTBR registers just magically worked because they happened to
      follow the CRn(0) regs and were considered CRn(0) in the in-kernel
      representation.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      240e99cb
  17. 18 7月, 2013 1 次提交
  18. 27 6月, 2013 3 次提交