1. 31 10月, 2013 1 次提交
    • B
      kvm: Add KVM_GET_EMULATED_CPUID · 9c15bb1d
      Borislav Petkov 提交于
      Add a kvm ioctl which states which system functionality kvm emulates.
      The format used is that of CPUID and we return the corresponding CPUID
      bits set for which we do emulate functionality.
      
      Make sure ->padding is being passed on clean from userspace so that we
      can use it for something in the future, after the ioctl gets cast in
      stone.
      
      s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at
      it.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9c15bb1d
  2. 28 10月, 2013 3 次提交
  3. 18 10月, 2013 4 次提交
    • C
      KVM: ARM: Transparent huge page (THP) support · 9b5fdb97
      Christoffer Dall 提交于
      Support transparent huge pages in KVM/ARM and KVM/ARM64.  The
      transparent_hugepage_adjust is not very pretty, but this is also how
      it's solved on x86 and seems to be simply an artifact on how THPs
      behave.  This should eventually be shared across architectures if
      possible, but that can always be changed down the road.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      9b5fdb97
    • C
      KVM: ARM: Support hugetlbfs backed huge pages · ad361f09
      Christoffer Dall 提交于
      Support huge pages in KVM/ARM and KVM/ARM64.  The pud_huge checking on
      the unmap path may feel a bit silly as the pud_huge check is always
      defined to false, but the compiler should be smart about this.
      
      Note: This deals only with VMAs marked as huge which are allocated by
      users through hugetlbfs only.  Transparent huge pages can only be
      detected by looking at the underlying pages (or the page tables
      themselves) and this patch so far simply maps these on a page-by-page
      level in the Stage-2 page tables.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      ad361f09
    • C
      KVM: ARM: Update comments for kvm_handle_wfi · 86ed81aa
      Christoffer Dall 提交于
      Update comments to reflect what is really going on and add the TWE bit
      to the comments in kvm_arm.h.
      
      Also renames the function to kvm_handle_wfx like is done on arm64 for
      consistency and uber-correctness.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      86ed81aa
    • M
      ARM: KVM: Yield CPU when vcpu executes a WFE · 58d5ec8f
      Marc Zyngier 提交于
      On an (even slightly) oversubscribed system, spinlocks are quickly
      becoming a bottleneck, as some vcpus are spinning, waiting for a
      lock to be released, while the vcpu holding the lock may not be
      running at all.
      
      This creates contention, and the observed slowdown is 40x for
      hackbench. No, this isn't a typo.
      
      The solution is to trap blocking WFEs and tell KVM that we're
      now spinning. This ensures that other vpus will get a scheduling
      boost, allowing the lock to be released more quickly. Also, using
      CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
      when the VM is severely overcommited.
      
      Quick test to estimate the performance: hackbench 1 process 1000
      
      2xA15 host (baseline):	1.843s
      
      2xA15 guest w/o patch:	2.083s
      4xA15 guest w/o patch:	80.212s
      8xA15 guest w/o patch:	Could not be bothered to find out
      
      2xA15 guest w/ patch:	2.102s
      4xA15 guest w/ patch:	3.205s
      8xA15 guest w/ patch:	6.887s
      
      So we go from a 40x degradation to 1.5x in the 2x overcommit case,
      which is vaguely more acceptable.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      58d5ec8f
  4. 15 10月, 2013 3 次提交
  5. 14 10月, 2013 9 次提交
  6. 13 10月, 2013 10 次提交
  7. 12 10月, 2013 1 次提交
    • V
      ARC: Ignore ptrace SETREGSET request for synthetic register "stop_pc" · 5b242828
      Vineet Gupta 提交于
      ARCompact TRAP_S insn used for breakpoints, commits before exception is
      taken (updating architectural PC). So ptregs->ret contains next-PC and
      not the breakpoint PC itself. This is different from other restartable
      exceptions such as TLB Miss where ptregs->ret has exact faulting PC.
      gdb needs to know exact-PC hence ARC ptrace GETREGSET provides for
      @stop_pc which returns ptregs->ret vs. EFA depending on the
      situation.
      
      However, writing stop_pc (SETREGSET request), which updates ptregs->ret
      doesn't makes sense stop_pc doesn't always correspond to that reg as
      described above.
      
      This was not an issue so far since user_regs->ret / user_regs->stop_pc
      had same value and both writing to ptregs->ret was OK, needless, but NOT
      broken, hence not observed.
      
      With gdb "jump", they diverge, and user_regs->ret updating ptregs is
      overwritten immediately with stop_pc, which this patch fixes.
      Reported-by: NAnton Kolesov <akolesov@synopsys.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      5b242828
  8. 11 10月, 2013 2 次提交
  9. 10 10月, 2013 4 次提交
    • F
      xen: Fix possible user space selector corruption · 7cde9b27
      Frediano Ziglio 提交于
      Due to the way kernel is initialized under Xen is possible that the
      ring1 selector used by the kernel for the boot cpu end up to be copied
      to userspace leading to segmentation fault in the userspace.
      
      Xen code in the kernel initialize no-boot cpus with correct selectors (ds
      and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen).
      On task context switch (switch_to) we assume that ds, es and cs already
      point to __USER_DS and __KERNEL_CSso these selector are not changed.
      
      If processor is an Intel that support sysenter instruction sysenter/sysexit
      is used so ds and es are not restored switching back from kernel to
      userspace. In the case the selectors point to a ring1 instead of __USER_DS
      the userspace code will crash on first memory access attempt (to be
      precise Xen on the emulated iret used to do sysexit will detect and set ds
      and es to zero which lead to GPF anyway).
      
      Now if an userspace process call kernel using sysenter and get rescheduled
      (for me it happen on a specific init calling wait4) could happen that the
      ring1 selector is set to ds and es.
      
      This is quite hard to detect cause after a while these selectors are fixed
      (__USER_DS seems sticky).
      
      Bisecting the code commit 7076aada appears
      to be the first one that have this issue.
      Signed-off-by: NFrediano Ziglio <frediano.ziglio@citrix.com>
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Reviewed-by: NAndrew Cooper <andrew.cooper3@citrix.com>
      7cde9b27
    • B
      kvm: ppc: booke: check range page invalidation progress on page setup · 40fde70d
      Bharat Bhushan 提交于
      When the MM code is invalidating a range of pages, it calls the KVM
      kvm_mmu_notifier_invalidate_range_start() notifier function, which calls
      kvm_unmap_hva_range(), which arranges to flush all the TLBs for guest pages.
      However, the Linux PTEs for the range being flushed are still valid at
      that point.  We are not supposed to establish any new references to pages
      in the range until the ...range_end() notifier gets called.
      The PPC-specific KVM code doesn't get any explicit notification of that;
      instead, we are supposed to use mmu_notifier_retry() to test whether we
      are or have been inside a range flush notifier pair while we have been
      referencing a page.
      
      This patch calls the mmu_notifier_retry() while mapping the guest
      page to ensure we are not referencing a page when in range invalidation.
      
      This call is inside a region locked with kvm->mmu_lock, which is the
      same lock that is called by the KVM MMU notifier functions, thus
      ensuring that no new notification can proceed while we are in the
      locked region.
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Acked-by: NAlexander Graf <agraf@suse.de>
      [Backported to 3.12 - Paolo]
      Reviewed-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      40fde70d
    • P
      KVM: PPC: Book3S HV: Fix typo in saving DSCR · cfc86025
      Paul Mackerras 提交于
      This fixes a typo in the code that saves the guest DSCR (Data Stream
      Control Register) into the kvm_vcpu_arch struct on guest exit.  The
      effect of the typo was that the DSCR value was saved in the wrong place,
      so changes to the DSCR by the guest didn't persist across guest exit
      and entry, and some host kernel memory got corrupted.
      
      Cc: stable@vger.kernel.org [v3.1+]
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cfc86025
    • G
      KVM: nVMX: fix shadow on EPT · d0d538b9
      Gleb Natapov 提交于
      72f85795 broke shadow on EPT. This patch reverts it and fixes PAE
      on nEPT (which reverted commit fixed) in other way.
      
      Shadow on EPT is now broken because while L1 builds shadow page table
      for L2 (which is PAE while L2 is in real mode) it never loads L2's
      GUEST_PDPTR[0-3].  They do not need to be loaded because without nested
      virtualization HW does this during guest entry if EPT is disabled,
      but in our case L0 emulates L2's vmentry while EPT is enables, so we
      cannot rely on vmcs12->guest_pdptr[0-3] to contain up-to-date values
      and need to re-read PDPTEs from L2 memory. This is what kvm_set_cr3()
      is doing, but by clearing cache bits during L2 vmentry we drop values
      that kvm_set_cr3() read from memory.
      
      So why the same code does not work for PAE on nEPT? kvm_set_cr3()
      reads pdptes into vcpu->arch.walk_mmu->pdptrs[]. walk_mmu points to
      vcpu->arch.nested_mmu while nested guest is running, but ept_load_pdptrs()
      uses vcpu->arch.mmu which contain incorrect values. Fix that by using
      walk_mmu in ept_(load|save)_pdptrs.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Tested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d0d538b9
  10. 09 10月, 2013 3 次提交