1. 26 9月, 2011 4 次提交
    • N
      KVM: L1 TSC handling · d5c1785d
      Nadav Har'El 提交于
      KVM assumed in several places that reading the TSC MSR returns the value for
      L1. This is incorrect, because when L2 is running, the correct TSC read exit
      emulation is to return L2's value.
      
      We therefore add a new x86_ops function, read_l1_tsc, to use in places that
      specifically need to read the L1 TSC, NOT the TSC of the current level of
      guest.
      
      Note that one change, of one line in kvm_arch_vcpu_load, is made redundant
      by a different patch sent by Zachary Amsden (and not yet applied):
      kvm_arch_vcpu_load() should not read the guest TSC, and if it didn't, of
      course we didn't have to change the call of kvm_get_msr() to read_l1_tsc().
      
      [avi: moved callback to kvm_x86_ops tsc block]
      Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
      Acked-by: NZachary Amsdem <zamsden@gmail.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d5c1785d
    • A
      KVM: MMU: Do not unconditionally read PDPTE from guest memory · e4e517b4
      Avi Kivity 提交于
      Architecturally, PDPTEs are cached in the PDPTRs when CR3 is reloaded.
      On SVM, it is not possible to implement this, but on VMX this is possible
      and was indeed implemented until nested SVM changed this to unconditionally
      read PDPTEs dynamically.  This has noticable impact when running PAE guests.
      
      Fix by changing the MMU to read PDPTRs from the cache, falling back to
      reading from memory for the nested MMU.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Tested-by: NJoerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      e4e517b4
    • S
      KVM: Use __print_symbolic() for vmexit tracepoints · 0d460ffc
      Stefan Hajnoczi 提交于
      The vmexit tracepoints format the exit_reason to make it human-readable.
      Since the exit_reason depends on the instruction set (vmx or svm),
      formatting is handled with ftrace_print_symbols_seq() by referring to
      the appropriate exit reason table.
      
      However, the ftrace_print_symbols_seq() function is not meant to be used
      directly in tracepoints since it does not export the formatting table
      which userspace tools like trace-cmd and perf use to format traces.
      
      In practice perf dies when formatting vmexit-related events and
      trace-cmd falls back to printing the numeric value (with extra
      formatting code in the kvm plugin to paper over this limitation).  Other
      userspace consumers of vmexit-related tracepoints would be in similar
      trouble.
      
      To avoid significant changes to the kvm_exit tracepoint, this patch
      moves the vmx and svm exit reason tables into arch/x86/kvm/trace.h and
      selects the right table with __print_symbolic() depending on the
      instruction set.  Note that __print_symbolic() is designed for exporting
      the formatting table to userspace and allows trace-cmd and perf to work.
      Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      0d460ffc
    • S
      KVM: x86: Raise the hard VCPU count limit · 8c3ba334
      Sasha Levin 提交于
      The patch raises the hard limit of VCPU count to 254.
      
      This will allow developers to easily work on scalability
      and will allow users to test high VCPU setups easily without
      patching the kernel.
      
      To prevent possible issues with current setups, KVM_CAP_NR_VCPUS
      now returns the recommended VCPU limit (which is still 64) - this
      should be a safe value for everybody, while a new KVM_CAP_MAX_VCPUS
      returns the hard limit which is now 254.
      
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Suggested-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      8c3ba334
  2. 16 9月, 2011 1 次提交
    • L
      asm alternatives: remove incorrect alignment notes · a7f934d4
      Linus Torvalds 提交于
      On x86-64, they were just wasteful: with the explicitly added (now
      unnecessary) padding, the size of the alternatives structure was 16
      bytes, and an alignment of 8 bytes didn't hurt much.
      
      However, it was still silly, since the natural size and alignment for
      the structure is actually just 12 bytes, 4-byte aligned since commit
      59e97e4d ("x86: Make alternative instruction pointers relative").
      So removing the padding, and removing the extra alignment is just a good
      idea.
      
      On x86-32, the alignment of 4 bytes was correct, but was incorrectly
      hardcoded as 8 bytes in <asm/alternative-asm.h>.  That header file had
      used to be an x86-64 only header file, but various unification efforts
      have made it be used for x86-32 too (ie the unification of rwlock and
      rwsem).
      
      That in turn caused x86-32 boot failures, because the extra alignment
      would result in random zero-filled words in the altinstructions section,
      causing oopses early at boot when doing alternative instruction
      replacement.
      
      So just remove all the alignment noise entirely.  It's wrong, and it's
      unnecessary.  The section itself is already properly aligned by the
      linker scripts, and all additions to the section had better be of the
      proper 12-byte format, keeping it aligned.  So if the align directive
      were to ever make a difference, that would be an indication of a serious
      bug to begin with.
      Reported-by: NWerner Landgraf <w.landgraf@ru.r>
      Acked-by: NAndrew Lutomirski <luto@mit.edu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7f934d4
  3. 30 8月, 2011 1 次提交
    • D
      KVM: Fix instruction size issue in pvclock scaling · 3b217116
      Duncan Sands 提交于
      Commit de2d1a52 ("KVM: Fix register corruption in pvclock_scale_delta")
      introduced a mul instruction that may have only a memory operand; the
      assembler therefore cannot select the correct size:
      
         pvclock.s:229: Error: no instruction mnemonic suffix given and no register
      operands; can't size instruction
      
      In this example the assembler is:
      
               #APP
               mul -48(%rbp) ; shrd $32, %rdx, %rax
               #NO_APP
      
      A simple solution is to use mulq.
      Signed-off-by: NDuncan Sands <baldrick@free.fr>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      3b217116
  4. 27 8月, 2011 1 次提交
  5. 17 8月, 2011 1 次提交
    • J
      xen/x86: replace order-based range checking of M2P table by linear one · ccbcdf7c
      Jan Beulich 提交于
      The order-based approach is not only less efficient (requiring a shift
      and a compare, typical generated code looking like this
      
      	mov	eax, [machine_to_phys_order]
      	mov	ecx, eax
      	shr	ebx, cl
      	test	ebx, ebx
      	jnz	...
      
      whereas a direct check requires just a compare, like in
      
      	cmp	ebx, [machine_to_phys_nr]
      	jae	...
      
      ), but also slightly dangerous in the 32-on-64 case - the element
      address calculation can wrap if the next power of two boundary is
      sufficiently far away from the actual upper limit of the table, and
      hence can result in user space addresses being accessed (with it being
      unknown what may actually be mapped there).
      
      Additionally, the elimination of the mistaken use of fls() here (should
      have been __fls()) fixes a latent issue on x86-64 that would trigger
      if the code was run on a system with memory extending beyond the 44-bit
      boundary.
      
      CC: stable@kernel.org
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      [v1: Based on Jeremy's feedback]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ccbcdf7c
  6. 11 8月, 2011 2 次提交
  7. 05 8月, 2011 1 次提交
  8. 04 8月, 2011 2 次提交
  9. 27 7月, 2011 5 次提交
  10. 24 7月, 2011 4 次提交
  11. 23 7月, 2011 1 次提交
  12. 22 7月, 2011 4 次提交
  13. 21 7月, 2011 3 次提交
  14. 19 7月, 2011 1 次提交
  15. 15 7月, 2011 4 次提交
  16. 14 7月, 2011 5 次提交