1. 23 12月, 2012 1 次提交
  2. 18 12月, 2012 1 次提交
    • N
      kvm: fix i8254 counter 0 wraparound · d4b06c2d
      Nickolai Zeldovich 提交于
      The kvm i8254 emulation for counter 0 (but not for counters 1 and 2)
      has at least two bugs in mode 0:
      
      1. The OUT bit, computed by pit_get_out(), is never set high.
      
      2. The counter value, computed by pit_get_count(), wraps back around to
         the initial counter value, rather than wrapping back to 0xFFFF
         (which is the behavior described in the comment in __kpit_elapsed,
         the behavior implemented by qemu, and the behavior observed on AMD
         hardware).
      
      The bug stems from __kpit_elapsed computing the elapsed time mod the
      initial counter value (stored as nanoseconds in ps->period).  This is both
      unnecessary (none of the callers of kpit_elapsed expect the value to be
      at most the initial counter value) and incorrect (it causes pit_get_count
      to appear to wrap around to the initial counter value rather than 0xFFFF).
      Removing this mod from __kpit_elapsed fixes both of the above bugs.
      Signed-off-by: NNickolai Zeldovich <nickolai@csail.mit.edu>
      Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      d4b06c2d
  3. 15 12月, 2012 1 次提交
  4. 14 12月, 2012 5 次提交
  5. 12 12月, 2012 3 次提交
  6. 07 12月, 2012 1 次提交
  7. 06 12月, 2012 2 次提交
    • X
      KVM: MMU: optimize for set_spte · c2193463
      Xiao Guangrong 提交于
      There are two cases we need to adjust page size in set_spte:
      1): the one is other vcpu creates new sp in the window between mapping_level()
          and acquiring mmu-lock.
      2): the another case is the new sp is created by itself (page-fault path) when
          guest uses the target gfn as its page table.
      
      In current code, set_spte drop the spte and emulate the access for these case,
      it works not good:
      - for the case 1, it may destroy the mapping established by other vcpu, and
        do expensive instruction emulation.
      - for the case 2, it may emulate the access even if the guest is accessing
        the page which not used as page table. There is a example, 0~2M is used as
        huge page in guest, in this huge page, only page 3 used as page table, then
        guest read/writes on other pages can cause instruction emulation.
      
      Both of these cases can be fixed by allowing guest to retry the access, it
      will refault, then we can establish the mapping by using small page
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      c2193463
    • J
      KVM: x86: Make register state after reset conform to specification · 66f7b72e
      Julian Stecklina 提交于
      VMX behaves now as SVM wrt to FPU initialization. Code has been moved to
      generic code path. General-purpose registers are now cleared on reset and
      INIT.  SVM code properly initializes EDX.
      Signed-off-by: NJulian Stecklina <jsteckli@os.inf.tu-dresden.de>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      66f7b72e
  8. 05 12月, 2012 2 次提交
  9. 02 12月, 2012 1 次提交
  10. 01 12月, 2012 2 次提交
    • W
      KVM: x86: Emulate IA32_TSC_ADJUST MSR · ba904635
      Will Auld 提交于
      CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
      
      Basic design is to emulate the MSR by allowing reads and writes to a guest
      vcpu specific location to store the value of the emulated MSR while adding
      the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will
      be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This
      is of course as long as the "use TSC counter offsetting" VM-execution control
      is enabled as well as the IA32_TSC_ADJUST control.
      
      However, because hardware will only return the TSC + IA32_TSC_ADJUST +
      vmsc tsc_offset for a guest process when it does and rdtsc (with the correct
      settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one
      of these three locations. The argument against storing it in the actual MSR
      is performance. This is likely to be seldom used while the save/restore is
      required on every transition. IA32_TSC_ADJUST was created as a way to solve
      some issues with writing TSC itself so that is not an option either.
      
      The remaining option, defined above as our solution has the problem of
      returning incorrect vmcs tsc_offset values (unless we intercept and fix, not
      done here) as mentioned above. However, more problematic is that storing the
      data in vmcs tsc_offset will have a different semantic effect on the system
      than does using the actual MSR. This is illustrated in the following example:
      
      The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest
      process performs a rdtsc. In this case the guest process will get
      TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
      IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics
      as seen by the guest do not and hence this will not cause a problem.
      Signed-off-by: NWill Auld <will.auld@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      ba904635
    • W
      KVM: x86: Add code to track call origin for msr assignment · 8fe8ab46
      Will Auld 提交于
      In order to track who initiated the call (host or guest) to modify an msr
      value I have changed function call parameters along the call path. The
      specific change is to add a struct pointer parameter that points to (index,
      data, caller) information rather than having this information passed as
      individual parameters.
      
      The initial use for this capability is for updating the IA32_TSC_ADJUST msr
      while setting the tsc value. It is anticipated that this capability is
      useful for other tasks.
      Signed-off-by: NWill Auld <will.auld@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      8fe8ab46
  11. 30 11月, 2012 1 次提交
  12. 29 11月, 2012 1 次提交
  13. 28 11月, 2012 7 次提交
  14. 27 11月, 2012 1 次提交
  15. 17 11月, 2012 1 次提交
    • T
      KVM: x86: Fix invalid secondary exec controls in vmx_cpuid_update() · 29282fde
      Takashi Iwai 提交于
      The commit [ad756a16: KVM: VMX: Implement PCID/INVPCID for guests with
      EPT] introduced the unconditional access to SECONDARY_VM_EXEC_CONTROL,
      and this triggers kernel warnings like below on old CPUs:
      
          vmwrite error: reg 401e value a0568000 (err 12)
          Pid: 13649, comm: qemu-kvm Not tainted 3.7.0-rc4-test2+ #154
          Call Trace:
           [<ffffffffa0558d86>] vmwrite_error+0x27/0x29 [kvm_intel]
           [<ffffffffa054e8cb>] vmcs_writel+0x1b/0x20 [kvm_intel]
           [<ffffffffa054f114>] vmx_cpuid_update+0x74/0x170 [kvm_intel]
           [<ffffffffa03629b6>] kvm_vcpu_ioctl_set_cpuid2+0x76/0x90 [kvm]
           [<ffffffffa0341c67>] kvm_arch_vcpu_ioctl+0xc37/0xed0 [kvm]
           [<ffffffff81143f7c>] ? __vunmap+0x9c/0x110
           [<ffffffffa0551489>] ? vmx_vcpu_load+0x39/0x1a0 [kvm_intel]
           [<ffffffffa0340ee2>] ? kvm_arch_vcpu_load+0x52/0x1a0 [kvm]
           [<ffffffffa032dcd4>] ? vcpu_load+0x74/0xd0 [kvm]
           [<ffffffffa032deb0>] kvm_vcpu_ioctl+0x110/0x5e0 [kvm]
           [<ffffffffa032e93d>] ? kvm_dev_ioctl+0x4d/0x4a0 [kvm]
           [<ffffffff8117dc6f>] do_vfs_ioctl+0x8f/0x530
           [<ffffffff81139d76>] ? remove_vma+0x56/0x60
           [<ffffffff8113b708>] ? do_munmap+0x328/0x400
           [<ffffffff81187c8c>] ? fget_light+0x4c/0x100
           [<ffffffff8117e1a1>] sys_ioctl+0x91/0xb0
           [<ffffffff815a942d>] system_call_fastpath+0x1a/0x1f
      
      This patch adds a check for the availability of secondary exec
      control to avoid these warnings.
      
      Cc: <stable@vger.kernel.org> [v3.6+]
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      29282fde
  16. 14 11月, 2012 3 次提交
  17. 13 11月, 2012 1 次提交
    • P
      KVM: x86: invalid opcode oops on SET_SREGS with OSXSAVE bit set (CVE-2012-4461) · 6d1068b3
      Petr Matousek 提交于
      On hosts without the XSAVE support unprivileged local user can trigger
      oops similar to the one below by setting X86_CR4_OSXSAVE bit in guest
      cr4 register using KVM_SET_SREGS ioctl and later issuing KVM_RUN
      ioctl.
      
      invalid opcode: 0000 [#2] SMP
      Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables
      ...
      Pid: 24935, comm: zoog_kvm_monito Tainted: G      D      3.2.0-3-686-pae
      EIP: 0060:[<f8b9550c>] EFLAGS: 00210246 CPU: 0
      EIP is at kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm]
      EAX: 00000001 EBX: 000f387e ECX: 00000000 EDX: 00000000
      ESI: 00000000 EDI: 00000000 EBP: ef5a0060 ESP: d7c63e70
       DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      Process zoog_kvm_monito (pid: 24935, ti=d7c62000 task=ed84a0c0
      task.ti=d7c62000)
      Stack:
       00000001 f70a1200 f8b940a9 ef5a0060 00000000 00200202 f8769009 00000000
       ef5a0060 000f387e eda5c020 8722f9c8 00015bae 00000000 ed84a0c0 ed84a0c0
       c12bf02d 0000ae80 ef7f8740 fffffffb f359b740 ef5a0060 f8b85dc1 0000ae80
      Call Trace:
       [<f8b940a9>] ? kvm_arch_vcpu_ioctl_set_sregs+0x2fe/0x308 [kvm]
      ...
       [<c12bfb44>] ? syscall_call+0x7/0xb
      Code: 89 e8 e8 14 ee ff ff ba 00 00 04 00 89 e8 e8 98 48 ff ff 85 c0 74
      1e 83 7d 48 00 75 18 8b 85 08 07 00 00 31 c9 8b 95 0c 07 00 00 <0f> 01
      d1 c7 45 48 01 00 00 00 c7 45 1c 01 00 00 00 0f ae f0 89
      EIP: [<f8b9550c>] kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] SS:ESP
      0068:d7c63e70
      
      QEMU first retrieves the supported features via KVM_GET_SUPPORTED_CPUID
      and then sets them later. So guest's X86_FEATURE_XSAVE should be masked
      out on hosts without X86_FEATURE_XSAVE, making kvm_set_cr4 with
      X86_CR4_OSXSAVE fail. Userspaces that allow specifying guest cpuid with
      X86_FEATURE_XSAVE even on hosts that do not support it, might be
      susceptible to this attack from inside the guest as well.
      
      Allow setting X86_CR4_OSXSAVE bit only if host has XSAVE support.
      Signed-off-by: NPetr Matousek <pmatouse@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      6d1068b3
  18. 01 11月, 2012 1 次提交
    • X
      KVM: x86: fix vcpu->mmio_fragments overflow · 87da7e66
      Xiao Guangrong 提交于
      After commit b3356bf0 (KVM: emulator: optimize "rep ins" handling),
      the pieces of io data can be collected and write them to the guest memory
      or MMIO together
      
      Unfortunately, kvm splits the mmio access into 8 bytes and store them to
      vcpu->mmio_fragments. If the guest uses "rep ins" to move large data, it
      will cause vcpu->mmio_fragments overflow
      
      The bug can be exposed by isapc (-M isapc):
      
      [23154.818733] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
      [ ......]
      [23154.858083] Call Trace:
      [23154.859874]  [<ffffffffa04f0e17>] kvm_get_cr8+0x1d/0x28 [kvm]
      [23154.861677]  [<ffffffffa04fa6d4>] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 [kvm]
      [23154.863604]  [<ffffffffa04f5a1a>] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm]
      
      Actually, we can use one mmio_fragment to store a large mmio access then
      split it when we pass the mmio-exit-info to userspace. After that, we only
      need two entries to store mmio info for the cross-mmio pages access
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      87da7e66
  19. 30 10月, 2012 1 次提交
  20. 23 10月, 2012 3 次提交
  21. 22 10月, 2012 1 次提交