1. 03 11月, 2014 15 次提交
  2. 02 11月, 2014 3 次提交
    • P
      KVM: vmx: defer load of APIC access page address during reset · a73896cb
      Paolo Bonzini 提交于
      Most call paths to vmx_vcpu_reset do not hold the SRCU lock.  Defer loading
      the APIC access page to the next vmentry.
      
      This avoids the following lockdep splat:
      
      [ INFO: suspicious RCU usage. ]
      3.18.0-rc2-test2+ #70 Not tainted
      -------------------------------
      include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 0
      1 lock held by qemu-system-x86/2371:
       #0:  (&vcpu->mutex){+.+...}, at: [<ffffffffa037d800>] vcpu_load+0x20/0xd0 [kvm]
      
      stack backtrace:
      CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70
      Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
       0000000000000001 ffff880209983ca8 ffffffff816f514f 0000000000000000
       ffff8802099b8990 ffff880209983cd8 ffffffff810bd687 00000000000fee00
       ffff880208a2c000 ffff880208a10000 ffff88020ef50040 ffff880209983d08
      Call Trace:
       [<ffffffff816f514f>] dump_stack+0x4e/0x71
       [<ffffffff810bd687>] lockdep_rcu_suspicious+0xe7/0x120
       [<ffffffffa037d055>] gfn_to_memslot+0xd5/0xe0 [kvm]
       [<ffffffffa03807d3>] __gfn_to_pfn+0x33/0x60 [kvm]
       [<ffffffffa0380885>] gfn_to_page+0x25/0x90 [kvm]
       [<ffffffffa038aeec>] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm]
       [<ffffffffa08f0a9c>] vmx_vcpu_reset+0x20c/0x460 [kvm_intel]
       [<ffffffffa039ab8e>] kvm_vcpu_reset+0x15e/0x1b0 [kvm]
       [<ffffffffa039ac0c>] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
       [<ffffffffa037f7e0>] kvm_vm_ioctl+0x1d0/0x780 [kvm]
       [<ffffffff810bc664>] ? __lock_is_held+0x54/0x80
       [<ffffffff812231f0>] do_vfs_ioctl+0x300/0x520
       [<ffffffff8122ee45>] ? __fget+0x5/0x250
       [<ffffffff8122f0fa>] ? __fget_light+0x2a/0xe0
       [<ffffffff81223491>] SyS_ioctl+0x81/0xa0
       [<ffffffff816fed6d>] system_call_fastpath+0x16/0x1b
      Reported-by: NTakashi Iwai <tiwai@suse.de>
      Reported-by: NAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Reviewed-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Tested-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Fixes: 38b99173Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a73896cb
    • J
      KVM: nVMX: Disable preemption while reading from shadow VMCS · 282da870
      Jan Kiszka 提交于
      In order to access the shadow VMCS, we need to load it. At this point,
      vmx->loaded_vmcs->vmcs and the actually loaded one start to differ. If
      we now get preempted by Linux, vmx_vcpu_put and, on return, the
      vmx_vcpu_load will work against the wrong vmcs. That can cause
      copy_shadow_to_vmcs12 to corrupt the vmcs12 state.
      
      Fix the issue by disabling preemption during the copy operation.
      copy_vmcs12_to_shadow is safe from this issue as it is executed by
      vmx_vcpu_run when preemption is already disabled before vmentry.
      
      This bug is exposed by running Jailhouse within KVM on CPUs with
      shadow VMCS support.  Jailhouse never expects an interrupt pending
      vmexit, but the bug can cause it if, after copy_shadow_to_vmcs12
      is preempted, the active VMCS happens to have the virtual interrupt
      pending flag set in the CPU-based execution controls.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      282da870
    • N
      KVM: x86: Fix far-jump to non-canonical check · 7e46dddd
      Nadav Amit 提交于
      Commit d1442d85 ("KVM: x86: Handle errors when RIP is set during far
      jumps") introduced a bug that caused the fix to be incomplete.  Due to
      incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit
      segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may
      not trigger #GP.  As we know, this imposes a security problem.
      
      In addition, the condition for two warnings was incorrect.
      
      Fixes: d1442d85Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      [Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7e46dddd
  3. 29 10月, 2014 4 次提交
    • J
      KVM: nVMX: Disable preemption while reading from shadow VMCS · 41e7ed64
      Jan Kiszka 提交于
      In order to access the shadow VMCS, we need to load it. At this point,
      vmx->loaded_vmcs->vmcs and the actually loaded one start to differ. If
      we now get preempted by Linux, vmx_vcpu_put and, on return, the
      vmx_vcpu_load will work against the wrong vmcs. That can cause
      copy_shadow_to_vmcs12 to corrupt the vmcs12 state.
      
      Fix the issue by disabling preemption during the copy operation.
      copy_vmcs12_to_shadow is safe from this issue as it is executed by
      vmx_vcpu_run when preemption is already disabled before vmentry.
      
      This bug is exposed by running Jailhouse within KVM on CPUs with
      shadow VMCS support.  Jailhouse never expects an interrupt pending
      vmexit, but the bug can cause it if, after copy_shadow_to_vmcs12
      is preempted, the active VMCS happens to have the virtual interrupt
      pending flag set in the CPU-based execution controls.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      41e7ed64
    • N
      KVM: x86: Fix far-jump to non-canonical check · cd9b8e2c
      Nadav Amit 提交于
      Commit d1442d85 ("KVM: x86: Handle errors when RIP is set during far
      jumps") introduced a bug that caused the fix to be incomplete.  Due to
      incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit
      segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may
      not trigger #GP.  As we know, this imposes a security problem.
      
      In addition, the condition for two warnings was incorrect.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      [Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cd9b8e2c
    • P
      KVM: emulator: fix execution close to the segment limit · fd56e154
      Paolo Bonzini 提交于
      Emulation of code that is 14 bytes to the segment limit or closer
      (e.g. RIP = 0xFFFFFFF2 after reset) is broken because we try to read as
      many as 15 bytes from the beginning of the instruction, and __linearize
      fails when the passed (address, size) pair reaches out of the segment.
      
      To fix this, let __linearize return the maximum accessible size (clamped
      to 2^32-1) for usage in __do_insn_fetch_bytes, and avoid the limit check
      by passing zero for the desired size.
      
      For expand-down segments, __linearize is performing a redundant check.
      (u32)(addr.ea + size - 1) <= lim can only happen if addr.ea is close
      to 4GB; in this case, addr.ea + size - 1 will also fail the check against
      the upper bound of the segment (which is provided by the D/B bit).
      After eliminating the redundant check, it is simple to compute
      the *max_size for expand-down segments too.
      
      Now that the limit check is done in __do_insn_fetch_bytes, we want
      to inject a general protection fault there if size < op_size (like
      __linearize would have done), instead of just aborting.
      
      This fixes booting Tiano Core from emulated flash with EPT disabled.
      
      Cc: stable@vger.kernel.org
      Fixes: 719d5a9bReported-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fd56e154
    • P
      KVM: emulator: fix error code for __linearize · 3606189f
      Paolo Bonzini 提交于
      The error code for #GP and #SS is zero when the segment is used to
      access an operand or an instruction.  It is only non-zero when
      a segment register is being loaded; for limit checks this means
      cases such as:
      
      * for #GP, when RIP is beyond the limit on a far call (before the first
      instruction is executed).  We do not implement this check, but it
      would be in em_jmp_far/em_call_far.
      
      * for #SS, if the new stack overflows during an inter-privilege-level
      call to a non-conforming code segment.  We do not implement stack
      switching at all.
      
      So use an error code of zero.
      Reviewed-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3606189f
  4. 24 10月, 2014 13 次提交
  5. 19 10月, 2014 1 次提交
    • A
      x86,kvm,vmx: Preserve CR4 across VM entry · d974baa3
      Andy Lutomirski 提交于
      CR4 isn't constant; at least the TSD and PCE bits can vary.
      
      TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks
      like it's correct.
      
      This adds a branch and a read from cr4 to each vm entry.  Because it is
      extremely likely that consecutive entries into the same vcpu will have
      the same host cr4 value, this fixes up the vmcs instead of restoring cr4
      after the fact.  A subsequent patch will add a kernel-wide cr4 shadow,
      reducing the overhead in the common case to just two memory reads and a
      branch.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: stable@vger.kernel.org
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d974baa3
  6. 03 10月, 2014 1 次提交
    • P
      kvm: do not handle APIC access page if in-kernel irqchip is not in use · f439ed27
      Paolo Bonzini 提交于
      This fixes the following OOPS:
      
         loaded kvm module (v3.17-rc1-168-gcec26bc3)
         BUG: unable to handle kernel paging request at fffffffffffffffe
         IP: [<ffffffff81168449>] put_page+0x9/0x30
         PGD 1e15067 PUD 1e17067 PMD 0
         Oops: 0000 [#1] PREEMPT SMP
          [<ffffffffa063271d>] ? kvm_vcpu_reload_apic_access_page+0x5d/0x70 [kvm]
          [<ffffffffa013b6db>] vmx_vcpu_reset+0x21b/0x470 [kvm_intel]
          [<ffffffffa0658816>] ? kvm_pmu_reset+0x76/0xb0 [kvm]
          [<ffffffffa064032a>] kvm_vcpu_reset+0x15a/0x1b0 [kvm]
          [<ffffffffa06403ac>] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
          [<ffffffffa062e540>] kvm_vm_ioctl+0x200/0x780 [kvm]
          [<ffffffff81212170>] do_vfs_ioctl+0x2d0/0x4b0
          [<ffffffff8108bd99>] ? __mmdrop+0x69/0xb0
          [<ffffffff812123d1>] SyS_ioctl+0x81/0xa0
          [<ffffffff8112a6f6>] ? __audit_syscall_exit+0x1f6/0x2a0
          [<ffffffff817229e9>] system_call_fastpath+0x16/0x1b
         Code: c6 78 ce a3 81 4c 89 e7 e8 d9 80 ff ff 0f 0b 4c 89 e7 e8 8f f6 ff ff e9 fa fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 1e 8b 47 1c 85 c0 74 27 f0
         RIP  [<ffffffff81193045>] put_page+0x5/0x50
      
      when not using the in-kernel irqchip ("-machine kernel_irqchip=off"
      with QEMU).  The fix is to make the same check in
      kvm_vcpu_reload_apic_access_page that we already have
      in vmx.c's vm_need_virtualize_apic_accesses().
      Reported-by: NJan Kiszka <jan.kiszka@siemens.com>
      Tested-by: NJan Kiszka <jan.kiszka@siemens.com>
      Fixes: 4256f43fSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f439ed27
  7. 24 9月, 2014 3 次提交
    • T
      kvm: x86: Unpin and remove kvm_arch->apic_access_page · c24ae0dc
      Tang Chen 提交于
      In order to make the APIC access page migratable, stop pinning it in
      memory.
      
      And because the APIC access page is not pinned in memory, we can
      remove kvm_arch->apic_access_page.  When we need to write its
      physical address into vmcs, we use gfn_to_page() to get its page
      struct, which is needed to call page_to_phys(); the page is then
      immediately unpinned.
      Suggested-by: NGleb Natapov <gleb@kernel.org>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c24ae0dc
    • T
      kvm: vmx: Implement set_apic_access_page_addr · 38b99173
      Tang Chen 提交于
      Currently, the APIC access page is pinned by KVM for the entire life
      of the guest.  We want to make it migratable in order to make memory
      hot-unplug available for machines that run KVM.
      
      This patch prepares to handle this for the case where there is no nested
      virtualization, or where the nested guest does not have an APIC page of
      its own.  All accesses to kvm->arch.apic_access_page are changed to go
      through kvm_vcpu_reload_apic_access_page.
      
      If the APIC access page is invalidated when the host is running, we update
      the VMCS in the next guest entry.
      
      If it is invalidated when the guest is running, the MMU notifier will force
      an exit, after which we will handle everything as in the previous case.
      
      If it is invalidated when a nested guest is running, the request will update
      either the VMCS01 or the VMCS02.  Updating the VMCS01 is done at the
      next L2->L1 exit, while updating the VMCS02 is done in prepare_vmcs02.
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      38b99173
    • T
      kvm: x86: Add request bit to reload APIC access page address · 4256f43f
      Tang Chen 提交于
      Currently, the APIC access page is pinned by KVM for the entire life
      of the guest.  We want to make it migratable in order to make memory
      hot-unplug available for machines that run KVM.
      
      This patch prepares to handle this in generic code, through a new
      request bit (that will be set by the MMU notifier) and a new hook
      that is called whenever the request bit is processed.
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4256f43f