1. 13 4月, 2017 5 次提交
  2. 07 4月, 2017 13 次提交
  3. 28 3月, 2017 2 次提交
  4. 24 3月, 2017 6 次提交
    • W
      KVM: VMX: Fix enable VPID conditions · 08d839c4
      Wanpeng Li 提交于
      This can be reproduced by running L2 on L1, and disable VPID on L0
      if w/o commit "KVM: nVMX: Fix nested VPID vmx exec control", the L2
      crash as below:
      
      KVM: entry failed, hardware error 0x7
      EAX=00000000 EBX=00000000 ECX=00000000 EDX=000306c3
      ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
      EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
      ES =0000 00000000 0000ffff 00009300
      CS =f000 ffff0000 0000ffff 00009b00
      SS =0000 00000000 0000ffff 00009300
      DS =0000 00000000 0000ffff 00009300
      FS =0000 00000000 0000ffff 00009300
      GS =0000 00000000 0000ffff 00009300
      LDT=0000 00000000 0000ffff 00008200
      TR =0000 00000000 0000ffff 00008b00
      GDT=     00000000 0000ffff
      IDT=     00000000 0000ffff
      CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
      DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
      DR6=00000000ffff0ff0 DR7=0000000000000400
      EFER=0000000000000000
      
      Reference SDM 30.3 INVVPID:
      
      Protected Mode Exceptions
      - #UD
        - If not in VMX operation.
        - If the logical processor does not support VPIDs (IA32_VMX_PROCBASED_CTLS2[37]=0).
        - If the logical processor supports VPIDs (IA32_VMX_PROCBASED_CTLS2[37]=1) but does
          not support the INVVPID instruction (IA32_VMX_EPT_VPID_CAP[32]=0).
      
      So we should check both VPID enable bit in vmx exec control and INVVPID support bit
      in vmx capability MSRs to enable VPID. This patch adds the guarantee to not enable
      VPID if either INVVPID or single-context/all-context invalidation is not exposed in
      vmx capability MSRs.
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      08d839c4
    • W
      KVM: nVMX: Fix nested VPID vmx exec control · 63cb6d5f
      Wanpeng Li 提交于
      This can be reproduced by running kvm-unit-tests/vmx.flat on L0 w/ vpid disabled.
      
      Test suite: VPID
      Unhandled exception 6 #UD at ip 00000000004051a6
      error_code=0000      rflags=00010047      cs=00000008
      rax=0000000000000000 rcx=0000000000000001 rdx=0000000000000047 rbx=0000000000402f79
      rbp=0000000000456240 rsi=0000000000000001 rdi=0000000000000000
      r8=000000000000000a  r9=00000000000003f8 r10=0000000080010011 r11=0000000000000000
      r12=0000000000000003 r13=0000000000000708 r14=0000000000000000 r15=0000000000000000
      cr0=0000000080010031 cr2=0000000000000000 cr3=0000000007fff000 cr4=0000000000002020
      cr8=0000000000000000
      STACK: @4051a6 40523e 400f7f 402059 40028f
      
      We should hide and forbid VPID in L1 if it is disabled on L0. However, nested VPID
      enable bit is set unconditionally during setup nested vmx exec controls though VPID
      is not exposed through nested VMX capablity. This patch fixes it by don't set nested
      VPID enable bit if it is disabled on L0.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 5c614b35 (KVM: nVMX: nested VPID emulation)
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      63cb6d5f
    • W
      KVM: x86: correct async page present tracepoint · 24dccf83
      Wanpeng Li 提交于
      After async pf setup successfully, there is a broadcast wakeup w/ special
      token 0xffffffff which tells vCPU that it should wake up all processes
      waiting for APFs though there is no real process waiting at the moment.
      
      The async page present tracepoint print prematurely and fails to catch the
      special token setup. This patch fixes it by moving the async page present
      tracepoint after the special token setup.
      
      Before patch:
      
      qemu-system-x86-8499  [006] ...1  5973.473292: kvm_async_pf_ready: token 0x0 gva 0x0
      
      After patch:
      
      qemu-system-x86-8499  [006] ...1  5973.473292: kvm_async_pf_ready: token 0xffffffff gva 0x0
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      24dccf83
    • J
      kvm: vmx: Flush TLB when the APIC-access address changes · fb6c8198
      Jim Mattson 提交于
      Quoting from the Intel SDM, volume 3, section 28.3.3.4: Guidelines for
      Use of the INVEPT Instruction:
      
      If EPT was in use on a logical processor at one time with EPTP X, it
      is recommended that software use the INVEPT instruction with the
      "single-context" INVEPT type and with EPTP X in the INVEPT descriptor
      before a VM entry on the same logical processor that enables EPT with
      EPTP X and either (a) the "virtualize APIC accesses" VM-execution
      control was changed from 0 to 1; or (b) the value of the APIC-access
      address was changed.
      
      In the nested case, the burden falls on L1, unless L0 enables EPT in
      vmcs02 when L1 doesn't enable EPT in vmcs12.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      fb6c8198
    • P
      KVM: x86: use pic/ioapic destructor when destroy vm · c761159c
      Peter Xu 提交于
      We have specific destructors for pic/ioapic, we'd better use them when
      destroying the VM as well.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      c761159c
    • P
      KVM: x86: check existance before destroy · 950712eb
      Peter Xu 提交于
      Mostly used for split irqchip mode. In that case, these two things are
      not inited at all, so no need to release.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      950712eb
  5. 20 3月, 2017 3 次提交
  6. 09 3月, 2017 1 次提交
    • R
      KVM: nVMX: do not warn when MSR bitmap address is not backed · 05d8d346
      Radim Krčmář 提交于
      Before trying to do nested_get_page() in nested_vmx_merge_msr_bitmap(),
      we have already checked that the MSR bitmap address is valid (4k aligned
      and within physical limits).  SDM doesn't specify what happens if the
      there is no memory mapped at the valid address, but Intel CPUs treat the
      situation as if the bitmap was configured to trap all MSRs.
      
      KVM already does that by returning false and a correct handling doesn't
      need the guest-trigerrable warning that was reported by syzkaller:
      (The warning was originally there to catch some possible bugs in nVMX.)
      
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 7832 at arch/x86/kvm/vmx.c:9709
        nested_vmx_merge_msr_bitmap arch/x86/kvm/vmx.c:9709 [inline]
        WARNING: CPU: 0 PID: 7832 at arch/x86/kvm/vmx.c:9709
        nested_get_vmcs12_pages+0xfb6/0x15c0 arch/x86/kvm/vmx.c:9640
        Kernel panic - not syncing: panic_on_warn set ...
        CPU: 0 PID: 7832 Comm: syz-executor1 Not tainted 4.10.0+ #229
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        Call Trace:
         __dump_stack lib/dump_stack.c:15 [inline]
         dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
         panic+0x1fb/0x412 kernel/panic.c:179
         __warn+0x1c4/0x1e0 kernel/panic.c:540
         warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
         nested_vmx_merge_msr_bitmap arch/x86/kvm/vmx.c:9709 [inline]
         nested_get_vmcs12_pages+0xfb6/0x15c0 arch/x86/kvm/vmx.c:9640
         enter_vmx_non_root_mode arch/x86/kvm/vmx.c:10471 [inline]
         nested_vmx_run+0x6186/0xaab0 arch/x86/kvm/vmx.c:10561
         handle_vmlaunch+0x1a/0x20 arch/x86/kvm/vmx.c:7312
         vmx_handle_exit+0xfc0/0x3f00 arch/x86/kvm/vmx.c:8526
         vcpu_enter_guest arch/x86/kvm/x86.c:6982 [inline]
         vcpu_run arch/x86/kvm/x86.c:7044 [inline]
         kvm_arch_vcpu_ioctl_run+0x1418/0x4840 arch/x86/kvm/x86.c:7205
         kvm_vcpu_ioctl+0x673/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2570
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      [Jim Mattson explained the bare metal behavior: "I believe this behavior
       would be documented in the chipset data sheet rather than the SDM,
       since the chipset returns all 1s for an unclaimed read."]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      05d8d346
  7. 07 3月, 2017 2 次提交
    • W
      KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset · 2f707d97
      Wanpeng Li 提交于
      Reported by syzkaller:
      
          WARNING: CPU: 1 PID: 27742 at arch/x86/kvm/vmx.c:11029
          nested_vmx_vmexit+0x5c35/0x74d0 arch/x86/kvm/vmx.c:11029
          CPU: 1 PID: 27742 Comm: a.out Not tainted 4.10.0+ #229
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Call Trace:
           __dump_stack lib/dump_stack.c:15 [inline]
           dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
           panic+0x1fb/0x412 kernel/panic.c:179
           __warn+0x1c4/0x1e0 kernel/panic.c:540
           warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
           nested_vmx_vmexit+0x5c35/0x74d0 arch/x86/kvm/vmx.c:11029
           vmx_leave_nested arch/x86/kvm/vmx.c:11136 [inline]
           vmx_set_msr+0x1565/0x1910 arch/x86/kvm/vmx.c:3324
           kvm_set_msr+0xd4/0x170 arch/x86/kvm/x86.c:1099
           do_set_msr+0x11e/0x190 arch/x86/kvm/x86.c:1128
           __msr_io arch/x86/kvm/x86.c:2577 [inline]
           msr_io+0x24b/0x450 arch/x86/kvm/x86.c:2614
           kvm_arch_vcpu_ioctl+0x35b/0x46a0 arch/x86/kvm/x86.c:3497
           kvm_vcpu_ioctl+0x232/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2721
           vfs_ioctl fs/ioctl.c:43 [inline]
           do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683
           SYSC_ioctl fs/ioctl.c:698 [inline]
           SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
           entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      The syzkaller folks reported a nested_run_pending warning during userspace
      clear VMX capability which is exposed to L1 before.
      
      The warning gets thrown while doing
      
      (*(uint32_t*)0x20aecfe8 = (uint32_t)0x1);
      (*(uint32_t*)0x20aecfec = (uint32_t)0x0);
      (*(uint32_t*)0x20aecff0 = (uint32_t)0x3a);
      (*(uint32_t*)0x20aecff4 = (uint32_t)0x0);
      (*(uint64_t*)0x20aecff8 = (uint64_t)0x0);
      r[29] = syscall(__NR_ioctl, r[4], 0x4008ae89ul,
      		0x20aecfe8ul, 0, 0, 0, 0, 0, 0);
      
      i.e. KVM_SET_MSR ioctl with
      
      struct kvm_msrs {
      	.nmsrs = 1,
      		.pad = 0,
      		.entries = {
      			{.index = MSR_IA32_FEATURE_CONTROL,
      			 .reserved = 0,
      			 .data = 0}
      		}
      }
      
      The VMLANCH/VMRESUME emulation should be stopped since the CPU is going to
      reset here. This patch resets the nested_run_pending since the CPU is going
      to be reset hence there should be nothing pending.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Suggested-by: NRadim Krčmář <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      2f707d97
    • J
      kvm: nVMX: VMCLEAR should not cause the vCPU to shut down · 587d7e72
      Jim Mattson 提交于
      VMCLEAR should silently ignore a failure to clear the launch state of
      the VMCS referenced by the operand.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      [Changed "kvm_write_guest(vcpu->kvm" to "kvm_vcpu_write_guest(vcpu".]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      587d7e72
  8. 02 3月, 2017 6 次提交
    • I
      sched/headers: Prepare to move sched_info_on() and force_schedstat_enabled()... · 3905f9ad
      Ingo Molnar 提交于
      sched/headers: Prepare to move sched_info_on() and force_schedstat_enabled() from <linux/sched.h> to <linux/sched/stat.h>
      
      But first update usage sites with the new header dependency.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3905f9ad
    • I
      sched/headers: Prepare to move cputime functionality from <linux/sched.h> into... · 32ef5517
      Ingo Molnar 提交于
      sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h>
      
      Introduce a trivial, mostly empty <linux/sched/cputime.h> header
      to prepare for the moving of cputime functionality out of sched.h.
      
      Update all code that relies on these facilities.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      32ef5517
    • I
      sched/headers: Prepare to use <linux/rcuupdate.h> instead of <linux/rculist.h> in <linux/sched.h> · b2d09103
      Ingo Molnar 提交于
      We don't actually need the full rculist.h header in sched.h anymore,
      we will be able to include the smaller rcupdate.h header instead.
      
      But first update code that relied on the implicit header inclusion.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b2d09103
    • I
      sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> · 3f07c014
      Ingo Molnar 提交于
      We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
      will have to be picked up from other headers and a couple of .c files.
      
      Create a trivial placeholder <linux/sched/signal.h> file that just
      maps to <linux/sched.h> to make this patch obviously correct and
      bisectable.
      
      Include the new header in the files that are going to need it.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3f07c014
    • W
      KVM: nVMX: Fix pending events injection · acc9ab60
      Wanpeng Li 提交于
      L2 fails to boot on a non-APICv box dues to 'commit 0ad3bed6
      ("kvm: nVMX: move nested events check to kvm_vcpu_running")'
      
      KVM internal error. Suberror: 3
      extra data[0]: 800000ef
      extra data[1]: 1
      RAX=0000000000000000 RBX=ffffffff81f36140 RCX=0000000000000000 RDX=0000000000000000
      RSI=0000000000000000 RDI=0000000000000000 RBP=ffff88007c92fe90 RSP=ffff88007c92fe90
      R8 =ffff88007fccdca0 R9 =0000000000000000 R10=00000000fffedb3d R11=0000000000000000
      R12=0000000000000003 R13=0000000000000000 R14=0000000000000000 R15=ffff88007c92c000
      RIP=ffffffff810645e6 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
      ES =0000 0000000000000000 ffffffff 00c00000
      CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
      SS =0000 0000000000000000 ffffffff 00c00000
      DS =0000 0000000000000000 ffffffff 00c00000
      FS =0000 0000000000000000 ffffffff 00c00000
      GS =0000 ffff88007fcc0000 ffffffff 00c00000
      LDT=0000 0000000000000000 ffffffff 00c00000
      TR =0040 ffff88007fcd4200 00002087 00008b00 DPL=0 TSS64-busy
      GDT=     ffff88007fcc9000 0000007f
      IDT=     ffffffffff578000 00000fff
      CR0=80050033 CR2=00000000ffffffff CR3=0000000001e0a000 CR4=003406e0
      DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
      DR6=00000000fffe0ff0 DR7=0000000000000400
      EFER=0000000000000d01
      
      We should try to reinject previous events if any before trying to inject
      new event if pending. If vmexit is triggered by L2 guest and L0 interested
      in, we should reinject IDT-vectoring info to L2 through vmcs02 if any,
      otherwise, we can consider new IRQs/NMIs which can be injected and call
      nested events callback to switch from L2 to L1 if needed and inject the
      proper vmexit events. However, 'commit 0ad3bed6 ("kvm: nVMX: move
      nested events check to kvm_vcpu_running")' results in the handle events
      order reversely on non-APICv box. This patch fixes it by bailing out for
      pending events and not consider new events in this scenario.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Jan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Fixes: 0ad3bed6 ("kvm: nVMX: move nested events check to kvm_vcpu_running")
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      acc9ab60
    • J
      x86/kvm/vmx: remove unused variable in segment_base() · 0fce546f
      Jérémy Lefaure 提交于
      The pointer 'struct desc_struct *d' is unused since commit 8c2e41f7
      ("x86/kvm/vmx: Simplify segment_base()") so let's remove it.
      Signed-off-by: NJérémy Lefaure <jeremy.lefaure@lse.epita.fr>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      0fce546f
  9. 01 3月, 2017 1 次提交
    • R
      KVM: x86: never specify a sample period for virtualized in_tx_cp counters · bba82fd7
      Robert O'Callahan 提交于
      pmc_reprogram_counter() always sets a sample period based on the value of
      pmc->counter. However, hsw_hw_config() rejects sample periods less than
      2^31 - 1. So for example, if a KVM guest does
      
          struct perf_event_attr attr;
          memset(&attr, 0, sizeof(attr));
          attr.type = PERF_TYPE_RAW;
          attr.size = sizeof(attr);
          attr.config = 0x2005101c4; // conditional branches retired IN_TXCP
          attr.sample_period = 0;
          int fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
          ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
          ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
      
      the guest kernel counts some conditional branch events, then updates the
      virtual PMU register with a nonzero count. The host reaches
      pmc_reprogram_counter() with nonzero pmc->counter, triggers EOPNOTSUPP
      in hsw_hw_config(), prints "kvm_pmu: event creation failed" in
      pmc_reprogram_counter(), and silently (from the guest's point of view) stops
      counting events.
      
      We fix event counting by forcing attr.sample_period to always be zero for
      in_tx_cp counters. Sampling doesn't work, but it already didn't work and
      can't be fixed without major changes to the approach in hsw_hw_config().
      Signed-off-by: NRobert O'Callahan <robert@ocallahan.org>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      bba82fd7
  10. 28 2月, 2017 1 次提交