1. 06 12月, 2017 3 次提交
    • R
      KVM: x86: fix APIC page invalidation · b1394e74
      Radim Krčmář 提交于
      Implementation of the unpinned APIC page didn't update the VMCS address
      cache when invalidation was done through range mmu notifiers.
      This became a problem when the page notifier was removed.
      
      Re-introduce the arch-specific helper and call it from ...range_start.
      Reported-by: NFabian Grünbichler <f.gruenbichler@proxmox.com>
      Fixes: 38b99173 ("kvm: vmx: Implement set_apic_access_page_addr")
      Fixes: 369ea824 ("mm/rmap: update to new mmu_notifier semantic v2")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Tested-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Tested-by: NFabian Grünbichler <f.gruenbichler@proxmox.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b1394e74
    • R
      x86,kvm: remove KVM emulator get_fpu / put_fpu · 6ab0b9fe
      Rik van Riel 提交于
      Now that get_fpu and put_fpu do nothing, because the scheduler will
      automatically load and restore the guest FPU context for us while we
      are in this code (deep inside the vcpu_run main loop), we can get rid
      of the get_fpu and put_fpu hooks.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Suggested-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6ab0b9fe
    • R
      x86,kvm: move qemu/guest FPU switching out to vcpu_run · f775b13e
      Rik van Riel 提交于
      Currently, every time a VCPU is scheduled out, the host kernel will
      first save the guest FPU/xstate context, then load the qemu userspace
      FPU context, only to then immediately save the qemu userspace FPU
      context back to memory. When scheduling in a VCPU, the same extraneous
      FPU loads and saves are done.
      
      This could be avoided by moving from a model where the guest FPU is
      loaded and stored with preemption disabled, to a model where the
      qemu userspace FPU is swapped out for the guest FPU context for
      the duration of the KVM_RUN ioctl.
      
      This is done under the VCPU mutex, which is also taken when other
      tasks inspect the VCPU FPU context, so the code should already be
      safe for this change. That should come as no surprise, given that
      s390 already has this optimization.
      
      This can fix a bug where KVM calls get_user_pages while owning the
      FPU, and the file system ends up requesting the FPU again:
      
          [258270.527947]  __warn+0xcb/0xf0
          [258270.527948]  warn_slowpath_null+0x1d/0x20
          [258270.527951]  kernel_fpu_disable+0x3f/0x50
          [258270.527953]  __kernel_fpu_begin+0x49/0x100
          [258270.527955]  kernel_fpu_begin+0xe/0x10
          [258270.527958]  crc32c_pcl_intel_update+0x84/0xb0
          [258270.527961]  crypto_shash_update+0x3f/0x110
          [258270.527968]  crc32c+0x63/0x8a [libcrc32c]
          [258270.527975]  dm_bm_checksum+0x1b/0x20 [dm_persistent_data]
          [258270.527978]  node_prepare_for_write+0x44/0x70 [dm_persistent_data]
          [258270.527985]  dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data]
          [258270.527988]  submit_io+0x170/0x1b0 [dm_bufio]
          [258270.527992]  __write_dirty_buffer+0x89/0x90 [dm_bufio]
          [258270.527994]  __make_buffer_clean+0x4f/0x80 [dm_bufio]
          [258270.527996]  __try_evict_buffer+0x42/0x60 [dm_bufio]
          [258270.527998]  dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio]
          [258270.528002]  shrink_slab.part.40+0x1f5/0x420
          [258270.528004]  shrink_node+0x22c/0x320
          [258270.528006]  do_try_to_free_pages+0xf5/0x330
          [258270.528008]  try_to_free_pages+0xe9/0x190
          [258270.528009]  __alloc_pages_slowpath+0x40f/0xba0
          [258270.528011]  __alloc_pages_nodemask+0x209/0x260
          [258270.528014]  alloc_pages_vma+0x1f1/0x250
          [258270.528017]  do_huge_pmd_anonymous_page+0x123/0x660
          [258270.528021]  handle_mm_fault+0xfd3/0x1330
          [258270.528025]  __get_user_pages+0x113/0x640
          [258270.528027]  get_user_pages+0x4f/0x60
          [258270.528063]  __gfn_to_pfn_memslot+0x120/0x3f0 [kvm]
          [258270.528108]  try_async_pf+0x66/0x230 [kvm]
          [258270.528135]  tdp_page_fault+0x130/0x280 [kvm]
          [258270.528149]  kvm_mmu_page_fault+0x60/0x120 [kvm]
          [258270.528158]  handle_ept_violation+0x91/0x170 [kvm_intel]
          [258270.528162]  vmx_handle_exit+0x1ca/0x1400 [kvm_intel]
      
      No performance changes were detected in quick ping-pong tests on
      my 4 socket system, which is expected since an FPU+xstate load is
      on the order of 0.1us, while ping-ponging between CPUs is on the
      order of 20us, and somewhat noisy.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Suggested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      [Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu,
       which happened inside from KVM_CREATE_VCPU ioctl. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f775b13e
  2. 28 11月, 2017 2 次提交
    • J
      KVM: Let KVM_SET_SIGNAL_MASK work as advertised · 20b7035c
      Jan H. Schönherr 提交于
      KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that
      "any unblocked signal received [...] will cause KVM_RUN to return with
      -EINTR" and that "the signal will only be delivered if not blocked by
      the original signal mask".
      
      This, however, is only true, when the calling task has a signal handler
      registered for a signal. If not, signal evaluation is short-circuited for
      SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN
      returning or the whole process is terminated.
      
      Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar
      to that in do_sigtimedwait() to avoid short-circuiting of signals.
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      20b7035c
    • W
      KVM: X86: Fix softlockup when get the current kvmclock · e70b57a6
      Wanpeng Li 提交于
       watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [qemu-system-x86:10185]
       CPU: 6 PID: 10185 Comm: qemu-system-x86 Tainted: G           OE   4.14.0-rc4+ #4
       RIP: 0010:kvm_get_time_scale+0x4e/0xa0 [kvm]
       Call Trace:
        get_time_ref_counter+0x5a/0x80 [kvm]
        kvm_hv_process_stimers+0x120/0x5f0 [kvm]
        kvm_arch_vcpu_ioctl_run+0x4b4/0x1690 [kvm]
        kvm_vcpu_ioctl+0x33a/0x620 [kvm]
        do_vfs_ioctl+0xa1/0x5d0
        SyS_ioctl+0x79/0x90
        entry_SYSCALL_64_fastpath+0x1e/0xa9
      
      This can be reproduced when running kvm-unit-tests/hyperv_stimer.flat and
      cpu-hotplug stress simultaneously. __this_cpu_read(cpu_tsc_khz) returns 0
      (set in kvmclock_cpu_down_prep()) when the pCPU is unhotplug which results
      in kvm_get_time_scale() gets into an infinite loop.
      
      This patch fixes it by treating the unhotplug pCPU as not using master clock.
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e70b57a6
  3. 17 11月, 2017 4 次提交
  4. 21 10月, 2017 1 次提交
  5. 19 10月, 2017 1 次提交
  6. 12 10月, 2017 7 次提交
  7. 26 9月, 2017 1 次提交
    • I
      x86/fpu: Rename fpu__activate_curr() to fpu__initialize() · 2ce03d85
      Ingo Molnar 提交于
      Rename this function to better express that it's all about
      initializing the FPU state of a task which goes hand in hand
      with the fpu::initialized field.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-33-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2ce03d85
  8. 15 9月, 2017 1 次提交
    • W
      KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously · 9a6e7c39
      Wanpeng Li 提交于
      qemu-system-x86-8600  [004] d..1  7205.687530: kvm_entry: vcpu 2
      qemu-system-x86-8600  [004] ....  7205.687532: kvm_exit: reason EXCEPTION_NMI rip 0xffffffffa921297d info ffffeb2c0e44e018 80000b0e
      qemu-system-x86-8600  [004] ....  7205.687532: kvm_page_fault: address ffffeb2c0e44e018 error_code 0
      qemu-system-x86-8600  [004] ....  7205.687620: kvm_try_async_get_page: gva = 0xffffeb2c0e44e018, gfn = 0x427e4e
      qemu-system-x86-8600  [004] .N..  7205.687628: kvm_async_pf_not_present: token 0x8b002 gva 0xffffeb2c0e44e018
          kworker/4:2-7814  [004] ....  7205.687655: kvm_async_pf_completed: gva 0xffffeb2c0e44e018 address 0x7fcc30c4e000
      qemu-system-x86-8600  [004] ....  7205.687703: kvm_async_pf_ready: token 0x8b002 gva 0xffffeb2c0e44e018
      qemu-system-x86-8600  [004] d..1  7205.687711: kvm_entry: vcpu 2
      
      After running some memory intensive workload in guest, I catch the kworker
      which completes the GUP too quickly, and queues an "Page Ready" #PF exception
      after the "Page not Present" exception before the next vmentry as the above
      trace which will result in #DF injected to guest.
      
      This patch fixes it by clearing the queue for "Page not Present" if "Page Ready"
      occurs before the next vmentry since the GUP has already got the required page
      and shadow page table has already been fixed by "Page Ready" handler.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Fixes: 7c90705b ("KVM: Inject asynchronous page fault into a PV guest if page is swapped out.")
      [Changed indentation and added clearing of injected. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      9a6e7c39
  9. 14 9月, 2017 2 次提交
  10. 13 9月, 2017 2 次提交
  11. 01 9月, 2017 1 次提交
  12. 25 8月, 2017 5 次提交
  13. 18 8月, 2017 1 次提交
  14. 12 8月, 2017 1 次提交
  15. 10 8月, 2017 1 次提交
    • W
      KVM: X86: Fix residual mmio emulation request to userspace · bbeac283
      Wanpeng Li 提交于
      Reported by syzkaller:
      
      The kvm-intel.unrestricted_guest=0
      
         WARNING: CPU: 5 PID: 1014 at /home/kernel/data/kvm/arch/x86/kvm//x86.c:7227 kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
         CPU: 5 PID: 1014 Comm: warn_test Tainted: G        W  OE   4.13.0-rc3+ #8
         RIP: 0010:kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
         Call Trace:
          ? put_pid+0x3a/0x50
          ? rcu_read_lock_sched_held+0x79/0x80
          ? kmem_cache_free+0x2f2/0x350
          kvm_vcpu_ioctl+0x340/0x700 [kvm]
          ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
          ? __fget+0xfc/0x210
          do_vfs_ioctl+0xa4/0x6a0
          ? __fget+0x11d/0x210
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x23/0xc2
          ? __this_cpu_preempt_check+0x13/0x20
      
      The syszkaller folks reported a residual mmio emulation request to userspace
      due to vm86 fails to emulate inject real mode interrupt(fails to read CS) and
      incurs a triple fault. The vCPU returns to userspace with vcpu->mmio_needed == true
      and KVM_EXIT_SHUTDOWN exit reason. However, the syszkaller testcase constructs
      several threads to launch the same vCPU, the thread which lauch this vCPU after
      the thread whichs get the vcpu->mmio_needed == true and KVM_EXIT_SHUTDOWN will
      trigger the warning.
      
         #define _GNU_SOURCE
         #include <pthread.h>
         #include <stdio.h>
         #include <stdlib.h>
         #include <string.h>
         #include <sys/wait.h>
         #include <sys/types.h>
         #include <sys/stat.h>
         #include <sys/mman.h>
         #include <fcntl.h>
         #include <unistd.h>
         #include <linux/kvm.h>
         #include <stdio.h>
      
         int kvmcpu;
         struct kvm_run *run;
      
         void* thr(void* arg)
         {
           int res;
           res = ioctl(kvmcpu, KVM_RUN, 0);
           printf("ret1=%d exit_reason=%d suberror=%d\n",
               res, run->exit_reason, run->internal.suberror);
           return 0;
         }
      
         void test()
         {
           int i, kvm, kvmvm;
           pthread_t th[4];
      
           kvm = open("/dev/kvm", O_RDWR);
           kvmvm = ioctl(kvm, KVM_CREATE_VM, 0);
           kvmcpu = ioctl(kvmvm, KVM_CREATE_VCPU, 0);
           run = (struct kvm_run*)mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, kvmcpu, 0);
           srand(getpid());
           for (i = 0; i < 4; i++) {
             pthread_create(&th[i], 0, thr, 0);
             usleep(rand() % 10000);
           }
           for (i = 0; i < 4; i++)
             pthread_join(th[i], 0);
         }
      
         int main()
         {
           for (;;) {
             int pid = fork();
             if (pid < 0)
               exit(1);
             if (pid == 0) {
               test();
               exit(0);
             }
             int status;
             while (waitpid(pid, &status, __WALL) != pid) {}
           }
           return 0;
         }
      
      This patch fixes it by resetting the vcpu->mmio_needed once we receive
      the triple fault to avoid the residue.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bbeac283
  16. 08 8月, 2017 2 次提交
  17. 07 8月, 2017 3 次提交
  18. 03 8月, 2017 2 次提交
    • L
      KVM: X86: init irq->level in kvm_pv_kick_cpu_op · ebd28fcb
      Longpeng(Mike) 提交于
      'lapic_irq' is a local variable and its 'level' field isn't
      initialized, so 'level' is random, it doesn't matter but
      makes UBSAN unhappy:
      
      UBSAN: Undefined behaviour in .../lapic.c:...
      load of value 10 is not a valid value for type '_Bool'
      ...
      Call Trace:
       [<ffffffff81f030b6>] dump_stack+0x1e/0x20
       [<ffffffff81f03173>] ubsan_epilogue+0x12/0x55
       [<ffffffff81f03b96>] __ubsan_handle_load_invalid_value+0x118/0x162
       [<ffffffffa1575173>] kvm_apic_set_irq+0xc3/0xf0 [kvm]
       [<ffffffffa1575b20>] kvm_irq_delivery_to_apic_fast+0x450/0x910 [kvm]
       [<ffffffffa15858ea>] kvm_irq_delivery_to_apic+0xfa/0x7a0 [kvm]
       [<ffffffffa1517f4e>] kvm_emulate_hypercall+0x62e/0x760 [kvm]
       [<ffffffffa113141a>] handle_vmcall+0x1a/0x30 [kvm_intel]
       [<ffffffffa114e592>] vmx_handle_exit+0x7a2/0x1fa0 [kvm_intel]
      ...
      Signed-off-by: NLongpeng(Mike) <longpeng2@huawei.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      ebd28fcb
    • W
      KVM: X86: Fix loss of pending INIT due to race · f4ef1910
      Wanpeng Li 提交于
      When SMP VM start, AP may lost INIT because of receiving INIT between
      kvm_vcpu_ioctl_x86_get/set_vcpu_events.
      
             vcpu 0                             vcpu 1
                                         kvm_vcpu_ioctl_x86_get_vcpu_events
                                           events->smi.latched_init = 0
        send INIT to vcpu1
          set vcpu1's pending_events
                                         kvm_vcpu_ioctl_x86_set_vcpu_events
                                            if (events->smi.latched_init == 0)
                                              clear INIT in pending_events
      
      This patch fixes it by just update SMM related flags if we are in SMM.
      
      Thanks Peng Hao for the report and original commit message.
      Reported-by: NPeng Hao <peng.hao2@zte.com.cn>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f4ef1910