1. 16 1月, 2018 5 次提交
    • W
      KVM: x86: fix escape of guest dr6 to the host · efdab992
      Wanpeng Li 提交于
      syzkaller reported:
      
         WARNING: CPU: 0 PID: 12927 at arch/x86/kernel/traps.c:780 do_debug+0x222/0x250
         CPU: 0 PID: 12927 Comm: syz-executor Tainted: G           OE    4.15.0-rc2+ #16
         RIP: 0010:do_debug+0x222/0x250
         Call Trace:
          <#DB>
          debug+0x3e/0x70
         RIP: 0010:copy_user_enhanced_fast_string+0x10/0x20
          </#DB>
          _copy_from_user+0x5b/0x90
          SyS_timer_create+0x33/0x80
          entry_SYSCALL_64_fastpath+0x23/0x9a
      
      The testcase sets a watchpoint (with perf_event_open) on a buffer that is
      passed to timer_create() as the struct sigevent argument.  In timer_create(),
      copy_from_user()'s rep movsb triggers the BP.  The testcase also sets
      the debug registers for the guest.
      
      However, KVM only restores host debug registers when the host has active
      watchpoints, which triggers a race condition when running the testcase with
      multiple threads.  The guest's DR6.BS bit can escape to the host before
      another thread invokes timer_create(), and do_debug() complains.
      
      The fix is to respect do_debug()'s dr6 invariant when leaving KVM.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      efdab992
    • W
      KVM: X86: support paravirtualized help for TLB shootdowns · f38a7b75
      Wanpeng Li 提交于
      When running on a virtual machine, IPIs are expensive when the target
      CPU is sleeping.  Thus, it is nice to be able to avoid them for TLB
      shootdowns.  KVM can just do the flush via INVVPID on the guest's behalf
      the next time the CPU is scheduled.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      [Use "&" to test the bit instead of "==". - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f38a7b75
    • W
      KVM: X86: introduce invalidate_gpa argument to tlb flush · c2ba05cc
      Wanpeng Li 提交于
      Introduce a new bool invalidate_gpa argument to kvm_x86_ops->tlb_flush,
      it will be used by later patches to just flush guest tlb.
      
      For VMX, this will use INVVPID instead of INVEPT, which will invalidate
      combined mappings while keeping guest-physical mappings.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      c2ba05cc
    • W
      KVM: X86: use paravirtualized TLB Shootdown · 858a43aa
      Wanpeng Li 提交于
      Remote TLB flush does a busy wait which is fine in bare-metal
      scenario. But with-in the guest, the vcpus might have been pre-empted or
      blocked. In this scenario, the initator vcpu would end up busy-waiting
      for a long amount of time; it also consumes CPU unnecessarily to wake
      up the target of the shootdown.
      
      This patch set adds support for KVM's new paravirtualized TLB flush;
      remote TLB flush does not wait for vcpus that are sleeping, instead
      KVM will flush the TLB as soon as the vCPU starts running again.
      
      The improvement is clearly visible when the host is overcommitted; in this
      case, the PV TLB flush (in addition to avoiding the wait on the main CPU)
      prevents preempted vCPUs from stealing precious execution time from the
      running ones.
      
      Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
      so 64 pCPUs, and each VM is 64 vCPUs.
      
      ebizzy -M
                    vanilla    optimized     boost
      1VM            46799       48670         4%
      2VM            23962       42691        78%
      3VM            16152       37539       132%
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      858a43aa
    • W
      KVM: X86: Add KVM_VCPU_PREEMPTED · fa55eedd
      Wanpeng Li 提交于
      The next patch will add another bit to the preempted field in
      kvm_steal_time.  Define a constant for bit 0 (the only one that is
      currently used).
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      fa55eedd
  2. 14 12月, 2017 33 次提交
  3. 09 12月, 2017 1 次提交
  4. 07 12月, 2017 1 次提交
    • A
      x86/vdso: Change time() prototype to match __vdso_time() · 88edb57d
      Arnd Bergmann 提交于
      gcc-8 warns that time() is an alias for __vdso_time() but the two
      have different prototypes:
      
        arch/x86/entry/vdso/vclock_gettime.c:327:5: error: 'time' alias between functions of incompatible types 'int(time_t *)' {aka 'int(long int *)'} and 'time_t(time_t *)' {aka 'long int(long int *)'} [-Werror=attribute-alias]
         int time(time_t *t)
             ^~~~
        arch/x86/entry/vdso/vclock_gettime.c:318:16: note: aliased declaration here
      
      I could not figure out whether this is intentional, but I see that
      changing it to return time_t avoids the warning.
      
      Returning 'int' from time() is also a bit questionable, as it causes an
      overflow in y2038 even on 64-bit architectures that use a 64-bit time_t
      type. On 32-bit architecture with 64-bit time_t, time() should always
      be implement by the C library by calling a (to be added) clock_gettime()
      variant that takes a sufficiently wide argument.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Link: http://lkml.kernel.org/r/20171204150203.852959-1-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      88edb57d