1. 26 1月, 2019 1 次提交
  2. 03 9月, 2018 1 次提交
  3. 24 8月, 2018 1 次提交
  4. 06 8月, 2018 3 次提交
  5. 20 7月, 2018 2 次提交
  6. 20 6月, 2018 1 次提交
  7. 18 5月, 2018 1 次提交
    • M
      kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME · 633711e8
      Michael S. Tsirkin 提交于
      KVM_HINTS_DEDICATED seems to be somewhat confusing:
      
      Guest doesn't really care whether it's the only task running on a host
      CPU as long as it's not preempted.
      
      And there are more reasons for Guest to be preempted than host CPU
      sharing, for example, with memory overcommit it can get preempted on a
      memory access, post copy migration can cause preemption, etc.
      
      Let's call it KVM_HINTS_REALTIME which seems to better
      match what guests expect.
      
      Also, the flag most be set on all vCPUs - current guests assume this.
      Note so in the documentation.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      633711e8
  8. 29 3月, 2018 1 次提交
    • W
      KVM: X86: Fix setup the virt_spin_lock_key before static key get initialized · 34226b6b
      Wanpeng Li 提交于
       static_key_disable_cpuslocked(): static key 'virt_spin_lock_key+0x0/0x20' used before call to jump_label_init()
       WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:161 static_key_disable_cpuslocked+0x61/0x80
       RIP: 0010:static_key_disable_cpuslocked+0x61/0x80
       Call Trace:
        static_key_disable+0x16/0x20
        start_kernel+0x192/0x4b3
        secondary_startup_64+0xa5/0xb0
      
      Qspinlock will be choosed when dedicated pCPUs are available, however, the
      static virt_spin_lock_key is set in kvm_spinlock_init() before jump_label_init()
      has been called, which will result in a WARN(). This patch fixes it by delaying
      the virt_spin_lock_key setup to .smp_prepare_cpus().
      Reported-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Fixes: b2798ba0 ("KVM: X86: Choose qspinlock when dedicated physical CPUs are available")
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      34226b6b
  9. 28 3月, 2018 1 次提交
  10. 07 3月, 2018 3 次提交
  11. 24 2月, 2018 3 次提交
  12. 16 1月, 2018 2 次提交
    • W
      KVM: X86: use paravirtualized TLB Shootdown · 858a43aa
      Wanpeng Li 提交于
      Remote TLB flush does a busy wait which is fine in bare-metal
      scenario. But with-in the guest, the vcpus might have been pre-empted or
      blocked. In this scenario, the initator vcpu would end up busy-waiting
      for a long amount of time; it also consumes CPU unnecessarily to wake
      up the target of the shootdown.
      
      This patch set adds support for KVM's new paravirtualized TLB flush;
      remote TLB flush does not wait for vcpus that are sleeping, instead
      KVM will flush the TLB as soon as the vCPU starts running again.
      
      The improvement is clearly visible when the host is overcommitted; in this
      case, the PV TLB flush (in addition to avoiding the wait on the main CPU)
      prevents preempted vCPUs from stealing precious execution time from the
      running ones.
      
      Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
      so 64 pCPUs, and each VM is 64 vCPUs.
      
      ebizzy -M
                    vanilla    optimized     boost
      1VM            46799       48670         4%
      2VM            23962       42691        78%
      3VM            16152       37539       132%
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      858a43aa
    • W
      KVM: X86: Add KVM_VCPU_PREEMPTED · fa55eedd
      Wanpeng Li 提交于
      The next patch will add another bit to the preempted field in
      kvm_steal_time.  Define a constant for bit 0 (the only one that is
      currently used).
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      fa55eedd
  13. 10 11月, 2017 3 次提交
  14. 07 11月, 2017 1 次提交
  15. 05 10月, 2017 1 次提交
    • B
      kvm/x86: Avoid async PF preempting the kernel incorrectly · a2b7861b
      Boqun Feng 提交于
      Currently, in PREEMPT_COUNT=n kernel, kvm_async_pf_task_wait() could call
      schedule() to reschedule in some cases.  This could result in
      accidentally ending the current RCU read-side critical section early,
      causing random memory corruption in the guest, or otherwise preempting
      the currently running task inside between preempt_disable and
      preempt_enable.
      
      The difficulty to handle this well is because we don't know whether an
      async PF delivered in a preemptible section or RCU read-side critical section
      for PREEMPT_COUNT=n, since preempt_disable()/enable() and rcu_read_lock/unlock()
      are both no-ops in that case.
      
      To cure this, we treat any async PF interrupting a kernel context as one
      that cannot be preempted, preventing kvm_async_pf_task_wait() from choosing
      the schedule() path in that case.
      
      To do so, a second parameter for kvm_async_pf_task_wait() is introduced,
      so that we know whether it's called from a context interrupting the
      kernel, and the parameter is set properly in all the callsites.
      
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a2b7861b
  16. 29 9月, 2017 1 次提交
    • B
      kvm/x86: Handle async PF in RCU read-side critical sections · b862789a
      Boqun Feng 提交于
      Sasha Levin reported a WARNING:
      
      | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329
      | rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
      | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329
      | rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
      ...
      | CPU: 0 PID: 6974 Comm: syz-fuzzer Not tainted 4.13.0-next-20170908+ #246
      | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      | 1.10.1-1ubuntu1 04/01/2014
      | Call Trace:
      ...
      | RIP: 0010:rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
      | RIP: 0010:rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
      | RSP: 0018:ffff88003b2debc8 EFLAGS: 00010002
      | RAX: 0000000000000001 RBX: 1ffff1000765bd85 RCX: 0000000000000000
      | RDX: 1ffff100075d7882 RSI: ffffffffb5c7da20 RDI: ffff88003aebc410
      | RBP: ffff88003b2def30 R08: dffffc0000000000 R09: 0000000000000001
      | R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003b2def08
      | R13: 0000000000000000 R14: ffff88003aebc040 R15: ffff88003aebc040
      | __schedule+0x201/0x2240 kernel/sched/core.c:3292
      | schedule+0x113/0x460 kernel/sched/core.c:3421
      | kvm_async_pf_task_wait+0x43f/0x940 arch/x86/kernel/kvm.c:158
      | do_async_page_fault+0x72/0x90 arch/x86/kernel/kvm.c:271
      | async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
      | RIP: 0010:format_decode+0x240/0x830 lib/vsprintf.c:1996
      | RSP: 0018:ffff88003b2df520 EFLAGS: 00010283
      | RAX: 000000000000003f RBX: ffffffffb5d1e141 RCX: ffff88003b2df670
      | RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffb5d1e140
      | RBP: ffff88003b2df560 R08: dffffc0000000000 R09: 0000000000000000
      | R10: ffff88003b2df718 R11: 0000000000000000 R12: ffff88003b2df5d8
      | R13: 0000000000000064 R14: ffffffffb5d1e140 R15: 0000000000000000
      | vsnprintf+0x173/0x1700 lib/vsprintf.c:2136
      | sprintf+0xbe/0xf0 lib/vsprintf.c:2386
      | proc_self_get_link+0xfb/0x1c0 fs/proc/self.c:23
      | get_link fs/namei.c:1047 [inline]
      | link_path_walk+0x1041/0x1490 fs/namei.c:2127
      ...
      
      This happened when the host hit a page fault, and delivered it as in an
      async page fault, while the guest was in an RCU read-side critical
      section.  The guest then tries to reschedule in kvm_async_pf_task_wait(),
      but rcu_preempt_note_context_switch() would treat the reschedule as a
      sleep in RCU read-side critical section, which is not allowed (even in
      preemptible RCU).  Thus the WARN.
      
      To cure this, make kvm_async_pf_task_wait() go to the halt path if the
      PF happens in a RCU read-side critical section.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b862789a
  17. 15 9月, 2017 1 次提交
  18. 29 8月, 2017 2 次提交
  19. 02 8月, 2017 1 次提交
    • W
      KVM: async_pf: make rcu irq exit if not triggered from idle task · 337c017c
      Wanpeng Li 提交于
       WARNING: CPU: 5 PID: 1242 at kernel/rcu/tree_plugin.h:323 rcu_note_context_switch+0x207/0x6b0
       CPU: 5 PID: 1242 Comm: unity-settings- Not tainted 4.13.0-rc2+ #1
       RIP: 0010:rcu_note_context_switch+0x207/0x6b0
       Call Trace:
        __schedule+0xda/0xba0
        ? kvm_async_pf_task_wait+0x1b2/0x270
        schedule+0x40/0x90
        kvm_async_pf_task_wait+0x1cc/0x270
        ? prepare_to_swait+0x22/0x70
        do_async_page_fault+0x77/0xb0
        ? do_async_page_fault+0x77/0xb0
        async_page_fault+0x28/0x30
       RIP: 0010:__d_lookup_rcu+0x90/0x1e0
      
      I encounter this when trying to stress the async page fault in L1 guest w/
      L2 guests running.
      
      Commit 9b132fbe (Add rcu user eqs exception hooks for async page
      fault) adds rcu_irq_enter/exit() to kvm_async_pf_task_wait() to exit cpu
      idle eqs when needed, to protect the code that needs use rcu.  However,
      we need to call the pair even if the function calls schedule(), as seen
      from the above backtrace.
      
      This patch fixes it by informing the RCU subsystem exit/enter the irq
      towards/away from idle for both n.halted and !n.halted.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      337c017c
  20. 14 7月, 2017 1 次提交
  21. 06 6月, 2017 1 次提交
  22. 13 4月, 2017 1 次提交
  23. 21 2月, 2017 2 次提交
    • W
      x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64 · dd0fd8bc
      Waiman Long 提交于
      It was found when running fio sequential write test with a XFS ramdisk
      on a KVM guest running on a 2-socket x86-64 system, the %CPU times
      as reported by perf were as follows:
      
       69.75%  0.59%  fio  [k] down_write
       69.15%  0.01%  fio  [k] call_rwsem_down_write_failed
       67.12%  1.12%  fio  [k] rwsem_down_write_failed
       63.48% 52.77%  fio  [k] osq_lock
        9.46%  7.88%  fio  [k] __raw_callee_save___kvm_vcpu_is_preempt
        3.93%  3.93%  fio  [k] __kvm_vcpu_is_preempted
      
      Making vcpu_is_preempted() a callee-save function has a relatively
      high cost on x86-64 primarily due to at least one more cacheline of
      data access from the saving and restoring of registers (8 of them)
      to and from stack as well as one more level of function call.
      
      To reduce this performance overhead, an optimized assembly version
      of the the __raw_callee_save___kvm_vcpu_is_preempt() function is
      provided for x86-64.
      
      With this patch applied on a KVM guest on a 2-socket 16-core 32-thread
      system with 16 parallel jobs (8 on each socket), the aggregrate
      bandwidth of the fio test on an XFS ramdisk were as follows:
      
         I/O Type      w/o patch    with patch
         --------      ---------    ----------
         random read   8141.2 MB/s  8497.1 MB/s
         seq read      8229.4 MB/s  8304.2 MB/s
         random write  1675.5 MB/s  1701.5 MB/s
         seq write     1681.3 MB/s  1699.9 MB/s
      
      There are some increases in the aggregated bandwidth because of
      the patch.
      
      The perf data now became:
      
       70.78%  0.58%  fio  [k] down_write
       70.20%  0.01%  fio  [k] call_rwsem_down_write_failed
       69.70%  1.17%  fio  [k] rwsem_down_write_failed
       59.91% 55.42%  fio  [k] osq_lock
       10.14% 10.14%  fio  [k] __kvm_vcpu_is_preempted
      
      The assembly code was verified by using a test kernel module to
      compare the output of C __kvm_vcpu_is_preempted() and that of assembly
      __raw_callee_save___kvm_vcpu_is_preempt() to verify that they matched.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dd0fd8bc
    • W
      x86/paravirt: Change vcp_is_preempted() arg type to long · 6c62985d
      Waiman Long 提交于
      The cpu argument in the function prototype of vcpu_is_preempted()
      is changed from int to long. That makes it easier to provide a better
      optimized assembly version of that function.
      
      For Xen, vcpu_is_preempted(long) calls xen_vcpu_stolen(int), the
      downcast from long to int is not a problem as vCPU number won't exceed
      32 bits.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6c62985d
  24. 14 1月, 2017 1 次提交
  25. 10 12月, 2016 1 次提交
  26. 22 11月, 2016 2 次提交
    • P
      x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted() · 3cded417
      Peter Zijlstra 提交于
      Avoid the pointless function call to pv_lock_ops.vcpu_is_preempted()
      when a paravirt spinlock enabled kernel is ran on native hardware.
      
      Do this by patching out the CALL instruction with "XOR %RAX,%RAX"
      which has the same effect (0 return value).
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David.Laight@ACULAB.COM
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: benh@kernel.crashing.org
      Cc: boqun.feng@gmail.com
      Cc: borntraeger@de.ibm.com
      Cc: bsingharora@gmail.com
      Cc: dave@stgolabs.net
      Cc: jgross@suse.com
      Cc: kernellwp@gmail.com
      Cc: konrad.wilk@oracle.com
      Cc: mpe@ellerman.id.au
      Cc: paulmck@linux.vnet.ibm.com
      Cc: paulus@samba.org
      Cc: pbonzini@redhat.com
      Cc: rkrcmar@redhat.com
      Cc: will.deacon@arm.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3cded417
    • P
      x86/kvm: Support the vCPU preemption check · 1885aa70
      Pan Xinhui 提交于
      Support the vcpu_is_preempted() functionality under KVM. This will
      enhance lock performance on overcommitted hosts (more runnable vCPUs
      than physical CPUs in the system) as doing busy waits for preempted
      vCPUs will hurt system performance far worse than early yielding.
      
      struct kvm_steal_time::preempted indicates that if one vCPU is running or
      not after commit "x86, kvm/x86.c: support vCPU preempted check".
      
       unix benchmark result:
       host:  kernel 4.8.1, i5-4570, 4 cpus
       guest: kernel 4.8.1, 8 vcpus
      
               test-case                       after-patch       before-patch
       Execl Throughput                       |    18307.9 lps  |    11701.6 lps
       File Copy 1024 bufsize 2000 maxblocks  |  1352407.3 KBps |   790418.9 KBps
       File Copy 256 bufsize 500 maxblocks    |   367555.6 KBps |   222867.7 KBps
       File Copy 4096 bufsize 8000 maxblocks  |  3675649.7 KBps |  1780614.4 KBps
       Pipe Throughput                        | 11872208.7 lps  | 11855628.9 lps
       Pipe-based Context Switching           |  1495126.5 lps  |  1490533.9 lps
       Process Creation                       |    29881.2 lps  |    28572.8 lps
       Shell Scripts (1 concurrent)           |    23224.3 lpm  |    22607.4 lpm
       Shell Scripts (8 concurrent)           |     3531.4 lpm  |     3211.9 lpm
       System Call Overhead                   | 10385653.0 lps  | 10419979.0 lps
      Signed-off-by: NPan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: David.Laight@ACULAB.COM
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: benh@kernel.crashing.org
      Cc: boqun.feng@gmail.com
      Cc: borntraeger@de.ibm.com
      Cc: bsingharora@gmail.com
      Cc: dave@stgolabs.net
      Cc: jgross@suse.com
      Cc: kernellwp@gmail.com
      Cc: konrad.wilk@oracle.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: mpe@ellerman.id.au
      Cc: paulmck@linux.vnet.ibm.com
      Cc: paulus@samba.org
      Cc: rkrcmar@redhat.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: will.deacon@arm.com
      Cc: xen-devel-request@lists.xenproject.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1478077718-37424-10-git-send-email-xinhui.pan@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1885aa70
  27. 18 11月, 2016 1 次提交