1. 16 1月, 2018 1 次提交
  2. 28 11月, 2017 2 次提交
  3. 03 11月, 2017 1 次提交
  4. 12 10月, 2017 7 次提交
  5. 15 9月, 2017 1 次提交
  6. 07 8月, 2017 1 次提交
  7. 27 7月, 2017 1 次提交
    • W
      KVM: LAPIC: Fix reentrancy issues with preempt notifiers · 1d518c68
      Wanpeng Li 提交于
      Preempt can occur in the preemption timer expiration handler:
      
                CPU0                    CPU1
      
        preemption timer vmexit
        handle_preemption_timer(vCPU0)
          kvm_lapic_expired_hv_timer
            hv_timer_is_use == true
        sched_out
                                 sched_in
                                 kvm_arch_vcpu_load
                                   kvm_lapic_restart_hv_timer
                                     restart_apic_timer
                                       start_hv_timer
                                         already-expired timer or sw timer triggerd in the window
                                       start_sw_timer
                                         cancel_hv_timer
                                 /* back in kvm_lapic_expired_hv_timer */
                                 cancel_hv_timer
                                   WARN_ON(!apic->lapic_timer.hv_timer_in_use);  ==> Oops
      
      This can be reproduced if CONFIG_PREEMPT is enabled.
      
      ------------[ cut here ]------------
       WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1563 kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm]
       CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G           OE   4.13.0-rc2+ #16
       RIP: 0010:kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm]
      Call Trace:
        handle_preemption_timer+0xe/0x20 [kvm_intel]
        vmx_handle_exit+0xb8/0xd70 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm]
        ? kvm_arch_vcpu_load+0x47/0x230 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x81/0x220
        entry_SYSCALL64_slow_path+0x25/0x25
       ------------[ cut here ]------------
       WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1498 cancel_hv_timer.isra.40+0x4f/0x60 [kvm]
       CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0-rc2+ #16
       RIP: 0010:cancel_hv_timer.isra.40+0x4f/0x60 [kvm]
      Call Trace:
        kvm_lapic_expired_hv_timer+0x3e/0xb0 [kvm]
        handle_preemption_timer+0xe/0x20 [kvm_intel]
        vmx_handle_exit+0xb8/0xd70 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm]
        ? kvm_arch_vcpu_load+0x47/0x230 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x81/0x220
        entry_SYSCALL64_slow_path+0x25/0x25
      
      This patch fixes it by making the caller of cancel_hv_timer, start_hv_timer
      and start_sw_timer be in preemption-disabled regions, which trivially
      avoid any reentrancy issue with preempt notifier.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      [Add more WARNs. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1d518c68
  8. 30 6月, 2017 3 次提交
    • W
      KVM: LAPIC: Fix lapic timer injection delay · c8533544
      Wanpeng Li 提交于
      If the TSC deadline timer is programmed really close to the deadline or
      even in the past, the computation in vmx_set_hv_timer will program the
      absolute target tsc value to vmcs preemption timer field w/ delta == 0,
      then plays a vmentry and an upcoming vmx preemption timer fire vmexit
      dance, the lapic timer injection is delayed due to this duration. Actually
      the lapic timer which is emulated by hrtimer can handle this correctly.
      
      This patch fixes it by firing the lapic timer and injecting a timer interrupt
      immediately during the next vmentry if the TSC deadline timer is programmed
      really close to the deadline or even in the past. This saves ~300 cycles on
      the tsc_deadline_timer test of apic.flat.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c8533544
    • P
      KVM: lapic: reorganize restart_apic_timer · a749e247
      Paolo Bonzini 提交于
      Move the code to cancel the hv timer into the caller, just before
      it starts the hrtimer.  Check availability of the hv timer in
      start_hv_timer.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a749e247
    • P
      KVM: lapic: reorganize start_hv_timer · 35ee9e48
      Paolo Bonzini 提交于
      There are many cases in which the hv timer must be canceled.  Split out
      a new function to avoid duplication.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      35ee9e48
  9. 27 5月, 2017 1 次提交
    • J
      KVM: x86: Fix virtual wire mode · 52b54190
      Jan H. Schönherr 提交于
      Intel SDM says, that at most one LAPIC should be configured with ExtINT
      delivery. KVM configures all LAPICs this way. This causes pic_unlock()
      to kick the first available vCPU from the internal KVM data structures.
      If this vCPU is not the BSP, but some not-yet-booted AP, the BSP may
      never realize that there is an interrupt.
      
      Fix that by enabling ExtINT delivery only for the BSP.
      
      This allows booting a Linux guest without a TSC in the above situation.
      Otherwise the BSP gets stuck in calibrate_delay_converge().
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      52b54190
  10. 26 5月, 2017 1 次提交
    • W
      KVM: X86: Fix preempt the preemption timer cancel · 5acc1ca4
      Wanpeng Li 提交于
      Preemption can occur during cancel preemption timer, and there will be
      inconsistent status in lapic, vmx and vmcs field.
      
                CPU0                    CPU1
      
        preemption timer vmexit
        handle_preemption_timer(vCPU0)
          kvm_lapic_expired_hv_timer
            vmx_cancel_hv_timer
              vmx->hv_deadline_tsc = -1
              vmcs_clear_bits
              /* hv_timer_in_use still true */
        sched_out
                                 sched_in
                                 kvm_arch_vcpu_load
                                   vmx_set_hv_timer
                                     write vmx->hv_deadline_tsc
                                     vmcs_set_bits
                                 /* back in kvm_lapic_expired_hv_timer */
                                 hv_timer_in_use = false
                                 ...
                                 vmx_vcpu_run
                                   vmx_arm_hv_run
                                     write preemption timer deadline
                                   spurious preemption timer vmexit
                                     handle_preemption_timer(vCPU0)
                                       kvm_lapic_expired_hv_timer
                                         WARN_ON(!apic->lapic_timer.hv_timer_in_use);
      
      This can be reproduced sporadically during boot of L2 on a
      preemptible L1, causing a splat on L1.
      
       WARNING: CPU: 3 PID: 1952 at arch/x86/kvm/lapic.c:1529 kvm_lapic_expired_hv_timer+0xb5/0xd0 [kvm]
       CPU: 3 PID: 1952 Comm: qemu-system-x86 Not tainted 4.12.0-rc1+ #24 RIP: 0010:kvm_lapic_expired_hv_timer+0xb5/0xd0 [kvm]
        Call Trace:
        handle_preemption_timer+0xe/0x20 [kvm_intel]
        vmx_handle_exit+0xc9/0x15f0 [kvm_intel]
        ? lock_acquire+0xdb/0x250
        ? lock_acquire+0xdb/0x250
        ? kvm_arch_vcpu_ioctl_run+0xdf3/0x1ce0 [kvm]
        kvm_arch_vcpu_ioctl_run+0xe55/0x1ce0 [kvm]
        kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
        ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
        ? __fget+0xf3/0x210
        do_vfs_ioctl+0xa4/0x700
        ? __fget+0x114/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x8f/0x750
        ? trace_hardirqs_on_thunk+0x1a/0x1c
        entry_SYSCALL64_slow_path+0x25/0x25
      
      This patch fixes it by disabling preemption while cancelling
      preemption timer.  This way cancel_hv_timer is atomic with
      respect to kvm_arch_vcpu_load.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5acc1ca4
  11. 09 5月, 2017 1 次提交
    • M
      mm: introduce kv[mz]alloc helpers · a7c3e901
      Michal Hocko 提交于
      Patch series "kvmalloc", v5.
      
      There are many open coded kmalloc with vmalloc fallback instances in the
      tree.  Most of them are not careful enough or simply do not care about
      the underlying semantic of the kmalloc/page allocator which means that
      a) some vmalloc fallbacks are basically unreachable because the kmalloc
      part will keep retrying until it succeeds b) the page allocator can
      invoke a really disruptive steps like the OOM killer to move forward
      which doesn't sound appropriate when we consider that the vmalloc
      fallback is available.
      
      As it can be seen implementing kvmalloc requires quite an intimate
      knowledge if the page allocator and the memory reclaim internals which
      strongly suggests that a helper should be implemented in the memory
      subsystem proper.
      
      Most callers, I could find, have been converted to use the helper
      instead.  This is patch 6.  There are some more relying on __GFP_REPEAT
      in the networking stack which I have converted as well and Eric Dumazet
      was not opposed [2] to convert them as well.
      
      [1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
      [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com
      
      This patch (of 9):
      
      Using kmalloc with the vmalloc fallback for larger allocations is a
      common pattern in the kernel code.  Yet we do not have any common helper
      for that and so users have invented their own helpers.  Some of them are
      really creative when doing so.  Let's just add kv[mz]alloc and make sure
      it is implemented properly.  This implementation makes sure to not make
      a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
      to not warn about allocation failures.  This also rules out the OOM
      killer as the vmalloc is a more approapriate fallback than a disruptive
      user visible action.
      
      This patch also changes some existing users and removes helpers which
      are specific for them.  In some cases this is not possible (e.g.
      ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
      require GFP_NO{FS,IO} context which is not vmalloc compatible in general
      (note that the page table allocation is GFP_KERNEL).  Those need to be
      fixed separately.
      
      While we are at it, document that __vmalloc{_node} about unsupported gfp
      mask because there seems to be a lot of confusion out there.
      kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
      superset) flags to catch new abusers.  Existing ones would have to die
      slowly.
      
      [sfr@canb.auug.org.au: f2fs fixup]
        Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Reviewed-by: Andreas Dilger <adilger@dilger.ca>	[ext4 part]
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7c3e901
  12. 03 5月, 2017 1 次提交
    • P
      Revert "KVM: Support vCPU-based gfn->hva cache" · 4e335d9e
      Paolo Bonzini 提交于
      This reverts commit bbd64115.
      
      I've been sitting on this revert for too long and it unfortunately
      missed 4.11.  It's also the reason why I haven't merged ring-based
      dirty tracking for 4.12.
      
      Using kvm_vcpu_memslots in kvm_gfn_to_hva_cache_init and
      kvm_vcpu_write_guest_offset_cached means that the MSR value can
      now be used to access SMRAM, simply by making it point to an SMRAM
      physical address.  This is problematic because it lets the guest
      OS overwrite memory that it shouldn't be able to touch.
      
      Cc: stable@vger.kernel.org
      Fixes: bbd64115Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4e335d9e
  13. 17 2月, 2017 1 次提交
  14. 15 2月, 2017 5 次提交
    • P
      kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection · b95234c8
      Paolo Bonzini 提交于
      Since bf9f6ac8 ("KVM: Update Posted-Interrupts Descriptor when vCPU
      is blocked", 2015-09-18) the posted interrupt descriptor is checked
      unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
      trigger the scan and, if NMIs or SMIs are not involved, we can avoid
      the complicated event injection path.
      
      Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
      there since APICv was introduced.
      
      However, without the KVM_REQ_EVENT safety net KVM needs to be much
      more careful about races between vmx_deliver_posted_interrupt and
      vcpu_enter_guest.  First, the IPI for posted interrupts may be issued
      between setting vcpu->mode = IN_GUEST_MODE and disabling interrupts.
      If that happens, kvm_trigger_posted_interrupt returns true, but
      smp_kvm_posted_intr_ipi doesn't do anything about it.  The guest is
      entered with PIR.ON, but the posted interrupt IPI has not been sent
      and the interrupt is only delivered to the guest on the next vmentry
      (if any).  To fix this, disable interrupts before setting vcpu->mode.
      This ensures that the IPI is delayed until the guest enters non-root mode;
      it is then trapped by the processor causing the interrupt to be injected.
      
      Second, the IPI may be issued between kvm_x86_ops->sync_pir_to_irr(vcpu)
      and vcpu->mode = IN_GUEST_MODE.  In this case, kvm_vcpu_kick is called
      but it (correctly) doesn't do anything because it sees vcpu->mode ==
      OUTSIDE_GUEST_MODE.  Again, the guest is entered with PIR.ON but no
      posted interrupt IPI is pending; this time, the fix for this is to move
      the RVI update after IN_GUEST_MODE.
      
      Both issues were mostly masked by the liberal usage of KVM_REQ_EVENT,
      though the second could actually happen with VT-d posted interrupts.
      In both race scenarios KVM_REQ_EVENT would cancel guest entry, resulting
      in another vmentry which would inject the interrupt.
      
      This saves about 300 cycles on the self_ipi_* tests of vmexit.flat.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b95234c8
    • P
      KVM: x86: do not scan IRR twice on APICv vmentry · 76dfafd5
      Paolo Bonzini 提交于
      Calls to apic_find_highest_irr are scanning IRR twice, once
      in vmx_sync_pir_from_irr and once in apic_search_irr.  Change
      sync_pir_from_irr to get the new maximum IRR from kvm_apic_update_irr;
      now that it does the computation, it can also do the RVI write.
      
      In order to avoid complications in svm.c, make the callback optional.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      76dfafd5
    • P
    • P
      KVM: x86: preparatory changes for APICv cleanups · 810e6def
      Paolo Bonzini 提交于
      Add return value to __kvm_apic_update_irr/kvm_apic_update_irr.
      Move vmx_sync_pir_to_irr around.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      810e6def
    • P
      KVM: vmx: clear pending interrupts on KVM_SET_LAPIC · 967235d3
      Paolo Bonzini 提交于
      Pending interrupts might be in the PI descriptor when the
      LAPIC is restored from an external state; we do not want
      them to be injected.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      967235d3
  15. 12 1月, 2017 1 次提交
  16. 09 1月, 2017 7 次提交
  17. 26 12月, 2016 1 次提交
    • T
      ktime: Cleanup ktime_set() usage · 8b0e1953
      Thomas Gleixner 提交于
      ktime_set(S,N) was required for the timespec storage type and is still
      useful for situations where a Seconds and Nanoseconds part of a time value
      needs to be converted. For anything where the Seconds argument is 0, this
      is pointless and can be replaced with a simple assignment.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      8b0e1953
  18. 25 11月, 2016 1 次提交
    • R
      KVM: x86: fix out-of-bounds access in lapic · 444fdad8
      Radim Krčmář 提交于
      Cluster xAPIC delivery incorrectly assumed that dest_id <= 0xff.
      With enabled KVM_X2APIC_API_USE_32BIT_IDS in KVM_CAP_X2APIC_API, a
      userspace can send an interrupt with dest_id that results in
      out-of-bounds access.
      
      Found by syzkaller:
      
        BUG: KASAN: slab-out-of-bounds in kvm_irq_delivery_to_apic_fast+0x11fa/0x1210 at addr ffff88003d9ca750
        Read of size 8 by task syz-executor/22923
        CPU: 0 PID: 22923 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
         [...]
        Call Trace:
         [...] __dump_stack lib/dump_stack.c:15
         [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
         [...] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
         [...] print_address_description mm/kasan/report.c:194
         [...] kasan_report_error mm/kasan/report.c:283
         [...] kasan_report+0x231/0x500 mm/kasan/report.c:303
         [...] __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:329
         [...] kvm_irq_delivery_to_apic_fast+0x11fa/0x1210 arch/x86/kvm/lapic.c:824
         [...] kvm_irq_delivery_to_apic+0x132/0x9a0 arch/x86/kvm/irq_comm.c:72
         [...] kvm_set_msi+0x111/0x160 arch/x86/kvm/irq_comm.c:157
         [...] kvm_send_userspace_msi+0x201/0x280 arch/x86/kvm/../../../virt/kvm/irqchip.c:74
         [...] kvm_vm_ioctl+0xba5/0x1670 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3015
         [...] vfs_ioctl fs/ioctl.c:43
         [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
         [...] SYSC_ioctl fs/ioctl.c:694
         [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
         [...] entry_SYSCALL_64_fastpath+0x1f/0xc2
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: e45115b6 ("KVM: x86: use physical LAPIC array for logical x2APIC")
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      444fdad8
  19. 22 11月, 2016 1 次提交
  20. 03 11月, 2016 2 次提交
    • P
      kvm: x86: avoid atomic operations on APICv vmentry · ad361091
      Paolo Bonzini 提交于
      On some benchmarks (e.g. netperf with ioeventfd disabled), APICv
      posted interrupts turn out to be slower than interrupt injection via
      KVM_REQ_EVENT.
      
      This patch optimizes a bit the IRR update, avoiding expensive atomic
      operations in the common case where PI.ON=0 at vmentry or the PIR vector
      is mostly zero.  This saves at least 20 cycles (1%) per vmexit, as
      measured by kvm-unit-tests' inl_from_qemu test (20 runs):
      
                    | enable_apicv=1  |  enable_apicv=0
                    | mean     stdev  |  mean     stdev
          ----------|-----------------|------------------
          before    | 5826     32.65  |  5765     47.09
          after     | 5809     43.42  |  5777     77.02
      
      Of course, any change in the right column is just placebo effect. :)
      The savings are bigger if interrupts are frequent.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ad361091
    • P
      KVM: x86: use ktime_get instead of seeking the hrtimer_clock_base · 5587859f
      Paolo Bonzini 提交于
      The base clock for the LAPIC timer is always CLOCK_MONOTONIC.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5587859f