1. 13 5月, 2016 1 次提交
    • C
      KVM: halt_polling: provide a way to qualify wakeups during poll · 3491caf2
      Christian Borntraeger 提交于
      Some wakeups should not be considered a sucessful poll. For example on
      s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
      would be considered runnable - letting all vCPUs poll all the time for
      transactional like workload, even if one vCPU would be enough.
      This can result in huge CPU usage for large guests.
      This patch lets architectures provide a way to qualify wakeups if they
      should be considered a good/bad wakeups in regard to polls.
      
      For s390 the implementation will fence of halt polling for anything but
      known good, single vCPU events. The s390 implementation for floating
      interrupts does a wakeup for one vCPU, but the interrupt will be delivered
      by whatever CPU checks first for a pending interrupt. We prefer the
      woken up CPU by marking the poll of this CPU as "good" poll.
      This code will also mark several other wakeup reasons like IPI or
      expired timers as "good". This will of course also mark some events as
      not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
      we prefer to not poll, unless we are really sure, though.
      
      This patch successfully limits the CPU usage for cases like uperf 1byte
      transactional ping pong workload or wakeup heavy workload like OLTP
      while still providing a proper speedup.
      
      This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
      wakeups that are considered not good for polling.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
      Cc: David Matlack <dmatlack@google.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      [Rename config symbol. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3491caf2
  2. 12 5月, 2016 3 次提交
  3. 03 5月, 2016 2 次提交
  4. 06 4月, 2016 1 次提交
    • M
      KVM: arm/arm64: Handle forward time correction gracefully · 1c5631c7
      Marc Zyngier 提交于
      On a host that runs NTP, corrections can have a direct impact on
      the background timer that we program on the behalf of a vcpu.
      
      In particular, NTP performing a forward correction will result in
      a timer expiring sooner than expected from a guest point of view.
      Not a big deal, we kick the vcpu anyway.
      
      But on wake-up, the vcpu thread is going to perform a check to
      find out whether or not it should block. And at that point, the
      timer check is going to say "timer has not expired yet, go back
      to sleep". This results in the timer event being lost forever.
      
      There are multiple ways to handle this. One would be record that
      the timer has expired and let kvm_cpu_has_pending_timer return
      true in that case, but that would be fairly invasive. Another is
      to check for the "short sleep" condition in the hrtimer callback,
      and restart the timer for the remaining time when the condition
      is detected.
      
      This patch implements the latter, with a bit of refactoring in
      order to avoid too much code duplication.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: NAlexander Graf <agraf@suse.de>
      Reviewed-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      1c5631c7
  5. 01 4月, 2016 1 次提交
    • W
      arm64: KVM: Add braces to multi-line if statement in virtual PMU code · 7d4bd1d2
      Will Deacon 提交于
      The kernel is written in C, not python, so we need braces around
      multi-line if statements. GCC 6 actually warns about this, thanks to the
      fantastic new "-Wmisleading-indentation" flag:
      
       | virt/kvm/arm/pmu.c: In function ‘kvm_pmu_overflow_status’:
       | virt/kvm/arm/pmu.c:198:3: warning: statement is indented as if it were guarded by... [-Wmisleading-indentation]
       |    reg &= vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
       |    ^~~
       | arch/arm64/kvm/../../../virt/kvm/arm/pmu.c:196:2: note: ...this ‘if’ clause, but it is not
       |   if ((vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E))
       |   ^~
      
      As it turns out, this particular case is harmless (we just do some &=
      operations with 0), but worth fixing nonetheless.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      7d4bd1d2
  6. 22 3月, 2016 3 次提交
    • L
      KVM: Replace smp_mb() with smp_load_acquire() in the kvm_flush_remote_tlbs() · 4ae3cb3a
      Lan Tianyu 提交于
      smp_load_acquire() is enough here and it's cheaper than smp_mb().
      Adding a comment about reusing memory barrier of kvm_make_all_cpus_request()
      here to keep order between modifications to the page tables and reading mode.
      Signed-off-by: NLan Tianyu <tianyu.lan@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4ae3cb3a
    • L
    • P
      KVM: fix spin_lock_init order on x86 · e9ad4ec8
      Paolo Bonzini 提交于
      Moving the initialization earlier is needed in 4.6 because
      kvm_arch_init_vm is now using mmu_lock, causing lockdep to
      complain:
      
      [  284.440294] INFO: trying to register non-static key.
      [  284.445259] the code is fine but needs lockdep annotation.
      [  284.450736] turning off the locking correctness validator.
      ...
      [  284.528318]  [<ffffffff810aecc3>] lock_acquire+0xd3/0x240
      [  284.533733]  [<ffffffffa0305aa0>] ? kvm_page_track_register_notifier+0x20/0x60 [kvm]
      [  284.541467]  [<ffffffff81715581>] _raw_spin_lock+0x41/0x80
      [  284.546960]  [<ffffffffa0305aa0>] ? kvm_page_track_register_notifier+0x20/0x60 [kvm]
      [  284.554707]  [<ffffffffa0305aa0>] kvm_page_track_register_notifier+0x20/0x60 [kvm]
      [  284.562281]  [<ffffffffa02ece70>] kvm_mmu_init_vm+0x20/0x30 [kvm]
      [  284.568381]  [<ffffffffa02dbf7a>] kvm_arch_init_vm+0x1ea/0x200 [kvm]
      [  284.574740]  [<ffffffffa02bff3f>] kvm_dev_ioctl+0xbf/0x4d0 [kvm]
      
      However, it also helps fixing a preexisting problem, which is why this
      patch is also good for stable kernels: kvm_create_vm was incrementing
      current->mm->mm_count but not decrementing it at the out_err label (in
      case kvm_init_mmu_notifier failed).  The new initialization order makes
      it possible to add the required mmdrop without adding a new error label.
      
      Cc: stable@vger.kernel.org
      Reported-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e9ad4ec8
  7. 09 3月, 2016 10 次提交
  8. 04 3月, 2016 1 次提交
    • P
      KVM: ensure __gfn_to_pfn_memslot initializes *writable · b2740d35
      Paolo Bonzini 提交于
      For the kvm_is_error_hva, ubsan complains if the uninitialized writable
      is passed to __direct_map, even though the value itself is not used
      (__direct_map goes to mmu_set_spte->set_spte->set_mmio_spte but never
      looks at that argument).
      
      Ensuring that __gfn_to_pfn_memslot initializes *writable is cheap and
      avoids this kind of issue.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b2740d35
  9. 03 3月, 2016 1 次提交
  10. 01 3月, 2016 13 次提交
  11. 25 2月, 2016 1 次提交
    • M
      KVM: Use simple waitqueue for vcpu->wq · 8577370f
      Marcelo Tosatti 提交于
      The problem:
      
      On -rt, an emulated LAPIC timer instances has the following path:
      
      1) hard interrupt
      2) ksoftirqd is scheduled
      3) ksoftirqd wakes up vcpu thread
      4) vcpu thread is scheduled
      
      This extra context switch introduces unnecessary latency in the
      LAPIC path for a KVM guest.
      
      The solution:
      
      Allow waking up vcpu thread from hardirq context,
      thus avoiding the need for ksoftirqd to be scheduled.
      
      Normal waitqueues make use of spinlocks, which on -RT
      are sleepable locks. Therefore, waking up a waitqueue
      waiter involves locking a sleeping lock, which
      is not allowed from hard interrupt context.
      
      cyclictest command line:
      
      This patch reduces the average latency in my tests from 14us to 11us.
      
      Daniel writes:
      Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency
      benchmark on mainline. The test was run 1000 times on
      tip/sched/core 4.4.0-rc8-01134-g0905f04e:
      
        ./x86-run x86/tscdeadline_latency.flat -cpu host
      
      with idle=poll.
      
      The test seems not to deliver really stable numbers though most of
      them are smaller. Paolo write:
      
      "Anything above ~10000 cycles means that the host went to C1 or
      lower---the number means more or less nothing in that case.
      
      The mean shows an improvement indeed."
      
      Before:
      
                     min             max         mean           std
      count  1000.000000     1000.000000  1000.000000   1000.000000
      mean   5162.596000  2019270.084000  5824.491541  20681.645558
      std      75.431231   622607.723969    89.575700   6492.272062
      min    4466.000000    23928.000000  5537.926500    585.864966
      25%    5163.000000  1613252.750000  5790.132275  16683.745433
      50%    5175.000000  2281919.000000  5834.654000  23151.990026
      75%    5190.000000  2382865.750000  5861.412950  24148.206168
      max    5228.000000  4175158.000000  6254.827300  46481.048691
      
      After
                     min            max         mean           std
      count  1000.000000     1000.00000  1000.000000   1000.000000
      mean   5143.511000  2076886.10300  5813.312474  21207.357565
      std      77.668322   610413.09583    86.541500   6331.915127
      min    4427.000000    25103.00000  5529.756600    559.187707
      25%    5148.000000  1691272.75000  5784.889825  17473.518244
      50%    5160.000000  2308328.50000  5832.025000  23464.837068
      75%    5172.000000  2393037.75000  5853.177675  24223.969976
      max    5222.000000  3922458.00000  6186.720500  42520.379830
      
      [Patch was originaly based on the swait implementation found in the -rt
       tree. Daniel ported it to mainline's version and gathered the
       benchmark numbers for tscdeadline_latency test.]
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: linux-rt-users@vger.kernel.org
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8577370f
  12. 24 2月, 2016 2 次提交
    • C
      KVM: async_pf: do not warn on page allocation failures · d7444794
      Christian Borntraeger 提交于
      In async_pf we try to allocate with NOWAIT to get an element quickly
      or fail. This code also handle failures gracefully. Lets silence
      potential page allocation failures under load.
      
      qemu-system-s39: page allocation failure: order:0,mode:0x2200000
      [...]
      Call Trace:
      ([<00000000001146b8>] show_trace+0xf8/0x148)
      [<000000000011476a>] show_stack+0x62/0xe8
      [<00000000004a36b8>] dump_stack+0x70/0x98
      [<0000000000272c3a>] warn_alloc_failed+0xd2/0x148
      [<000000000027709e>] __alloc_pages_nodemask+0x94e/0xb38
      [<00000000002cd36a>] new_slab+0x382/0x400
      [<00000000002cf7ac>] ___slab_alloc.constprop.30+0x2dc/0x378
      [<00000000002d03d0>] kmem_cache_alloc+0x160/0x1d0
      [<0000000000133db4>] kvm_setup_async_pf+0x6c/0x198
      [<000000000013dee8>] kvm_arch_vcpu_ioctl_run+0xd48/0xd58
      [<000000000012fcaa>] kvm_vcpu_ioctl+0x372/0x690
      [<00000000002f66f6>] do_vfs_ioctl+0x3be/0x510
      [<00000000002f68ec>] SyS_ioctl+0xa4/0xb8
      [<0000000000781c5e>] system_call+0xd6/0x264
      [<000003ffa24fa06a>] 0x3ffa24fa06a
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d7444794
    • M
      KVM: arm/arm64: vgic: Ensure bitmaps are long enough · 236cf17c
      Mark Rutland 提交于
      When we allocate bitmaps in vgic_vcpu_init_maps, we divide the number of
      bits we need by 8 to figure out how many bytes to allocate. However,
      bitmap elements are always accessed as unsigned longs, and if we didn't
      happen to allocate a size such that size % sizeof(unsigned long) == 0,
      bitmap accesses may go past the end of the allocation.
      
      When using KASAN (which does byte-granular access checks), this results
      in a continuous stream of BUGs whenever these bitmaps are accessed:
      
      =============================================================================
      BUG kmalloc-128 (Tainted: G    B          ): kasan: bad access detected
      -----------------------------------------------------------------------------
      
      INFO: Allocated in vgic_init.part.25+0x55c/0x990 age=7493 cpu=3 pid=1730
      INFO: Slab 0xffffffbde6d5da40 objects=16 used=15 fp=0xffffffc935769700 flags=0x4000000000000080
      INFO: Object 0xffffffc935769500 @offset=1280 fp=0x          (null)
      
      Bytes b4 ffffffc9357694f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      Object ffffffc935769500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      Object ffffffc935769510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      Object ffffffc935769520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      Object ffffffc935769530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      Object ffffffc935769540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      Object ffffffc935769550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      Object ffffffc935769560: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      Object ffffffc935769570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      Padding ffffffc9357695b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      Padding ffffffc9357695c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      Padding ffffffc9357695d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      Padding ffffffc9357695e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      Padding ffffffc9357695f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      CPU: 3 PID: 1740 Comm: kvm-vcpu-0 Tainted: G    B           4.4.0+ #17
      Hardware name: ARM Juno development board (r1) (DT)
      Call trace:
      [<ffffffc00008e770>] dump_backtrace+0x0/0x280
      [<ffffffc00008ea04>] show_stack+0x14/0x20
      [<ffffffc000726360>] dump_stack+0x100/0x188
      [<ffffffc00030d324>] print_trailer+0xfc/0x168
      [<ffffffc000312294>] object_err+0x3c/0x50
      [<ffffffc0003140fc>] kasan_report_error+0x244/0x558
      [<ffffffc000314548>] __asan_report_load8_noabort+0x48/0x50
      [<ffffffc000745688>] __bitmap_or+0xc0/0xc8
      [<ffffffc0000d9e44>] kvm_vgic_flush_hwstate+0x1bc/0x650
      [<ffffffc0000c514c>] kvm_arch_vcpu_ioctl_run+0x2ec/0xa60
      [<ffffffc0000b9a6c>] kvm_vcpu_ioctl+0x474/0xa68
      [<ffffffc00036b7b0>] do_vfs_ioctl+0x5b8/0xcb0
      [<ffffffc00036bf34>] SyS_ioctl+0x8c/0xa0
      [<ffffffc000086cb0>] el0_svc_naked+0x24/0x28
      Memory state around the buggy address:
       ffffffc935769400: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffffffc935769480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffffffc935769500: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                         ^
       ffffffc935769580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffffffc935769600: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
      ==================================================================
      
      Fix the issue by always allocating a multiple of sizeof(unsigned long),
      as we do elsewhere in the vgic code.
      
      Fixes: c1bfb577 ("arm/arm64: KVM: vgic: switch to dynamic allocation")
      Cc: stable@vger.kernel.org
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      236cf17c
  13. 23 2月, 2016 1 次提交