1. 09 1月, 2020 2 次提交
  2. 15 11月, 2019 1 次提交
    • N
      KVM: x86: deliver KVM IOAPIC scan request to target vCPUs · 7ee30bc1
      Nitesh Narayan Lal 提交于
      In IOAPIC fixed delivery mode instead of flushing the scan
      requests to all vCPUs, we should only send the requests to
      vCPUs specified within the destination field.
      
      This patch introduces kvm_get_dest_vcpus_mask() API which
      retrieves an array of target vCPUs by using
      kvm_apic_map_get_dest_lapic() and then based on the
      vcpus_idx, it sets the bit in a bitmap. However, if the above
      fails kvm_get_dest_vcpus_mask() finds the target vCPUs by
      traversing all available vCPUs. Followed by setting the
      bits in the bitmap.
      
      If we had different vCPUs in the previous request for the
      same redirection table entry then bits corresponding to
      these vCPUs are also set. This to done to keep
      ioapic_handled_vectors synchronized.
      
      This bitmap is then eventually passed on to
      kvm_make_vcpus_request_mask() to generate a masked request
      only for the target vCPUs.
      
      This would enable us to reduce the latency overhead on isolated
      vCPUs caused by the IPI to process due to KVM_REQ_IOAPIC_SCAN.
      Suggested-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NNitesh Narayan Lal <nitesh@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7ee30bc1
  3. 23 10月, 2019 1 次提交
  4. 24 9月, 2019 1 次提交
  5. 20 7月, 2019 1 次提交
    • W
      KVM: LAPIC: Inject timer interrupt via posted interrupt · 0c5f81da
      Wanpeng Li 提交于
      Dedicated instances are currently disturbed by unnecessary jitter due
      to the emulated lapic timers firing on the same pCPUs where the
      vCPUs reside.  There is no hardware virtual timer on Intel for guest
      like ARM, so both programming timer in guest and the emulated timer fires
      incur vmexits.  This patch tries to avoid vmexit when the emulated timer
      fires, at least in dedicated instance scenario when nohz_full is enabled.
      
      In that case, the emulated timers can be offload to the nearest busy
      housekeeping cpus since APICv has been found for several years in server
      processors. The guest timer interrupt can then be injected via posted interrupts,
      which are delivered by the housekeeping cpu once the emulated timer fires.
      
      The host should tuned so that vCPUs are placed on isolated physical
      processors, and with several pCPUs surplus for busy housekeeping.
      If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode,
      ~3% redis performance benefit can be observed on Skylake server, and the
      number of external interrupt vmexits drops substantially.  Without patch
      
                  VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
      EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )
      
      While with patch:
      
                  VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
      EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0c5f81da
  6. 18 6月, 2019 1 次提交
  7. 05 6月, 2019 2 次提交
    • W
      KVM: LAPIC: Optimize timer latency further · b6c4bc65
      Wanpeng Li 提交于
      Advance lapic timer tries to hidden the hypervisor overhead between the
      host emulated timer fires and the guest awares the timer is fired. However,
      it just hidden the time between apic_timer_fn/handle_preemption_timer ->
      wait_lapic_expire, instead of the real position of vmentry which is
      mentioned in the orignial commit d0659d94 ("KVM: x86: add option to
      advance tscdeadline hrtimer expiration"). There is 700+ cpu cycles between
      the end of wait_lapic_expire and before world switch on my haswell desktop.
      
      This patch tries to narrow the last gap(wait_lapic_expire -> world switch),
      it takes the real overhead time between apic_timer_fn/handle_preemption_timer
      and before world switch into consideration when adaptively tuning timer
      advancement. The patch can reduce 40% latency (~1600+ cycles to ~1000+ cycles
      on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when testing
      busy waits.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b6c4bc65
    • W
      KVM: LAPIC: Delay trace_kvm_wait_lapic_expire tracepoint to after vmexit · ec0671d5
      Wanpeng Li 提交于
      wait_lapic_expire() call was moved above guest_enter_irqoff() because of
      its tracepoint, which violated the RCU extended quiescent state invoked
      by guest_enter_irqoff()[1][2]. This patch simply moves the tracepoint
      below guest_exit_irqoff() in vcpu_enter_guest(). Snapshot the delta before
      VM-Enter, but trace it after VM-Exit. This can help us to move
      wait_lapic_expire() just before vmentry in the later patch.
      
      [1] Commit 8b89fe1f ("kvm: x86: move tracepoints outside extended quiescent state")
      [2] https://patchwork.kernel.org/patch/7821111/
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      [Track whether wait_lapic_expire was called, and do not invoke the tracepoint
       if not. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ec0671d5
  8. 19 4月, 2019 2 次提交
    • S
      KVM: lapic: Allow user to disable adaptive tuning of timer advancement · c3941d9e
      Sean Christopherson 提交于
      The introduction of adaptive tuning of lapic timer advancement did not
      allow for the scenario where userspace would want to disable adaptive
      tuning but still employ timer advancement, e.g. for testing purposes or
      to handle a use case where adaptive tuning is unable to settle on a
      suitable time.  This is epecially pertinent now that KVM places a hard
      threshold on the maximum advancment time.
      
      Rework the timer semantics to accept signed values, with a value of '-1'
      being interpreted as "use adaptive tuning with KVM's internal default",
      and any other value being used as an explicit advancement time, e.g. a
      time of '0' effectively disables advancement.
      
      Note, this does not completely restore the original behavior of
      lapic_timer_advance_ns.  Prior to tracking the advancement per vCPU,
      which is necessary to support autotuning, userspace could adjust
      lapic_timer_advance_ns for *running* vCPU.  With per-vCPU tracking, the
      module params are snapshotted at vCPU creation, i.e. applying a new
      advancement effectively requires restarting a VM.
      
      Dynamically updating a running vCPU is possible, e.g. a helper could be
      added to retrieve the desired delay, choosing between the global module
      param and the per-VCPU value depending on whether or not auto-tuning is
      (globally) enabled, but introduces a great deal of complexity.  The
      wrapper itself is not complex, but understanding and documenting the
      effects of dynamically toggling auto-tuning and/or adjusting the timer
      advancement is nigh impossible since the behavior would be dependent on
      KVM's implementation as well as compiler optimizations.  In other words,
      providing stable behavior would require extremely careful consideration
      now and in the future.
      
      Given that the expected use of a manually-tuned timer advancement is to
      "tune once, run many", use the vastly simpler approach of recognizing
      changes to the module params only when creating a new vCPU.
      
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Fixes: 3b8a5df6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c3941d9e
    • S
      KVM: lapic: Track lapic timer advance per vCPU · 39497d76
      Sean Christopherson 提交于
      Automatically adjusting the globally-shared timer advancement could
      corrupt the timer, e.g. if multiple vCPUs are concurrently adjusting
      the advancement value.  That could be partially fixed by using a local
      variable for the arithmetic, but it would still be susceptible to a
      race when setting timer_advance_adjust_done.
      
      And because virtual_tsc_khz and tsc_scaling_ratio are per-vCPU, the
      correct calibration for a given vCPU may not apply to all vCPUs.
      
      Furthermore, lapic_timer_advance_ns is marked __read_mostly, which is
      effectively violated when finding a stable advancement takes an extended
      amount of timer.
      
      Opportunistically change the definition of lapic_timer_advance_ns to
      a u32 so that it matches the style of struct kvm_timer.  Explicitly
      pass the param to kvm_create_lapic() so that it doesn't have to be
      exposed to lapic.c, thus reducing the probability of unintentionally
      using the global value instead of the per-vCPU value.
      
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Fixes: 3b8a5df6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      39497d76
  9. 17 10月, 2018 1 次提交
  10. 15 5月, 2018 1 次提交
  11. 29 3月, 2018 1 次提交
  12. 16 1月, 2018 1 次提交
  13. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  14. 07 8月, 2017 1 次提交
  15. 30 6月, 2017 1 次提交
  16. 15 2月, 2017 1 次提交
  17. 12 1月, 2017 1 次提交
  18. 09 1月, 2017 2 次提交
  19. 03 11月, 2016 2 次提交
    • W
      KVM: LAPIC: add APIC Timer periodic/oneshot mode VMX preemption timer support · 8003c9ae
      Wanpeng Li 提交于
      Most windows guests still utilize APIC Timer periodic/oneshot mode
      instead of tsc-deadline mode, and the APIC Timer periodic/oneshot
      mode are still emulated by high overhead hrtimer on host. This patch
      converts the expected expire time of the periodic/oneshot mode to
      guest deadline tsc in order to leverage VMX preemption timer logic
      for APIC Timer tsc-deadline mode. After each preemption timer vmexit
      preemption timer is restarted to emulate LVTT current-count register
      is automatically reloaded from the initial-count register when the
      count reaches 0. This patch reduces ~5600 cycles for each APIC Timer
      periodic mode operation virtualization.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Yunhong Jiang <yunhong.jiang@intel.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      [Squashed with my fixes that were reviewed-by Paolo.]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      8003c9ae
    • W
      KVM: LAPIC: introduce kvm_get_lapic_target_expiration_tsc() · 498f8162
      Wanpeng Li 提交于
      Introdce kvm_get_lapic_target_expiration_tsc() to get APIC Timer target
      deadline tsc.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Yunhong Jiang <yunhong.jiang@intel.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      498f8162
  20. 14 7月, 2016 2 次提交
  21. 16 6月, 2016 1 次提交
    • Y
      KVM: x86: support using the vmx preemption timer for tsc deadline timer · ce7a058a
      Yunhong Jiang 提交于
      The VMX preemption timer can be used to virtualize the TSC deadline timer.
      The VMX preemption timer is armed when the vCPU is running, and a VMExit
      will happen if the virtual TSC deadline timer expires.
      
      When the vCPU thread is blocked because of HLT, KVM will switch to use
      an hrtimer, and then go back to the VMX preemption timer when the vCPU
      thread is unblocked.
      
      This solution avoids the complex OS's hrtimer system, and the host
      timer interrupt handling cost, replacing them with a little math
      (for guest->host TSC and host TSC->preemption timer conversion)
      and a cheaper VMexit.  This benefits latency for isolated pCPUs.
      
      [A word about performance... Yunhong reported a 30% reduction in average
       latency from cyclictest.  I made a similar test with tscdeadline_latency
       from kvm-unit-tests, and measured
      
       - ~20 clock cycles loss (out of ~3200, so less than 1% but still
         statistically significant) in the worst case where the test halts
         just after programming the TSC deadline timer
      
       - ~800 clock cycles gain (25% reduction in latency) in the best case
         where the test busy waits.
      
       I removed the VMX bits from Yunhong's patch, to concentrate them in the
       next patch - Paolo]
      Signed-off-by: NYunhong Jiang <yunhong.jiang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ce7a058a
  22. 19 5月, 2016 3 次提交
  23. 03 3月, 2016 1 次提交
  24. 09 2月, 2016 2 次提交
  25. 26 11月, 2015 2 次提交
    • A
      kvm/x86: Hyper-V synthetic interrupt controller · 5c919412
      Andrey Smetanin 提交于
      SynIC (synthetic interrupt controller) is a lapic extension,
      which is controlled via MSRs and maintains for each vCPU
       - 16 synthetic interrupt "lines" (SINT's); each can be configured to
         trigger a specific interrupt vector optionally with auto-EOI
         semantics
       - a message page in the guest memory with 16 256-byte per-SINT message
         slots
       - an event flag page in the guest memory with 16 2048-bit per-SINT
         event flag areas
      
      The host triggers a SINT whenever it delivers a new message to the
      corresponding slot or flips an event flag bit in the corresponding area.
      The guest informs the host that it can try delivering a message by
      explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
      MSR.
      
      The userspace (qemu) triggers interrupts and receives EOM notifications
      via irqfd with resampler; for that, a GSI is allocated for each
      configured SINT, and irq_routing api is extended to support GSI-SINT
      mapping.
      
      Changes v4:
      * added activation of SynIC by vcpu KVM_ENABLE_CAP
      * added per SynIC active flag
      * added deactivation of APICv upon SynIC activation
      
      Changes v3:
      * added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into
      docs
      
      Changes v2:
      * do not use posted interrupts for Hyper-V SynIC AutoEOI vectors
      * add Hyper-V SynIC vectors into EOI exit bitmap
      * Hyper-V SyniIC SINT msr write logic simplified
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5c919412
    • A
      kvm/x86: per-vcpu apicv deactivation support · d62caabb
      Andrey Smetanin 提交于
      The decision on whether to use hardware APIC virtualization used to be
      taken globally, based on the availability of the feature in the CPU
      and the value of a module parameter.
      
      However, under certain circumstances we want to control it on per-vcpu
      basis.  In particular, when the userspace activates HyperV synthetic
      interrupt controller (SynIC), APICv has to be disabled as it's
      incompatible with SynIC auto-EOI behavior.
      
      To achieve that, introduce 'apicv_active' flag on struct
      kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv
      off.  The flag is initialized based on the module parameter and CPU
      capability, and consulted whenever an APICv-specific action is
      performed.
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d62caabb
  26. 01 10月, 2015 3 次提交
  27. 23 7月, 2015 1 次提交
  28. 04 7月, 2015 1 次提交