1. 03 11月, 2022 8 次提交
  2. 02 11月, 2022 1 次提交
  3. 27 10月, 2022 2 次提交
  4. 08 10月, 2022 8 次提交
  5. 29 9月, 2022 2 次提交
  6. 21 9月, 2022 3 次提交
    • L
      KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog · 5898671f
      Like Xu 提交于
      mainline inclusion
      from mainline-v5.18
      commit 75189d1d
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5RD6Y
      CVE: NA
      
      -------------
      
      NMI-watchdog is one of the favorite features of kernel developers,
      but it does not work in AMD guest even with vPMU enabled and worse,
      the system misrepresents this capability via /proc.
      
      This is a PMC emulation error. KVM does not pass the latest valid
      value to perf_event in time when guest NMI-watchdog is running, thus
      the perf_event corresponding to the watchdog counter will enter the
      old state at some point after the first guest NMI injection, forcing
      the hardware register PMC0 to be constantly written to 0x800000000001.
      
      Meanwhile, the running counter should accurately reflect its new value
      based on the latest coordinated pmc->counter (from vPMC's point of view)
      rather than the value written directly by the guest.
      
      Fixes: 168d918f ("KVM: x86: Adjust counter sample period after a wrmsr")
      Reported-by: NDongli Cao <caodongli@kingsoft.com>
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Reviewed-by: NYanan Wang <wangyanan55@huawei.com>
      Tested-by: NYanan Wang <wangyanan55@huawei.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Message-Id: <20220409015226.38619-1-likexu@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      5898671f
    • L
      KVM: x86/pmu: Introduce pmc->is_paused to reduce the call time of perf interfaces · e7815ebf
      Like Xu 提交于
      mainline inclusion
      from mainline-v5.14
      commit e79f49c3
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5RD6Y
      CVE: NA
      
      -------------
      
      Based on our observations, after any vm-exit associated with vPMU, there
      are at least two or more perf interfaces to be called for guest counter
      emulation, such as perf_event_{pause, read_value, period}(), and each one
      will {lock, unlock} the same perf_event_ctx. The frequency of calls becomes
      more severe when guest use counters in a multiplexed manner.
      
      Holding a lock once and completing the KVM request operations in the perf
      context would introduce a set of impractical new interfaces. So we can
      further optimize the vPMU implementation by avoiding repeated calls to
      these interfaces in the KVM context for at least one pattern:
      
      After we call perf_event_pause() once, the event will be disabled and its
      internal count will be reset to 0. So there is no need to pause it again
      or read its value. Once the event is paused, event period will not be
      updated until the next time it's resumed or reprogrammed. And there is
      also no need to call perf_event_period twice for a non-running counter,
      considering the perf_event for a running counter is never paused.
      
      Based on this implementation, for the following common usage of
      sampling 4 events using perf on a 4u8g guest:
      
        echo 0 > /proc/sys/kernel/watchdog
        echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent
        echo 10000 > /proc/sys/kernel/perf_event_max_sample_rate
        echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
        for i in `seq 1 1 10`
        do
        taskset -c 0 perf record \
        -e cpu-cycles -e instructions -e branch-instructions -e cache-misses \
        /root/br_instr a
        done
      
      the average latency of the guest NMI handler is reduced from
      37646.7 ns to 32929.3 ns (~1.14x speed up) on the Intel ICX server.
      Also, in addition to collecting more samples, no loss of sampling
      accuracy was observed compared to before the optimization.
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Message-Id: <20210728120705.6855-1-likexu@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      e7815ebf
    • D
      x86/speculation: Add RSB VM Exit protections · 30b1e4cb
      Daniel Sneddon 提交于
      stable inclusion
      from stable-v5.10.136
      commit 509c2c9fe75ea7493eebbb6bb2f711f37530ae19
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5N1SO
      CVE: CVE-2022-26373
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=509c2c9fe75ea7493eebbb6bb2f711f37530ae19
      
      --------------------------------
      
      commit 2b129932 upstream.
      
      tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as
      documented for RET instructions after VM exits. Mitigate it with a new
      one-entry RSB stuffing mechanism and a new LFENCE.
      
      == Background ==
      
      Indirect Branch Restricted Speculation (IBRS) was designed to help
      mitigate Branch Target Injection and Speculative Store Bypass, i.e.
      Spectre, attacks. IBRS prevents software run in less privileged modes
      from affecting branch prediction in more privileged modes. IBRS requires
      the MSR to be written on every privilege level change.
      
      To overcome some of the performance issues of IBRS, Enhanced IBRS was
      introduced.  eIBRS is an "always on" IBRS, in other words, just turn
      it on once instead of writing the MSR on every privilege level change.
      When eIBRS is enabled, more privileged modes should be protected from
      less privileged modes, including protecting VMMs from guests.
      
      == Problem ==
      
      Here's a simplification of how guests are run on Linux' KVM:
      
      void run_kvm_guest(void)
      {
      	// Prepare to run guest
      	VMRESUME();
      	// Clean up after guest runs
      }
      
      The execution flow for that would look something like this to the
      processor:
      
      1. Host-side: call run_kvm_guest()
      2. Host-side: VMRESUME
      3. Guest runs, does "CALL guest_function"
      4. VM exit, host runs again
      5. Host might make some "cleanup" function calls
      6. Host-side: RET from run_kvm_guest()
      
      Now, when back on the host, there are a couple of possible scenarios of
      post-guest activity the host needs to do before executing host code:
      
      * on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not
      touched and Linux has to do a 32-entry stuffing.
      
      * on eIBRS hardware, VM exit with IBRS enabled, or restoring the host
      IBRS=1 shortly after VM exit, has a documented side effect of flushing
      the RSB except in this PBRSB situation where the software needs to stuff
      the last RSB entry "by hand".
      
      IOW, with eIBRS supported, host RET instructions should no longer be
      influenced by guest behavior after the host retires a single CALL
      instruction.
      
      However, if the RET instructions are "unbalanced" with CALLs after a VM
      exit as is the RET in #6, it might speculatively use the address for the
      instruction after the CALL in #3 as an RSB prediction. This is a problem
      since the (untrusted) guest controls this address.
      
      Balanced CALL/RET instruction pairs such as in step #5 are not affected.
      
      == Solution ==
      
      The PBRSB issue affects a wide variety of Intel processors which
      support eIBRS. But not all of them need mitigation. Today,
      X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates
      PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e.,
      eIBRS systems which enable legacy IBRS explicitly.
      
      However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT
      and most of them need a new mitigation.
      
      Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE
      which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT.
      
      The lighter-weight mitigation performs a CALL instruction which is
      immediately followed by a speculative execution barrier (INT3). This
      steers speculative execution to the barrier -- just like a retpoline
      -- which ensures that speculation can never reach an unbalanced RET.
      Then, ensure this CALL is retired before continuing execution with an
      LFENCE.
      
      In other words, the window of exposure is opened at VM exit where RET
      behavior is troublesome. While the window is open, force RSB predictions
      sampling for RET targets to a dead end at the INT3. Close the window
      with the LFENCE.
      
      There is a subset of eIBRS systems which are not vulnerable to PBRSB.
      Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB.
      Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO.
      
        [ bp: Massage, incorporate review comments from Andy Cooper. ]
      Signed-off-by: NDaniel Sneddon <daniel.sneddon@linux.intel.com>
      Co-developed-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      conflict:
          arch/x86/include/asm/cpufeatures.h
      Signed-off-by: NChen Jiahao <chenjiahao16@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Reviewed-by: NLiao Chang <liaochang1@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      30b1e4cb
  7. 20 9月, 2022 16 次提交