1. 15 8月, 2019 4 次提交
    • T
      x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS · f5d27995
      Thomas Gleixner 提交于
      commit f36cf386e3fec258a341d446915862eded3e13d8 upstream.
      
      Intel provided the following information:
      
       On all current Atom processors, instructions that use a segment register
       value (e.g. a load or store) will not speculatively execute before the
       last writer of that segment retires. Thus they will not use a
       speculatively written segment value.
      
      That means on ATOMs there is no speculation through SWAPGS, so the SWAPGS
      entry paths can be excluded from the extra LFENCE if PTI is disabled.
      
      Create a separate bug flag for the through SWAPGS speculation and mark all
      out-of-order ATOMs and AMD/HYGON CPUs as not affected. The in-order ATOMs
      are excluded from the whole mitigation mess anyway.
      Reported-by: NAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      f5d27995
    • J
      x86/speculation: Enable Spectre v1 swapgs mitigations · 12351a6c
      Josh Poimboeuf 提交于
      commit a2059825986a1c8143fd6698774fa9d83733bb11 upstream.
      
      The previous commit added macro calls in the entry code which mitigate the
      Spectre v1 swapgs issue if the X86_FEATURE_FENCE_SWAPGS_* features are
      enabled.  Enable those features where applicable.
      
      The mitigations may be disabled with "nospectre_v1" or "mitigations=off".
      
      There are different features which can affect the risk of attack:
      
      - When FSGSBASE is enabled, unprivileged users are able to place any
        value in GS, using the wrgsbase instruction.  This means they can
        write a GS value which points to any value in kernel space, which can
        be useful with the following gadget in an interrupt/exception/NMI
        handler:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg1
      	// dependent load or store based on the value of %reg
      	// for example: mov %(reg1), %reg2
      
        If an interrupt is coming from user space, and the entry code
        speculatively skips the swapgs (due to user branch mistraining), it
        may speculatively execute the GS-based load and a subsequent dependent
        load or store, exposing the kernel data to an L1 side channel leak.
      
        Note that, on Intel, a similar attack exists in the above gadget when
        coming from kernel space, if the swapgs gets speculatively executed to
        switch back to the user GS.  On AMD, this variant isn't possible
        because swapgs is serializing with respect to future GS-based
        accesses.
      
        NOTE: The FSGSBASE patch set hasn't been merged yet, so the above case
      	doesn't exist quite yet.
      
      - When FSGSBASE is disabled, the issue is mitigated somewhat because
        unprivileged users must use prctl(ARCH_SET_GS) to set GS, which
        restricts GS values to user space addresses only.  That means the
        gadget would need an additional step, since the target kernel address
        needs to be read from user space first.  Something like:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg1
      	mov (%reg1), %reg2
      	// dependent load or store based on the value of %reg2
      	// for example: mov %(reg2), %reg3
      
        It's difficult to audit for this gadget in all the handlers, so while
        there are no known instances of it, it's entirely possible that it
        exists somewhere (or could be introduced in the future).  Without
        tooling to analyze all such code paths, consider it vulnerable.
      
        Effects of SMAP on the !FSGSBASE case:
      
        - If SMAP is enabled, and the CPU reports RDCL_NO (i.e., not
          susceptible to Meltdown), the kernel is prevented from speculatively
          reading user space memory, even L1 cached values.  This effectively
          disables the !FSGSBASE attack vector.
      
        - If SMAP is enabled, but the CPU *is* susceptible to Meltdown, SMAP
          still prevents the kernel from speculatively reading user space
          memory.  But it does *not* prevent the kernel from reading the
          user value from L1, if it has already been cached.  This is probably
          only a small hurdle for an attacker to overcome.
      
      Thanks to Dave Hansen for contributing the speculative_smap() function.
      
      Thanks to Andrew Cooper for providing the inside scoop on whether swapgs
      is serializing on AMD.
      
      [ tglx: Fixed the USER fence decision and polished the comment as suggested
        	by Dave Hansen ]
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      12351a6c
    • F
      x86/cpufeatures: Combine word 11 and 12 into a new scattered features word · 74f65670
      Fenghua Yu 提交于
      commit acec0ce081de0c36459eea91647faf99296445a3 upstream.
      
      It's a waste for the four X86_FEATURE_CQM_* feature bits to occupy two
      whole feature bits words. To better utilize feature words, re-define
      word 11 to host scattered features and move the four X86_FEATURE_CQM_*
      features into Linux defined word 11. More scattered features can be
      added in word 11 in the future.
      
      Rename leaf 11 in cpuid_leafs to CPUID_LNX_4 to reflect it's a
      Linux-defined leaf.
      
      Rename leaf 12 as CPUID_DUMMY which will be replaced by a meaningful
      name in the next patch when CPUID.7.1:EAX occupies world 12.
      
      Maximum number of RMID and cache occupancy scale are retrieved from
      CPUID.0xf.1 after scattered CQM features are enumerated. Carve out the
      code into a separate function.
      
      KVM doesn't support resctrl now. So it's safe to move the
      X86_FEATURE_CQM_* features to scattered features word 11 for KVM.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Aaron Lewis <aaronlewis@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Babu Moger <babu.moger@amd.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
      Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
      Cc: x86 <x86@kernel.org>
      Link: https://lkml.kernel.org/r/1560794416-217638-2-git-send-email-fenghua.yu@intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      74f65670
    • B
      x86/cpufeatures: Carve out CQM features retrieval · deaf49f3
      Borislav Petkov 提交于
      commit 45fc56e629caa451467e7664fbd4c797c434a6c4 upstream.
      
      ... into a separate function for better readability. Split out from a
      patch from Fenghua Yu <fenghua.yu@intel.com> to keep the mechanical,
      sole code movement separate for easy review.
      
      No functional changes.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: x86@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      deaf49f3
  2. 10 7月, 2019 1 次提交
  3. 05 7月, 2019 2 次提交
    • J
      Revert "x86/tsc: Prepare warp test for TSC adjustment" · f5f62304
      Jiufei Xue 提交于
      This reverts commit 76d3b851.
      
      The returned value for check_tsc_warp() is useless now, remove it.
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      f5f62304
    • J
      Revert "x86/tsc: Try to adjust TSC if sync test fails" · 64a17a82
      Jiufei Xue 提交于
      This reverts commit cc4db268.
      
      When we do hot-add and enable vCPU, the time inside the VM jumps and
      then VM stucks.
      The dmesg shows like this:
      [   48.402948] CPU2 has been hot-added
      [   48.413774] smpboot: Booting Node 0 Processor 2 APIC 0x2
      [   48.415155] kvm-clock: cpu 2, msr 6b615081, secondary cpu clock
      [   48.453690] TSC ADJUST compensate: CPU2 observed 139318776350 warp.  Adjust: 139318776350
      [  102.060874] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
      [  102.060874] clocksource:                       'kvm-clock' wd_now: 1cb1cfc4bf8 wd_last: 1be9588f1fe mask: ffffffffffffffff
      [  102.060874] clocksource:                       'tsc' cs_now: 207d794f7e cs_last: 205a32697a mask: ffffffffffffffff
      [  102.060874] tsc: Marking TSC unstable due to clocksource watchdog
      [  102.070188] KVM setup async PF for cpu 2
      [  102.071461] kvm-stealtime: cpu 2, msr 13ba95000
      [  102.074530] Will online and init hotplugged CPU: 2
      
      This is because the TSC for the newly added VCPU is initialized to 0
      while others are ahead. Guest will do the TSC ADJUST compensate and
      cause the time jumps.
      
      Commit bd8fab39("KVM: x86: fix maintaining of kvm_clock stability
      on guest CPU hotplug") can fix this problem.  However, the host kernel
      version may be older, so do not ajust TSC if sync test fails, just mark
      it unstable.
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      64a17a82
  4. 03 7月, 2019 3 次提交
    • R
      x86/resctrl: Prevent possible overrun during bitmap operations · 32746032
      Reinette Chatre 提交于
      commit 32f010deab575199df4ebe7b6aec20c17bb7eccd upstream.
      
      While the DOC at the beginning of lib/bitmap.c explicitly states that
      "The number of valid bits in a given bitmap does _not_ need to be an
      exact multiple of BITS_PER_LONG.", some of the bitmap operations do
      indeed access BITS_PER_LONG portions of the provided bitmap no matter
      the size of the provided bitmap.
      
      For example, if find_first_bit() is provided with an 8 bit bitmap the
      operation will access BITS_PER_LONG bits from the provided bitmap. While
      the operation ensures that these extra bits do not affect the result,
      the memory is still accessed.
      
      The capacity bitmasks (CBMs) are typically stored in u32 since they
      can never exceed 32 bits. A few instances exist where a bitmap_*
      operation is performed on a CBM by simply pointing the bitmap operation
      to the stored u32 value.
      
      The consequence of this pattern is that some bitmap_* operations will
      access out-of-bounds memory when interacting with the provided CBM.
      
      This same issue has previously been addressed with commit 49e00eee
      ("x86/intel_rdt: Fix out-of-bounds memory access in CBM tests")
      but at that time not all instances of the issue were fixed.
      
      Fix this by using an unsigned long to store the capacity bitmask data
      that is passed to bitmap functions.
      
      Fixes: e6519011 ("x86/intel_rdt: Introduce "bit_usage" to display cache allocations details")
      Fixes: f4e80d67 ("x86/intel_rdt: Resctrl files reflect pseudo-locked information")
      Fixes: 95f0b77e ("x86/intel_rdt: Initialize new resource group with sane defaults")
      Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: stable <stable@vger.kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/58c9b6081fd9bf599af0dfc01a6fdd335768efef.1560975645.git.reinette.chatre@intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32746032
    • T
      x86/microcode: Fix the microcode load on CPU hotplug for real · 1746dc52
      Thomas Gleixner 提交于
      commit 5423f5ce5ca410b3646f355279e4e937d452e622 upstream.
      
      A recent change moved the microcode loader hotplug callback into the early
      startup phase which is running with interrupts disabled. It missed that
      the callbacks invoke sysfs functions which might sleep causing nice 'might
      sleep' splats with proper debugging enabled.
      
      Split the callbacks and only load the microcode in the early startup phase
      and move the sysfs handling back into the later threaded and preemptible
      bringup phase where it was before.
      
      Fixes: 78f4e932f776 ("x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: stable@vger.kernel.org
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1906182228350.1766@nanos.tec.linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1746dc52
    • A
      x86/speculation: Allow guests to use SSBD even if host does not · 690049ed
      Alejandro Jimenez 提交于
      commit c1f7fec1eb6a2c86d01bc22afce772c743451d88 upstream.
      
      The bits set in x86_spec_ctrl_mask are used to calculate the guest's value
      of SPEC_CTRL that is written to the MSR before VMENTRY, and control which
      mitigations the guest can enable.  In the case of SSBD, unless the host has
      enabled SSBD always on mode (by passing "spec_store_bypass_disable=on" in
      the kernel parameters), the SSBD bit is not set in the mask and the guest
      can not properly enable the SSBD always on mitigation mode.
      
      This has been confirmed by running the SSBD PoC on a guest using the SSBD
      always on mitigation mode (booted with kernel parameter
      "spec_store_bypass_disable=on"), and verifying that the guest is vulnerable
      unless the host is also using SSBD always on mode. In addition, the guest
      OS incorrectly reports the SSB vulnerability as mitigated.
      
      Always set the SSBD bit in x86_spec_ctrl_mask when the host CPU supports
      it, allowing the guest to use SSBD whether or not the host has chosen to
      enable the mitigation in any of its modes.
      
      Fixes: be6fcb54 ("x86/bugs: Rework spec_ctrl base and mask logic")
      Signed-off-by: NAlejandro Jimenez <alejandro.j.jimenez@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
      Reviewed-by: NMark Kanda <mark.kanda@oracle.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: bp@alien8.de
      Cc: rkrcmar@redhat.com
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1560187210-11054-1-git-send-email-alejandro.j.jimenez@oracle.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      690049ed
  5. 25 6月, 2019 1 次提交
  6. 22 6月, 2019 1 次提交
    • F
      x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor · a35e7822
      Frank van der Linden 提交于
      [ Upstream commit 2ac44ab608705948564791ce1d15d43ba81a1e38 ]
      
      For F17h AMD CPUs, the CPB capability ('Core Performance Boost') is forcibly set,
      because some versions of that chip incorrectly report that they do not have it.
      
      However, a hypervisor may filter out the CPB capability, for good
      reasons. For example, KVM currently does not emulate setting the CPB
      bit in MSR_K7_HWCR, and unchecked MSR access errors will be thrown
      when trying to set it as a guest:
      
      	unchecked MSR access error: WRMSR to 0xc0010015 (tried to write 0x0000000001000011) at rIP: 0xffffffff890638f4 (native_write_msr+0x4/0x20)
      
      	Call Trace:
      	boost_set_msr+0x50/0x80 [acpi_cpufreq]
      	cpuhp_invoke_callback+0x86/0x560
      	sort_range+0x20/0x20
      	cpuhp_thread_fun+0xb0/0x110
      	smpboot_thread_fn+0xef/0x160
      	kthread+0x113/0x130
      	kthread_create_worker_on_cpu+0x70/0x70
      	ret_from_fork+0x35/0x40
      
      To avoid this issue, don't forcibly set the CPB capability for a CPU
      when running under a hypervisor.
      Signed-off-by: NFrank van der Linden <fllinden@amazon.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bp@alien8.de
      Cc: jiaxun.yang@flygoat.com
      Fixes: 0237199186e7 ("x86/CPU/AMD: Set the CPB bit unconditionally on F17h")
      Link: http://lkml.kernel.org/r/20190522221745.GA15789@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com
      [ Minor edits to the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a35e7822
  7. 19 6月, 2019 2 次提交
    • P
      x86/resctrl: Prevent NULL pointer dereference when local MBM is disabled · e93ce57f
      Prarit Bhargava 提交于
      commit c7563e62a6d720aa3b068e26ddffab5f0df29263 upstream.
      
      Booting with kernel parameter "rdt=cmt,mbmtotal,memlocal,l3cat,mba" and
      executing "mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl" results in
      a NULL pointer dereference on systems which do not have local MBM support
      enabled..
      
      BUG: kernel NULL pointer dereference, address: 0000000000000020
      PGD 0 P4D 0
      Oops: 0000 [#1] SMP PTI
      CPU: 0 PID: 722 Comm: kworker/0:3 Not tainted 5.2.0-0.rc3.git0.1.el7_UNSUPPORTED.x86_64 #2
      Workqueue: events mbm_handle_overflow
      RIP: 0010:mbm_handle_overflow+0x150/0x2b0
      
      Only enter the bandwith update loop if the system has local MBM enabled.
      
      Fixes: de73f38f ("x86/intel_rdt/mba_sc: Feedback loop to dynamically update mem bandwidth")
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Reinette Chatre <reinette.chatre@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190610171544.13474-1-prarit@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e93ce57f
    • B
      x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback · ecec31ce
      Borislav Petkov 提交于
      commit 78f4e932f7760d965fb1569025d1576ab77557c5 upstream.
      
      Adric Blake reported the following warning during suspend-resume:
      
        Enabling non-boot CPUs ...
        x86: Booting SMP configuration:
        smpboot: Booting Node 0 Processor 1 APIC 0x2
        unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0000000000000000) \
         at rIP: 0xffffffff8d267924 (native_write_msr+0x4/0x20)
        Call Trace:
         intel_set_tfa
         intel_pmu_cpu_starting
         ? x86_pmu_dead_cpu
         x86_pmu_starting_cpu
         cpuhp_invoke_callback
         ? _raw_spin_lock_irqsave
         notify_cpu_starting
         start_secondary
         secondary_startup_64
        microcode: sig=0x806ea, pf=0x80, revision=0x96
        microcode: updated to revision 0xb4, date = 2019-04-01
        CPU1 is up
      
      The MSR in question is MSR_TFA_RTM_FORCE_ABORT and that MSR is emulated
      by microcode. The log above shows that the microcode loader callback
      happens after the PMU restoration, leading to the conjecture that
      because the microcode hasn't been updated yet, that MSR is not present
      yet, leading to the #GP.
      
      Add a microcode loader-specific hotplug vector which comes before
      the PERF vectors and thus executes earlier and makes sure the MSR is
      present.
      
      Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort")
      Reported-by: NAdric Blake <promarbler14@gmail.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: x86@kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=203637Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ecec31ce
  8. 09 6月, 2019 4 次提交
  9. 04 6月, 2019 1 次提交
  10. 31 5月, 2019 7 次提交
    • Y
      x86/mce: Handle varying MCA bank counts · 75a96196
      Yazen Ghannam 提交于
      [ Upstream commit 006c077041dc73b9490fffc4c6af5befe0687110 ]
      
      Linux reads MCG_CAP[Count] to find the number of MCA banks visible to a
      CPU. Currently, this number is the same for all CPUs and a warning is
      shown if there is a difference. The number of banks is overwritten with
      the MCG_CAP[Count] value of each following CPU that boots.
      
      According to the Intel SDM and AMD APM, the MCG_CAP[Count] value gives
      the number of banks that are available to a "processor implementation".
      The AMD BKDGs/PPRs further clarify that this value is per core. This
      value has historically been the same for every core in the system, but
      that is not an architectural requirement.
      
      Future AMD systems may have different MCG_CAP[Count] values per core,
      so the assumption that all CPUs will have the same MCG_CAP[Count] value
      will no longer be valid.
      
      Also, the first CPU to boot will allocate the struct mce_banks[] array
      using the number of banks based on its MCG_CAP[Count] value. The machine
      check handler and other functions use the global number of banks to
      iterate and index into the mce_banks[] array. So it's possible to use an
      out-of-bounds index on an asymmetric system where a following CPU sees a
      MCG_CAP[Count] value greater than its predecessors.
      
      Thus, allocate the mce_banks[] array to the maximum number of banks.
      This will avoid the potential out-of-bounds index since the value of
      mca_cfg.banks is capped to MAX_NR_BANKS.
      
      Set the value of mca_cfg.banks equal to the max of the previous value
      and the value for the current CPU. This way mca_cfg.banks will always
      represent the max number of banks detected on any CPU in the system.
      
      This will ensure that all CPUs will access all the banks that are
      visible to them. A CPU that can access fewer than the max number of
      banks will find the registers of the extra banks to be read-as-zero.
      
      Furthermore, print the resulting number of MCA banks in use. Do this in
      mcheck_late_init() so that the final value is printed after all CPUs
      have been initialized.
      
      Finally, get bank count from target CPU when doing injection with mce-inject
      module.
      
       [ bp: Remove out-of-bounds example, passify and cleanup commit message. ]
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20180727214009.78289-1-Yazen.Ghannam@amd.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      75a96196
    • T
      x86/mce: Fix machine_check_poll() tests for error types · 3d036cba
      Tony Luck 提交于
      [ Upstream commit f19501aa07f18268ab14f458b51c1c6b7f72a134 ]
      
      There has been a lurking "TBD" in the machine check poll routine ever
      since it was first split out from the machine check handler. The
      potential issue is that the poll routine may have just begun a read from
      the STATUS register in a machine check bank when the hardware logs an
      error in that bank and signals a machine check.
      
      That race used to be pretty small back when machine checks were
      broadcast, but the addition of local machine check means that the poll
      code could continue running and clear the error from the bank before the
      local machine check handler on another CPU gets around to reading it.
      
      Fix the code to be sure to only process errors that need to be processed
      in the poll code, leaving other logged errors alone for the machine
      check handler to find and process.
      
       [ bp: Massage a bit and flip the "== 0" check to the usual !(..) test. ]
      
      Fixes: b79109c3 ("x86, mce: separate correct machine check poller and fatal exception handler")
      Fixes: ed7290d0 ("x86, mce: implement new status bits")
      Reported-by: NAshok Raj <ashok.raj@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
      Link: https://lkml.kernel.org/r/20190312170938.GA23035@agluck-deskSigned-off-by: NSasha Levin <sashal@kernel.org>
      3d036cba
    • P
      x86/uaccess, signal: Fix AC=1 bloat · 4614b0bb
      Peter Zijlstra 提交于
      [ Upstream commit 88e4718275c1bddca6f61f300688b4553dc8584b ]
      
      Occasionally GCC is less agressive with inlining and the following is
      observed:
      
        arch/x86/kernel/signal.o: warning: objtool: restore_sigcontext()+0x3cc: call to force_valid_ss.isra.5() with UACCESS enabled
        arch/x86/kernel/signal.o: warning: objtool: do_signal()+0x384: call to frame_uc_flags.isra.0() with UACCESS enabled
      
      Cure this by moving this code out of the AC=1 region, since it really
      isn't needed for the user access.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4614b0bb
    • B
      x86/microcode: Fix the ancient deprecated microcode loading method · a07de9b9
      Borislav Petkov 提交于
      [ Upstream commit 24613a04ad1c0588c10f4b5403ca60a73d164051 ]
      
      Commit
      
        2613f36e ("x86/microcode: Attempt late loading only when new microcode is present")
      
      added the new define UCODE_NEW to denote that an update should happen
      only when newer microcode (than installed on the system) has been found.
      
      But it missed adjusting that for the old /dev/cpu/microcode loading
      interface. Fix it.
      
      Fixes: 2613f36e ("x86/microcode: Attempt late loading only when new microcode is present")
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Jann Horn <jannh@google.com>
      Link: https://lkml.kernel.org/r/20190405133010.24249-3-bp@alien8.deSigned-off-by: NSasha Levin <sashal@kernel.org>
      a07de9b9
    • T
      x86/irq/64: Limit IST stack overflow check to #DB stack · f843f848
      Thomas Gleixner 提交于
      [ Upstream commit 7dbcf2b0b770eeb803a416ee8dcbef78e6389d40 ]
      
      Commit
      
        37fe6a42 ("x86: Check stack overflow in detail")
      
      added a broad check for the full exception stack area, i.e. it considers
      the full exception stack area as valid.
      
      That's wrong in two aspects:
      
       1) It does not check the individual areas one by one
      
       2) #DF, NMI and #MCE are not enabling interrupts which means that a
          regular device interrupt cannot happen in their context. In fact if a
          device interrupt hits one of those IST stacks that's a bug because some
          code path enabled interrupts while handling the exception.
      
      Limit the check to the #DB stack and consider all other IST stacks as
      'overflow' or invalid.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160143.682135110@linutronix.deSigned-off-by: NSasha Levin <sashal@kernel.org>
      f843f848
    • K
      x86/build: Move _etext to actual end of .text · 0fcb3cd5
      Kees Cook 提交于
      [ Upstream commit 392bef709659abea614abfe53cf228e7a59876a4 ]
      
      When building x86 with Clang LTO and CFI, CFI jump regions are
      automatically added to the end of the .text section late in linking. As a
      result, the _etext position was being labelled before the appended jump
      regions, causing confusion about where the boundaries of the executable
      region actually are in the running kernel, and broke at least the fault
      injection code. This moves the _etext mark to outside (and immediately
      after) the .text area, as it already the case on other architectures
      (e.g. arm64, arm).
      Reported-and-tested-by: NSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20190423183827.GA4012@beastSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0fcb3cd5
    • N
      x86/modules: Avoid breaking W^X while loading modules · 8715ce03
      Nadav Amit 提交于
      [ Upstream commit f2c65fb3221adc6b73b0549fc7ba892022db9797 ]
      
      When modules and BPF filters are loaded, there is a time window in
      which some memory is both writable and executable. An attacker that has
      already found another vulnerability (e.g., a dangling pointer) might be
      able to exploit this behavior to overwrite kernel code. Prevent having
      writable executable PTEs in this stage.
      
      In addition, avoiding having W+X mappings can also slightly simplify the
      patching of modules code on initialization (e.g., by alternatives and
      static-key), as would be done in the next patch. This was actually the
      main motivation for this patch.
      
      To avoid having W+X mappings, set them initially as RW (NX) and after
      they are set as RO set them as X as well. Setting them as executable is
      done as a separate step to avoid one core in which the old PTE is cached
      (hence writable), and another which sees the updated PTE (executable),
      which would break the W^X protection.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Suggested-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Signed-off-by: NRick Edgecombe <rick.p.edgecombe@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <akpm@linux-foundation.org>
      Cc: <ard.biesheuvel@linaro.org>
      Cc: <deneen.t.dock@intel.com>
      Cc: <kernel-hardening@lists.openwall.com>
      Cc: <kristen@linux.intel.com>
      Cc: <linux_dti@icloud.com>
      Cc: <will.deacon@arm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Link: https://lkml.kernel.org/r/20190426001143.4983-12-namit@vmware.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8715ce03
  11. 26 5月, 2019 1 次提交
  12. 22 5月, 2019 2 次提交
  13. 17 5月, 2019 3 次提交
  14. 15 5月, 2019 8 次提交