1. 05 12月, 2019 31 次提交
  2. 01 12月, 2019 9 次提交
    • M
      KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernel · 345712c9
      Michael Ellerman 提交于
      commit af2e8c68b9c5403f77096969c516f742f5bb29e0 upstream.
      
      On some systems that are vulnerable to Spectre v2, it is up to
      software to flush the link stack (return address stack), in order to
      protect against Spectre-RSB.
      
      When exiting from a guest we do some house keeping and then
      potentially exit to C code which is several stack frames deep in the
      host kernel. We will then execute a series of returns without
      preceeding calls, opening up the possiblity that the guest could have
      poisoned the link stack, and direct speculative execution of the host
      to a gadget of some sort.
      
      To prevent this we add a flush of the link stack on exit from a guest.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      [dja: straightforward backport to v4.19]
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      345712c9
    • M
      powerpc/book3s64: Fix link stack flush on context switch · 0a60d4bd
      Michael Ellerman 提交于
      commit 39e72bf96f5847ba87cc5bd7a3ce0fed813dc9ad upstream.
      
      In commit ee13cb24 ("powerpc/64s: Add support for software count
      cache flush"), I added support for software to flush the count
      cache (indirect branch cache) on context switch if firmware told us
      that was the required mitigation for Spectre v2.
      
      As part of that code we also added a software flush of the link
      stack (return address stack), which protects against Spectre-RSB
      between user processes.
      
      That is all correct for CPUs that activate that mitigation, which is
      currently Power9 Nimbus DD2.3.
      
      What I got wrong is that on older CPUs, where firmware has disabled
      the count cache, we also need to flush the link stack on context
      switch.
      
      To fix it we create a new feature bit which is not set by firmware,
      which tells us we need to flush the link stack. We set that when
      firmware tells us that either of the existing Spectre v2 mitigations
      are enabled.
      
      Then we adjust the patching code so that if we see that feature bit we
      enable the link stack flush. If we're also told to flush the count
      cache in software then we fall through and do that also.
      
      On the older CPUs we don't need to do do the software count cache
      flush, firmware has disabled it, so in that case we patch in an early
      return after the link stack flush.
      
      The naming of some of the functions is awkward after this patch,
      because they're called "count cache" but they also do link stack. But
      we'll fix that up in a later commit to ease backporting.
      
      This is the fix for CVE-2019-18660.
      Reported-by: NAnthony Steinhauser <asteinhauser@google.com>
      Fixes: ee13cb24 ("powerpc/64s: Add support for software count cache flush")
      Cc: stable@vger.kernel.org # v4.4+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a60d4bd
    • C
      powerpc/64s: support nospectre_v2 cmdline option · 19d98b4d
      Christopher M. Riedl 提交于
      commit d8f0e0b073e1ec52a05f0c2a56318b47387d2f10 upstream.
      
      Add support for disabling the kernel implemented spectre v2 mitigation
      (count cache flush on context switch) via the nospectre_v2 and
      mitigations=off cmdline options.
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NChristopher M. Riedl <cmr@informatik.wtf>
      Reviewed-by: NAndrew Donnellan <ajd@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190524024647.381-1-cmr@informatik.wtfSigned-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19d98b4d
    • W
      x86/speculation: Fix redundant MDS mitigation message · ed7a3dde
      Waiman Long 提交于
      commit cd5a2aa89e847bdda7b62029d94e95488d73f6b2 upstream.
      
      Since MDS and TAA mitigations are inter-related for processors that are
      affected by both vulnerabilities, the followiing confusing messages can
      be printed in the kernel log:
      
        MDS: Vulnerable
        MDS: Mitigation: Clear CPU buffers
      
      To avoid the first incorrect message, defer the printing of MDS
      mitigation after the TAA mitigation selection has been done. However,
      that has the side effect of printing TAA mitigation first before MDS
      mitigation.
      
       [ bp: Check box is affected/mitigations are disabled first before
         printing and massage. ]
      Suggested-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Mark Gross <mgross@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20191115161445.30809-3-longman@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed7a3dde
    • W
      x86/speculation: Fix incorrect MDS/TAA mitigation status · 0af5ae26
      Waiman Long 提交于
      commit 64870ed1b12e235cfca3f6c6da75b542c973ff78 upstream.
      
      For MDS vulnerable processors with TSX support, enabling either MDS or
      TAA mitigations will enable the use of VERW to flush internal processor
      buffers at the right code path. IOW, they are either both mitigated
      or both not. However, if the command line options are inconsistent,
      the vulnerabilites sysfs files may not report the mitigation status
      correctly.
      
      For example, with only the "mds=off" option:
      
        vulnerabilities/mds:Vulnerable; SMT vulnerable
        vulnerabilities/tsx_async_abort:Mitigation: Clear CPU buffers; SMT vulnerable
      
      The mds vulnerabilities file has wrong status in this case. Similarly,
      the taa vulnerability file will be wrong with mds mitigation on, but
      taa off.
      
      Change taa_select_mitigation() to sync up the two mitigation status
      and have them turned off if both "mds=off" and "tsx_async_abort=off"
      are present.
      
      Update documentation to emphasize the fact that both "mds=off" and
      "tsx_async_abort=off" have to be specified together for processors that
      are affected by both TAA and MDS to be effective.
      
       [ bp: Massage and add kernel-parameters.txt change too. ]
      
      Fixes: 1b42f017415b ("x86/speculation/taa: Add mitigation for TSX Async Abort")
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: linux-doc@vger.kernel.org
      Cc: Mark Gross <mgross@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20191115161445.30809-2-longman@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0af5ae26
    • A
      x86/insn: Fix awk regexp warnings · ed731209
      Alexander Kapshuk 提交于
      commit 700c1018b86d0d4b3f1f2d459708c0cdf42b521d upstream.
      
      gawk 5.0.1 generates the following regexp warnings:
      
        GEN      /home/sasha/torvalds/tools/objtool/arch/x86/lib/inat-tables.c
        awk: ../arch/x86/tools/gen-insn-attr-x86.awk:260: warning: regexp escape sequence `\:' is not a known regexp operator
        awk: ../arch/x86/tools/gen-insn-attr-x86.awk:350: (FILENAME=../arch/x86/lib/x86-opcode-map.txt FNR=41) warning: regexp escape sequence `\&' is  not a known regexp operator
      
      Ealier versions of gawk are not known to generate these warnings. The
      gawk manual referenced below does not list characters ':' and '&' as
      needing escaping, so 'unescape' them. See
      
        https://www.gnu.org/software/gawk/manual/html_node/Escape-Sequences.html
      
      for more info.
      
      Running diff on the output generated by the script before and after
      applying the patch reported no differences.
      
       [ bp: Massage commit message. ]
      
      [ Caught the respective tools header discrepancy. ]
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NAlexander Kapshuk <alexander.kapshuk@gmail.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190924044659.3785-1-alexander.kapshuk@gmail.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed731209
    • A
      ARC: perf: Accommodate big-endian CPU · 99b933bb
      Alexey Brodkin 提交于
      commit 5effc09c4907901f0e71e68e5f2e14211d9a203f upstream.
      
      8-letter strings representing ARC perf events are stores in two
      32-bit registers as ASCII characters like that: "IJMP", "IALL", "IJMPTAK" etc.
      
      And the same order of bytes in the word is used regardless CPU endianness.
      
      Which means in case of big-endian CPU core we need to swap bytes to get
      the same order as if it was on little-endian CPU.
      
      Otherwise we're seeing the following error message on boot:
      ------------------------->8----------------------
      ARC perf        : 8 counters (32 bits), 40 conditions, [overflow IRQ support]
      sysfs: cannot create duplicate filename '/devices/arc_pct/events/pmji'
      CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.18 #3
      Stack Trace:
        arc_unwind_core+0xd4/0xfc
        dump_stack+0x64/0x80
        sysfs_warn_dup+0x46/0x58
        sysfs_add_file_mode_ns+0xb2/0x168
        create_files+0x70/0x2a0
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 1 at kernel/events/core.c:12144 perf_event_sysfs_init+0x70/0xa0
      Failed to register pmu: arc_pct, reason -17
      Modules linked in:
      CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.18 #3
      Stack Trace:
        arc_unwind_core+0xd4/0xfc
        dump_stack+0x64/0x80
        __warn+0x9c/0xd4
        warn_slowpath_fmt+0x22/0x2c
        perf_event_sysfs_init+0x70/0xa0
      ---[ end trace a75fb9a9837bd1ec ]---
      ------------------------->8----------------------
      
      What happens here we're trying to register more than one raw perf event
      with the same name "PMJI". Why? Because ARC perf events are 4 to 8 letters
      and encoded into two 32-bit words. In this particular case we deal with 2
      events:
       * "IJMP____" which counts all jump & branch instructions
       * "IJMPC___" which counts only conditional jumps & branches
      
      Those strings are split in two 32-bit words this way "IJMP" + "____" &
      "IJMP" + "C___" correspondingly. Now if we read them swapped due to CPU core
      being big-endian then we read "PMJI" + "____" & "PMJI" + "___C".
      
      And since we interpret read array of ASCII letters as a null-terminated string
      on big-endian CPU we end up with 2 events of the same name "PMJI".
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      
      99b933bb
    • C
      ARM: 8904/1: skip nomap memblocks while finding the lowmem/highmem boundary · e02f1448
      Chester Lin 提交于
      commit 1d31999cf04c21709f72ceb17e65b54a401330da upstream.
      
      adjust_lowmem_bounds() checks every memblocks in order to find the boundary
      between lowmem and highmem. However some memblocks could be marked as NOMAP
      so they are not used by kernel, which should be skipped while calculating
      the boundary.
      Signed-off-by: NChester Lin <clin@suse.com>
      Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e02f1448
    • S
      KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved · 4ae7392a
      Sean Christopherson 提交于
      commit a78986aae9b2988f8493f9f65a587ee433e83bc3 upstream.
      
      Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and
      instead manually handle ZONE_DEVICE on a case-by-case basis.  For things
      like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal
      pages, e.g. put pages grabbed via gup().  But for flows such as setting
      A/D bits or shifting refcounts for transparent huge pages, KVM needs to
      to avoid processing ZONE_DEVICE pages as the flows in question lack the
      underlying machinery for proper handling of ZONE_DEVICE pages.
      
      This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup()
      when running a KVM guest backed with /dev/dax memory, as KVM straight up
      doesn't put any references to ZONE_DEVICE pages acquired by gup().
      
      Note, Dan Williams proposed an alternative solution of doing put_page()
      on ZONE_DEVICE pages immediately after gup() in order to simplify the
      auditing needed to ensure is_zone_device_page() is called if and only if
      the backing device is pinned (via gup()).  But that approach would break
      kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til
      unmap() when accessing guest memory, unlike KVM's secondary MMU, which
      coordinates with mmu_notifier invalidations to avoid creating stale
      page references, i.e. doesn't rely on pages being pinned.
      
      [*] http://lkml.kernel.org/r/20190919115547.GA17963@angband.plReported-by: NAdam Borowski <kilobyte@angband.pl>
      Analyzed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Cc: stable@vger.kernel.org
      Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      [sean: backport to 4.x; resolve conflict in mmu.c]
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ae7392a