1. 15 1月, 2018 3 次提交
    • D
      x86/retpoline: Fill RSB on context switch for affected CPUs · c995efd5
      David Woodhouse 提交于
      On context switch from a shallow call stack to a deeper one, as the CPU
      does 'ret' up the deeper side it may encounter RSB entries (predictions for
      where the 'ret' goes to) which were populated in userspace.
      
      This is problematic if neither SMEP nor KPTI (the latter of which marks
      userspace pages as NX for the kernel) are active, as malicious code in
      userspace may then be executed speculatively.
      
      Overwrite the CPU's return prediction stack with calls which are predicted
      to return to an infinite loop, to "capture" speculation if this
      happens. This is required both for retpoline, and also in conjunction with
      IBRS for !SMEP && !KPTI.
      
      On Skylake+ the problem is slightly different, and an *underflow* of the
      RSB may cause errant branch predictions to occur. So there it's not so much
      overwrite, as *filling* the RSB to attempt to prevent it getting
      empty. This is only a partial solution for Skylake+ since there are many
      other conditions which may result in the RSB becoming empty. The full
      solution on Skylake+ is to use IBRS, which will prevent the problem even
      when the RSB becomes empty. With IBRS, the RSB-stuffing will not be
      required on context switch.
      
      [ tglx: Added missing vendor check and slighty massaged comments and
        	changelog ]
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: thomas.lendacky@amd.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: https://lkml.kernel.org/r/1515779365-9032-1-git-send-email-dwmw@amazon.co.uk
      c995efd5
    • A
      x86/kasan: Panic if there is not enough memory to boot · 0d39e266
      Andrey Ryabinin 提交于
      Currently KASAN doesn't panic in case it don't have enough memory
      to boot. Instead, it crashes in some random place:
      
       kernel BUG at arch/x86/mm/physaddr.c:27!
      
       RIP: 0010:__phys_addr+0x268/0x276
       Call Trace:
        kasan_populate_shadow+0x3f2/0x497
        kasan_init+0x12e/0x2b2
        setup_arch+0x2825/0x2a2c
        start_kernel+0xc8/0x15f4
        x86_64_start_reservations+0x2a/0x2c
        x86_64_start_kernel+0x72/0x75
        secondary_startup_64+0xa5/0xb0
      
      Use memblock_virt_alloc_try_nid() for allocations without failure
      fallback. It will panic with an out of memory message.
      Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: kasan-dev@googlegroups.com
      Cc: Alexander Potapenko <glider@google.com>
      Cc: lkp@01.org
      Link: https://lkml.kernel.org/r/20180110153602.18919-1-aryabinin@virtuozzo.com
      0d39e266
    • T
      x86/retpoline: Remove compile time warning · b8b9ce4b
      Thomas Gleixner 提交于
      Remove the compile time warning when CONFIG_RETPOLINE=y and the compiler
      does not have retpoline support. Linus rationale for this is:
      
        It's wrong because it will just make people turn off RETPOLINE, and the
        asm updates - and return stack clearing - that are independent of the
        compiler are likely the most important parts because they are likely the
        ones easiest to target.
      
        And it's annoying because most people won't be able to do anything about
        it. The number of people building their own compiler? Very small. So if
        their distro hasn't got a compiler yet (and pretty much nobody does), the
        warning is just annoying crap.
      
        It is already properly reported as part of the sysfs interface. The
        compile-time warning only encourages bad things.
      
      Fixes: 76b04384 ("x86/retpoline: Add initial retpoline support")
      Requested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: thomas.lendacky@amd.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Link: https://lkml.kernel.org/r/CA+55aFzWgquv4i6Mab6bASqYXg3ErV3XDFEYf=GEcCDQg5uAtw@mail.gmail.com
      b8b9ce4b
  2. 14 1月, 2018 3 次提交
    • P
      x86,perf: Disable intel_bts when PTI · 99a9dc98
      Peter Zijlstra 提交于
      The intel_bts driver does not use the 'normal' BTS buffer which is exposed
      through the cpu_entry_area but instead uses the memory allocated for the
      perf AUX buffer.
      
      This obviously comes apart when using PTI because then the kernel mapping;
      which includes that AUX buffer memory; disappears. Fixing this requires to
      expose a mapping which is visible in all context and that's not trivial.
      
      As a quick fix disable this driver when PTI is enabled to prevent
      malfunction.
      
      Fixes: 385ce0ea ("x86/mm/pti: Add Kconfig")
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Reported-by: NRobert Święcki <robert@swiecki.net>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: greg@kroah.com
      Cc: hughd@google.com
      Cc: luto@amacapital.net
      Cc: Vince Weaver <vince@deater.net>
      Cc: torvalds@linux-foundation.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180114102713.GB6166@worktop.programming.kicks-ass.net
      99a9dc98
    • W
      security/Kconfig: Correct the Documentation reference for PTI · a237f762
      W. Trevor King 提交于
      When the config option for PTI was added a reference to documentation was
      added as well. But the documentation did not exist at that point. The final
      documentation has a different file name.
      
      Fix it up to point to the proper file.
      
      Fixes: 385ce0ea ("x86/mm/pti: Add Kconfig")
      Signed-off-by: NW. Trevor King <wking@tremily.us>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-security-module@vger.kernel.org
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/3009cc8ccbddcd897ec1e0cb6dda524929de0d14.1515799398.git.wking@tremily.us
      a237f762
    • T
      x86/pti: Fix !PCID and sanitize defines · f10ee3dc
      Thomas Gleixner 提交于
      The switch to the user space page tables in the low level ASM code sets
      unconditionally bit 12 and bit 11 of CR3. Bit 12 is switching the base
      address of the page directory to the user part, bit 11 is switching the
      PCID to the PCID associated with the user page tables.
      
      This fails on a machine which lacks PCID support because bit 11 is set in
      CR3. Bit 11 is reserved when PCID is inactive.
      
      While the Intel SDM claims that the reserved bits are ignored when PCID is
      disabled, the AMD APM states that they should be cleared.
      
      This went unnoticed as the AMD APM was not checked when the code was
      developed and reviewed and test systems with Intel CPUs never failed to
      boot. The report is against a Centos 6 host where the guest fails to boot,
      so it's not yet clear whether this is a virt issue or can happen on real
      hardware too, but thats irrelevant as the AMD APM clearly ask for clearing
      the reserved bits.
      
      Make sure that on non PCID machines bit 11 is not set by the page table
      switching code.
      
      Andy suggested to rename the related bits and masks so they are clearly
      describing what they should be used for, which is done as well for clarity.
      
      That split could have been done with alternatives but the macro hell is
      horrible and ugly. This can be done on top if someone cares to remove the
      extra orq. For now it's a straight forward fix.
      
      Fixes: 6fd166aa ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
      Reported-by: NLaura Abbott <labbott@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable <stable@vger.kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801140009150.2371@nanos
      f10ee3dc
  3. 13 1月, 2018 1 次提交
    • A
      selftests/x86: Add test_vsyscall · 352909b4
      Andy Lutomirski 提交于
      This tests that the vsyscall entries do what they're expected to do.
      It also confirms that attempts to read the vsyscall page behave as
      expected.
      
      If changes are made to the vsyscall code or its memory map handling,
      running this test in all three of vsyscall=none, vsyscall=emulate,
      and vsyscall=native are helpful.
      
      (Because it's easy, this also compares the vsyscall results to their
       vDSO equivalents.)
      
      Note to KAISER backporters: please test this under all three
      vsyscall modes.  Also, in the emulate and native modes, make sure
      that test_vsyscall_64 agrees with the command line or config
      option as to which mode you're in.  It's quite easy to mess up
      the kernel such that native mode accidentally emulates
      or vice versa.
      
      Greg, etc: please backport this to all your Meltdown-patched
      kernels.  It'll help make sure the patches didn't regress
      vsyscalls.
      CSigned-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/2b9c5a174c1d60fd7774461d518aa75598b1d8fd.1515719552.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      352909b4
  4. 12 1月, 2018 13 次提交
  5. 11 1月, 2018 1 次提交
  6. 09 1月, 2018 5 次提交
  7. 08 1月, 2018 2 次提交
  8. 07 1月, 2018 3 次提交
  9. 05 1月, 2018 7 次提交
    • T
      x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN · de791821
      Thomas Gleixner 提交于
      Use the name associated with the particular attack which needs page table
      isolation for mitigation.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
      Cc: Jiri Koshina <jikos@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Lutomirski  <luto@amacapital.net>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Greg KH <gregkh@linux-foundation.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801051525300.1724@nanos
      de791821
    • D
      x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm · b9e705ef
      David Woodhouse 提交于
      Where an ALTERNATIVE is used in the middle of an inline asm block, this
      would otherwise lead to the following instruction being appended directly
      to the trailing ".popsection", and a failed compile.
      
      Fixes: 9cebed42 ("x86, alternative: Use .pushsection/.popsection")
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: Rik van Riel <riel@redhat.com>
      Cc: ak@linux.intel.com
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180104143710.8961-8-dwmw@amazon.co.uk
      b9e705ef
    • T
      x86/tlb: Drop the _GPL from the cpu_tlbstate export · 1e547681
      Thomas Gleixner 提交于
      The recent changes for PTI touch cpu_tlbstate from various tlb_flush
      inlines. cpu_tlbstate is exported as GPL symbol, so this causes a
      regression when building out of tree drivers for certain graphics cards.
      
      Aside of that the export was wrong since it was introduced as it should
      have been EXPORT_PER_CPU_SYMBOL_GPL().
      
      Use the correct PER_CPU export and drop the _GPL to restore the previous
      state which allows users to utilize the cards they payed for.
      
      As always I'm really thrilled to make this kind of change to support the
      #friends (or however the hot hashtag of today is spelled) from that closet
      sauce graphics corp.
      
      Fixes: 1e02ce4c ("x86: Store a per-cpu shadow copy of CR4")
      Fixes: 6fd166aa ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
      Reported-by: NKees Cook <keescook@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: stable@vger.kernel.org
      1e547681
    • P
      x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers · 42f3bdc5
      Peter Zijlstra 提交于
      Thomas reported the following warning:
      
       BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
       caller is native_flush_tlb_single+0x57/0xc0
       native_flush_tlb_single+0x57/0xc0
       __set_pte_vaddr+0x2d/0x40
       set_pte_vaddr+0x2f/0x40
       cea_set_pte+0x30/0x40
       ds_update_cea.constprop.4+0x4d/0x70
       reserve_ds_buffers+0x159/0x410
       x86_reserve_hardware+0x150/0x160
       x86_pmu_event_init+0x3e/0x1f0
       perf_try_init_event+0x69/0x80
       perf_event_alloc+0x652/0x740
       SyS_perf_event_open+0x3f6/0xd60
       do_syscall_64+0x5c/0x190
      
      set_pte_vaddr is used to map the ds buffers into the cpu entry area, but
      there are two problems with that:
      
       1) The resulting flush is not supposed to be called in preemptible context
      
       2) The cpu entry area is supposed to be per CPU, but the debug store
          buffers are mapped for all CPUs so these mappings need to be flushed
          globally.
      
      Add the necessary preemption protection across the mapping code and flush
      TLBs globally.
      
      Fixes: c1961a46 ("x86/events/intel/ds: Map debug buffers in cpu_entry_area")
      Reported-by: NThomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NThomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180104170712.GB3040@hirez.programming.kicks-ass.net
      42f3bdc5
    • T
      x86/kaslr: Fix the vaddr_end mess · 1dddd251
      Thomas Gleixner 提交于
      vaddr_end for KASLR is only documented in the KASLR code itself and is
      adjusted depending on config options. So it's not surprising that a change
      of the memory layout causes KASLR to have the wrong vaddr_end. This can map
      arbitrary stuff into other areas causing hard to understand problems.
      
      Remove the whole ifdef magic and define the start of the cpu_entry_area to
      be the end of the KASLR vaddr range.
      
      Add documentation to that effect.
      
      Fixes: 92a0f81d ("x86/cpu_entry_area: Move it out of the fixmap")
      Reported-by: NBenjamin Gilbert <benjamin.gilbert@coreos.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NBenjamin Gilbert <benjamin.gilbert@coreos.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: stable <stable@vger.kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Garnier <thgarnie@google.com>,
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos
      1dddd251
    • T
      x86/mm: Map cpu_entry_area at the same place on 4/5 level · f2078904
      Thomas Gleixner 提交于
      There is no reason for 4 and 5 level pagetables to have a different
      layout. It just makes determining vaddr_end for KASLR harder than
      necessary.
      
      Fixes: 92a0f81d ("x86/cpu_entry_area: Move it out of the fixmap")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Gilbert <benjamin.gilbert@coreos.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: stable <stable@vger.kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Garnier <thgarnie@google.com>,
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos
      f2078904
    • A
      x86/mm: Set MODULES_END to 0xffffffffff000000 · f5a40711
      Andrey Ryabinin 提交于
      Since f06bdd40 ("x86/mm: Adapt MODULES_END based on fixmap section size")
      kasan_mem_to_shadow(MODULES_END) could be not aligned to a page boundary.
      
      So passing page unaligned address to kasan_populate_zero_shadow() have two
      possible effects:
      
      1) It may leave one page hole in supposed to be populated area. After commit
        21506525 ("x86/kasan/64: Teach KASAN about the cpu_entry_area") that
        hole happens to be in the shadow covering fixmap area and leads to crash:
      
       BUG: unable to handle kernel paging request at fffffbffffe8ee04
       RIP: 0010:check_memory_region+0x5c/0x190
      
       Call Trace:
        <NMI>
        memcpy+0x1f/0x50
        ghes_copy_tofrom_phys+0xab/0x180
        ghes_read_estatus+0xfb/0x280
        ghes_notify_nmi+0x2b2/0x410
        nmi_handle+0x115/0x2c0
        default_do_nmi+0x57/0x110
        do_nmi+0xf8/0x150
        end_repeat_nmi+0x1a/0x1e
      
      Note, the crash likely disappeared after commit 92a0f81d, which
      changed kasan_populate_zero_shadow() call the way it was before
      commit 21506525.
      
      2) Attempt to load module near MODULES_END will fail, because
         __vmalloc_node_range() called from kasan_module_alloc() will hit the
         WARN_ON(!pte_none(*pte)) in the vmap_pte_range() and bail out with error.
      
      To fix this we need to make kasan_mem_to_shadow(MODULES_END) page aligned
      which means that MODULES_END should be 8*PAGE_SIZE aligned.
      
      The whole point of commit f06bdd40 was to move MODULES_END down if
      NR_CPUS is big, so the cpu_entry_area takes a lot of space.
      But since 92a0f81d ("x86/cpu_entry_area: Move it out of the fixmap")
      the cpu_entry_area is no longer in fixmap, so we could just set
      MODULES_END to a fixed 8*PAGE_SIZE aligned address.
      
      Fixes: f06bdd40 ("x86/mm: Adapt MODULES_END based on fixmap section size")
      Reported-by: NJakub Kicinski <kubakici@wp.pl>
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Link: https://lkml.kernel.org/r/20171228160620.23818-1-aryabinin@virtuozzo.com
      f5a40711
  10. 04 1月, 2018 2 次提交