1. 18 7月, 2017 5 次提交
    • T
      x86/mm: Simplify p[g4um]d_page() macros · fd7e3159
      Tom Lendacky 提交于
      Create a pgd_pfn() macro similar to the p[4um]d_pfn() macros and then
      use the p[g4um]d_pfn() macros in the p[g4um]d_page() macros instead of
      duplicating the code.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Toshimitsu Kani <toshi.kani@hpe.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kvm@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/e61eb533a6d0aac941db2723d8aa63ef6b882dee.1500319216.git.thomas.lendacky@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fd7e3159
    • T
      x86/mm: Add support to enable SME in early boot processing · 5868f365
      Tom Lendacky 提交于
      Add support to the early boot code to use Secure Memory Encryption (SME).
      Since the kernel has been loaded into memory in a decrypted state, encrypt
      the kernel in place and update the early pagetables with the memory
      encryption mask so that new pagetable entries will use memory encryption.
      
      The routines to set the encryption mask and perform the encryption are
      stub routines for now with functionality to be added in a later patch.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Toshimitsu Kani <toshi.kani@hpe.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kvm@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/e52ad781f085224bf835b3caff9aa3aee6febccb.1500319216.git.thomas.lendacky@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5868f365
    • T
      x86/mm: Add Secure Memory Encryption (SME) support · 7744ccdb
      Tom Lendacky 提交于
      Add support for Secure Memory Encryption (SME). This initial support
      provides a Kconfig entry to build the SME support into the kernel and
      defines the memory encryption mask that will be used in subsequent
      patches to mark pages as encrypted.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Toshimitsu Kani <toshi.kani@hpe.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kvm@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/a6c34d16caaed3bc3e2d6f0987554275bd291554.1500319216.git.thomas.lendacky@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7744ccdb
    • T
      x86/cpu/AMD: Add the Secure Memory Encryption CPU feature · 872cbefd
      Tom Lendacky 提交于
      Update the CPU features to include identifying and reporting on the
      Secure Memory Encryption (SME) feature.  SME is identified by CPUID
      0x8000001f, but requires BIOS support to enable it (set bit 23 of
      MSR_K8_SYSCFG).  Only show the SME feature as available if reported by
      CPUID, enabled by BIOS and not configured as CONFIG_X86_32=y.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Toshimitsu Kani <toshi.kani@hpe.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kvm@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/85c17ff450721abccddc95e611ae8df3f4d9718b.1500319216.git.thomas.lendacky@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      872cbefd
    • T
      x86, mpparse, x86/acpi, x86/PCI, x86/dmi, SFI: Use memremap() for RAM mappings · f7750a79
      Tom Lendacky 提交于
      The ioremap() function is intended for mapping MMIO. For RAM, the
      memremap() function should be used. Convert calls from ioremap() to
      memremap() when re-mapping RAM.
      
      This will be used later by SME to control how the encryption mask is
      applied to memory mappings, with certain memory locations being mapped
      decrypted vs encrypted.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Toshimitsu Kani <toshi.kani@hpe.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kvm@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/b13fccb9abbd547a7eef7b1fdfc223431b211c88.1500319216.git.thomas.lendacky@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f7750a79
  2. 14 7月, 2017 5 次提交
  3. 13 7月, 2017 9 次提交
  4. 11 7月, 2017 1 次提交
    • K
      binfmt_elf: use ELF_ET_DYN_BASE only for PIE · eab09532
      Kees Cook 提交于
      The ELF_ET_DYN_BASE position was originally intended to keep loaders
      away from ET_EXEC binaries.  (For example, running "/lib/ld-linux.so.2
      /bin/cat" might cause the subsequent load of /bin/cat into where the
      loader had been loaded.)
      
      With the advent of PIE (ET_DYN binaries with an INTERP Program Header),
      ELF_ET_DYN_BASE continued to be used since the kernel was only looking
      at ET_DYN.  However, since ELF_ET_DYN_BASE is traditionally set at the
      top 1/3rd of the TASK_SIZE, a substantial portion of the address space
      is unused.
      
      For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs are
      loaded above the mmap region.  This means they can be made to collide
      (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with
      pathological stack regions.
      
      Lowering ELF_ET_DYN_BASE solves both by moving programs below the mmap
      region in all cases, and will now additionally avoid programs falling
      back to the mmap region by enforcing MAP_FIXED for program loads (i.e.
      if it would have collided with the stack, now it will fail to load
      instead of falling back to the mmap region).
      
      To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP)
      are loaded into the mmap region, leaving space available for either an
      ET_EXEC binary with a fixed location or PIE being loaded into mmap by
      the loader.  Only PIE programs are loaded offset from ELF_ET_DYN_BASE,
      which means architectures can now safely lower their values without risk
      of loaders colliding with their subsequently loaded programs.
      
      For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to use
      the entire 32-bit address space for 32-bit pointers.
      
      Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and
      suggestions on how to implement this solution.
      
      Fixes: d1fd836d ("mm: split ET_DYN ASLR from mmap ASLR")
      Link: http://lkml.kernel.org/r/20170621173201.GA114489@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eab09532
  5. 07 7月, 2017 1 次提交
  6. 05 7月, 2017 9 次提交
    • A
      x86/mm: Enable CR4.PCIDE on supported systems · 660da7c9
      Andy Lutomirski 提交于
      We can use PCID if the CPU has PCID and PGE and we're not on Xen.
      
      By itself, this has no effect. A followup patch will start using PCID.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/6327ecd907b32f79d5aa0d466f04503bbec5df88.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      660da7c9
    • A
      x86/mm: Disable PCID on 32-bit kernels · cba4671a
      Andy Lutomirski 提交于
      32-bit kernels on new hardware will see PCID in CPUID, but PCID can
      only be used in 64-bit mode.  Rather than making all PCID code
      conditional, just disable the feature on 32-bit builds.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/2e391769192a4d31b808410c383c6bf0734bc6ea.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cba4671a
    • A
      x86/mm: Stop calling leave_mm() in idle code · 43858b4f
      Andy Lutomirski 提交于
      Now that lazy TLB suppresses all flush IPIs (as opposed to all but
      the first), there's no need to leave_mm() when going idle.
      
      This means we can get rid of the rcuidle hack in
      switch_mm_irqs_off() and we can unexport leave_mm().
      
      This also removes acpi_unlazy_tlb() from the x86 and ia64 headers,
      since it has no callers any more.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/03c699cfd6021e467be650d6b73deaccfe4b4bd7.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      43858b4f
    • A
      x86/mm: Rework lazy TLB mode and TLB freshness tracking · 94b1b03b
      Andy Lutomirski 提交于
      x86's lazy TLB mode used to be fairly weak -- it would switch to
      init_mm the first time it tried to flush a lazy TLB.  This meant an
      unnecessary CR3 write and, if the flush was remote, an unnecessary
      IPI.
      
      Rewrite it entirely.  When we enter lazy mode, we simply remove the
      CPU from mm_cpumask.  This means that we need a way to figure out
      whether we've missed a flush when we switch back out of lazy mode.
      I use the tlb_gen machinery to track whether a context is up to
      date.
      
      Note to reviewers: this patch, my itself, looks a bit odd.  I'm
      using an array of length 1 containing (ctx_id, tlb_gen) rather than
      just storing tlb_gen, and making it at array isn't necessary yet.
      I'm doing this because the next few patches add PCID support, and,
      with PCID, we need ctx_id, and the array will end up with a length
      greater than 1.  Making it an array now means that there will be
      less churn and therefore less stress on your eyeballs.
      
      NB: This is dubious but, AFAICT, still correct on Xen and UV.
      xen_exit_mmap() uses mm_cpumask() for nefarious purposes and this
      patch changes the way that mm_cpumask() works.  This should be okay,
      since Xen *also* iterates all online CPUs to find all the CPUs it
      needs to twiddle.
      
      The UV tlbflush code is rather dated and should be changed.
      
      Here are some benchmark results, done on a Skylake laptop at 2.3 GHz
      (turbo off, intel_pstate requesting max performance) under KVM with
      the guest using idle=poll (to avoid artifacts when bouncing between
      CPUs).  I haven't done any real statistics here -- I just ran them
      in a loop and picked the fastest results that didn't look like
      outliers.  Unpatched means commit a4eb8b99, so all the
      bookkeeping overhead is gone.
      
      MADV_DONTNEED; touch the page; switch CPUs using sched_setaffinity.  In
      an unpatched kernel, MADV_DONTNEED will send an IPI to the previous CPU.
      This is intended to be a nearly worst-case test.
      
        patched:         13.4µs
        unpatched:       21.6µs
      
      Vitaly's pthread_mmap microbenchmark with 8 threads (on four cores),
      nrounds = 100, 256M data
      
        patched:         1.1 seconds or so
        unpatched:       1.9 seconds or so
      
      The sleepup on Vitaly's test appearss to be because it spends a lot
      of time blocked on mmap_sem, and this patch avoids sending IPIs to
      blocked CPUs.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Banman <abanman@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/ddf2c92962339f4ba39d8fc41b853936ec0b44f1.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      94b1b03b
    • A
      x86/mm: Track the TLB's tlb_gen and update the flushing algorithm · b0579ade
      Andy Lutomirski 提交于
      There are two kernel features that would benefit from tracking
      how up-to-date each CPU's TLB is in the case where IPIs aren't keeping
      it up to date in real time:
      
       - Lazy mm switching currently works by switching to init_mm when
         it would otherwise flush.  This is wasteful: there isn't fundamentally
         any need to update CR3 at all when going lazy or when returning from
         lazy mode, nor is there any need to receive flush IPIs at all.  Instead,
         we should just stop trying to keep the TLB coherent when we go lazy and,
         when unlazying, check whether we missed any flushes.
      
       - PCID will let us keep recent user contexts alive in the TLB.  If we
         start doing this, we need a way to decide whether those contexts are
         up to date.
      
      On some paravirt systems, remote TLBs can be flushed without IPIs.
      This won't update the target CPUs' tlb_gens, which may cause
      unnecessary local flushes later on.  We can address this if it becomes
      a problem by carefully updating the target CPU's tlb_gen directly.
      
      By itself, this patch is a very minor optimization that avoids
      unnecessary flushes when multiple TLB flushes targetting the same CPU
      race.  The complexity in this patch would not be worth it on its own,
      but it will enable improved lazy TLB tracking and PCID.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1210fb244bc9cbe7677f7f0b72db4d359675f24b.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b0579ade
    • A
      x86/mm: Give each mm TLB flush generation a unique ID · f39681ed
      Andy Lutomirski 提交于
      This adds two new variables to mmu_context_t: ctx_id and tlb_gen.
      ctx_id uniquely identifies the mm_struct and will never be reused.
      For a given mm_struct (and hence ctx_id), tlb_gen is a monotonic
      count of the number of times that a TLB flush has been requested.
      The pair (ctx_id, tlb_gen) can be used as an identifier for TLB
      flush actions and will be used in subsequent patches to reliably
      determine whether all needed TLB flushes have occurred on a given
      CPU.
      
      This patch is split out for ease of review.  By itself, it has no
      real effect other than creating and updating the new variables.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/413a91c24dab3ed0caa5f4e4d017d87b0857f920.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f39681ed
    • C
      x86/boot/e820: Introduce the bootloader provided e820_table_firmware[] table · 12df216c
      Chen Yu 提交于
      Add the real e820_tabel_firmware[] that will not be modified by the kernel
      or the EFI boot stub under any circumstance.
      
      In addition to that modify the code so that e820_table_firmwarep[] is
      exposed via sysfs to represent the real firmware memory layout,
      rather than exposing the e820_table_kexec[] table.
      
      This fixes a hibernation bug/warning, which uses e820_table_kexec[] to check
      RAM layout consistency across hibernation/resume:
      
        The suspend kernel:
        [    0.000000] e820: update [mem 0x76671018-0x76679457] usable ==> usable
      
        The resume kernel:
        [    0.000000] e820: update [mem 0x7666f018-0x76677457] usable ==> usable
        ...
        [   15.752088] PM: Using 3 thread(s) for decompression.
        [   15.752088] PM: Loading and decompressing image data (471870 pages)...
        [   15.764971] Hibernate inconsistent memory map detected!
        [   15.770833] PM: Image mismatch: architecture specific data
      
      Actually it is safe to restore these pages because E820_TYPE_RAM and
      E820_TYPE_RESERVED_KERN are treated the same during hibernation, so
      the original e820 table provided by the bootloader is used for
      hibernation MD5 fingerprint checking.
      
      The side effect is that, this newly introduced variable might increase the
      kernel size at compile time.
      Suggested-by: NIngo Molnar <mingo@redhat.com>
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Xunlei Pang <xlpang@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      12df216c
    • C
      x86/boot/e820: Rename the e820_table_firmware to e820_table_kexec · a09bae0f
      Chen Yu 提交于
      Currently the e820_table_firmware[] table is mainly used by the kexec,
      and it is not what it's supposed to be - despite its name it might be
      modified by the kernel.
      
      So change its name to e820_table_kexec[]. In the next patch we will
      introduce the real e820_table_firmware[] table.
      
      No functional change.
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Xunlei Pang <xlpang@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a09bae0f
    • M
      x86/mm/pat: Don't report PAT on CPUs that don't support it · 99c13b8c
      Mikulas Patocka 提交于
      The pat_enabled() logic is broken on CPUs which do not support PAT and
      where the initialization code fails to call pat_init(). Due to that the
      enabled flag stays true and pat_enabled() returns true wrongfully.
      
      As a consequence the mappings, e.g. for Xorg, are set up with the wrong
      caching mode and the required MTRR setups are omitted.
      
      To cure this the following changes are required:
      
        1) Make pat_enabled() return true only if PAT initialization was
           invoked and successful.
      
        2) Invoke init_cache_modes() unconditionally in setup_arch() and
           remove the extra callsites in pat_disable() and the pat disabled
           code path in pat_init().
      
      Also rename __pat_enabled to pat_disabled to reflect the real purpose of
      this variable.
      
      Fixes: 9cd25aac ("x86/mm/pat: Emulate PAT when it is disabled")
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bernhard Held <berny156@gmx.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: "Luis R. Rodriguez" <mcgrof@suse.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1707041749300.3456@file01.intranet.prod.int.rdu2.redhat.com
      99c13b8c
  7. 04 7月, 2017 1 次提交
  8. 03 7月, 2017 2 次提交
  9. 29 6月, 2017 2 次提交
  10. 28 6月, 2017 4 次提交
  11. 24 6月, 2017 1 次提交