1. 24 12月, 2017 17 次提交
    • D
      x86/mm: Use INVPCID for __native_flush_tlb_single() · 6cff64b8
      Dave Hansen 提交于
      This uses INVPCID to shoot down individual lines of the user mapping
      instead of marking the entire user map as invalid. This
      could/might/possibly be faster.
      
      This for sure needs tlb_single_page_flush_ceiling to be redetermined;
      esp. since INVPCID is _slow_.
      
      A detailed performance analysis is available here:
      
        https://lkml.kernel.org/r/3062e486-3539-8a1f-5724-16199420be71@intel.com
      
      [ Peterz: Split out from big combo patch ]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6cff64b8
    • P
      x86/mm: Use/Fix PCID to optimize user/kernel switches · 6fd166aa
      Peter Zijlstra 提交于
      We can use PCID to retain the TLBs across CR3 switches; including those now
      part of the user/kernel switch. This increases performance of kernel
      entry/exit at the cost of more expensive/complicated TLB flushing.
      
      Now that we have two address spaces, one for kernel and one for user space,
      we need two PCIDs per mm. We use the top PCID bit to indicate a user PCID
      (just like we use the PFN LSB for the PGD). Since we do TLB invalidation
      from kernel space, the existing code will only invalidate the kernel PCID,
      we augment that by marking the corresponding user PCID invalid, and upon
      switching back to userspace, use a flushing CR3 write for the switch.
      
      In order to access the user_pcid_flush_mask we use PER_CPU storage, which
      means the previously established SWAPGS vs CR3 ordering is now mandatory
      and required.
      
      Having to do this memory access does require additional registers, most
      sites have a functioning stack and we can spill one (RAX), sites without
      functional stack need to otherwise provide the second scratch register.
      
      Note: PCID is generally available on Intel Sandybridge and later CPUs.
      Note: Up until this point TLB flushing was broken in this series.
      
      Based-on-code-from: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6fd166aa
    • D
      x86/mm: Abstract switching CR3 · 48e11198
      Dave Hansen 提交于
      In preparation to adding additional PCID flushing, abstract the
      loading of a new ASID into CR3.
      
      [ PeterZ: Split out from big combo patch ]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      48e11198
    • D
      x86/mm: Allow flushing for future ASID switches · 2ea907c4
      Dave Hansen 提交于
      If changing the page tables in such a way that an invalidation of all
      contexts (aka. PCIDs / ASIDs) is required, they can be actively invalidated
      by:
      
       1. INVPCID for each PCID (works for single pages too).
      
       2. Load CR3 with each PCID without the NOFLUSH bit set
      
       3. Load CR3 with the NOFLUSH bit set for each and do INVLPG for each address.
      
      But, none of these are really feasible since there are ~6 ASIDs (12 with
      PAGE_TABLE_ISOLATION) at the time that invalidation is required.
      Instead of actively invalidating them, invalidate the *current* context and
      also mark the cpu_tlbstate _quickly_ to indicate future invalidation to be
      required.
      
      At the next context-switch, look for this indicator
      ('invalidate_other' being set) invalidate all of the
      cpu_tlbstate.ctxs[] entries.
      
      This ensures that any future context switches will do a full flush
      of the TLB, picking up the previous changes.
      
      [ tglx: Folded more fixups from Peter ]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2ea907c4
    • A
      x86/pti: Map the vsyscall page if needed · 85900ea5
      Andy Lutomirski 提交于
      Make VSYSCALLs work fully in PTI mode by mapping them properly to the user
      space visible page tables.
      
      [ tglx: Hide unused functions (Patch by Arnd Bergmann) ]
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      85900ea5
    • A
      x86/pti: Put the LDT in its own PGD if PTI is on · f55f0501
      Andy Lutomirski 提交于
      With PTI enabled, the LDT must be mapped in the usermode tables somewhere.
      The LDT is per process, i.e. per mm.
      
      An earlier approach mapped the LDT on context switch into a fixmap area,
      but that's a big overhead and exhausted the fixmap space when NR_CPUS got
      big.
      
      Take advantage of the fact that there is an address space hole which
      provides a completely unused pgd. Use this pgd to manage per-mm LDT
      mappings.
      
      This has a down side: the LDT isn't (currently) randomized, and an attack
      that can write the LDT is instant root due to call gates (thanks, AMD, for
      leaving call gates in AMD64 but designing them wrong so they're only useful
      for exploits).  This can be mitigated by making the LDT read-only or
      randomizing the mapping, either of which is strightforward on top of this
      patch.
      
      This will significantly slow down LDT users, but that shouldn't matter for
      important workloads -- the LDT is only used by DOSEMU(2), Wine, and very
      old libc implementations.
      
      [ tglx: Cleaned it up. ]
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f55f0501
    • T
      x86/cpu_entry_area: Add debugstore entries to cpu_entry_area · 10043e02
      Thomas Gleixner 提交于
      The Intel PEBS/BTS debug store is a design trainwreck as it expects virtual
      addresses which must be visible in any execution context.
      
      So it is required to make these mappings visible to user space when kernel
      page table isolation is active.
      
      Provide enough room for the buffer mappings in the cpu_entry_area so the
      buffers are available in the user space visible page tables.
      
      At the point where the kernel side entry area is populated there is no
      buffer available yet, but the kernel PMD must be populated. To achieve this
      set the entries for these buffers to non present.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      10043e02
    • A
      x86/mm/pti: Map ESPFIX into user space · 4b6bbe95
      Andy Lutomirski 提交于
      Map the ESPFIX pages into user space when PTI is enabled.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4b6bbe95
    • T
      x86/mm/pti: Share entry text PMD · 6dc72c3c
      Thomas Gleixner 提交于
      Share the entry text PMD of the kernel mapping with the user space
      mapping. If large pages are enabled this is a single PMD entry and at the
      point where it is copied into the user page table the RW bit has not been
      cleared yet. Clear it right away so the user space visible map becomes RX.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6dc72c3c
    • A
      x86/mm/pti: Share cpu_entry_area with user space page tables · f7cfbee9
      Andy Lutomirski 提交于
      Share the cpu entry area so the user space and kernel space page tables
      have the same P4D page.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f7cfbee9
    • A
      x86/mm/pti: Add functions to clone kernel PMDs · 03f4424f
      Andy Lutomirski 提交于
      Provide infrastructure to:
      
       - find a kernel PMD for a mapping which must be visible to user space for
         the entry/exit code to work.
      
       - walk an address range and share the kernel PMD with it.
      
      This reuses a small part of the original KAISER patches to populate the
      user space page table.
      
      [ tglx: Made it universally usable so it can be used for any kind of shared
      	mapping. Add a mechanism to clear specific bits in the user space
      	visible PMD entry. Folded Andys simplifactions ]
      Originally-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      03f4424f
    • D
      x86/mm/pti: Allocate a separate user PGD · d9e9a641
      Dave Hansen 提交于
      Kernel page table isolation requires to have two PGDs. One for the kernel,
      which contains the full kernel mapping plus the user space mapping and one
      for user space which contains the user space mappings and the minimal set
      of kernel mappings which are required by the architecture to be able to
      transition from and to user space.
      
      Add the necessary preliminaries.
      
      [ tglx: Split out from the big kaiser dump. EFI fixup from Kirill ]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d9e9a641
    • D
      x86/mm/pti: Add mapping helper functions · 61e9b367
      Dave Hansen 提交于
      Add the pagetable helper functions do manage the separate user space page
      tables.
      
      [ tglx: Split out from the big combo kaiser patch. Folded Andys
      	simplification and made it out of line as Boris suggested ]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      61e9b367
    • B
      x86/pti: Add the pti= cmdline option and documentation · 41f4c20b
      Borislav Petkov 提交于
      Keep the "nopti" optional for traditional reasons.
      
      [ tglx: Don't allow force on when running on XEN PV and made 'on'
      	printout conditional ]
      Requested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171212133952.10177-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      41f4c20b
    • T
      x86/mm/pti: Add infrastructure for page table isolation · aa8c6248
      Thomas Gleixner 提交于
      Add the initial files for kernel page table isolation, with a minimal init
      function and the boot time detection for this misfeature.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      aa8c6248
    • D
      x86/mm/pti: Disable global pages if PAGE_TABLE_ISOLATION=y · c313ec66
      Dave Hansen 提交于
      Global pages stay in the TLB across context switches.  Since all contexts
      share the same kernel mapping, these mappings are marked as global pages
      so kernel entries in the TLB are not flushed out on a context switch.
      
      But, even having these entries in the TLB opens up something that an
      attacker can use, such as the double-page-fault attack:
      
         http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
      
      That means that even when PAGE_TABLE_ISOLATION switches page tables
      on return to user space the global pages would stay in the TLB cache.
      
      Disable global pages so that kernel TLB entries can be flushed before
      returning to user space. This way, all accesses to kernel addresses from
      userspace result in a TLB miss independent of the existence of a kernel
      mapping.
      
      Suppress global pages via the __supported_pte_mask. The user space
      mappings set PAGE_GLOBAL for the minimal kernel mappings which are
      required for entry/exit. These mappings are set up manually so the
      filtering does not take place.
      
      [ The __supported_pte_mask simplification was written by Thomas Gleixner. ]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c313ec66
    • T
      x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit · f6c4fd50
      Thomas Gleixner 提交于
      The loop which populates the CPU entry area PMDs can wrap around on 32bit
      machines when the number of CPUs is small.
      
      It worked wonderful for NR_CPUS=64 for whatever reason and the moron who
      wrote that code did not bother to test it with !SMP.
      
      Check for the wraparound to fix it.
      
      Fixes: 92a0f81d ("x86/cpu_entry_area: Move it out of the fixmap")
      Reported-by: Nkernel test robot <fengguang.wu@intel.com>
      Signed-off-by: NThomas "Feels stupid" Gleixner <tglx@linutronix.de>
      Tested-by: NBorislav Petkov <bp@alien8.de>
      f6c4fd50
  2. 23 12月, 2017 6 次提交
    • T
      x86/cpu_entry_area: Move it out of the fixmap · 92a0f81d
      Thomas Gleixner 提交于
      Put the cpu_entry_area into a separate P4D entry. The fixmap gets too big
      and 0-day already hit a case where the fixmap PTEs were cleared by
      cleanup_highmap().
      
      Aside of that the fixmap API is a pain as it's all backwards.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      92a0f81d
    • T
      x86/cpu_entry_area: Move it to a separate unit · ed1bbc40
      Thomas Gleixner 提交于
      Separate the cpu_entry_area code out of cpu/common.c and the fixmap.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ed1bbc40
    • D
      x86/mm: Move the CR3 construction functions to tlbflush.h · 50fb83a6
      Dave Hansen 提交于
      For flushing the TLB, the ASID which has been programmed into the hardware
      must be known.  That differs from what is in 'cpu_tlbstate'.
      
      Add functions to transform the 'cpu_tlbstate' values into to the one
      programmed into the hardware (CR3).
      
      It's not easy to include mmu_context.h into tlbflush.h, so just move the
      CR3 building over to tlbflush.h.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      50fb83a6
    • P
      x86/mm: Use __flush_tlb_one() for kernel memory · a501686b
      Peter Zijlstra 提交于
      __flush_tlb_single() is for user mappings, __flush_tlb_one() for
      kernel mappings.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a501686b
    • T
      x86/mm/dump_pagetables: Make the address hints correct and readable · 146122e2
      Thomas Gleixner 提交于
      The address hints are a trainwreck. The array entry numbers have to kept
      magically in sync with the actual hints, which is doomed as some of the
      array members are initialized at runtime via the entry numbers.
      
      Designated initializers have been around before this code was
      implemented....
      
      Use the entry numbers to populate the address hints array and add the
      missing bits and pieces. Split 32 and 64 bit for readability sake.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      146122e2
    • T
      x86/mm/dump_pagetables: Check PAGE_PRESENT for real · c0534494
      Thomas Gleixner 提交于
      The check for a present page in printk_prot():
      
             if (!pgprot_val(prot)) {
                      /* Not present */
      
      is bogus. If a PTE is set to PAGE_NONE then the pgprot_val is not zero and
      the entry is decoded in bogus ways, e.g. as RX GLB. That is confusing when
      analyzing mapping correctness. Check for the present bit to make an
      informed decision.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c0534494
  3. 17 12月, 2017 2 次提交
    • A
      x86/kasan/64: Teach KASAN about the cpu_entry_area · 21506525
      Andy Lutomirski 提交于
      The cpu_entry_area will contain stacks.  Make sure that KASAN has
      appropriate shadow mappings for them.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: kasan-dev@googlegroups.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150605.642806442@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      21506525
    • A
      x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow · 2aeb0736
      Andrey Ryabinin 提交于
      [ Note, this is a Git cherry-pick of the following commit:
      
          d17a1d97: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")
      
        ... for easier x86 PTI code testing and back-porting. ]
      
      The KASAN shadow is currently mapped using vmemmap_populate() since that
      provides a semi-convenient way to map pages into init_top_pgt.  However,
      since that no longer zeroes the mapped pages, it is not suitable for
      KASAN, which requires zeroed shadow memory.
      
      Add kasan_populate_shadow() interface and use it instead of
      vmemmap_populate().  Besides, this allows us to take advantage of
      gigantic pages and use them to populate the shadow, which should save us
      some memory wasted on page tables and reduce TLB pressure.
      
      Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2aeb0736
  4. 10 11月, 2017 1 次提交
  5. 09 11月, 2017 1 次提交
    • J
      x86/mm: Unbreak modules that rely on external PAGE_KERNEL availability · 87df2617
      Jiri Kosina 提交于
      Commit 7744ccdb ("x86/mm: Add Secure Memory Encryption (SME)
      support") as a side-effect made PAGE_KERNEL all of a sudden unavailable
      to modules which can't make use of EXPORT_SYMBOL_GPL() symbols.
      
      This is because once SME is enabled, sme_me_mask (which is introduced as
      EXPORT_SYMBOL_GPL) makes its way to PAGE_KERNEL through _PAGE_ENC,
      causing imminent build failure for all the modules which make use of all
      the EXPORT-SYMBOL()-exported API (such as vmap(), __vmalloc(),
      remap_pfn_range(), ...).
      
      Exporting (as EXPORT_SYMBOL()) interfaces (and having done so for ages)
      that take pgprot_t argument, while making it impossible to -- all of a
      sudden -- pass PAGE_KERNEL to it, feels rather incosistent.
      
      Restore the original behavior and make it possible to pass PAGE_KERNEL
      to all its EXPORT_SYMBOL() consumers.
      
      [ This is all so not wonderful. We shouldn't need that "sme_me_mask"
        access at all in all those places that really don't care about that
        level of detail, and just want _PAGE_KERNEL or whatever.
      
        We have some similar issues with _PAGE_CACHE_WP and _PAGE_NOCACHE,
        both of which hide a "cachemode2protval()" call, and which also ends
        up using another EXPORT_SYMBOL(), but at least that only triggers for
        the much more rare cases.
      
        Maybe we could move these dynamic page table bits to be generated much
        deeper down in the VM layer, instead of hiding them in the macros that
        everybody uses.
      
        So this all would merit some cleanup. But not today.   - Linus ]
      
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Despised-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87df2617
  6. 04 11月, 2017 1 次提交
    • A
      Revert "x86/mm: Stop calling leave_mm() in idle code" · 67535736
      Andy Lutomirski 提交于
      This reverts commit 43858b4f.
      
      The reason I removed the leave_mm() calls in question is because the
      heuristic wasn't needed after that patch.  With the original version
      of my PCID series, we never flushed a "lazy cpu" (i.e. a CPU running
      kernel thread) due a flush on the loaded mm.
      
      Unfortunately, that caused architectural issues, so now I've
      reinstated these flushes on non-PCID systems in:
      
          commit b956575b ("x86/mm: Flush more aggressively in lazy TLB mode").
      
      That, in turn, gives us a power management and occasionally
      performance regression as compared to old kernels: a process that
      goes into a deep idle state on a given CPU and gets its mm flushed
      due to activity on a different CPU will wake the idle CPU.
      
      Reinstate the old ugly heuristic: if a CPU goes into ACPI C3 or an
      intel_idle state that is likely to cause a TLB flush gets its mm
      switched to init_mm before going idle.
      
      FWIW, this heuristic is lousy.  Whether we should change CR3 before
      idle isn't a good hint except insofar as the performance hit is a bit
      lower if the TLB is getting flushed by the idle code anyway.  What we
      really want to know is whether we anticipate being idle long enough
      that the mm is likely to be flushed before we wake up.  This is more a
      matter of the expected latency than the idle state that gets chosen.
      This heuristic also completely fails on systems that don't know
      whether the TLB will be flushed (e.g. AMD systems?).  OTOH it may be a
      bit obsolete anyway -- PCID systems don't presently benefit from this
      heuristic at all.
      
      We also shouldn't do this callback from innermost bit of the idle code
      due to the RCU nastiness it causes.  All the information need is
      available before rcu_idle_enter() needs to happen.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 43858b4f "x86/mm: Stop calling leave_mm() in idle code"
      Link: http://lkml.kernel.org/r/c513bbd4e653747213e05bc7062de000bf0202a5.1509793738.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      67535736
  7. 02 11月, 2017 2 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
    • R
      x86/mm: Relocate page fault error codes to traps.h · 1067f030
      Ricardo Neri 提交于
      Up to this point, only fault.c used the definitions of the page fault error
      codes. Thus, it made sense to keep them within such file. Other portions of
      code might be interested in those definitions too. For instance, the User-
      Mode Instruction Prevention emulation code will use such definitions to
      emulate a page fault when it is unable to successfully copy the results
      of the emulated instructions to user space.
      
      While relocating the error code enumeration, the prefix X86_ is used to
      make it consistent with the rest of the definitions in traps.h. Of course,
      code using the enumeration had to be updated as well. No functional changes
      were performed.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: ricardo.neri@intel.com
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: https://lkml.kernel.org/r/1509135945-13762-2-git-send-email-ricardo.neri-calderon@linux.intel.com
      1067f030
  8. 01 11月, 2017 1 次提交
    • V
      x86/mm: fix use-after-free of vma during userfaultfd fault · cb0631fd
      Vlastimil Babka 提交于
      Syzkaller with KASAN has reported a use-after-free of vma->vm_flags in
      __do_page_fault() with the following reproducer:
      
        mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
        mmap(&(0x7f0000011000/0x3000)=nil, 0x3000, 0x1, 0x32, 0xffffffffffffffff, 0x0)
        r0 = userfaultfd(0x0)
        ioctl$UFFDIO_API(r0, 0xc018aa3f, &(0x7f0000002000-0x18)={0xaa, 0x0, 0x0})
        ioctl$UFFDIO_REGISTER(r0, 0xc020aa00, &(0x7f0000019000)={{&(0x7f0000012000/0x2000)=nil, 0x2000}, 0x1, 0x0})
        r1 = gettid()
        syz_open_dev$evdev(&(0x7f0000013000-0x12)="2f6465762f696e7075742f6576656e742300", 0x0, 0x0)
        tkill(r1, 0x7)
      
      The vma should be pinned by mmap_sem, but handle_userfault() might (in a
      return to userspace scenario) release it and then acquire again, so when
      we return to __do_page_fault() (with other result than VM_FAULT_RETRY),
      the vma might be gone.
      
      Specifically, per Andrea the scenario is
       "A return to userland to repeat the page fault later with a
        VM_FAULT_NOPAGE retval (potentially after handling any pending signal
        during the return to userland). The return to userland is identified
        whenever FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in
        vmf->flags"
      
      However, since commit a3c4fb7c ("x86/mm: Fix fault error path using
      unsafe vma pointer") there is a vma_pkey() read of vma->vm_flags after
      that point, which can thus become use-after-free.  Fix this by moving
      the read before calling handle_mm_fault().
      Reported-by: Nsyzbot <bot+6a5269ce759a7bb12754ed9622076dc93f65a1f6@syzkaller.appspotmail.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
      Fixes: 3c4fb7c9c2e ("x86/mm: Fix fault error path using unsafe vma pointer")
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb0631fd
  9. 30 10月, 2017 1 次提交
  10. 27 10月, 2017 1 次提交
    • I
      Revert "x86/mm: Limit mmap() of /dev/mem to valid physical addresses" · 90edaac6
      Ingo Molnar 提交于
      This reverts commit ce56a86e.
      
      There's unanticipated interaction with some boot parameters like 'mem=',
      which now cause the new checks via valid_mmap_phys_addr_range() to be too
      restrictive, crashing a Qemu bootup in fact, as reported by Fengguang Wu.
      
      So while the motivation of the change is still entirely valid, we
      need a few more rounds of testing to get it right - it's way too late
      after -rc6, so revert it for now.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NCraig Bergstrom <craigb@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dsafonov@virtuozzo.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: oleg@redhat.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      90edaac6
  11. 20 10月, 2017 2 次提交
    • A
      x86/kasan: Use the same shadow offset for 4- and 5-level paging · 12a8cc7f
      Andrey Ryabinin 提交于
      We are going to support boot-time switching between 4- and 5-level
      paging. For KASAN it means we cannot have different KASAN_SHADOW_OFFSET
      for different paging modes: the constant is passed to gcc to generate
      code and cannot be changed at runtime.
      
      This patch changes KASAN code to use 0xdffffc0000000000 as shadow offset
      for both 4- and 5-level paging.
      
      For 5-level paging it means that shadow memory region is not aligned to
      PGD boundary anymore and we have to handle unaligned parts of the region
      properly.
      
      In addition, we have to exclude paravirt code from KASAN instrumentation
      as we now use set_pgd() before KASAN is fully ready.
      
      [kirill.shutemov@linux.intel.com: clenaup, changelog message]
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20170929140821.37654-4-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      12a8cc7f
    • C
      x86/mm: Limit mmap() of /dev/mem to valid physical addresses · ce56a86e
      Craig Bergstrom 提交于
      Currently, it is possible to mmap() any offset from /dev/mem.  If a
      program mmaps() /dev/mem offsets outside of the addressable limits
      of a system, the page table can be corrupted by setting reserved bits.
      
      For example if you mmap() offset 0x0001000000000000 of /dev/mem on an
      x86_64 system with a 48-bit bus, the page fault handler will be called
      with error_code set to RSVD.  The kernel then crashes with a page table
      corruption error.
      
      This change prevents this page table corruption on x86 by refusing
      to mmap offsets higher than the highest valid address in the system.
      Signed-off-by: NCraig Bergstrom <craigb@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dsafonov@virtuozzo.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: oleg@redhat.com
      Link: http://lkml.kernel.org/r/20171019192856.39672-1-craigb@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ce56a86e
  12. 18 10月, 2017 3 次提交
  13. 14 10月, 2017 1 次提交
    • A
      x86/mm: Flush more aggressively in lazy TLB mode · b956575b
      Andy Lutomirski 提交于
      Since commit:
      
        94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      
      x86's lazy TLB mode has been all the way lazy: when running a kernel thread
      (including the idle thread), the kernel keeps using the last user mm's
      page tables without attempting to maintain user TLB coherence at all.
      
      From a pure semantic perspective, this is fine -- kernel threads won't
      attempt to access user pages, so having stale TLB entries doesn't matter.
      
      Unfortunately, I forgot about a subtlety.  By skipping TLB flushes,
      we also allow any paging-structure caches that may exist on the CPU
      to become incoherent.  This means that we can have a
      paging-structure cache entry that references a freed page table, and
      the CPU is within its rights to do a speculative page walk starting
      at the freed page table.
      
      I can imagine this causing two different problems:
      
       - A speculative page walk starting from a bogus page table could read
         IO addresses.  I haven't seen any reports of this causing problems.
      
       - A speculative page walk that involves a bogus page table can install
         garbage in the TLB.  Such garbage would always be at a user VA, but
         some AMD CPUs have logic that triggers a machine check when it notices
         these bogus entries.  I've seen a couple reports of this.
      
      Boris further explains the failure mode:
      
      > It is actually more of an optimization which assumes that paging-structure
      > entries are in WB DRAM:
      >
      > "TlbCacheDis: cacheable memory disable. Read-write. 0=Enables
      > performance optimization that assumes PML4, PDP, PDE, and PTE entries
      > are in cacheable WB-DRAM; memory type checks may be bypassed, and
      > addresses outside of WB-DRAM may result in undefined behavior or NB
      > protocol errors. 1=Disables performance optimization and allows PML4,
      > PDP, PDE and PTE entries to be in any memory type. Operating systems
      > that maintain page tables in memory types other than WB- DRAM must set
      > TlbCacheDis to insure proper operation."
      >
      > The MCE generated is an NB protocol error to signal that
      >
      > "Link: A specific coherent-only packet from a CPU was issued to an
      > IO link. This may be caused by software which addresses page table
      > structures in a memory type other than cacheable WB-DRAM without
      > properly configuring MSRC001_0015[TlbCacheDis]. This may occur, for
      > example, when page table structure addresses are above top of memory. In
      > such cases, the NB will generate an MCE if it sees a mismatch between
      > the memory operation generated by the core and the link type."
      >
      > I'm assuming coherent-only packets don't go out on IO links, thus the
      > error.
      
      To fix this, reinstate TLB coherence in lazy mode.  With this patch
      applied, we do it in one of two ways:
      
       - If we have PCID, we simply switch back to init_mm's page tables
         when we enter a kernel thread -- this seems to be quite cheap
         except for the cost of serializing the CPU.
      
       - If we don't have PCID, then we set a flag and switch to init_mm
         the first time we would otherwise need to flush the TLB.
      
      The /sys/kernel/debug/x86/tlb_use_lazy_mode debug switch can be changed
      to override the default mode for benchmarking.
      
      In theory, we could optimize this better by only flushing the TLB in
      lazy CPUs when a page table is freed.  Doing that would require
      auditing the mm code to make sure that all page table freeing goes
      through tlb_remove_page() as well as reworking some data structures
      to implement the improved flush logic.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Reported-by: NAdam Borowski <kilobyte@angband.pl>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Johannes Hirte <johannes.hirte@datenkhaos.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      Link: http://lkml.kernel.org/r/20171009170231.fkpraqokz6e4zeco@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b956575b
  14. 11 10月, 2017 1 次提交