1. 13 1月, 2018 1 次提交
    • E
      x86/mm/pkeys: Fix fill_sig_info_pkey · 90bc9fb1
      Eric W. Biederman 提交于
      SEGV_PKUERR is a signal specific si_code which happens to have the
      same numeric value as several others: BUS_MCEERR_AR, ILL_ILLTRP,
      FPE_FLTOVF, TRAP_HWBKPT, CLD_TRAPPED, POLL_ERR, SEGV_THREAD_ID,
      as such it is not safe to just test the si_code the signal number
      must also be tested to prevent a false positive in fill_sig_info_pkey.
      
      I found this error by inspection, and BUS_MCEERR_AR appears to
      be a real candidate for confusion.  So pass in si_signo and fix it.
      
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Fixes: 019132ff ("x86/mm/pkeys: Fill in pkey field in siginfo")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      90bc9fb1
  2. 24 12月, 2017 20 次提交
    • T
      x86/mm/dump_pagetables: Allow dumping current pagetables · a4b51ef6
      Thomas Gleixner 提交于
      Add two debugfs files which allow to dump the pagetable of the current
      task.
      
      current_kernel dumps the regular page table. This is the page table which
      is normally shared between kernel and user space. If kernel page table
      isolation is enabled this is the kernel space mapping.
      
      If kernel page table isolation is enabled the second file, current_user,
      dumps the user space page table.
      
      These files allow to verify the resulting page tables for page table
      isolation, but even in the normal case its useful to be able to inspect
      user space page tables of current for debugging purposes.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a4b51ef6
    • T
      x86/mm/dump_pagetables: Check user space page table for WX pages · b4bf4f92
      Thomas Gleixner 提交于
      ptdump_walk_pgd_level_checkwx() checks the kernel page table for WX pages,
      but does not check the PAGE_TABLE_ISOLATION user space page table.
      
      Restructure the code so that dmesg output is selected by an explicit
      argument and not implicit via checking the pgd argument for !NULL.
      
      Add the check for the user space page table.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b4bf4f92
    • B
      x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy · 75298aa1
      Borislav Petkov 提交于
      The upcoming support for dumping the kernel and the user space page tables
      of the current process would create more random files in the top level
      debugfs directory.
      
      Add a page table directory and move the existing file to it.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      75298aa1
    • D
      x86/mm: Use INVPCID for __native_flush_tlb_single() · 6cff64b8
      Dave Hansen 提交于
      This uses INVPCID to shoot down individual lines of the user mapping
      instead of marking the entire user map as invalid. This
      could/might/possibly be faster.
      
      This for sure needs tlb_single_page_flush_ceiling to be redetermined;
      esp. since INVPCID is _slow_.
      
      A detailed performance analysis is available here:
      
        https://lkml.kernel.org/r/3062e486-3539-8a1f-5724-16199420be71@intel.com
      
      [ Peterz: Split out from big combo patch ]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6cff64b8
    • P
      x86/mm: Use/Fix PCID to optimize user/kernel switches · 6fd166aa
      Peter Zijlstra 提交于
      We can use PCID to retain the TLBs across CR3 switches; including those now
      part of the user/kernel switch. This increases performance of kernel
      entry/exit at the cost of more expensive/complicated TLB flushing.
      
      Now that we have two address spaces, one for kernel and one for user space,
      we need two PCIDs per mm. We use the top PCID bit to indicate a user PCID
      (just like we use the PFN LSB for the PGD). Since we do TLB invalidation
      from kernel space, the existing code will only invalidate the kernel PCID,
      we augment that by marking the corresponding user PCID invalid, and upon
      switching back to userspace, use a flushing CR3 write for the switch.
      
      In order to access the user_pcid_flush_mask we use PER_CPU storage, which
      means the previously established SWAPGS vs CR3 ordering is now mandatory
      and required.
      
      Having to do this memory access does require additional registers, most
      sites have a functioning stack and we can spill one (RAX), sites without
      functional stack need to otherwise provide the second scratch register.
      
      Note: PCID is generally available on Intel Sandybridge and later CPUs.
      Note: Up until this point TLB flushing was broken in this series.
      
      Based-on-code-from: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6fd166aa
    • D
      x86/mm: Abstract switching CR3 · 48e11198
      Dave Hansen 提交于
      In preparation to adding additional PCID flushing, abstract the
      loading of a new ASID into CR3.
      
      [ PeterZ: Split out from big combo patch ]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      48e11198
    • D
      x86/mm: Allow flushing for future ASID switches · 2ea907c4
      Dave Hansen 提交于
      If changing the page tables in such a way that an invalidation of all
      contexts (aka. PCIDs / ASIDs) is required, they can be actively invalidated
      by:
      
       1. INVPCID for each PCID (works for single pages too).
      
       2. Load CR3 with each PCID without the NOFLUSH bit set
      
       3. Load CR3 with the NOFLUSH bit set for each and do INVLPG for each address.
      
      But, none of these are really feasible since there are ~6 ASIDs (12 with
      PAGE_TABLE_ISOLATION) at the time that invalidation is required.
      Instead of actively invalidating them, invalidate the *current* context and
      also mark the cpu_tlbstate _quickly_ to indicate future invalidation to be
      required.
      
      At the next context-switch, look for this indicator
      ('invalidate_other' being set) invalidate all of the
      cpu_tlbstate.ctxs[] entries.
      
      This ensures that any future context switches will do a full flush
      of the TLB, picking up the previous changes.
      
      [ tglx: Folded more fixups from Peter ]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2ea907c4
    • A
      x86/pti: Map the vsyscall page if needed · 85900ea5
      Andy Lutomirski 提交于
      Make VSYSCALLs work fully in PTI mode by mapping them properly to the user
      space visible page tables.
      
      [ tglx: Hide unused functions (Patch by Arnd Bergmann) ]
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      85900ea5
    • A
      x86/pti: Put the LDT in its own PGD if PTI is on · f55f0501
      Andy Lutomirski 提交于
      With PTI enabled, the LDT must be mapped in the usermode tables somewhere.
      The LDT is per process, i.e. per mm.
      
      An earlier approach mapped the LDT on context switch into a fixmap area,
      but that's a big overhead and exhausted the fixmap space when NR_CPUS got
      big.
      
      Take advantage of the fact that there is an address space hole which
      provides a completely unused pgd. Use this pgd to manage per-mm LDT
      mappings.
      
      This has a down side: the LDT isn't (currently) randomized, and an attack
      that can write the LDT is instant root due to call gates (thanks, AMD, for
      leaving call gates in AMD64 but designing them wrong so they're only useful
      for exploits).  This can be mitigated by making the LDT read-only or
      randomizing the mapping, either of which is strightforward on top of this
      patch.
      
      This will significantly slow down LDT users, but that shouldn't matter for
      important workloads -- the LDT is only used by DOSEMU(2), Wine, and very
      old libc implementations.
      
      [ tglx: Cleaned it up. ]
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f55f0501
    • T
      x86/cpu_entry_area: Add debugstore entries to cpu_entry_area · 10043e02
      Thomas Gleixner 提交于
      The Intel PEBS/BTS debug store is a design trainwreck as it expects virtual
      addresses which must be visible in any execution context.
      
      So it is required to make these mappings visible to user space when kernel
      page table isolation is active.
      
      Provide enough room for the buffer mappings in the cpu_entry_area so the
      buffers are available in the user space visible page tables.
      
      At the point where the kernel side entry area is populated there is no
      buffer available yet, but the kernel PMD must be populated. To achieve this
      set the entries for these buffers to non present.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      10043e02
    • A
      x86/mm/pti: Map ESPFIX into user space · 4b6bbe95
      Andy Lutomirski 提交于
      Map the ESPFIX pages into user space when PTI is enabled.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4b6bbe95
    • T
      x86/mm/pti: Share entry text PMD · 6dc72c3c
      Thomas Gleixner 提交于
      Share the entry text PMD of the kernel mapping with the user space
      mapping. If large pages are enabled this is a single PMD entry and at the
      point where it is copied into the user page table the RW bit has not been
      cleared yet. Clear it right away so the user space visible map becomes RX.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6dc72c3c
    • A
      x86/mm/pti: Share cpu_entry_area with user space page tables · f7cfbee9
      Andy Lutomirski 提交于
      Share the cpu entry area so the user space and kernel space page tables
      have the same P4D page.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f7cfbee9
    • A
      x86/mm/pti: Add functions to clone kernel PMDs · 03f4424f
      Andy Lutomirski 提交于
      Provide infrastructure to:
      
       - find a kernel PMD for a mapping which must be visible to user space for
         the entry/exit code to work.
      
       - walk an address range and share the kernel PMD with it.
      
      This reuses a small part of the original KAISER patches to populate the
      user space page table.
      
      [ tglx: Made it universally usable so it can be used for any kind of shared
      	mapping. Add a mechanism to clear specific bits in the user space
      	visible PMD entry. Folded Andys simplifactions ]
      Originally-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      03f4424f
    • D
      x86/mm/pti: Allocate a separate user PGD · d9e9a641
      Dave Hansen 提交于
      Kernel page table isolation requires to have two PGDs. One for the kernel,
      which contains the full kernel mapping plus the user space mapping and one
      for user space which contains the user space mappings and the minimal set
      of kernel mappings which are required by the architecture to be able to
      transition from and to user space.
      
      Add the necessary preliminaries.
      
      [ tglx: Split out from the big kaiser dump. EFI fixup from Kirill ]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d9e9a641
    • D
      x86/mm/pti: Add mapping helper functions · 61e9b367
      Dave Hansen 提交于
      Add the pagetable helper functions do manage the separate user space page
      tables.
      
      [ tglx: Split out from the big combo kaiser patch. Folded Andys
      	simplification and made it out of line as Boris suggested ]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      61e9b367
    • B
      x86/pti: Add the pti= cmdline option and documentation · 41f4c20b
      Borislav Petkov 提交于
      Keep the "nopti" optional for traditional reasons.
      
      [ tglx: Don't allow force on when running on XEN PV and made 'on'
      	printout conditional ]
      Requested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171212133952.10177-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      41f4c20b
    • T
      x86/mm/pti: Add infrastructure for page table isolation · aa8c6248
      Thomas Gleixner 提交于
      Add the initial files for kernel page table isolation, with a minimal init
      function and the boot time detection for this misfeature.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      aa8c6248
    • D
      x86/mm/pti: Disable global pages if PAGE_TABLE_ISOLATION=y · c313ec66
      Dave Hansen 提交于
      Global pages stay in the TLB across context switches.  Since all contexts
      share the same kernel mapping, these mappings are marked as global pages
      so kernel entries in the TLB are not flushed out on a context switch.
      
      But, even having these entries in the TLB opens up something that an
      attacker can use, such as the double-page-fault attack:
      
         http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
      
      That means that even when PAGE_TABLE_ISOLATION switches page tables
      on return to user space the global pages would stay in the TLB cache.
      
      Disable global pages so that kernel TLB entries can be flushed before
      returning to user space. This way, all accesses to kernel addresses from
      userspace result in a TLB miss independent of the existence of a kernel
      mapping.
      
      Suppress global pages via the __supported_pte_mask. The user space
      mappings set PAGE_GLOBAL for the minimal kernel mappings which are
      required for entry/exit. These mappings are set up manually so the
      filtering does not take place.
      
      [ The __supported_pte_mask simplification was written by Thomas Gleixner. ]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c313ec66
    • T
      x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit · f6c4fd50
      Thomas Gleixner 提交于
      The loop which populates the CPU entry area PMDs can wrap around on 32bit
      machines when the number of CPUs is small.
      
      It worked wonderful for NR_CPUS=64 for whatever reason and the moron who
      wrote that code did not bother to test it with !SMP.
      
      Check for the wraparound to fix it.
      
      Fixes: 92a0f81d ("x86/cpu_entry_area: Move it out of the fixmap")
      Reported-by: Nkernel test robot <fengguang.wu@intel.com>
      Signed-off-by: NThomas "Feels stupid" Gleixner <tglx@linutronix.de>
      Tested-by: NBorislav Petkov <bp@alien8.de>
      f6c4fd50
  3. 23 12月, 2017 6 次提交
    • T
      x86/cpu_entry_area: Move it out of the fixmap · 92a0f81d
      Thomas Gleixner 提交于
      Put the cpu_entry_area into a separate P4D entry. The fixmap gets too big
      and 0-day already hit a case where the fixmap PTEs were cleared by
      cleanup_highmap().
      
      Aside of that the fixmap API is a pain as it's all backwards.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      92a0f81d
    • T
      x86/cpu_entry_area: Move it to a separate unit · ed1bbc40
      Thomas Gleixner 提交于
      Separate the cpu_entry_area code out of cpu/common.c and the fixmap.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ed1bbc40
    • D
      x86/mm: Move the CR3 construction functions to tlbflush.h · 50fb83a6
      Dave Hansen 提交于
      For flushing the TLB, the ASID which has been programmed into the hardware
      must be known.  That differs from what is in 'cpu_tlbstate'.
      
      Add functions to transform the 'cpu_tlbstate' values into to the one
      programmed into the hardware (CR3).
      
      It's not easy to include mmu_context.h into tlbflush.h, so just move the
      CR3 building over to tlbflush.h.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      50fb83a6
    • P
      x86/mm: Use __flush_tlb_one() for kernel memory · a501686b
      Peter Zijlstra 提交于
      __flush_tlb_single() is for user mappings, __flush_tlb_one() for
      kernel mappings.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a501686b
    • T
      x86/mm/dump_pagetables: Make the address hints correct and readable · 146122e2
      Thomas Gleixner 提交于
      The address hints are a trainwreck. The array entry numbers have to kept
      magically in sync with the actual hints, which is doomed as some of the
      array members are initialized at runtime via the entry numbers.
      
      Designated initializers have been around before this code was
      implemented....
      
      Use the entry numbers to populate the address hints array and add the
      missing bits and pieces. Split 32 and 64 bit for readability sake.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      146122e2
    • T
      x86/mm/dump_pagetables: Check PAGE_PRESENT for real · c0534494
      Thomas Gleixner 提交于
      The check for a present page in printk_prot():
      
             if (!pgprot_val(prot)) {
                      /* Not present */
      
      is bogus. If a PTE is set to PAGE_NONE then the pgprot_val is not zero and
      the entry is decoded in bogus ways, e.g. as RX GLB. That is confusing when
      analyzing mapping correctness. Check for the present bit to make an
      informed decision.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c0534494
  4. 20 12月, 2017 1 次提交
  5. 18 12月, 2017 1 次提交
    • T
      x86/mm: Unbreak modules that use the DMA API · 9d5f38ba
      Tom Lendacky 提交于
      Commit d8aa7eea ("x86/mm: Add Secure Encrypted Virtualization (SEV)
      support") changed sme_active() from an inline function that referenced
      sme_me_mask to a non-inlined function in order to make the sev_enabled
      variable a static variable.  This function was marked EXPORT_SYMBOL_GPL
      because at the time the patch was submitted, sme_me_mask was marked
      EXPORT_SYMBOL_GPL.
      
      Commit 87df2617 ("x86/mm: Unbreak modules that rely on external
      PAGE_KERNEL availability") changed sme_me_mask variable from
      EXPORT_SYMBOL_GPL to EXPORT_SYMBOL, allowing external modules the ability
      to build with CONFIG_AMD_MEM_ENCRYPT=y.  Now, however, with sev_active()
      no longer an inline function and marked as EXPORT_SYMBOL_GPL, external
      modules that use the DMA API are once again broken in 4.15. Since the DMA
      API is meant to be used by external modules, this needs to be changed.
      
      Change the sme_active() and sev_active() functions from EXPORT_SYMBOL_GPL
      to EXPORT_SYMBOL.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Link: https://lkml.kernel.org/r/20171215162011.14125.7113.stgit@tlendack-t1.amdoffice.net
      9d5f38ba
  6. 17 12月, 2017 2 次提交
    • A
      x86/kasan/64: Teach KASAN about the cpu_entry_area · 21506525
      Andy Lutomirski 提交于
      The cpu_entry_area will contain stacks.  Make sure that KASAN has
      appropriate shadow mappings for them.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: kasan-dev@googlegroups.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150605.642806442@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      21506525
    • A
      x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow · 2aeb0736
      Andrey Ryabinin 提交于
      [ Note, this is a Git cherry-pick of the following commit:
      
          d17a1d97: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")
      
        ... for easier x86 PTI code testing and back-porting. ]
      
      The KASAN shadow is currently mapped using vmemmap_populate() since that
      provides a semi-convenient way to map pages into init_top_pgt.  However,
      since that no longer zeroes the mapped pages, it is not suitable for
      KASAN, which requires zeroed shadow memory.
      
      Add kasan_populate_shadow() interface and use it instead of
      vmemmap_populate().  Besides, this allows us to take advantage of
      gigantic pages and use them to populate the shadow, which should save us
      some memory wasted on page tables and reduce TLB pressure.
      
      Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2aeb0736
  7. 11 12月, 2017 1 次提交
    • K
      x86/mm/kmmio: Fix mmiotrace for page unaligned addresses · 6d60ce38
      Karol Herbst 提交于
      If something calls ioremap() with an address not aligned to PAGE_SIZE, the
      returned address might be not aligned as well. This led to a probe
      registered on exactly the returned address, but the entire page was armed
      for mmiotracing.
      
      On calling iounmap() the address passed to unregister_kmmio_probe() was
      PAGE_SIZE aligned by the caller leading to a complete freeze of the
      machine.
      
      We should always page align addresses while (un)registerung mappings,
      because the mmiotracer works on top of pages, not mappings. We still keep
      track of the probes based on their real addresses and lengths though,
      because the mmiotrace still needs to know what are mapped memory regions.
      
      Also move the call to mmiotrace_iounmap() prior page aligning the address,
      so that all probes are unregistered properly, otherwise the kernel ends up
      failing memory allocations randomly after disabling the mmiotracer.
      Tested-by: NLyude <lyude@redhat.com>
      Signed-off-by: NKarol Herbst <kherbst@redhat.com>
      Acked-by: NPekka Paalanen <ppaalanen@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: nouveau@lists.freedesktop.org
      Link: http://lkml.kernel.org/r/20171127075139.4928-1-kherbst@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6d60ce38
  8. 09 12月, 2017 1 次提交
  9. 06 12月, 2017 2 次提交
  10. 28 11月, 2017 1 次提交
  11. 22 11月, 2017 1 次提交
    • A
      x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow · f68d62a5
      Andrey Ryabinin 提交于
      [ Note, this commit is a cherry-picked version of:
      
          d17a1d97: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")
      
        ... for easier x86 entry code testing and back-porting. ]
      
      The KASAN shadow is currently mapped using vmemmap_populate() since that
      provides a semi-convenient way to map pages into init_top_pgt.  However,
      since that no longer zeroes the mapped pages, it is not suitable for
      KASAN, which requires zeroed shadow memory.
      
      Add kasan_populate_shadow() interface and use it instead of
      vmemmap_populate().  Besides, this allows us to take advantage of
      gigantic pages and use them to populate the shadow, which should save us
      some memory wasted on page tables and reduce TLB pressure.
      
      Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f68d62a5
  12. 16 11月, 2017 3 次提交
    • C
      x86/mm: Limit mmap() of /dev/mem to valid physical addresses · be62a320
      Craig Bergstrom 提交于
      One thing /dev/mem access APIs should verify is that there's no way
      that excessively large pfn's can leak into the high bits of the
      page table entry.
      
      In particular, if people can use "very large physical page addresses"
      through /dev/mem to set the bits past bit 58 - SOFTW4 and permission
      key bits and NX bit, that could *really* confuse the kernel.
      
      We had an earlier attempt:
      
        ce56a86e ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses")
      
      ... which turned out to be too restrictive (breaking mem=... bootups for example) and
      had to be reverted in:
      
        90edaac6 ("Revert "x86/mm: Limit mmap() of /dev/mem to valid physical addresses"")
      
      This v2 attempt modifies the original patch and makes sure that mmap(/dev/mem)
      limits the pfns so that it at least fits in the actual pteval_t architecturally:
      
       - Make sure mmap_mem() actually validates that the offset fits in phys_addr_t
      
          ( This may be indirectly true due to some other check, but it's not
            entirely obvious. )
      
       - Change valid_mmap_phys_addr_range() to just use phys_addr_valid()
         on the top byte
      
          ( Top byte is sufficient, because mmap_mem() has already checked that
            it cannot wrap. )
      
       - Add a few comments about what the valid_phys_addr_range() vs.
         valid_mmap_phys_addr_range() difference is.
      Signed-off-by: NCraig Bergstrom <craigb@google.com>
      [ Fixed the checks and added comments. ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      [ Collected the discussion and patches into a commit. ]
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sander Eikelenboom <linux@eikelenboom.it>
      Cc: Sean Young <sean@mess.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/CA+55aFyEcOMb657vWSmrM13OxmHxC-XxeBmNis=DwVvpJUOogQ@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      be62a320
    • K
      x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border · 1e0f25db
      Kirill A. Shutemov 提交于
      In case of 5-level paging, the kernel does not place any mapping above
      47-bit, unless userspace explicitly asks for it.
      
      Userspace can request an allocation from the full address space by
      specifying the mmap address hint above 47-bit.
      
      Nicholas noticed that the current implementation violates this interface:
      
        If user space requests a mapping at the end of the 47-bit address space
        with a length which causes the mapping to cross the 47-bit border
        (DEFAULT_MAP_WINDOW), then the vma is partially in the address space
        below and above.
      
      Sanity check the mmap address hint so that start and end of the resulting
      vma are on the same side of the 47-bit border. If that's not the case fall
      back to the code path which ignores the address hint and allocate from the
      regular address space below 47-bit.
      
      To make the checks consistent, mask out the address hints lower bits
      (either PAGE_MASK or huge_page_mask()) instead of using ALIGN() which can
      push them up to the next boundary.
      
      [ tglx: Moved the address check to a function and massaged comment and
        	changelog ]
      Reported-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20171115143607.81541-1-kirill.shutemov@linux.intel.com
      1e0f25db
    • M
      mm, sparse: do not swamp log with huge vmemmap allocation failures · fcdaf842
      Michal Hocko 提交于
      While doing memory hotplug tests under heavy memory pressure we have
      noticed too many page allocation failures when allocating vmemmap memmap
      backed by huge page
      
        kworker/u3072:1: page allocation failure: order:9, mode:0x24084c0(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO)
        [...]
        Call Trace:
          dump_trace+0x59/0x310
          show_stack_log_lvl+0xea/0x170
          show_stack+0x21/0x40
          dump_stack+0x5c/0x7c
          warn_alloc_failed+0xe2/0x150
          __alloc_pages_nodemask+0x3ed/0xb20
          alloc_pages_current+0x7f/0x100
          vmemmap_alloc_block+0x79/0xb6
          __vmemmap_alloc_block_buf+0x136/0x145
          vmemmap_populate+0xd2/0x2b9
          sparse_mem_map_populate+0x23/0x30
          sparse_add_one_section+0x68/0x18e
          __add_pages+0x10a/0x1d0
          arch_add_memory+0x4a/0xc0
          add_memory_resource+0x89/0x160
          add_memory+0x6d/0xd0
          acpi_memory_device_add+0x181/0x251
          acpi_bus_attach+0xfd/0x19b
          acpi_bus_scan+0x59/0x69
          acpi_device_hotplug+0xd2/0x41f
          acpi_hotplug_work_fn+0x1a/0x23
          process_one_work+0x14e/0x410
          worker_thread+0x116/0x490
          kthread+0xbd/0xe0
          ret_from_fork+0x3f/0x70
      
      and we do see many of those because essentially every allocation fails
      for each memory section.  This is an excessive way to tell the user that
      there is nothing to really worry about because we do have a fallback
      mechanism to use base pages.  The only downside might be a performance
      degradation due to TLB pressure.
      
      This patch changes vmemmap_alloc_block() to use __GFP_NOWARN and warn
      explicitly once on the first allocation failure.  This will reduce the
      noise in the kernel log considerably, while we still have an indication
      that a performance might be impacted.
      
      [mhocko@kernel.org: forgot to git add the follow up fix]
        Link: http://lkml.kernel.org/r/20171107090635.c27thtse2lchjgvb@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20171106092228.31098-1-mhocko@kernel.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fcdaf842