1. 23 12月, 2017 3 次提交
    • P
      x86/mm: Use __flush_tlb_one() for kernel memory · a501686b
      Peter Zijlstra 提交于
      __flush_tlb_single() is for user mappings, __flush_tlb_one() for
      kernel mappings.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a501686b
    • T
      x86/mm/dump_pagetables: Make the address hints correct and readable · 146122e2
      Thomas Gleixner 提交于
      The address hints are a trainwreck. The array entry numbers have to kept
      magically in sync with the actual hints, which is doomed as some of the
      array members are initialized at runtime via the entry numbers.
      
      Designated initializers have been around before this code was
      implemented....
      
      Use the entry numbers to populate the address hints array and add the
      missing bits and pieces. Split 32 and 64 bit for readability sake.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      146122e2
    • T
      x86/mm/dump_pagetables: Check PAGE_PRESENT for real · c0534494
      Thomas Gleixner 提交于
      The check for a present page in printk_prot():
      
             if (!pgprot_val(prot)) {
                      /* Not present */
      
      is bogus. If a PTE is set to PAGE_NONE then the pgprot_val is not zero and
      the entry is decoded in bogus ways, e.g. as RX GLB. That is confusing when
      analyzing mapping correctness. Check for the present bit to make an
      informed decision.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c0534494
  2. 17 12月, 2017 2 次提交
    • A
      x86/kasan/64: Teach KASAN about the cpu_entry_area · 21506525
      Andy Lutomirski 提交于
      The cpu_entry_area will contain stacks.  Make sure that KASAN has
      appropriate shadow mappings for them.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: kasan-dev@googlegroups.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150605.642806442@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      21506525
    • A
      x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow · 2aeb0736
      Andrey Ryabinin 提交于
      [ Note, this is a Git cherry-pick of the following commit:
      
          d17a1d97: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")
      
        ... for easier x86 PTI code testing and back-porting. ]
      
      The KASAN shadow is currently mapped using vmemmap_populate() since that
      provides a semi-convenient way to map pages into init_top_pgt.  However,
      since that no longer zeroes the mapped pages, it is not suitable for
      KASAN, which requires zeroed shadow memory.
      
      Add kasan_populate_shadow() interface and use it instead of
      vmemmap_populate().  Besides, this allows us to take advantage of
      gigantic pages and use them to populate the shadow, which should save us
      some memory wasted on page tables and reduce TLB pressure.
      
      Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2aeb0736
  3. 10 11月, 2017 1 次提交
  4. 09 11月, 2017 1 次提交
    • J
      x86/mm: Unbreak modules that rely on external PAGE_KERNEL availability · 87df2617
      Jiri Kosina 提交于
      Commit 7744ccdb ("x86/mm: Add Secure Memory Encryption (SME)
      support") as a side-effect made PAGE_KERNEL all of a sudden unavailable
      to modules which can't make use of EXPORT_SYMBOL_GPL() symbols.
      
      This is because once SME is enabled, sme_me_mask (which is introduced as
      EXPORT_SYMBOL_GPL) makes its way to PAGE_KERNEL through _PAGE_ENC,
      causing imminent build failure for all the modules which make use of all
      the EXPORT-SYMBOL()-exported API (such as vmap(), __vmalloc(),
      remap_pfn_range(), ...).
      
      Exporting (as EXPORT_SYMBOL()) interfaces (and having done so for ages)
      that take pgprot_t argument, while making it impossible to -- all of a
      sudden -- pass PAGE_KERNEL to it, feels rather incosistent.
      
      Restore the original behavior and make it possible to pass PAGE_KERNEL
      to all its EXPORT_SYMBOL() consumers.
      
      [ This is all so not wonderful. We shouldn't need that "sme_me_mask"
        access at all in all those places that really don't care about that
        level of detail, and just want _PAGE_KERNEL or whatever.
      
        We have some similar issues with _PAGE_CACHE_WP and _PAGE_NOCACHE,
        both of which hide a "cachemode2protval()" call, and which also ends
        up using another EXPORT_SYMBOL(), but at least that only triggers for
        the much more rare cases.
      
        Maybe we could move these dynamic page table bits to be generated much
        deeper down in the VM layer, instead of hiding them in the macros that
        everybody uses.
      
        So this all would merit some cleanup. But not today.   - Linus ]
      
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Despised-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87df2617
  5. 04 11月, 2017 1 次提交
    • A
      Revert "x86/mm: Stop calling leave_mm() in idle code" · 67535736
      Andy Lutomirski 提交于
      This reverts commit 43858b4f.
      
      The reason I removed the leave_mm() calls in question is because the
      heuristic wasn't needed after that patch.  With the original version
      of my PCID series, we never flushed a "lazy cpu" (i.e. a CPU running
      kernel thread) due a flush on the loaded mm.
      
      Unfortunately, that caused architectural issues, so now I've
      reinstated these flushes on non-PCID systems in:
      
          commit b956575b ("x86/mm: Flush more aggressively in lazy TLB mode").
      
      That, in turn, gives us a power management and occasionally
      performance regression as compared to old kernels: a process that
      goes into a deep idle state on a given CPU and gets its mm flushed
      due to activity on a different CPU will wake the idle CPU.
      
      Reinstate the old ugly heuristic: if a CPU goes into ACPI C3 or an
      intel_idle state that is likely to cause a TLB flush gets its mm
      switched to init_mm before going idle.
      
      FWIW, this heuristic is lousy.  Whether we should change CR3 before
      idle isn't a good hint except insofar as the performance hit is a bit
      lower if the TLB is getting flushed by the idle code anyway.  What we
      really want to know is whether we anticipate being idle long enough
      that the mm is likely to be flushed before we wake up.  This is more a
      matter of the expected latency than the idle state that gets chosen.
      This heuristic also completely fails on systems that don't know
      whether the TLB will be flushed (e.g. AMD systems?).  OTOH it may be a
      bit obsolete anyway -- PCID systems don't presently benefit from this
      heuristic at all.
      
      We also shouldn't do this callback from innermost bit of the idle code
      due to the RCU nastiness it causes.  All the information need is
      available before rcu_idle_enter() needs to happen.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 43858b4f "x86/mm: Stop calling leave_mm() in idle code"
      Link: http://lkml.kernel.org/r/c513bbd4e653747213e05bc7062de000bf0202a5.1509793738.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      67535736
  6. 02 11月, 2017 2 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
    • R
      x86/mm: Relocate page fault error codes to traps.h · 1067f030
      Ricardo Neri 提交于
      Up to this point, only fault.c used the definitions of the page fault error
      codes. Thus, it made sense to keep them within such file. Other portions of
      code might be interested in those definitions too. For instance, the User-
      Mode Instruction Prevention emulation code will use such definitions to
      emulate a page fault when it is unable to successfully copy the results
      of the emulated instructions to user space.
      
      While relocating the error code enumeration, the prefix X86_ is used to
      make it consistent with the rest of the definitions in traps.h. Of course,
      code using the enumeration had to be updated as well. No functional changes
      were performed.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: ricardo.neri@intel.com
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: https://lkml.kernel.org/r/1509135945-13762-2-git-send-email-ricardo.neri-calderon@linux.intel.com
      1067f030
  7. 01 11月, 2017 1 次提交
    • V
      x86/mm: fix use-after-free of vma during userfaultfd fault · cb0631fd
      Vlastimil Babka 提交于
      Syzkaller with KASAN has reported a use-after-free of vma->vm_flags in
      __do_page_fault() with the following reproducer:
      
        mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
        mmap(&(0x7f0000011000/0x3000)=nil, 0x3000, 0x1, 0x32, 0xffffffffffffffff, 0x0)
        r0 = userfaultfd(0x0)
        ioctl$UFFDIO_API(r0, 0xc018aa3f, &(0x7f0000002000-0x18)={0xaa, 0x0, 0x0})
        ioctl$UFFDIO_REGISTER(r0, 0xc020aa00, &(0x7f0000019000)={{&(0x7f0000012000/0x2000)=nil, 0x2000}, 0x1, 0x0})
        r1 = gettid()
        syz_open_dev$evdev(&(0x7f0000013000-0x12)="2f6465762f696e7075742f6576656e742300", 0x0, 0x0)
        tkill(r1, 0x7)
      
      The vma should be pinned by mmap_sem, but handle_userfault() might (in a
      return to userspace scenario) release it and then acquire again, so when
      we return to __do_page_fault() (with other result than VM_FAULT_RETRY),
      the vma might be gone.
      
      Specifically, per Andrea the scenario is
       "A return to userland to repeat the page fault later with a
        VM_FAULT_NOPAGE retval (potentially after handling any pending signal
        during the return to userland). The return to userland is identified
        whenever FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in
        vmf->flags"
      
      However, since commit a3c4fb7c ("x86/mm: Fix fault error path using
      unsafe vma pointer") there is a vma_pkey() read of vma->vm_flags after
      that point, which can thus become use-after-free.  Fix this by moving
      the read before calling handle_mm_fault().
      Reported-by: Nsyzbot <bot+6a5269ce759a7bb12754ed9622076dc93f65a1f6@syzkaller.appspotmail.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
      Fixes: 3c4fb7c9c2e ("x86/mm: Fix fault error path using unsafe vma pointer")
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb0631fd
  8. 30 10月, 2017 1 次提交
  9. 27 10月, 2017 1 次提交
    • I
      Revert "x86/mm: Limit mmap() of /dev/mem to valid physical addresses" · 90edaac6
      Ingo Molnar 提交于
      This reverts commit ce56a86e.
      
      There's unanticipated interaction with some boot parameters like 'mem=',
      which now cause the new checks via valid_mmap_phys_addr_range() to be too
      restrictive, crashing a Qemu bootup in fact, as reported by Fengguang Wu.
      
      So while the motivation of the change is still entirely valid, we
      need a few more rounds of testing to get it right - it's way too late
      after -rc6, so revert it for now.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NCraig Bergstrom <craigb@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dsafonov@virtuozzo.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: oleg@redhat.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      90edaac6
  10. 20 10月, 2017 2 次提交
    • A
      x86/kasan: Use the same shadow offset for 4- and 5-level paging · 12a8cc7f
      Andrey Ryabinin 提交于
      We are going to support boot-time switching between 4- and 5-level
      paging. For KASAN it means we cannot have different KASAN_SHADOW_OFFSET
      for different paging modes: the constant is passed to gcc to generate
      code and cannot be changed at runtime.
      
      This patch changes KASAN code to use 0xdffffc0000000000 as shadow offset
      for both 4- and 5-level paging.
      
      For 5-level paging it means that shadow memory region is not aligned to
      PGD boundary anymore and we have to handle unaligned parts of the region
      properly.
      
      In addition, we have to exclude paravirt code from KASAN instrumentation
      as we now use set_pgd() before KASAN is fully ready.
      
      [kirill.shutemov@linux.intel.com: clenaup, changelog message]
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20170929140821.37654-4-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      12a8cc7f
    • C
      x86/mm: Limit mmap() of /dev/mem to valid physical addresses · ce56a86e
      Craig Bergstrom 提交于
      Currently, it is possible to mmap() any offset from /dev/mem.  If a
      program mmaps() /dev/mem offsets outside of the addressable limits
      of a system, the page table can be corrupted by setting reserved bits.
      
      For example if you mmap() offset 0x0001000000000000 of /dev/mem on an
      x86_64 system with a 48-bit bus, the page fault handler will be called
      with error_code set to RSVD.  The kernel then crashes with a page table
      corruption error.
      
      This change prevents this page table corruption on x86 by refusing
      to mmap offsets higher than the highest valid address in the system.
      Signed-off-by: NCraig Bergstrom <craigb@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dsafonov@virtuozzo.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: oleg@redhat.com
      Link: http://lkml.kernel.org/r/20171019192856.39672-1-craigb@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ce56a86e
  11. 18 10月, 2017 3 次提交
  12. 14 10月, 2017 1 次提交
    • A
      x86/mm: Flush more aggressively in lazy TLB mode · b956575b
      Andy Lutomirski 提交于
      Since commit:
      
        94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      
      x86's lazy TLB mode has been all the way lazy: when running a kernel thread
      (including the idle thread), the kernel keeps using the last user mm's
      page tables without attempting to maintain user TLB coherence at all.
      
      From a pure semantic perspective, this is fine -- kernel threads won't
      attempt to access user pages, so having stale TLB entries doesn't matter.
      
      Unfortunately, I forgot about a subtlety.  By skipping TLB flushes,
      we also allow any paging-structure caches that may exist on the CPU
      to become incoherent.  This means that we can have a
      paging-structure cache entry that references a freed page table, and
      the CPU is within its rights to do a speculative page walk starting
      at the freed page table.
      
      I can imagine this causing two different problems:
      
       - A speculative page walk starting from a bogus page table could read
         IO addresses.  I haven't seen any reports of this causing problems.
      
       - A speculative page walk that involves a bogus page table can install
         garbage in the TLB.  Such garbage would always be at a user VA, but
         some AMD CPUs have logic that triggers a machine check when it notices
         these bogus entries.  I've seen a couple reports of this.
      
      Boris further explains the failure mode:
      
      > It is actually more of an optimization which assumes that paging-structure
      > entries are in WB DRAM:
      >
      > "TlbCacheDis: cacheable memory disable. Read-write. 0=Enables
      > performance optimization that assumes PML4, PDP, PDE, and PTE entries
      > are in cacheable WB-DRAM; memory type checks may be bypassed, and
      > addresses outside of WB-DRAM may result in undefined behavior or NB
      > protocol errors. 1=Disables performance optimization and allows PML4,
      > PDP, PDE and PTE entries to be in any memory type. Operating systems
      > that maintain page tables in memory types other than WB- DRAM must set
      > TlbCacheDis to insure proper operation."
      >
      > The MCE generated is an NB protocol error to signal that
      >
      > "Link: A specific coherent-only packet from a CPU was issued to an
      > IO link. This may be caused by software which addresses page table
      > structures in a memory type other than cacheable WB-DRAM without
      > properly configuring MSRC001_0015[TlbCacheDis]. This may occur, for
      > example, when page table structure addresses are above top of memory. In
      > such cases, the NB will generate an MCE if it sees a mismatch between
      > the memory operation generated by the core and the link type."
      >
      > I'm assuming coherent-only packets don't go out on IO links, thus the
      > error.
      
      To fix this, reinstate TLB coherence in lazy mode.  With this patch
      applied, we do it in one of two ways:
      
       - If we have PCID, we simply switch back to init_mm's page tables
         when we enter a kernel thread -- this seems to be quite cheap
         except for the cost of serializing the CPU.
      
       - If we don't have PCID, then we set a flag and switch to init_mm
         the first time we would otherwise need to flush the TLB.
      
      The /sys/kernel/debug/x86/tlb_use_lazy_mode debug switch can be changed
      to override the default mode for benchmarking.
      
      In theory, we could optimize this better by only flushing the TLB in
      lazy CPUs when a page table is freed.  Doing that would require
      auditing the mm code to make sure that all page table freeing goes
      through tlb_remove_page() as well as reworking some data structures
      to implement the improved flush logic.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Reported-by: NAdam Borowski <kilobyte@angband.pl>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Johannes Hirte <johannes.hirte@datenkhaos.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      Link: http://lkml.kernel.org/r/20171009170231.fkpraqokz6e4zeco@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b956575b
  13. 11 10月, 2017 1 次提交
  14. 30 9月, 2017 2 次提交
  15. 26 9月, 2017 1 次提交
    • I
      x86/fpu: Rename fpu::fpstate_active to fpu::initialized · e4a81bfc
      Ingo Molnar 提交于
      The x86 FPU code used to have a complex state machine where both the FPU
      registers and the FPU state context could be 'active' (or inactive)
      independently of each other - which enabled features like lazy FPU restore.
      
      Much of this complexity is gone in the current code: now we basically can
      have FPU-less tasks (kernel threads) that don't use (and save/restore) FPU
      state at all, plus full FPU users that save/restore directly with no laziness
      whatsoever.
      
      But the fpu::fpstate_active still carries bits of the old complexity - meanwhile
      this flag has become a simple flag that shows whether the FPU context saving
      area in the thread struct is initialized and used, or not.
      
      Rename it to fpu::initialized to express this simplicity in the name as well.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-30-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e4a81bfc
  16. 25 9月, 2017 2 次提交
    • L
      x86/mm: Fix fault error path using unsafe vma pointer · a3c4fb7c
      Laurent Dufour 提交于
      commit 7b2d0dba ("x86/mm/pkeys: Pass VMA down in to fault signal
      generation code") passes down a vma pointer to the error path, but that is
      done once the mmap_sem is released when calling mm_fault_error() from
      __do_page_fault().
      
      This is dangerous as the vma structure is no more safe to be used once the
      mmap_sem has been released. As only the protection key value is required in
      the error processing, we could just pass down this value.
      
      Fix it by passing a pointer to a protection key value down to the fault
      signal generation code. The use of a pointer allows to keep the check
      generating a warning message in fill_sig_info_pkey() when the vma was not
      known. If the pointer is valid, the protection value can be accessed by
      deferencing the pointer.
      
      [ tglx: Made *pkey u32 as that's the type which is passed in siginfo ]
      
      Fixes: 7b2d0dba ("x86/mm/pkeys: Pass VMA down in to fault signal generation code")
      Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1504513935-12742-1-git-send-email-ldufour@linux.vnet.ibm.com
      a3c4fb7c
    • E
      x86/fpu: Reinitialize FPU registers if restoring FPU state fails · d5c8028b
      Eric Biggers 提交于
      Userspace can change the FPU state of a task using the ptrace() or
      rt_sigreturn() system calls.  Because reserved bits in the FPU state can
      cause the XRSTOR instruction to fail, the kernel has to carefully
      validate that no reserved bits or other invalid values are being set.
      
      Unfortunately, there have been bugs in this validation code.  For
      example, we were not checking that the 'xcomp_bv' field in the
      xstate_header was 0.  As-is, such bugs are exploitable to read the FPU
      registers of other processes on the system.  To do so, an attacker can
      create a task, assign to it an invalid FPU state, then spin in a loop
      and monitor the values of the FPU registers.  Because the task's FPU
      registers are not being restored, sometimes the FPU registers will have
      the values from another process.
      
      This is likely to continue to be a problem in the future because the
      validation done by the CPU instructions like XRSTOR is not immediately
      visible to kernel developers.  Nor will invalid FPU states ever be
      encountered during ordinary use --- they will only be seen during
      fuzzing or exploits.  There can even be reserved bits outside the
      xstate_header which are easy to forget about.  For example, the MXCSR
      register contains reserved bits, which were not validated by the
      KVM_SET_XSAVE ioctl until commit a575813b ("KVM: x86: Fix load
      damaged SSEx MXCSR register").
      
      Therefore, mitigate this class of vulnerability by restoring the FPU
      registers from init_fpstate if restoring from the task's state fails.
      
      We actually used to do this, but it was (perhaps unwisely) removed by
      commit 9ccc27a5 ("x86/fpu: Remove error return values from
      copy_kernel_to_*regs() functions").  This new patch is also a bit
      different.  First, it only clears the registers, not also the bad
      in-memory state; this is simpler and makes it easier to make the
      mitigation cover all callers of __copy_kernel_to_fpregs().  Second, it
      does the register clearing in an exception handler so that no extra
      instructions are added to context switches.  In fact, we *remove*
      instructions, since previously we were always zeroing the register
      containing 'err' even if CONFIG_X86_DEBUG_FPU was disabled.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170922174156.16780-4-ebiggers3@gmail.com
      Link: http://lkml.kernel.org/r/20170923130016.21448-27-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d5c8028b
  17. 24 9月, 2017 2 次提交
    • I
      x86/fpu: Change fpu->fpregs_active users to fpu->fpstate_active · f1c8cd01
      Ingo Molnar 提交于
      We want to simplify the FPU state machine by eliminating fpu->fpregs_active,
      and we can do that because the two state flags (::fpregs_active and
      ::fpstate_active) are set essentially together.
      
      The old lazy FPU switching code used to make a distinction - but there's
      no lazy switching code anymore, we always switch in an 'eager' fashion.
      
      Do this by first changing all substantial uses of fpu->fpregs_active
      to fpu->fpstate_active and adding a few debug checks to double check
      our assumption is correct.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-19-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f1c8cd01
    • I
      x86/fpu: Simplify fpu->fpregs_active use · b3a16308
      Ingo Molnar 提交于
      The fpregs_active() inline function is pretty pointless - in almost
      all the callsites it can be replaced with a direct fpu->fpregs_active
      access.
      
      Do so and eliminate the extra layer of obfuscation.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-16-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b3a16308
  18. 23 9月, 2017 1 次提交
    • J
      x86/asm: Fix inline asm call constraints for Clang · f5caf621
      Josh Poimboeuf 提交于
      For inline asm statements which have a CALL instruction, we list the
      stack pointer as a constraint to convince GCC to ensure the frame
      pointer is set up first:
      
        static inline void foo()
        {
      	register void *__sp asm(_ASM_SP);
      	asm("call bar" : "+r" (__sp))
        }
      
      Unfortunately, that pattern causes Clang to corrupt the stack pointer.
      
      The fix is easy: convert the stack pointer register variable to a global
      variable.
      
      It should be noted that the end result is different based on the GCC
      version.  With GCC 6.4, this patch has exactly the same result as
      before:
      
      	defconfig	defconfig-nofp	distro		distro-nofp
       before	9820389		9491555		8816046		8516940
       after	9820389		9491555		8816046		8516940
      
      With GCC 7.2, however, GCC's behavior has changed.  It now changes its
      behavior based on the conversion of the register variable to a global.
      That somehow convinces it to *always* set up the frame pointer before
      inserting *any* inline asm.  (Therefore, listing the variable as an
      output constraint is a no-op and is no longer necessary.)  It's a bit
      overkill, but the performance impact should be negligible.  And in fact,
      there's a nice improvement with frame pointers disabled:
      
      	defconfig	defconfig-nofp	distro		distro-nofp
       before	9796316		9468236		9076191		8790305
       after	9796957		9464267		9076381		8785949
      
      So in summary, while listing the stack pointer as an output constraint
      is no longer necessary for newer versions of GCC, it's still needed for
      older versions.
      Suggested-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: NMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f5caf621
  19. 18 9月, 2017 1 次提交
  20. 13 9月, 2017 3 次提交
    • J
      x86/paravirt: Remove no longer used paravirt functions · 87930019
      Juergen Gross 提交于
      With removal of lguest some of the paravirt functions are no longer
      needed:
      
      	->read_cr4()
      	->store_idt()
      	->set_pmd_at()
      	->set_pud_at()
      	->pte_update()
      
      Remove them.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akataria@vmware.com
      Cc: boris.ostrovsky@oracle.com
      Cc: chrisw@sous-sol.org
      Cc: jeremy@goop.org
      Cc: rusty@rustcorp.com.au
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/20170904102527.25409-1-jgross@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      87930019
    • A
      x86/mm/64: Initialize CR4.PCIDE early · c7ad5ad2
      Andy Lutomirski 提交于
      cpu_init() is weird: it's called rather late (after early
      identification and after most MMU state is initialized) on the boot
      CPU but is called extremely early (before identification) on secondary
      CPUs.  It's called just late enough on the boot CPU that its CR4 value
      isn't propagated to mmu_cr4_features.
      
      Even if we put CR4.PCIDE into mmu_cr4_features, we'd hit two
      problems.  First, we'd crash in the trampoline code.  That's
      fixable, and I tried that.  It turns out that mmu_cr4_features is
      totally ignored by secondary_start_64(), though, so even with the
      trampoline code fixed, it wouldn't help.
      
      This means that we don't currently have CR4.PCIDE reliably initialized
      before we start playing with cpu_tlbstate.  This is very fragile and
      tends to cause boot failures if I make even small changes to the TLB
      handling code.
      
      Make it more robust: initialize CR4.PCIDE earlier on the boot CPU
      and propagate it to secondary CPUs in start_secondary().
      
      ( Yes, this is ugly.  I think we should have improved mmu_cr4_features
        to actually control CR4 during secondary bootup, but that would be
        fairly intrusive at this stage. )
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reported-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Tested-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Fixes: 660da7c9 ("x86/mm: Enable CR4.PCIDE on supported systems")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c7ad5ad2
    • A
      x86/mm: Get rid of VM_BUG_ON in switch_tlb_irqs_off() · a376e7f9
      Andy Lutomirski 提交于
      If we hit the VM_BUG_ON(), we're detecting a genuinely bad situation,
      but we're very unlikely to get a useful call trace.
      
      Make it a warning instead.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3b4e06bbb382ca54a93218407c93925ff5871546.1504847163.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a376e7f9
  21. 11 9月, 2017 1 次提交
  22. 09 9月, 2017 1 次提交
    • M
      mm/memory_hotplug: introduce add_pages · 3072e413
      Michal Hocko 提交于
      There are new users of memory hotplug emerging.  Some of them require
      different subset of arch_add_memory.  There are some which only require
      allocation of struct pages without mapping those pages to the kernel
      address space.  We currently have __add_pages for that purpose.  But this
      is rather lowlevel and not very suitable for the code outside of the
      memory hotplug.  E.g.  x86_64 wants to update max_pfn which should be done
      by the caller.  Introduce add_pages() which should care about those
      details if they are needed.  Each architecture should define its
      implementation and select CONFIG_ARCH_HAS_ADD_PAGES.  All others use the
      currently existing __add_pages.
      
      Link: http://lkml.kernel.org/r/20170817000548.32038-7-jglisse@redhat.comSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NJérôme Glisse <jglisse@redhat.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Nellans <dnellans@nvidia.com>
      Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mark Hairgrove <mhairgrove@nvidia.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Sherry Cheung <SCheung@nvidia.com>
      Cc: Subhash Gutti <sgutti@nvidia.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Bob Liu <liubo95@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3072e413
  23. 07 9月, 2017 2 次提交
  24. 31 8月, 2017 2 次提交
  25. 29 8月, 2017 2 次提交