1. 15 5月, 2019 1 次提交
  2. 06 5月, 2019 1 次提交
  3. 30 4月, 2019 3 次提交
    • R
      mm/hibernation: Make hibernation handle unmapped pages · d6332692
      Rick Edgecombe 提交于
      Make hibernate handle unmapped pages on the direct map when
      CONFIG_ARCH_HAS_SET_ALIAS=y is set. These functions allow for setting pages
      to invalid configurations, so now hibernate should check if the pages have
      valid mappings and handle if they are unmapped when doing a hibernate
      save operation.
      
      Previously this checking was already done when CONFIG_DEBUG_PAGEALLOC=y
      was configured. It does not appear to have a big hibernating performance
      impact. The speed of the saving operation before this change was measured
      as 819.02 MB/s, and after was measured at 813.32 MB/s.
      
      Before:
      [    4.670938] PM: Wrote 171996 kbytes in 0.21 seconds (819.02 MB/s)
      
      After:
      [    4.504714] PM: Wrote 178932 kbytes in 0.22 seconds (813.32 MB/s)
      Signed-off-by: NRick Edgecombe <rick.p.edgecombe@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: <akpm@linux-foundation.org>
      Cc: <ard.biesheuvel@linaro.org>
      Cc: <deneen.t.dock@intel.com>
      Cc: <kernel-hardening@lists.openwall.com>
      Cc: <kristen@linux.intel.com>
      Cc: <linux_dti@icloud.com>
      Cc: <will.deacon@arm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190426001143.4983-16-namit@vmware.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d6332692
    • R
      x86/mm/cpa: Add set_direct_map_*() functions · d253ca0c
      Rick Edgecombe 提交于
      Add two new functions set_direct_map_default_noflush() and
      set_direct_map_invalid_noflush() for setting the direct map alias for the
      page to its default valid permissions and to an invalid state that cannot
      be cached in a TLB, respectively. These functions do not flush the TLB.
      
      Note, __kernel_map_pages() does something similar but flushes the TLB and
      doesn't reset the permission bits to default on all architectures.
      
      Also add an ARCH config ARCH_HAS_SET_DIRECT_MAP for specifying whether
      these have an actual implementation or a default empty one.
      Signed-off-by: NRick Edgecombe <rick.p.edgecombe@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <akpm@linux-foundation.org>
      Cc: <ard.biesheuvel@linaro.org>
      Cc: <deneen.t.dock@intel.com>
      Cc: <kernel-hardening@lists.openwall.com>
      Cc: <kristen@linux.intel.com>
      Cc: <linux_dti@icloud.com>
      Cc: <will.deacon@arm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190426001143.4983-15-namit@vmware.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d253ca0c
    • N
      x86/alternatives: Initialize temporary mm for patching · 4fc19708
      Nadav Amit 提交于
      To prevent improper use of the PTEs that are used for text patching, the
      next patches will use a temporary mm struct. Initailize it by copying
      the init mm.
      
      The address that will be used for patching is taken from the lower area
      that is usually used for the task memory. Doing so prevents the need to
      frequently synchronize the temporary-mm (e.g., when BPF programs are
      installed), since different PGDs are used for the task memory.
      
      Finally, randomize the address of the PTEs to harden against exploits
      that use these PTEs.
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Tested-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Signed-off-by: NRick Edgecombe <rick.p.edgecombe@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: ard.biesheuvel@linaro.org
      Cc: deneen.t.dock@intel.com
      Cc: kernel-hardening@lists.openwall.com
      Cc: kristen@linux.intel.com
      Cc: linux_dti@icloud.com
      Cc: will.deacon@arm.com
      Link: https://lkml.kernel.org/r/20190426232303.28381-8-nadav.amit@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4fc19708
  4. 26 4月, 2019 1 次提交
    • N
      x86/mm/tlb: Remove 'struct flush_tlb_info' from the stack · 3db6d5a5
      Nadav Amit 提交于
      Move flush_tlb_info variables off the stack. This allows to align
      flush_tlb_info to cache-line and avoid potentially unnecessary cache
      line movements. It also allows to have a fixed virtual-to-physical
      translation of the variables, which reduces TLB misses.
      
      Use per-CPU struct for flush_tlb_mm_range() and
      flush_tlb_kernel_range(). Add debug assertions to ensure there are
      no nested TLB flushes that might overwrite the per-CPU data. For
      arch_tlbbatch_flush() use a const struct.
      
      Results when running a microbenchmarks that performs 10^6 MADV_DONTEED
      operations and touching a page, in which 3 additional threads run a
      busy-wait loop (5 runs, PTI and retpolines are turned off):
      
      			base		off-stack
      			----		---------
        avg (usec/op)		1.629		1.570	(-3%)
        stddev		0.014		0.009
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20190425230143.7008-1-namit@vmware.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3db6d5a5
  5. 24 4月, 2019 2 次提交
    • J
      x86/mm: Remove in_nmi() warning from 64-bit implementation of vmalloc_fault() · a65c88e1
      Jiri Kosina 提交于
      In-NMI warnings have been added to vmalloc_fault() via:
      
        ebc8827f ("x86: Barf when vmalloc and kmemcheck faults happen in NMI")
      
      back in the time when our NMI entry code could not cope with nested NMIs.
      
      These days, it's perfectly fine to take a fault in NMI context and we
      don't have to care about the fact that IRET from the fault handler might
      cause NMI nesting.
      
      This warning has already been removed from 32-bit implementation of
      vmalloc_fault() in:
      
        6863ea0c ("x86/mm: Remove in_nmi() warning from vmalloc_fault()")
      
      but the 64-bit version was omitted.
      
      Remove the bogus warning also from 64-bit implementation of vmalloc_fault().
      Reported-by: NNicolai Stange <nstange@suse.de>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 6863ea0c ("x86/mm: Remove in_nmi() warning from vmalloc_fault()")
      Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1904240902280.9803@cbobk.fhfr.pmSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a65c88e1
    • Q
      x86/mm: Fix a crash with kmemleak_scan() · 0d02113b
      Qian Cai 提交于
      The first kmemleak_scan() call after boot would trigger the crash below
      because this callpath:
      
        kernel_init
          free_initmem
            mem_encrypt_free_decrypted_mem
              free_init_pages
      
      unmaps memory inside the .bss when DEBUG_PAGEALLOC=y.
      
      kmemleak_init() will register the .data/.bss sections and then
      kmemleak_scan() will scan those addresses and dereference them looking
      for pointer references. If free_init_pages() frees and unmaps pages in
      those sections, kmemleak_scan() will crash if referencing one of those
      addresses:
      
        BUG: unable to handle kernel paging request at ffffffffbd402000
        CPU: 12 PID: 325 Comm: kmemleak Not tainted 5.1.0-rc4+ #4
        RIP: 0010:scan_block
        Call Trace:
         scan_gray_list
         kmemleak_scan
         kmemleak_scan_thread
         kthread
         ret_from_fork
      
      Since kmemleak_free_part() is tolerant to unknown objects (not tracked
      by kmemleak), it is fine to call it from free_init_pages() even if not
      all address ranges passed to this function are known to kmemleak.
      
       [ bp: Massage. ]
      
      Fixes: b3f0907c ("x86/mm: Add .bss..decrypted section to hold shared variables")
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190423165811.36699-1-cai@lca.pw
      0d02113b
  6. 22 4月, 2019 1 次提交
    • B
      x86/fault: Make fault messages more succinct · ea2f8d60
      Borislav Petkov 提交于
      So we are going to be staring at those in the next years, let's make
      them more succinct. In particular:
      
       - change "address = " to "address: "
      
       - "-privileged" reads funny. It should be simply "kernel" or "user"
      
       - "from kernel code" reads funny too. "kernel mode" or "user mode" is
         more natural.
      
      An actual example says more than 1000 words, of course:
      
        [    0.248370] BUG: kernel NULL pointer dereference, address: 00000000000005b8
        [    0.249120] #PF: supervisor write access in kernel mode
        [    0.249717] #PF: error_code(0x0002) - not-present page
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave.hansen@linux.intel.com
      Cc: luto@kernel.org
      Cc: riel@surriel.com
      Cc: sean.j.christopherson@intel.com
      Cc: yu-cheng.yu@intel.com
      Link: http://lkml.kernel.org/r/20190421183524.GC6048@zn.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ea2f8d60
  7. 20 4月, 2019 2 次提交
    • S
      x86/fault: Decode and print #PF oops in human readable form · 18ea35c5
      Sean Christopherson 提交于
      Linus pointed out that deciphering the raw #PF error code and printing
      a more human readable message are two different things, and also that
      printing the negative cases is mostly just noise[1].  For example, the
      USER bit doesn't mean the fault originated in user code and stating
      that an oops wasn't due to a protection keys violation isn't interesting
      since an oops on a keys violation is a one-in-a-million scenario.
      
      Remove the per-bit decoding of the error code and instead print:
        - the raw error code
        - why the fault occurred
        - the effective privilege level of the access
        - the type of access
        - whether the fault originated in user code or kernel code
      
      This provides the user with the information needed to triage 99.9% of
      oopses without polluting the log with useless information or conflating
      the error_code with the CPL.
      
      Sample output:
      
          BUG: kernel NULL pointer dereference, address = 0000000000000008
          #PF: supervisor-privileged instruction fetch from kernel code
          #PF: error_code(0x0010) - not-present page
      
          BUG: unable to handle page fault for address = ffffbeef00000000
          #PF: supervisor-privileged instruction fetch from kernel code
          #PF: error_code(0x0010) - not-present page
      
          BUG: unable to handle page fault for address = ffffc90000230000
          #PF: supervisor-privileged write access from kernel code
          #PF: error_code(0x000b) - reserved bit violation
      
      [1] https://lkml.kernel.org/r/CAHk-=whk_fsnxVMvF1T2fFCaP2WrvSybABrLQCWLJyCvHw6NKA@mail.gmail.comSuggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20181221213657.27628-3-sean.j.christopherson@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      18ea35c5
    • S
      x86/fault: Reword initial BUG message for unhandled page faults · f28b11a2
      Sean Christopherson 提交于
      Reword the NULL pointer dereference case to simply state that a NULL
      pointer was dereferenced, i.e. drop "unable to handle" as that implies
      that there are instances where the kernel actual does handle NULL
      pointer dereferences, which is not true barring funky exception fixup.
      
      For the non-NULL case, replace "kernel paging request" with "page fault"
      as the kernel can technically oops on faults that originated in user
      code.  Dropping "kernel" also allows future patches to provide detailed
      information on where the fault occurred, e.g. user vs. kernel, without
      conflicting with the initial BUG message.
      
      In both cases, replace "at address=" with wording more appropriate to
      the oops, as "at" may be interpreted as stating that the address is the
      RIP of the instruction that faulted.
      
      Last, and probably least, further qualify the NULL-pointer path by
      checking that the fault actually originated in kernel code.  It's
      technically possible for userspace to map address 0, and not printing
      a super specific message is the least of our worries if the kernel does
      manage to oops on an actual NULL pointer dereference from userspace.
      
      Before:
          BUG: unable to handle kernel NULL pointer dereference at ffffbeef00000000
          BUG: unable to handle kernel paging request at ffffbeef00000000
      
      After:
          BUG: kernel NULL pointer dereference, address = 0000000000000008
          BUG: unable to handle page fault for address = ffffbeef00000000
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20181221213657.27628-2-sean.j.christopherson@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f28b11a2
  8. 18 4月, 2019 2 次提交
    • B
      x86/mm/KASLR: Fix the size of the direct mapping section · ec393710
      Baoquan He 提交于
      kernel_randomize_memory() uses __PHYSICAL_MASK_SHIFT to calculate
      the maximum amount of system RAM supported. The size of the direct
      mapping section is obtained from the smaller one of the below two
      values:
      
        (actual system RAM size + padding size) vs (max system RAM size supported)
      
      This calculation is wrong since commit
      
        b83ce5ee ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52").
      
      In it, __PHYSICAL_MASK_SHIFT was changed to be 52, regardless of whether
      the kernel is using 4-level or 5-level page tables. Thus, it will always
      use 4 PB as the maximum amount of system RAM, even in 4-level paging
      mode where it should actually be 64 TB.
      
      Thus, the size of the direct mapping section will always
      be the sum of the actual system RAM size plus the padding size.
      
      Even when the amount of system RAM is 64 TB, the following layout will
      still be used. Obviously KALSR will be weakened significantly.
      
         |____|_______actual RAM_______|_padding_|______the rest_______|
         0            64TB                                            ~120TB
      
      Instead, it should be like this:
      
         |____|_______actual RAM_______|_________the rest______________|
         0            64TB                                            ~120TB
      
      The size of padding region is controlled by
      CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING, which is 10 TB by default.
      
      The above issue only exists when
      CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING is set to a non-zero value,
      which is the case when CONFIG_MEMORY_HOTPLUG is enabled. Otherwise,
      using __PHYSICAL_MASK_SHIFT doesn't affect KASLR.
      
      Fix it by replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS.
      
       [ bp: Massage commit message. ]
      
      Fixes: b83ce5ee ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Garnier <thgarnie@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: frank.ramsay@hpe.com
      Cc: herbert@gondor.apana.org.au
      Cc: kirill@shutemov.name
      Cc: mike.travis@hpe.com
      Cc: thgarnie@google.com
      Cc: x86-ml <x86@kernel.org>
      Cc: yamada.masahiro@socionext.com
      Link: https://lkml.kernel.org/r/20190417083536.GE7065@MiWiFi-R3L-srv
      ec393710
    • J
      x86/speculation: Support 'mitigations=' cmdline option · d68be4c4
      Josh Poimboeuf 提交于
      Configure x86 runtime CPU speculation bug mitigations in accordance with
      the 'mitigations=' cmdline option.  This affects Meltdown, Spectre v2,
      Speculative Store Bypass, and L1TF.
      
      The default behavior is unchanged.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86)
      Reviewed-by: NJiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux-s390@vger.kernel.org
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-arch@vger.kernel.org
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Phil Auld <pauld@redhat.com>
      Link: https://lkml.kernel.org/r/6616d0ae169308516cfdf5216bedd169f8a8291b.1555085500.git.jpoimboe@redhat.com
      d68be4c4
  9. 17 4月, 2019 7 次提交
    • T
      x86/exceptions: Split debug IST stack · 2a594d4c
      Thomas Gleixner 提交于
      The debug IST stack is actually two separate debug stacks to handle #DB
      recursion. This is required because the CPU starts always at top of stack
      on exception entry, which means on #DB recursion the second #DB would
      overwrite the stack of the first.
      
      The low level entry code therefore adjusts the top of stack on entry so a
      secondary #DB starts from a different stack page. But the stack pages are
      adjacent without a guard page between them.
      
      Split the debug stack into 3 stacks which are separated by guard pages. The
      3rd stack is never mapped into the cpu_entry_area and is only there to
      catch triple #DB nesting:
      
            --- top of DB_stack	<- Initial stack
            --- end of DB_stack
            	  guard page
      
            --- top of DB1_stack	<- Top of stack after entering first #DB
            --- end of DB1_stack
            	  guard page
      
            --- top of DB2_stack	<- Top of stack after entering second #DB
            --- end of DB2_stack
            	  guard page
      
      If DB2 would not act as the final guard hole, a second #DB would point the
      top of #DB stack to the stack below #DB1 which would be valid and not catch
      the not so desired triple nesting.
      
      The backing store does not allocate any memory for DB2 and its guard page
      as it is not going to be mapped into the cpu_entry_area.
      
       - Adjust the low level entry code so it adjusts top of #DB with the offset
         between the stacks instead of exception stack size.
      
       - Make the dumpstack code aware of the new stacks.
      
       - Adjust the in_debug_stack() implementation and move it into the NMI code
         where it belongs. As this is NMI hotpath code, it just checks the full
         area between top of DB_stack and bottom of DB1_stack without checking
         for the guard page. That's correct because the NMI cannot hit a
         stackpointer pointing to the guard page between DB and DB1 stack.  Even
         if it would, then the NMI operation still is unaffected, but the resume
         of the debug exception on the topmost DB stack will crash by touching
         the guard page.
      
        [ bp: Make exception_stack_names static const char * const ]
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: linux-doc@vger.kernel.org
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160145.439944544@linutronix.de
      2a594d4c
    • T
      x86/traps: Use cpu_entry_area instead of orig_ist · d876b673
      Thomas Gleixner 提交于
      The orig_ist[] array is a shadow copy of the IST array in the TSS. The
      reason why it exists is that older kernels used two TSS variants with
      different pointers into the debug stack. orig_ist[] contains the real
      starting points.
      
      There is no point anymore to do so because the same information can be
      retrieved using the base address of the cpu entry area mapping and the
      offsets of the various exception stacks.
      
      No functional change. Preparation for removing orig_ist.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.784487230@linutronix.de
      d876b673
    • T
      x86/cpu_entry_area: Provide exception stack accessor · 7623f37e
      Thomas Gleixner 提交于
      Store a pointer to the per cpu entry area exception stack mappings to allow
      fast retrieval.
      
      Required for converting various places from using the shadow IST array to
      directly doing address calculations on the actual mapping address.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.680960459@linutronix.de
      7623f37e
    • T
      x86/cpu_entry_area: Prepare for IST guard pages · a4af767a
      Thomas Gleixner 提交于
      To allow guard pages between the IST stacks each stack needs to be
      mapped individually.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.592691557@linutronix.de
      a4af767a
    • T
      x86/exceptions: Add structs for exception stacks · 019b17b3
      Thomas Gleixner 提交于
      At the moment everything assumes a full linear mapping of the various
      exception stacks. Adding guard pages to the cpu entry area mapping of the
      exception stacks will break that assumption.
      
      As a preparatory step convert both the real storage and the effective
      mapping in the cpu entry area from character arrays to structures.
      
      To ensure that both arrays have the same ordering and the same size of the
      individual stacks fill the members with a macro. The guard size is the only
      difference between the two resulting structures. For now both have guard
      size 0 until the preparation of all usage sites is done.
      
      Provide a couple of helper macros which are used in the following
      conversions.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.506807893@linutronix.de
      019b17b3
    • T
      x86/cpu_entry_area: Cleanup setup functions · 881a463c
      Thomas Gleixner 提交于
      No point in retrieving the entry area pointer over and over. Do it once
      and use unsigned int for 'cpu' everywhere.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.419653165@linutronix.de
      881a463c
    • T
      x86/exceptions: Make IST index zero based · 8f34c5b5
      Thomas Gleixner 提交于
      The defines for the exception stack (IST) array in the TSS are using the
      SDM convention IST1 - IST7. That causes all sorts of code to subtract 1 for
      array indices related to IST. That's confusing at best and does not provide
      any value.
      
      Make the indices zero based and fixup the usage sites. The only code which
      needs to adjust the 0 based index is the interrupt descriptor setup which
      needs to add 1 now.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: linux-doc@vger.kernel.org
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.331772825@linutronix.de
      8f34c5b5
  10. 16 4月, 2019 2 次提交
  11. 13 4月, 2019 1 次提交
    • S
      x86/pkeys: Add PKRU value to init_fpstate · a5eff725
      Sebastian Andrzej Siewior 提交于
      The task's initial PKRU value is set partly for fpu__clear()/
      copy_init_pkru_to_fpregs(). It is not part of init_fpstate.xsave and
      instead it is set explicitly.
      
      If the user removes the PKRU state from XSAVE in the signal handler then
      __fpu__restore_sig() will restore the missing bits from `init_fpstate'
      and initialize the PKRU value to 0.
      
      Add the `init_pkru_value' to `init_fpstate' so it is set to the init
      value in such a case.
      
      In theory copy_init_pkru_to_fpregs() could be removed because restoring
      the PKRU at return-to-userland should be enough.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190403164156.19645-28-bigeasy@linutronix.de
      a5eff725
  12. 11 4月, 2019 3 次提交
    • R
      x86/fpu: Eager switch PKRU state · 0cecca9d
      Rik van Riel 提交于
      While most of a task's FPU state is only needed in user space, the
      protection keys need to be in place immediately after a context switch.
      
      The reason is that any access to userspace memory while running in
      kernel mode also needs to abide by the memory permissions specified in
      the protection keys.
      
      The "eager switch" is a preparation for loading the FPU state on return
      to userland. Instead of decoupling PKRU state from xstate, update PKRU
      within xstate on write operations by the kernel.
      
      For user tasks the PKRU should be always read from the xsave area and it
      should not change anything because the PKRU value was loaded as part of
      FPU restore.
      
      For kernel threads the default "init_pkru_value" will be written. Before
      this commit, the kernel thread would end up with a random value which it
      inherited from the previous user task.
      
       [ bigeasy: save pkru to xstate, no cache, don't use __raw_xsave_addr() ]
      
       [ bp: update commit message, sort headers properly in asm/fpu/xstate.h ]
      Signed-off-by: NRik van Riel <riel@surriel.com>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aubrey Li <aubrey.li@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190403164156.19645-16-bigeasy@linutronix.de
      0cecca9d
    • S
      x86/pkeys: Don't check if PKRU is zero before writing it · 0556cbdc
      Sebastian Andrzej Siewior 提交于
      write_pkru() checks if the current value is the same as the expected
      value. So instead of just checking if the current and new value is zero
      (and skip the write in such a case) we can benefit from that.
      
      Remove the zero check of PKRU, __write_pkru() provides such a check now.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190403164156.19645-15-bigeasy@linutronix.de
      0556cbdc
    • S
      x86/fpu: Use a feature number instead of mask in two more helpers · abd16d68
      Sebastian Andrzej Siewior 提交于
      After changing the argument of __raw_xsave_addr() from a mask to
      number Dave suggested to check if it makes sense to do the same for
      get_xsave_addr(). As it turns out it does.
      
      Only get_xsave_addr() needs the mask to check if the requested feature
      is part of what is supported/saved and then uses the number again. The
      shift operation is cheaper compared to fls64() (find last bit set).
      Also, the feature number uses less opcode space compared to the mask. :)
      
      Make the get_xsave_addr() argument a xfeature number instead of a mask
      and fix up its callers.
      
      Furthermore, use xfeature_nr and xfeature_mask consistently.
      
      This results in the following changes to the kvm code:
      
        feature -> xfeature_mask
        index -> xfeature_nr
      Suggested-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Siarhei Liakh <Siarhei.Liakh@concurrent-rt.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190403164156.19645-12-bigeasy@linutronix.de
      abd16d68
  13. 10 4月, 2019 1 次提交
    • S
      x86/fpu: Remove fpu->initialized · 2722146e
      Sebastian Andrzej Siewior 提交于
      The struct fpu.initialized member is always set to one for user tasks
      and zero for kernel tasks. This avoids saving/restoring the FPU
      registers for kernel threads.
      
      The ->initialized = 0 case for user tasks has been removed in previous
      changes, for instance, by doing an explicit unconditional init at fork()
      time for FPU-less systems which was otherwise delayed until the emulated
      opcode.
      
      The context switch code (switch_fpu_prepare() + switch_fpu_finish())
      can't unconditionally save/restore registers for kernel threads. Not
      only would it slow down the switch but also load a zeroed xcomp_bv for
      XSAVES.
      
      For kernel_fpu_begin() (+end) the situation is similar: EFI with runtime
      services uses this before alternatives_patched is true. Which means that
      this function is used too early and it wasn't the case before.
      
      For those two cases, use current->mm to distinguish between user and
      kernel thread. For kernel_fpu_begin() skip save/restore of the FPU
      registers.
      
      During the context switch into a kernel thread don't do anything. There
      is no reason to save the FPU state of a kernel thread.
      
      The reordering in __switch_to() is important because the current()
      pointer needs to be valid before switch_fpu_finish() is invoked so ->mm
      is seen of the new task instead the old one.
      
      N.B.: fpu__save() doesn't need to check ->mm because it is called by
      user tasks only.
      
       [ bp: Massage. ]
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aubrey Li <aubrey.li@intel.com>
      Cc: Babu Moger <Babu.Moger@amd.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dmitry Safonov <dima@arista.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190403164156.19645-8-bigeasy@linutronix.de
      2722146e
  14. 09 4月, 2019 1 次提交
    • S
      treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively · d75f773c
      Sakari Ailus 提交于
      %pF and %pf are functionally equivalent to %pS and %ps conversion
      specifiers. The former are deprecated, therefore switch the current users
      to use the preferred variant.
      
      The changes have been produced by the following command:
      
      	git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
      	while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done
      
      And verifying the result.
      
      Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: sparclinux@vger.kernel.org
      Cc: linux-um@lists.infradead.org
      Cc: xen-devel@lists.xenproject.org
      Cc: linux-acpi@vger.kernel.org
      Cc: linux-pm@vger.kernel.org
      Cc: drbd-dev@lists.linbit.com
      Cc: linux-block@vger.kernel.org
      Cc: linux-mmc@vger.kernel.org
      Cc: linux-nvdimm@lists.01.org
      Cc: linux-pci@vger.kernel.org
      Cc: linux-scsi@vger.kernel.org
      Cc: linux-btrfs@vger.kernel.org
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Cc: linux-mm@kvack.org
      Cc: ceph-devel@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Signed-off-by: NSakari Ailus <sakari.ailus@linux.intel.com>
      Acked-by: David Sterba <dsterba@suse.com> (for btrfs)
      Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c)
      Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci)
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      d75f773c
  15. 08 4月, 2019 1 次提交
  16. 06 4月, 2019 2 次提交
  17. 28 3月, 2019 1 次提交
    • R
      x86/mm: Don't exceed the valid physical address space · 92c77f7c
      Ralph Campbell 提交于
      valid_phys_addr_range() is used to sanity check the physical address range
      of an operation, e.g., access to /dev/mem. It uses __pa(high_memory)
      internally.
      
      If memory is populated at the end of the physical address space, then
      __pa(high_memory) is outside of the physical address space because:
      
         high_memory = (void *)__va(max_pfn * PAGE_SIZE - 1) + 1;
      
      For the comparison in valid_phys_addr_range() this is not an issue, but if
      CONFIG_DEBUG_VIRTUAL is enabled, __pa() maps to __phys_addr(), which
      verifies that the resulting physical address is within the valid physical
      address space of the CPU. So in the case that memory is populated at the
      end of the physical address space, this is not true and triggers a
      VIRTUAL_BUG_ON().
      
      Use __pa(high_memory - 1) to prevent the conversion from going beyond
      the end of valid physical addresses.
      
      Fixes: be62a320 ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses")
      Signed-off-by: NRalph Campbell <rcampbell@nvidia.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Craig Bergstrom <craigb@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sander Eikelenboom <linux@eikelenboom.it>
      Cc: Sean Young <sean@mess.org>
      
      Link: https://lkml.kernel.org/r/20190326001817.15413-2-rcampbell@nvidia.com
      92c77f7c
  18. 22 3月, 2019 1 次提交
    • V
      x86/mm/pti: Make local symbols static · 4fe64a62
      Valdis Kletnieks 提交于
      With 'make C=2 W=1', sparse and gcc both complain:
      
        CHECK   arch/x86/mm/pti.c
      arch/x86/mm/pti.c:84:3: warning: symbol 'pti_mode' was not declared. Should it be static?
      arch/x86/mm/pti.c:605:6: warning: symbol 'pti_set_kernel_image_nonglobal' was not declared. Should it be static?
        CC      arch/x86/mm/pti.o
      arch/x86/mm/pti.c:605:6: warning: no previous prototype for 'pti_set_kernel_image_nonglobal' [-Wmissing-prototypes]
        605 | void pti_set_kernel_image_nonglobal(void)
            |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      pti_set_kernel_image_nonglobal() is only used locally. 'pti_mode' exists in
      drivers/hwtracing/intel_th/pti.c as well, but it's a completely unrelated
      local (static) symbol.
      
      Make both static.
      Signed-off-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/27680.1552376873@turing-police
      
      4fe64a62
  19. 13 3月, 2019 2 次提交
    • M
      memblock: drop memblock_alloc_*_nopanic() variants · 26fb3dae
      Mike Rapoport 提交于
      As all the memblock allocation functions return NULL in case of error
      rather than panic(), the duplicates with _nopanic suffix can be removed.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-22-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: Petr Mladek <pmladek@suse.com>		[printk]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26fb3dae
    • M
      memblock: drop __memblock_alloc_base() · 42b46aef
      Mike Rapoport 提交于
      The __memblock_alloc_base() function tries to allocate a memory up to
      the limit specified by its max_addr parameter.  Depending on the value
      of this parameter, the __memblock_alloc_base() can is replaced with the
      appropriate memblock_phys_alloc*() variant.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-9-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NRob Herring <robh@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42b46aef
  20. 08 3月, 2019 1 次提交
  21. 07 3月, 2019 2 次提交
  22. 06 3月, 2019 1 次提交
  23. 05 3月, 2019 1 次提交
    • L
      x86-64: add warning for non-canonical user access address dereferences · 00c42373
      Linus Torvalds 提交于
      This adds a warning (once) for any kernel dereference that has a user
      exception handler, but accesses a non-canonical address.  It basically
      is a simpler - and more limited - version of commit 9da3f2b7
      ("x86/fault: BUG() when uaccess helpers fault on kernel addresses") that
      got reverted.
      
      Note that unlike that original commit, this only causes a warning,
      because there are real situations where we currently can do this
      (notably speculative argument fetching for uprobes etc).  Also, unlike
      that original commit, this _only_ triggers for #GP accesses, so the
      cases of valid kernel pointers that cross into a non-mapped page aren't
      affected.
      
      The intent of this is two-fold:
      
       - the uprobe/tracing accesses really do need to be more careful. In
         particular, from a portability standpoint it's just wrong to think
         that "a pointer is a pointer", and use the same logic for any random
         pointer value you find on the stack. It may _work_ on x86-64, but it
         doesn't necessarily work on other architectures (where the same
         pointer value can be either a kernel pointer _or_ a user pointer, and
         you really need to be much more careful in how you try to access it)
      
         The warning can hopefully end up being a reminder that just any
         random pointer access won't do.
      
       - Kees in particular wanted a way to actually report invalid uses of
         wild pointers to user space accessors, instead of just silently
         failing them. Automated fuzzers want a way to get reports if the
         kernel ever uses invalid values that the fuzzer fed it.
      
         The non-canonical address range is a fair chunk of the address space,
         and with this you can teach syzkaller to feed in invalid pointer
         values and find cases where we do not properly validate user
         addresses (possibly due to bad uses of "set_fs()").
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00c42373