1. 30 4月, 2019 2 次提交
  2. 26 4月, 2019 1 次提交
    • N
      x86/mm/tlb: Remove 'struct flush_tlb_info' from the stack · 3db6d5a5
      Nadav Amit 提交于
      Move flush_tlb_info variables off the stack. This allows to align
      flush_tlb_info to cache-line and avoid potentially unnecessary cache
      line movements. It also allows to have a fixed virtual-to-physical
      translation of the variables, which reduces TLB misses.
      
      Use per-CPU struct for flush_tlb_mm_range() and
      flush_tlb_kernel_range(). Add debug assertions to ensure there are
      no nested TLB flushes that might overwrite the per-CPU data. For
      arch_tlbbatch_flush() use a const struct.
      
      Results when running a microbenchmarks that performs 10^6 MADV_DONTEED
      operations and touching a page, in which 3 additional threads run a
      busy-wait loop (5 runs, PTI and retpolines are turned off):
      
      			base		off-stack
      			----		---------
        avg (usec/op)		1.629		1.570	(-3%)
        stddev		0.014		0.009
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20190425230143.7008-1-namit@vmware.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3db6d5a5
  3. 24 4月, 2019 2 次提交
    • J
      x86/mm: Remove in_nmi() warning from 64-bit implementation of vmalloc_fault() · a65c88e1
      Jiri Kosina 提交于
      In-NMI warnings have been added to vmalloc_fault() via:
      
        ebc8827f ("x86: Barf when vmalloc and kmemcheck faults happen in NMI")
      
      back in the time when our NMI entry code could not cope with nested NMIs.
      
      These days, it's perfectly fine to take a fault in NMI context and we
      don't have to care about the fact that IRET from the fault handler might
      cause NMI nesting.
      
      This warning has already been removed from 32-bit implementation of
      vmalloc_fault() in:
      
        6863ea0c ("x86/mm: Remove in_nmi() warning from vmalloc_fault()")
      
      but the 64-bit version was omitted.
      
      Remove the bogus warning also from 64-bit implementation of vmalloc_fault().
      Reported-by: NNicolai Stange <nstange@suse.de>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 6863ea0c ("x86/mm: Remove in_nmi() warning from vmalloc_fault()")
      Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1904240902280.9803@cbobk.fhfr.pmSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a65c88e1
    • Q
      x86/mm: Fix a crash with kmemleak_scan() · 0d02113b
      Qian Cai 提交于
      The first kmemleak_scan() call after boot would trigger the crash below
      because this callpath:
      
        kernel_init
          free_initmem
            mem_encrypt_free_decrypted_mem
              free_init_pages
      
      unmaps memory inside the .bss when DEBUG_PAGEALLOC=y.
      
      kmemleak_init() will register the .data/.bss sections and then
      kmemleak_scan() will scan those addresses and dereference them looking
      for pointer references. If free_init_pages() frees and unmaps pages in
      those sections, kmemleak_scan() will crash if referencing one of those
      addresses:
      
        BUG: unable to handle kernel paging request at ffffffffbd402000
        CPU: 12 PID: 325 Comm: kmemleak Not tainted 5.1.0-rc4+ #4
        RIP: 0010:scan_block
        Call Trace:
         scan_gray_list
         kmemleak_scan
         kmemleak_scan_thread
         kthread
         ret_from_fork
      
      Since kmemleak_free_part() is tolerant to unknown objects (not tracked
      by kmemleak), it is fine to call it from free_init_pages() even if not
      all address ranges passed to this function are known to kmemleak.
      
       [ bp: Massage. ]
      
      Fixes: b3f0907c ("x86/mm: Add .bss..decrypted section to hold shared variables")
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190423165811.36699-1-cai@lca.pw
      0d02113b
  4. 22 4月, 2019 1 次提交
    • B
      x86/fault: Make fault messages more succinct · ea2f8d60
      Borislav Petkov 提交于
      So we are going to be staring at those in the next years, let's make
      them more succinct. In particular:
      
       - change "address = " to "address: "
      
       - "-privileged" reads funny. It should be simply "kernel" or "user"
      
       - "from kernel code" reads funny too. "kernel mode" or "user mode" is
         more natural.
      
      An actual example says more than 1000 words, of course:
      
        [    0.248370] BUG: kernel NULL pointer dereference, address: 00000000000005b8
        [    0.249120] #PF: supervisor write access in kernel mode
        [    0.249717] #PF: error_code(0x0002) - not-present page
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave.hansen@linux.intel.com
      Cc: luto@kernel.org
      Cc: riel@surriel.com
      Cc: sean.j.christopherson@intel.com
      Cc: yu-cheng.yu@intel.com
      Link: http://lkml.kernel.org/r/20190421183524.GC6048@zn.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ea2f8d60
  5. 20 4月, 2019 2 次提交
    • S
      x86/fault: Decode and print #PF oops in human readable form · 18ea35c5
      Sean Christopherson 提交于
      Linus pointed out that deciphering the raw #PF error code and printing
      a more human readable message are two different things, and also that
      printing the negative cases is mostly just noise[1].  For example, the
      USER bit doesn't mean the fault originated in user code and stating
      that an oops wasn't due to a protection keys violation isn't interesting
      since an oops on a keys violation is a one-in-a-million scenario.
      
      Remove the per-bit decoding of the error code and instead print:
        - the raw error code
        - why the fault occurred
        - the effective privilege level of the access
        - the type of access
        - whether the fault originated in user code or kernel code
      
      This provides the user with the information needed to triage 99.9% of
      oopses without polluting the log with useless information or conflating
      the error_code with the CPL.
      
      Sample output:
      
          BUG: kernel NULL pointer dereference, address = 0000000000000008
          #PF: supervisor-privileged instruction fetch from kernel code
          #PF: error_code(0x0010) - not-present page
      
          BUG: unable to handle page fault for address = ffffbeef00000000
          #PF: supervisor-privileged instruction fetch from kernel code
          #PF: error_code(0x0010) - not-present page
      
          BUG: unable to handle page fault for address = ffffc90000230000
          #PF: supervisor-privileged write access from kernel code
          #PF: error_code(0x000b) - reserved bit violation
      
      [1] https://lkml.kernel.org/r/CAHk-=whk_fsnxVMvF1T2fFCaP2WrvSybABrLQCWLJyCvHw6NKA@mail.gmail.comSuggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20181221213657.27628-3-sean.j.christopherson@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      18ea35c5
    • S
      x86/fault: Reword initial BUG message for unhandled page faults · f28b11a2
      Sean Christopherson 提交于
      Reword the NULL pointer dereference case to simply state that a NULL
      pointer was dereferenced, i.e. drop "unable to handle" as that implies
      that there are instances where the kernel actual does handle NULL
      pointer dereferences, which is not true barring funky exception fixup.
      
      For the non-NULL case, replace "kernel paging request" with "page fault"
      as the kernel can technically oops on faults that originated in user
      code.  Dropping "kernel" also allows future patches to provide detailed
      information on where the fault occurred, e.g. user vs. kernel, without
      conflicting with the initial BUG message.
      
      In both cases, replace "at address=" with wording more appropriate to
      the oops, as "at" may be interpreted as stating that the address is the
      RIP of the instruction that faulted.
      
      Last, and probably least, further qualify the NULL-pointer path by
      checking that the fault actually originated in kernel code.  It's
      technically possible for userspace to map address 0, and not printing
      a super specific message is the least of our worries if the kernel does
      manage to oops on an actual NULL pointer dereference from userspace.
      
      Before:
          BUG: unable to handle kernel NULL pointer dereference at ffffbeef00000000
          BUG: unable to handle kernel paging request at ffffbeef00000000
      
      After:
          BUG: kernel NULL pointer dereference, address = 0000000000000008
          BUG: unable to handle page fault for address = ffffbeef00000000
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20181221213657.27628-2-sean.j.christopherson@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f28b11a2
  6. 18 4月, 2019 1 次提交
    • B
      x86/mm/KASLR: Fix the size of the direct mapping section · ec393710
      Baoquan He 提交于
      kernel_randomize_memory() uses __PHYSICAL_MASK_SHIFT to calculate
      the maximum amount of system RAM supported. The size of the direct
      mapping section is obtained from the smaller one of the below two
      values:
      
        (actual system RAM size + padding size) vs (max system RAM size supported)
      
      This calculation is wrong since commit
      
        b83ce5ee ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52").
      
      In it, __PHYSICAL_MASK_SHIFT was changed to be 52, regardless of whether
      the kernel is using 4-level or 5-level page tables. Thus, it will always
      use 4 PB as the maximum amount of system RAM, even in 4-level paging
      mode where it should actually be 64 TB.
      
      Thus, the size of the direct mapping section will always
      be the sum of the actual system RAM size plus the padding size.
      
      Even when the amount of system RAM is 64 TB, the following layout will
      still be used. Obviously KALSR will be weakened significantly.
      
         |____|_______actual RAM_______|_padding_|______the rest_______|
         0            64TB                                            ~120TB
      
      Instead, it should be like this:
      
         |____|_______actual RAM_______|_________the rest______________|
         0            64TB                                            ~120TB
      
      The size of padding region is controlled by
      CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING, which is 10 TB by default.
      
      The above issue only exists when
      CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING is set to a non-zero value,
      which is the case when CONFIG_MEMORY_HOTPLUG is enabled. Otherwise,
      using __PHYSICAL_MASK_SHIFT doesn't affect KASLR.
      
      Fix it by replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS.
      
       [ bp: Massage commit message. ]
      
      Fixes: b83ce5ee ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Garnier <thgarnie@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: frank.ramsay@hpe.com
      Cc: herbert@gondor.apana.org.au
      Cc: kirill@shutemov.name
      Cc: mike.travis@hpe.com
      Cc: thgarnie@google.com
      Cc: x86-ml <x86@kernel.org>
      Cc: yamada.masahiro@socionext.com
      Link: https://lkml.kernel.org/r/20190417083536.GE7065@MiWiFi-R3L-srv
      ec393710
  7. 16 4月, 2019 2 次提交
  8. 06 4月, 2019 2 次提交
  9. 28 3月, 2019 1 次提交
    • R
      x86/mm: Don't exceed the valid physical address space · 92c77f7c
      Ralph Campbell 提交于
      valid_phys_addr_range() is used to sanity check the physical address range
      of an operation, e.g., access to /dev/mem. It uses __pa(high_memory)
      internally.
      
      If memory is populated at the end of the physical address space, then
      __pa(high_memory) is outside of the physical address space because:
      
         high_memory = (void *)__va(max_pfn * PAGE_SIZE - 1) + 1;
      
      For the comparison in valid_phys_addr_range() this is not an issue, but if
      CONFIG_DEBUG_VIRTUAL is enabled, __pa() maps to __phys_addr(), which
      verifies that the resulting physical address is within the valid physical
      address space of the CPU. So in the case that memory is populated at the
      end of the physical address space, this is not true and triggers a
      VIRTUAL_BUG_ON().
      
      Use __pa(high_memory - 1) to prevent the conversion from going beyond
      the end of valid physical addresses.
      
      Fixes: be62a320 ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses")
      Signed-off-by: NRalph Campbell <rcampbell@nvidia.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Craig Bergstrom <craigb@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sander Eikelenboom <linux@eikelenboom.it>
      Cc: Sean Young <sean@mess.org>
      
      Link: https://lkml.kernel.org/r/20190326001817.15413-2-rcampbell@nvidia.com
      92c77f7c
  10. 22 3月, 2019 1 次提交
    • V
      x86/mm/pti: Make local symbols static · 4fe64a62
      Valdis Kletnieks 提交于
      With 'make C=2 W=1', sparse and gcc both complain:
      
        CHECK   arch/x86/mm/pti.c
      arch/x86/mm/pti.c:84:3: warning: symbol 'pti_mode' was not declared. Should it be static?
      arch/x86/mm/pti.c:605:6: warning: symbol 'pti_set_kernel_image_nonglobal' was not declared. Should it be static?
        CC      arch/x86/mm/pti.o
      arch/x86/mm/pti.c:605:6: warning: no previous prototype for 'pti_set_kernel_image_nonglobal' [-Wmissing-prototypes]
        605 | void pti_set_kernel_image_nonglobal(void)
            |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      pti_set_kernel_image_nonglobal() is only used locally. 'pti_mode' exists in
      drivers/hwtracing/intel_th/pti.c as well, but it's a completely unrelated
      local (static) symbol.
      
      Make both static.
      Signed-off-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/27680.1552376873@turing-police
      
      4fe64a62
  11. 13 3月, 2019 2 次提交
    • M
      memblock: drop memblock_alloc_*_nopanic() variants · 26fb3dae
      Mike Rapoport 提交于
      As all the memblock allocation functions return NULL in case of error
      rather than panic(), the duplicates with _nopanic suffix can be removed.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-22-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: Petr Mladek <pmladek@suse.com>		[printk]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26fb3dae
    • M
      memblock: drop __memblock_alloc_base() · 42b46aef
      Mike Rapoport 提交于
      The __memblock_alloc_base() function tries to allocate a memory up to
      the limit specified by its max_addr parameter.  Depending on the value
      of this parameter, the __memblock_alloc_base() can is replaced with the
      appropriate memblock_phys_alloc*() variant.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-9-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NRob Herring <robh@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42b46aef
  12. 08 3月, 2019 1 次提交
  13. 07 3月, 2019 2 次提交
  14. 06 3月, 2019 1 次提交
  15. 05 3月, 2019 1 次提交
    • L
      x86-64: add warning for non-canonical user access address dereferences · 00c42373
      Linus Torvalds 提交于
      This adds a warning (once) for any kernel dereference that has a user
      exception handler, but accesses a non-canonical address.  It basically
      is a simpler - and more limited - version of commit 9da3f2b7
      ("x86/fault: BUG() when uaccess helpers fault on kernel addresses") that
      got reverted.
      
      Note that unlike that original commit, this only causes a warning,
      because there are real situations where we currently can do this
      (notably speculative argument fetching for uprobes etc).  Also, unlike
      that original commit, this _only_ triggers for #GP accesses, so the
      cases of valid kernel pointers that cross into a non-mapped page aren't
      affected.
      
      The intent of this is two-fold:
      
       - the uprobe/tracing accesses really do need to be more careful. In
         particular, from a portability standpoint it's just wrong to think
         that "a pointer is a pointer", and use the same logic for any random
         pointer value you find on the stack. It may _work_ on x86-64, but it
         doesn't necessarily work on other architectures (where the same
         pointer value can be either a kernel pointer _or_ a user pointer, and
         you really need to be much more careful in how you try to access it)
      
         The warning can hopefully end up being a reminder that just any
         random pointer access won't do.
      
       - Kees in particular wanted a way to actually report invalid uses of
         wild pointers to user space accessors, instead of just silently
         failing them. Automated fuzzers want a way to get reports if the
         kernel ever uses invalid values that the fuzzer fed it.
      
         The non-canonical address range is a fair chunk of the address space,
         and with this you can teach syzkaller to feed in invalid pointer
         values and find cases where we do not properly validate user
         addresses (possibly due to bad uses of "set_fs()").
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00c42373
  16. 26 2月, 2019 1 次提交
    • L
      Revert "x86/fault: BUG() when uaccess helpers fault on kernel addresses" · 53a41cb7
      Linus Torvalds 提交于
      This reverts commit 9da3f2b7.
      
      It was well-intentioned, but wrong.  Overriding the exception tables for
      instructions for random reasons is just wrong, and that is what the new
      code did.
      
      It caused problems for tracing, and it caused problems for strncpy_from_user(),
      because the new checks made perfectly valid use cases break, rather than
      catch things that did bad things.
      
      Unchecked user space accesses are a problem, but that's not a reason to
      add invalid checks that then people have to work around with silly flags
      (in this case, that 'kernel_uaccess_faults_ok' flag, which is just an
      odd way to say "this commit was wrong" and was sprinked into random
      places to hide the wrongness).
      
      The real fix to unchecked user space accesses is to get rid of the
      special "let's not check __get_user() and __put_user() at all" logic.
      Make __{get|put}_user() be just aliases to the regular {get|put}_user()
      functions, and make it impossible to access user space without having
      the proper checks in places.
      
      The raison d'être of the special double-underscore versions used to be
      that the range check was expensive, and if you did multiple user
      accesses, you'd do the range check up front (like the signal frame
      handling code, for example).  But SMAP (on x86) and PAN (on ARM) have
      made that optimization pointless, because the _real_ expense is the "set
      CPU flag to allow user space access".
      
      Do let's not break the valid cases to catch invalid cases that shouldn't
      even exist.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Tobin C. Harding <tobin@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Jann Horn <jannh@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53a41cb7
  17. 19 2月, 2019 1 次提交
  18. 15 2月, 2019 1 次提交
  19. 08 2月, 2019 2 次提交
  20. 04 2月, 2019 1 次提交
    • A
      x86: Make ARCH_USE_MEMREMAP_PROT a generic Kconfig symbol · ce9084ba
      Ard Biesheuvel 提交于
      Turn ARCH_USE_MEMREMAP_PROT into a generic Kconfig symbol, and fix the
      dependency expression to reflect that AMD_MEM_ENCRYPT depends on it,
      instead of the other way around. This will permit ARCH_USE_MEMREMAP_PROT
      to be selected by other architectures.
      
      Note that the encryption related early memremap routines in
      arch/x86/mm/ioremap.c cannot be built for 32-bit x86 without triggering
      the following warning:
      
           arch/x86//mm/ioremap.c: In function 'early_memremap_encrypted':
        >> arch/x86/include/asm/pgtable_types.h:193:27: warning: conversion from
                           'long long unsigned int' to 'long unsigned int' changes
                           value from '9223372036854776163' to '355' [-Woverflow]
            #define __PAGE_KERNEL_ENC (__PAGE_KERNEL | _PAGE_ENC)
                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~
           arch/x86//mm/ioremap.c:713:46: note: in expansion of macro '__PAGE_KERNEL_ENC'
             return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_ENC);
      
      which essentially means they are 64-bit only anyway. However, we cannot
      make them dependent on CONFIG_ARCH_HAS_MEM_ENCRYPT, since that is always
      defined, even for i386 (and changing that results in a slew of build errors)
      
      So instead, build those routines only if CONFIG_AMD_MEM_ENCRYPT is
      defined.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Alexander Graf <agraf@suse.de>
      Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Jeffrey Hugo <jhugo@codeaurora.org>
      Cc: Lee Jones <lee.jones@linaro.org>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190202094119.13230-9-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ce9084ba
  21. 30 1月, 2019 2 次提交
  22. 15 1月, 2019 1 次提交
  23. 05 1月, 2019 1 次提交
    • J
      mm: treewide: remove unused address argument from pte_alloc functions · 4cf58924
      Joel Fernandes (Google) 提交于
      Patch series "Add support for fast mremap".
      
      This series speeds up the mremap(2) syscall by copying page tables at
      the PMD level even for non-THP systems.  There is concern that the extra
      'address' argument that mremap passes to pte_alloc may do something
      subtle architecture related in the future that may make the scheme not
      work.  Also we find that there is no point in passing the 'address' to
      pte_alloc since its unused.  This patch therefore removes this argument
      tree-wide resulting in a nice negative diff as well.  Also ensuring
      along the way that the enabled architectures do not do anything funky
      with the 'address' argument that goes unnoticed by the optimization.
      
      Build and boot tested on x86-64.  Build tested on arm64.  The config
      enablement patch for arm64 will be posted in the future after more
      testing.
      
      The changes were obtained by applying the following Coccinelle script.
      (thanks Julia for answering all Coccinelle questions!).
      Following fix ups were done manually:
      * Removal of address argument from  pte_fragment_alloc
      * Removal of pte_alloc_one_fast definitions from m68k and microblaze.
      
      // Options: --include-headers --no-includes
      // Note: I split the 'identifier fn' line, so if you are manually
      // running it, please unsplit it so it runs for you.
      
      virtual patch
      
      @pte_alloc_func_def depends on patch exists@
      identifier E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      type T2;
      @@
      
       fn(...
      - , T2 E2
       )
       { ... }
      
      @pte_alloc_func_proto_noarg depends on patch exists@
      type T1, T2, T3, T4;
      identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1, T2);
      + T3 fn(T1);
      |
      - T3 fn(T1, T2, T4);
      + T3 fn(T1, T2);
      )
      
      @pte_alloc_func_proto depends on patch exists@
      identifier E1, E2, E4;
      type T1, T2, T3, T4;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1 E1, T2 E2);
      + T3 fn(T1 E1);
      |
      - T3 fn(T1 E1, T2 E2, T4 E4);
      + T3 fn(T1 E1, T2 E2);
      )
      
      @pte_alloc_func_call depends on patch exists@
      expression E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
       fn(...
      -,  E2
       )
      
      @pte_alloc_macro depends on patch exists@
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      identifier a, b, c;
      expression e;
      position p;
      @@
      
      (
      - #define fn(a, b, c) e
      + #define fn(a, b) e
      |
      - #define fn(a, b) e
      + #define fn(a) e
      )
      
      Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
      Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Julia Lawall <Julia.Lawall@lip6.fr>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4cf58924
  24. 04 1月, 2019 1 次提交
    • L
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds 提交于
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  25. 29 12月, 2018 5 次提交
  26. 18 12月, 2018 2 次提交