1. 22 7月, 2019 1 次提交
  2. 18 7月, 2019 1 次提交
  3. 17 7月, 2019 1 次提交
    • A
      mm, kprobes: generalize and rename notify_page_fault() as kprobe_page_fault() · b98cca44
      Anshuman Khandual 提交于
      Architectures which support kprobes have very similar boilerplate around
      calling kprobe_fault_handler().  Use a helper function in kprobes.h to
      unify them, based on the x86 code.
      
      This changes the behaviour for other architectures when preemption is
      enabled.  Previously, they would have disabled preemption while calling
      the kprobe handler.  However, preemption would be disabled if this fault
      was due to a kprobe, so we know the fault was not due to a kprobe
      handler and can simply return failure.
      
      This behaviour was introduced in commit a980c0ef ("x86/kprobes:
      Refactor kprobes_fault() like kprobe_exceptions_notify()")
      
      [anshuman.khandual@arm.com: export kprobe_fault_handler()]
        Link: http://lkml.kernel.org/r/1561133358-8876-1-git-send-email-anshuman.khandual@arm.com
      Link: http://lkml.kernel.org/r/1560420444-25737-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b98cca44
  4. 28 6月, 2019 2 次提交
  5. 03 6月, 2019 1 次提交
    • E
      signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus · 318759b4
      Eric W. Biederman 提交于
      Stephen Rothwell <sfr@canb.auug.org.au> reported:
      > After merging the userns tree, today's linux-next build (i386 defconfig)
      > produced this warning:
      >
      > arch/x86/mm/fault.c: In function 'do_sigbus':
      > arch/x86/mm/fault.c:1017:22: warning: unused variable 'tsk' [-Wunused-variable]
      >   struct task_struct *tsk = current;
      >                       ^~~
      >
      > Introduced by commit
      >
      >   351b6825 ("signal: Explicitly call force_sig_fault on current")
      >
      > The remaining used of "tsk" are protected by CONFIG_MEMORY_FAILURE.
      
      So do the obvious thing and move tsk inside of CONFIG_MEMORY_FAILURE
      to prevent introducing new warnings into the build.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      318759b4
  6. 29 5月, 2019 2 次提交
    • E
      signal: Remove the task parameter from force_sig_fault · 2e1661d2
      Eric W. Biederman 提交于
      As synchronous exceptions really only make sense against the current
      task (otherwise how are you synchronous) remove the task parameter
      from from force_sig_fault to make it explicit that is what is going
      on.
      
      The two known exceptions that deliver a synchronous exception to a
      stopped ptraced task have already been changed to
      force_sig_fault_to_task.
      
      The callers have been changed with the following emacs regular expression
      (with obvious variations on the architectures that take more arguments)
      to avoid typos:
      
      force_sig_fault[(]\([^,]+\)[,]\([^,]+\)[,]\([^,]+\)[,]\W+current[)]
      ->
      force_sig_fault(\1,\2,\3)
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      2e1661d2
    • E
      signal: Explicitly call force_sig_fault on current · 351b6825
      Eric W. Biederman 提交于
      Update the calls of force_sig_fault that pass in a variable that is
      set to current earlier to explicitly use current.
      
      This is to make the next change that removes the task parameter
      from force_sig_fault easier to verify.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      351b6825
  7. 27 5月, 2019 1 次提交
  8. 24 4月, 2019 1 次提交
  9. 22 4月, 2019 1 次提交
    • B
      x86/fault: Make fault messages more succinct · ea2f8d60
      Borislav Petkov 提交于
      So we are going to be staring at those in the next years, let's make
      them more succinct. In particular:
      
       - change "address = " to "address: "
      
       - "-privileged" reads funny. It should be simply "kernel" or "user"
      
       - "from kernel code" reads funny too. "kernel mode" or "user mode" is
         more natural.
      
      An actual example says more than 1000 words, of course:
      
        [    0.248370] BUG: kernel NULL pointer dereference, address: 00000000000005b8
        [    0.249120] #PF: supervisor write access in kernel mode
        [    0.249717] #PF: error_code(0x0002) - not-present page
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave.hansen@linux.intel.com
      Cc: luto@kernel.org
      Cc: riel@surriel.com
      Cc: sean.j.christopherson@intel.com
      Cc: yu-cheng.yu@intel.com
      Link: http://lkml.kernel.org/r/20190421183524.GC6048@zn.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ea2f8d60
  10. 20 4月, 2019 2 次提交
    • S
      x86/fault: Decode and print #PF oops in human readable form · 18ea35c5
      Sean Christopherson 提交于
      Linus pointed out that deciphering the raw #PF error code and printing
      a more human readable message are two different things, and also that
      printing the negative cases is mostly just noise[1].  For example, the
      USER bit doesn't mean the fault originated in user code and stating
      that an oops wasn't due to a protection keys violation isn't interesting
      since an oops on a keys violation is a one-in-a-million scenario.
      
      Remove the per-bit decoding of the error code and instead print:
        - the raw error code
        - why the fault occurred
        - the effective privilege level of the access
        - the type of access
        - whether the fault originated in user code or kernel code
      
      This provides the user with the information needed to triage 99.9% of
      oopses without polluting the log with useless information or conflating
      the error_code with the CPL.
      
      Sample output:
      
          BUG: kernel NULL pointer dereference, address = 0000000000000008
          #PF: supervisor-privileged instruction fetch from kernel code
          #PF: error_code(0x0010) - not-present page
      
          BUG: unable to handle page fault for address = ffffbeef00000000
          #PF: supervisor-privileged instruction fetch from kernel code
          #PF: error_code(0x0010) - not-present page
      
          BUG: unable to handle page fault for address = ffffc90000230000
          #PF: supervisor-privileged write access from kernel code
          #PF: error_code(0x000b) - reserved bit violation
      
      [1] https://lkml.kernel.org/r/CAHk-=whk_fsnxVMvF1T2fFCaP2WrvSybABrLQCWLJyCvHw6NKA@mail.gmail.comSuggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20181221213657.27628-3-sean.j.christopherson@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      18ea35c5
    • S
      x86/fault: Reword initial BUG message for unhandled page faults · f28b11a2
      Sean Christopherson 提交于
      Reword the NULL pointer dereference case to simply state that a NULL
      pointer was dereferenced, i.e. drop "unable to handle" as that implies
      that there are instances where the kernel actual does handle NULL
      pointer dereferences, which is not true barring funky exception fixup.
      
      For the non-NULL case, replace "kernel paging request" with "page fault"
      as the kernel can technically oops on faults that originated in user
      code.  Dropping "kernel" also allows future patches to provide detailed
      information on where the fault occurred, e.g. user vs. kernel, without
      conflicting with the initial BUG message.
      
      In both cases, replace "at address=" with wording more appropriate to
      the oops, as "at" may be interpreted as stating that the address is the
      RIP of the instruction that faulted.
      
      Last, and probably least, further qualify the NULL-pointer path by
      checking that the fault actually originated in kernel code.  It's
      technically possible for userspace to map address 0, and not printing
      a super specific message is the least of our worries if the kernel does
      manage to oops on an actual NULL pointer dereference from userspace.
      
      Before:
          BUG: unable to handle kernel NULL pointer dereference at ffffbeef00000000
          BUG: unable to handle kernel paging request at ffffbeef00000000
      
      After:
          BUG: kernel NULL pointer dereference, address = 0000000000000008
          BUG: unable to handle page fault for address = ffffbeef00000000
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20181221213657.27628-2-sean.j.christopherson@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f28b11a2
  11. 17 4月, 2019 2 次提交
    • T
      x86/traps: Use cpu_entry_area instead of orig_ist · d876b673
      Thomas Gleixner 提交于
      The orig_ist[] array is a shadow copy of the IST array in the TSS. The
      reason why it exists is that older kernels used two TSS variants with
      different pointers into the debug stack. orig_ist[] contains the real
      starting points.
      
      There is no point anymore to do so because the same information can be
      retrieved using the base address of the cpu entry area mapping and the
      offsets of the various exception stacks.
      
      No functional change. Preparation for removing orig_ist.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.784487230@linutronix.de
      d876b673
    • T
      x86/exceptions: Make IST index zero based · 8f34c5b5
      Thomas Gleixner 提交于
      The defines for the exception stack (IST) array in the TSS are using the
      SDM convention IST1 - IST7. That causes all sorts of code to subtract 1 for
      array indices related to IST. That's confusing at best and does not provide
      any value.
      
      Make the indices zero based and fixup the usage sites. The only code which
      needs to adjust the 0 based index is the interrupt descriptor setup which
      needs to add 1 now.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: linux-doc@vger.kernel.org
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190414160144.331772825@linutronix.de
      8f34c5b5
  12. 08 3月, 2019 1 次提交
  13. 30 1月, 2019 1 次提交
    • C
      x86/fault: Fix sign-extend unintended sign extension · 5ccd3528
      Colin Ian King 提交于
      show_ldttss() shifts desc.base2 by 24 bit, but base2 is 8 bits of a
      bitfield in a u16.
      
      Due to the really great idea of integer promotion in C99 base2 is promoted
      to an int, because that's the standard defined behaviour when all values
      which can be represented by base2 fit into an int.
      
      Now if bit 7 is set in desc.base2 the result of the shift left by 24 makes
      the resulting integer negative and the following conversion to unsigned
      long legitmately sign extends first causing the upper bits 32 bits to be
      set in the result.
      
      Fix this by casting desc.base2 to unsigned long before the shift.
      
      Detected by CoverityScan, CID#1475635 ("Unintended sign extension")
      
      [ tglx: Reworded the changelog a bit as I actually had to lookup
        	the standard (again) to decode the original one. ]
      
      Fixes: a1a371c4 ("x86/fault: Decode page fault OOPSes better")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: kernel-janitors@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181222191116.21831-1-colin.king@canonical.com
      5ccd3528
  14. 22 11月, 2018 4 次提交
    • I
      x86/fault: Clean up the page fault oops decoder a bit · a2aa52ab
      Ingo Molnar 提交于
       - Make the oops messages a bit less scary (don't mention 'HW errors')
      
       - Turn 'PROT USER' (which is visually easily confused with PROT_USER)
         into individual bit descriptors: "[PROT] [USER]".
         This also makes "[normal kernel read fault]" more apparent.
      
       - De-abbreviate variables to make the code easier to read
      
       - Use vertical alignment where appropriate.
      
       - Add comment about string size limits and the helper function.
      
       - Remove unnecessary line breaks.
      
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a2aa52ab
    • A
      x86/fault: Decode page fault OOPSes better · a1a371c4
      Andy Lutomirski 提交于
      One of Linus' favorite hobbies seems to be looking at OOPSes and
      decoding the error code in his head.  This is not one of my favorite
      hobbies :)
      
      Teach the page fault OOPS hander to decode the error code.  If it's
      a !USER fault from user mode, print an explicit note to that effect
      and print out the addresses of various tables that might cause such
      an error.
      
      With this patch applied, if I intentionally point the LDT at 0x0 and
      run the x86 selftests, I get:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        HW error: normal kernel read fault
        This was a system access from user code
        IDT: 0xfffffe0000000000 (limit=0xfff) GDT: 0xfffffe0000001000 (limit=0x7f)
        LDTR: 0x50 -- base=0x0 limit=0xfff7
        TR: 0x40 -- base=0xfffffe0000003000 limit=0x206f
        PGD 800000000456e067 P4D 800000000456e067 PUD 4623067 PMD 0
        SMP PTI
        CPU: 0 PID: 153 Comm: ldt_gdt_64 Not tainted 4.19.0+ #1317
        Hardware name: ...
        RIP: 0033:0x401454
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/11212acb25980cd1b3030875cd9502414fbb214d.1542841400.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a1a371c4
    • A
      x86/fault: Don't try to recover from an implicit supervisor access · ebb53e25
      Andy Lutomirski 提交于
      This avoids a situation in which we attempt to apply various fixups
      that are not intended to handle implicit supervisor accesses from
      user mode if we screw up in a way that causes this type of fault.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/9999f151d72ff352265f3274c5ab3a4105090f49.1542841400.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ebb53e25
    • A
      x86/fault: Remove sw_error_code · 0ed32f1a
      Andy Lutomirski 提交于
      All of the fault handling code now corrently checks user_mode(regs)
      as needed, and nothing depends on the X86_PF_USER bit being munged.
      Get rid of the sw_error code and use hw_error_code everywhere.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/078f5b8ae6e8c79ff8ee7345b5c476c45003e5ac.1542841400.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0ed32f1a
  15. 20 11月, 2018 7 次提交
    • A
      x86/fault: Don't set thread.cr2, etc before OOPSing · 1ad33f5a
      Andy Lutomirski 提交于
      The fault handling code sets the cr2, trap_nr, and error_code fields
      in thread_struct before OOPSing.  No one reads those fields during
      an OOPS, so remove the code to set them.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/d418022aa0fad9cb40467aa7acaf4e95be50ee96.1542667307.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1ad33f5a
    • A
      x86/fault: Make error_code sanitization more robust · e49d3cbe
      Andy Lutomirski 提交于
      The error code in a page fault on a kernel address indicates
      whether that address is mapped, which should not be revealed in a signal.
      
      The normal code path for a page fault on a kernel address sanitizes the bit,
      but the paths for vsyscall emulation and SIGBUS do not.  Both are
      harmless, but for subtle reasons.  SIGBUS is never sent for a kernel
      address, and vsyscall emulation will never fault on a kernel address
      per se because it will fail an access_ok() check instead.
      
      Make the code more robust by adding a helper that sets the relevant
      fields and sanitizing the error code in the helper.  This also
      cleans up the code -- we had three copies of roughly the same thing.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/b31159bd55bd0c4fa061a20dfd6c429c094bebaa.1542667307.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e49d3cbe
    • A
      x86/fault: Improve the condition for signalling vs OOPSing · 6ea59b07
      Andy Lutomirski 提交于
      __bad_area_nosemaphore() currently checks the X86_PF_USER bit in the
      error code to decide whether to send a signal or to treat the fault
      as a kernel error.  This can cause somewhat erratic behavior.  The
      straightforward cases where the CPL agrees with the hardware USER
      bit are all correct, but the other cases are confusing.
      
       - A user instruction accessing a kernel address with supervisor
         privilege (e.g. a descriptor table access failed).  The USER bit
         will be clear, and we OOPS.  This is correct, because it indicates
         a kernel bug, not a user error.
      
       - A user instruction accessing a user address with supervisor
         privilege (e.g. a descriptor table was incorrectly pointing at
         user memory).  __bad_area_nosemaphore() will be passed a modified
         error code with the user bit set, and we will send a signal.
         Sending the signal will work (because the regs and the entry
         frame genuinely come from user mode), but we really ought to
         OOPS, as this event indicates a severe kernel bug.
      
       - A kernel instruction with user privilege (i.e. WRUSS).  This
         should OOPS or get fixed up.  The current code would instead try
         send a signal and malfunction.
      
      Change the logic: a signal should be sent if the faulting context is
      user mode *and* the access has user privilege.  Otherwise it's
      either a kernel mode fault or a failed implicit access, either of
      which should end up in no_context().
      
      Note to -stable maintainers: don't backport this unless you backport
      CET.  The bug it fixes is unobservable in current kernels unless
      something is extremely wrong.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/10e509c43893170e262e82027ea399130ae81159.1542667307.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6ea59b07
    • A
      x86/fault: Fix SMAP #PF handling buglet for implicit supervisor accesses · e50928d7
      Andy Lutomirski 提交于
      Currently, if a user program somehow triggers an implicit supervisor
      access to a user address (e.g. if the kernel somehow sets LDTR to a
      user address), it will be incorrectly detected as a SMAP violation
      if AC is clear and SMAP is enabled.  This is incorrect -- the error
      has nothing to do with SMAP.  Fix the condition so that only
      accesses with the hardware USER bit set are diagnosed as SMAP
      violations.
      
      With the logic fixed, an implicit supervisor access to a user address
      will hit the code lower in the function that is intended to handle it
      even if SMAP is enabled.  That logic is still a bit buggy, and later
      patches will clean it up.
      
      I *think* this code is still correct for WRUSS, and I've added a
      comment to that effect.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/d1d1b2e66ef31f884dba172084486ea9423ddcdb.1542667307.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e50928d7
    • A
      x86/fault: Fold smap_violation() into do_user_addr_fault() · a15781b5
      Andy Lutomirski 提交于
      smap_violation() has a single caller, and the contents are a bit
      nonsensical.  I'm going to fix it, but first let's fold it into its
      caller for ease of comprehension.
      
      In this particular case, the user_mode(regs) check is incorrect --
      it will cause false positives in the case of a user-initiated
      kernel-privileged access.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/806c366f6ca861152398ce2c01744d59d9aceb6d.1542667307.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a15781b5
    • A
      x86/cpufeatures, x86/fault: Mark SMAP as disabled when configured out · dae0a105
      Andy Lutomirski 提交于
      Add X86_FEATURE_SMAP to the disabled features mask as appropriate
      and use cpu_feature_enabled() in the fault code.  This lets us get
      rid of a redundant IS_ENABLED(CONFIG_X86_SMAP).
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/fe93332eded3d702f0b0b4cf83928d6830739ba3.1542667307.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dae0a105
    • A
      x86/fault: Check user_mode(regs) when avoiding an mmap_sem deadlock · 6344be60
      Andy Lutomirski 提交于
      The fault-handling code that takes mmap_sem needs to avoid a
      deadlock that could occur if the kernel took a bad (OOPS-worthy)
      page fault on a user address while holding mmap_sem.  This can only
      happen if the faulting instruction was in the kernel
      (i.e. user_mode(regs)).  Rather than checking the sw_error_code
      (which will have the USER bit set if the fault was a USER-permission
      access *or* if user_mode(regs)), just check user_mode(regs)
      directly.
      
      The old code would have malfunctioned if the kernel executed a bogus
      WRUSS instruction while holding mmap_sem.  Fortunately, that is
      extremely unlikely in current kernels, which don't use WRUSS.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/4b89b542e8ceba9bd6abde2f386afed6d99244a9.1542667307.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6344be60
  16. 12 11月, 2018 1 次提交
    • W
      x86/mm/fault: Allow stack access below %rsp · 1d8ca3be
      Waiman Long 提交于
      The current x86 page fault handler allows stack access below the stack
      pointer if it is no more than 64k+256 bytes. Any access beyond the 64k+
      limit will cause a segmentation fault.
      
      The gcc -fstack-check option generates code to probe the stack for
      large stack allocation to see if the stack is accessible. The newer gcc
      does that while updating the %rsp simultaneously. Older gcc's like gcc4
      doesn't do that. As a result, an application compiled with an old gcc
      and the -fstack-check option may fail to start at all:
      
        $ cat test.c
        int main() {
      	char tmp[1024*128];
      	printf("### ok\n");
      	return 0;
        }
      
        $ gcc -fstack-check -g -o test test.c
      
        $ ./test
        Segmentation fault
      
      The old binary was working in older kernels where expand_stack() was
      somehow called before the check. But it is not working in newer kernels.
      Besides, the 64k+ limit check is kind of crude and will not catch a
      lot of mistakes that userspace applications may be misbehaving anyway.
      I think the kernel isn't the right place for this kind of tests. We
      should leave it to userspace instrumentation tools to perform them.
      
      The 64k+ limit check is now removed to just let expand_stack() decide
      if a segmentation fault should happen, when the RLIMIT_STACK limit is
      exceeded, for example.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1541535149-31963-1-git-send-email-longman@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1d8ca3be
  17. 31 10月, 2018 1 次提交
  18. 21 10月, 2018 1 次提交
  19. 09 10月, 2018 8 次提交
  20. 26 9月, 2018 1 次提交
    • S
      efi/x86: Handle page faults occurring while running EFI runtime services · 3425d934
      Sai Praneeth 提交于
      Memory accesses performed by UEFI runtime services should be limited to:
      - reading/executing from EFI_RUNTIME_SERVICES_CODE memory regions
      - reading/writing from/to EFI_RUNTIME_SERVICES_DATA memory regions
      - reading/writing by-ref arguments
      - reading/writing from/to the stack.
      
      Accesses outside these regions may cause the kernel to hang because the
      memory region requested by the firmware isn't mapped in efi_pgd, which
      causes a page fault in ring 0 and the kernel fails to handle it, leading
      to die(). To save kernel from hanging, add an EFI specific page fault
      handler which recovers from such faults by
      1. If the efi runtime service is efi_reset_system(), reboot the machine
         through BIOS.
      2. If the efi runtime service is _not_ efi_reset_system(), then freeze
         efi_rts_wq and schedule a new process.
      
      The EFI page fault handler offers us two advantages:
      1. Avoid potential hangs caused by buggy firmware.
      2. Shout loud that the firmware is buggy and hence is not a kernel bug.
      Tested-by: NBhupesh Sharma <bhsharma@redhat.com>
      Suggested-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Based-on-code-from: Ricardo Neri <ricardo.neri@intel.com>
      Signed-off-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      [ardb: clarify commit log]
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      3425d934