1. 17 7月, 2019 1 次提交
    • A
      mm, kprobes: generalize and rename notify_page_fault() as kprobe_page_fault() · b98cca44
      Anshuman Khandual 提交于
      Architectures which support kprobes have very similar boilerplate around
      calling kprobe_fault_handler().  Use a helper function in kprobes.h to
      unify them, based on the x86 code.
      
      This changes the behaviour for other architectures when preemption is
      enabled.  Previously, they would have disabled preemption while calling
      the kprobe handler.  However, preemption would be disabled if this fault
      was due to a kprobe, so we know the fault was not due to a kprobe
      handler and can simply return failure.
      
      This behaviour was introduced in commit a980c0ef ("x86/kprobes:
      Refactor kprobes_fault() like kprobe_exceptions_notify()")
      
      [anshuman.khandual@arm.com: export kprobe_fault_handler()]
        Link: http://lkml.kernel.org/r/1561133358-8876-1-git-send-email-anshuman.khandual@arm.com
      Link: http://lkml.kernel.org/r/1560420444-25737-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b98cca44
  2. 31 5月, 2019 1 次提交
  3. 29 5月, 2019 1 次提交
    • E
      signal: Remove the task parameter from force_sig_fault · 2e1661d2
      Eric W. Biederman 提交于
      As synchronous exceptions really only make sense against the current
      task (otherwise how are you synchronous) remove the task parameter
      from from force_sig_fault to make it explicit that is what is going
      on.
      
      The two known exceptions that deliver a synchronous exception to a
      stopped ptraced task have already been changed to
      force_sig_fault_to_task.
      
      The callers have been changed with the following emacs regular expression
      (with obvious variations on the architectures that take more arguments)
      to avoid typos:
      
      force_sig_fault[(]\([^,]+\)[,]\([^,]+\)[,]\([^,]+\)[,]\W+current[)]
      ->
      force_sig_fault(\1,\2,\3)
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      2e1661d2
  4. 27 5月, 2019 1 次提交
  5. 21 4月, 2019 3 次提交
    • M
      powerpc/mm: Detect bad KUAP faults · 5e5be3ae
      Michael Ellerman 提交于
      When KUAP is enabled we have logic to detect page faults that occur
      outside of a valid user access region and are blocked by the AMR.
      
      What we don't have at the moment is logic to detect a fault *within* a
      valid user access region, that has been incorrectly blocked by AMR.
      This is not meant to ever happen, but it can if we incorrectly
      save/restore the AMR, or if the AMR was overwritten for some other
      reason.
      
      Currently if that happens we assume it's just a regular fault that
      will be corrected by handling the fault normally, so we just return.
      But there is nothing the fault handling code can do to fix it, so the
      fault just happens again and we spin forever, leading to soft lockups.
      
      So add some logic to detect that case and WARN() if we ever see it.
      Arguably it should be a BUG(), but it's more polite to fail the access
      and let the kernel continue, rather than taking down the box. There
      should be no data integrity issue with failing the fault rather than
      BUG'ing, as we're just going to disallow an access that should have
      been allowed.
      
      To make the code a little easier to follow, unroll the condition at
      the end of bad_kernel_fault() and comment each case, before adding the
      call to bad_kuap_fault().
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5e5be3ae
    • C
      powerpc: Add a framework for Kernel Userspace Access Protection · de78a9c4
      Christophe Leroy 提交于
      This patch implements a framework for Kernel Userspace Access
      Protection.
      
      Then subarches will have the possibility to provide their own
      implementation by providing setup_kuap() and
      allow/prevent_user_access().
      
      Some platforms will need to know the area accessed and whether it is
      accessed from read, write or both. Therefore source, destination and
      size and handed over to the two functions.
      
      mpe: Rename to allow/prevent rather than unlock/lock, and add
      read/write wrappers. Drop the 32-bit code for now until we have an
      implementation for it. Add kuap to pt_regs for 64-bit as well as
      32-bit. Don't split strings, use pr_crit_ratelimited().
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      de78a9c4
    • C
      powerpc: Add skeleton for Kernel Userspace Execution Prevention · 0fb1c25a
      Christophe Leroy 提交于
      This patch adds a skeleton for Kernel Userspace Execution Prevention.
      
      Then subarches implementing it have to define CONFIG_PPC_HAVE_KUEP
      and provide setup_kuep() function.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      [mpe: Don't split strings, use pr_crit_ratelimited()]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0fb1c25a
  6. 04 1月, 2019 1 次提交
    • L
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds 提交于
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  7. 21 12月, 2018 1 次提交
  8. 20 12月, 2018 2 次提交
    • C
      powerpc/mm: Make NULL pointer deferences explicit on bad page faults. · 49a502ea
      Christophe Leroy 提交于
      As several other arches including x86, this patch makes it explicit
      that a bad page fault is a NULL pointer dereference when the fault
      address is lower than PAGE_SIZE
      
      In the mean time, this page makes all bad_page_fault() messages
      shorter so that they remain on one single line. And it prefixes them
      by "BUG: " so that they get easily grepped.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      [mpe: Avoid pr_cont()]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      49a502ea
    • A
      powerpc/mm/hash: Handle user access of kernel address gracefully · 374f3f59
      Aneesh Kumar K.V 提交于
      In commit 2865d08d ("powerpc/mm: Move the DSISR_PROTFAULT sanity
      check") we moved the protection fault access check before the vma
      lookup. That means we hit that WARN_ON when user space accesses a
      kernel address. Before that commit this was handled by find_vma() not
      finding vma for the kernel address and considering that access as bad
      area access.
      
      Avoid the confusing WARN_ON and convert that to a ratelimited printk.
      
      With the patch we now get:
      
      for load:
        a.out[5997]: User access of kernel address (c00000000000dea0) - exploit attempt? (uid: 1000)
        a.out[5997]: segfault (11) at c00000000000dea0 nip 1317c0798 lr 7fff80d6441c code 1 in a.out[1317c0000+10000]
        a.out[5997]: code: 60000000 60420000 3c4c0002 38427790 4bffff20 3c4c0002 38427784 fbe1fff8
        a.out[5997]: code: f821ffc1 7c3f0b78 60000000 e9228030 <89290000> 993f002f 60000000 383f0040
      
      for exec:
        a.out[6067]: User access of kernel address (c00000000000dea0) - exploit attempt? (uid: 1000)
        a.out[6067]: segfault (11) at c00000000000dea0 nip c00000000000dea0 lr 129d507b0 code 1
        a.out[6067]: Bad NIP, not dumping instructions.
      
      Fixes: 2865d08d ("powerpc/mm: Move the DSISR_PROTFAULT sanity check")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Tested-by: NBreno Leitao <leitao@debian.org>
      [mpe: Don't split printk() string across lines]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      374f3f59
  9. 17 12月, 2018 1 次提交
    • S
      KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2 · d7b45615
      Suraj Jitindar Singh 提交于
      The POWER9 radix mmu has the concept of quadrants. The quadrant number
      is the two high bits of the effective address and determines the fully
      qualified address to be used for the translation. The fully qualified
      address consists of the effective lpid, the effective pid and the
      effective address. This gives then 4 possible quadrants 0, 1, 2, and 3.
      
      When accessing these quadrants the fully qualified address is obtained
      as follows:
      
      Quadrant		| Hypervisor		| Guest
      --------------------------------------------------------------------------
      			| EA[0:1] = 0b00	| EA[0:1] = 0b00
      0			| effLPID = 0		| effLPID = LPIDR
      			| effPID  = PIDR	| effPID  = PIDR
      --------------------------------------------------------------------------
      			| EA[0:1] = 0b01	|
      1			| effLPID = LPIDR	| Invalid Access
      			| effPID  = PIDR	|
      --------------------------------------------------------------------------
      			| EA[0:1] = 0b10	|
      2			| effLPID = LPIDR	| Invalid Access
      			| effPID  = 0		|
      --------------------------------------------------------------------------
      			| EA[0:1] = 0b11	| EA[0:1] = 0b11
      3			| effLPID = 0		| effLPID = LPIDR
      			| effPID  = 0		| effPID  = 0
      --------------------------------------------------------------------------
      
      In the Guest;
      Quadrant 3 is normally used to address the operating system since this
      uses effPID=0 and effLPID=LPIDR, meaning the PID register doesn't need to
      be switched.
      Quadrant 0 is normally used to address user space since the effLPID and
      effPID are taken from the corresponding registers.
      
      In the Host;
      Quadrant 0 and 3 are used as above, however the effLPID is always 0 to
      address the host.
      
      Quadrants 1 and 2 can be used by the host to address guest memory using
      a guest effective address. Since the effLPID comes from the LPID register,
      the host loads the LPID of the guest it would like to access (and the
      PID of the process) and can perform accesses to a guest effective
      address.
      
      This means quadrant 1 can be used to address the guest user space and
      quadrant 2 can be used to address the guest operating system from the
      hypervisor, using a guest effective address.
      
      Access to the quadrants can cause a Hypervisor Data Storage Interrupt
      (HDSI) due to being unable to perform partition scoped translation.
      Previously this could only be generated from a guest and so the code
      path expects us to take the KVM trampoline in the interrupt handler.
      This is no longer the case so we modify the handler to call
      bad_page_fault() to check if we were expecting this fault so we can
      handle it gracefully and just return with an error code. In the hash mmu
      case we still raise an unknown exception since quadrants aren't defined
      for the hash mmu.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      d7b45615
  10. 26 11月, 2018 1 次提交
  11. 21 9月, 2018 6 次提交
  12. 18 8月, 2018 1 次提交
    • S
      mm: convert return type of handle_mm_fault() caller to vm_fault_t · 50a7ca3c
      Souptick Joarder 提交于
      Use new return type vm_fault_t for fault handler.  For now, this is just
      documenting that the function returns a VM_FAULT value rather than an
      errno.  Once all instances are converted, vm_fault_t will become a
      distinct type.
      
      Ref-> commit 1c8f4220 ("mm: change return type to vm_fault_t")
      
      In this patch all the caller of handle_mm_fault() are changed to return
      vm_fault_t type.
      
      Link: http://lkml.kernel.org/r/20180617084810.GA6730@jordon-HP-15-Notebook-PCSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50a7ca3c
  13. 30 7月, 2018 1 次提交
  14. 24 5月, 2018 2 次提交
    • C
      powerpc/mm: Only read faulting instruction when necessary in do_page_fault() · 0e36b0d1
      Christophe Leroy 提交于
      Commit a7a9dcd8 ("powerpc: Avoid taking a data miss on every
      userspace instruction miss") has shown that limiting the read of
      faulting instruction to likely cases improves performance.
      
      This patch goes further into this direction by limiting the read
      of the faulting instruction to the only cases where it is likely
      needed.
      
      On an MPC885, with the same benchmark app as in the commit referred
      above, we see a reduction of about 3900 dTLB misses (approx 3%):
      
      Before the patch:
       Performance counter stats for './fault 500' (10 runs):
      
               683033312      cpu-cycles                                                    ( +-  0.03% )
                  134538      dTLB-load-misses                                              ( +-  0.03% )
                   46099      iTLB-load-misses                                              ( +-  0.02% )
                   19681      faults                                                        ( +-  0.02% )
      
             5.389747878 seconds time elapsed                                          ( +-  0.06% )
      
      With the patch:
      
       Performance counter stats for './fault 500' (10 runs):
      
               682112862      cpu-cycles                                                    ( +-  0.03% )
                  130619      dTLB-load-misses                                              ( +-  0.03% )
                   46073      iTLB-load-misses                                              ( +-  0.05% )
                   19681      faults                                                        ( +-  0.01% )
      
             5.381342641 seconds time elapsed                                          ( +-  0.07% )
      
      The proper work of the huge stack expansion was tested with the
      following app:
      
      int main(int argc, char **argv)
      {
      	char buf[1024 * 1025];
      
      	sprintf(buf, "Hello world !\n");
      	printf(buf);
      
      	exit(0);
      }
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Add include of pagemap.h to fix build errors]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0e36b0d1
    • C
      powerpc/mm: Use instruction symbolic names in store_updates_sp() · 8a0b1120
      Christophe Leroy 提交于
      Use symbolic names defined in asm/ppc-opcode.h
      instead of hardcoded values.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8a0b1120
  15. 25 4月, 2018 1 次提交
    • E
      signal: Ensure every siginfo we send has all bits initialized · 3eb0f519
      Eric W. Biederman 提交于
      Call clear_siginfo to ensure every stack allocated siginfo is properly
      initialized before being passed to the signal sending functions.
      
      Note: It is not safe to depend on C initializers to initialize struct
      siginfo on the stack because C is allowed to skip holes when
      initializing a structure.
      
      The initialization of struct siginfo in tracehook_report_syscall_exit
      was moved from the helper user_single_step_siginfo into
      tracehook_report_syscall_exit itself, to make it clear that the local
      variable siginfo gets fully initialized.
      
      In a few cases the scope of struct siginfo has been reduced to make it
      clear that siginfo siginfo is not used on other paths in the function
      in which it is declared.
      
      Instances of using memset to initialize siginfo have been replaced
      with calls clear_siginfo for clarity.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      3eb0f519
  16. 04 4月, 2018 1 次提交
  17. 20 1月, 2018 2 次提交
  18. 16 1月, 2018 1 次提交
  19. 02 1月, 2018 1 次提交
    • J
      powerpc/mm: Fix SEGV on mapped region to return SEGV_ACCERR · ecb101ae
      John Sperbeck 提交于
      The recent refactoring of the powerpc page fault handler in commit
      c3350602 ("powerpc/mm: Make bad_area* helper functions") caused
      access to protected memory regions to indicate SEGV_MAPERR instead of
      the traditional SEGV_ACCERR in the si_code field of a user-space
      signal handler. This can confuse debug libraries that temporarily
      change the protection of memory regions, and expect to use SEGV_ACCERR
      as an indication to restore access to a region.
      
      This commit restores the previous behavior. The following program
      exhibits the issue:
      
          $ ./repro read  || echo "FAILED"
          $ ./repro write || echo "FAILED"
          $ ./repro exec  || echo "FAILED"
      
          #include <stdio.h>
          #include <stdlib.h>
          #include <string.h>
          #include <unistd.h>
          #include <signal.h>
          #include <sys/mman.h>
          #include <assert.h>
      
          static void segv_handler(int n, siginfo_t *info, void *arg) {
                  _exit(info->si_code == SEGV_ACCERR ? 0 : 1);
          }
      
          int main(int argc, char **argv)
          {
                  void *p = NULL;
                  struct sigaction act = {
                          .sa_sigaction = segv_handler,
                          .sa_flags = SA_SIGINFO,
                  };
      
                  assert(argc == 2);
                  p = mmap(NULL, getpagesize(),
                          (strcmp(argv[1], "write") == 0) ? PROT_READ : 0,
                          MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
                  assert(p != MAP_FAILED);
      
                  assert(sigaction(SIGSEGV, &act, NULL) == 0);
                  if (strcmp(argv[1], "read") == 0)
                          printf("%c", *(unsigned char *)p);
                  else if (strcmp(argv[1], "write") == 0)
                          *(unsigned char *)p = 0;
                  else if (strcmp(argv[1], "exec") == 0)
                          ((void (*)(void))p)();
                  return 1;  /* failed to generate SEGV */
          }
      
      Fixes: c3350602 ("powerpc/mm: Make bad_area* helper functions")
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: NJohn Sperbeck <jsperbeck@google.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [mpe: Add commit references in change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ecb101ae
  20. 10 8月, 2017 2 次提交
  21. 03 8月, 2017 9 次提交