1. 21 4月, 2019 2 次提交
  2. 04 1月, 2019 1 次提交
    • L
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds 提交于
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  3. 21 12月, 2018 1 次提交
  4. 20 12月, 2018 2 次提交
    • C
      powerpc/mm: Make NULL pointer deferences explicit on bad page faults. · 49a502ea
      Christophe Leroy 提交于
      As several other arches including x86, this patch makes it explicit
      that a bad page fault is a NULL pointer dereference when the fault
      address is lower than PAGE_SIZE
      
      In the mean time, this page makes all bad_page_fault() messages
      shorter so that they remain on one single line. And it prefixes them
      by "BUG: " so that they get easily grepped.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      [mpe: Avoid pr_cont()]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      49a502ea
    • A
      powerpc/mm/hash: Handle user access of kernel address gracefully · 374f3f59
      Aneesh Kumar K.V 提交于
      In commit 2865d08d ("powerpc/mm: Move the DSISR_PROTFAULT sanity
      check") we moved the protection fault access check before the vma
      lookup. That means we hit that WARN_ON when user space accesses a
      kernel address. Before that commit this was handled by find_vma() not
      finding vma for the kernel address and considering that access as bad
      area access.
      
      Avoid the confusing WARN_ON and convert that to a ratelimited printk.
      
      With the patch we now get:
      
      for load:
        a.out[5997]: User access of kernel address (c00000000000dea0) - exploit attempt? (uid: 1000)
        a.out[5997]: segfault (11) at c00000000000dea0 nip 1317c0798 lr 7fff80d6441c code 1 in a.out[1317c0000+10000]
        a.out[5997]: code: 60000000 60420000 3c4c0002 38427790 4bffff20 3c4c0002 38427784 fbe1fff8
        a.out[5997]: code: f821ffc1 7c3f0b78 60000000 e9228030 <89290000> 993f002f 60000000 383f0040
      
      for exec:
        a.out[6067]: User access of kernel address (c00000000000dea0) - exploit attempt? (uid: 1000)
        a.out[6067]: segfault (11) at c00000000000dea0 nip c00000000000dea0 lr 129d507b0 code 1
        a.out[6067]: Bad NIP, not dumping instructions.
      
      Fixes: 2865d08d ("powerpc/mm: Move the DSISR_PROTFAULT sanity check")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Tested-by: NBreno Leitao <leitao@debian.org>
      [mpe: Don't split printk() string across lines]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      374f3f59
  5. 17 12月, 2018 1 次提交
    • S
      KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2 · d7b45615
      Suraj Jitindar Singh 提交于
      The POWER9 radix mmu has the concept of quadrants. The quadrant number
      is the two high bits of the effective address and determines the fully
      qualified address to be used for the translation. The fully qualified
      address consists of the effective lpid, the effective pid and the
      effective address. This gives then 4 possible quadrants 0, 1, 2, and 3.
      
      When accessing these quadrants the fully qualified address is obtained
      as follows:
      
      Quadrant		| Hypervisor		| Guest
      --------------------------------------------------------------------------
      			| EA[0:1] = 0b00	| EA[0:1] = 0b00
      0			| effLPID = 0		| effLPID = LPIDR
      			| effPID  = PIDR	| effPID  = PIDR
      --------------------------------------------------------------------------
      			| EA[0:1] = 0b01	|
      1			| effLPID = LPIDR	| Invalid Access
      			| effPID  = PIDR	|
      --------------------------------------------------------------------------
      			| EA[0:1] = 0b10	|
      2			| effLPID = LPIDR	| Invalid Access
      			| effPID  = 0		|
      --------------------------------------------------------------------------
      			| EA[0:1] = 0b11	| EA[0:1] = 0b11
      3			| effLPID = 0		| effLPID = LPIDR
      			| effPID  = 0		| effPID  = 0
      --------------------------------------------------------------------------
      
      In the Guest;
      Quadrant 3 is normally used to address the operating system since this
      uses effPID=0 and effLPID=LPIDR, meaning the PID register doesn't need to
      be switched.
      Quadrant 0 is normally used to address user space since the effLPID and
      effPID are taken from the corresponding registers.
      
      In the Host;
      Quadrant 0 and 3 are used as above, however the effLPID is always 0 to
      address the host.
      
      Quadrants 1 and 2 can be used by the host to address guest memory using
      a guest effective address. Since the effLPID comes from the LPID register,
      the host loads the LPID of the guest it would like to access (and the
      PID of the process) and can perform accesses to a guest effective
      address.
      
      This means quadrant 1 can be used to address the guest user space and
      quadrant 2 can be used to address the guest operating system from the
      hypervisor, using a guest effective address.
      
      Access to the quadrants can cause a Hypervisor Data Storage Interrupt
      (HDSI) due to being unable to perform partition scoped translation.
      Previously this could only be generated from a guest and so the code
      path expects us to take the KVM trampoline in the interrupt handler.
      This is no longer the case so we modify the handler to call
      bad_page_fault() to check if we were expecting this fault so we can
      handle it gracefully and just return with an error code. In the hash mmu
      case we still raise an unknown exception since quadrants aren't defined
      for the hash mmu.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      d7b45615
  6. 26 11月, 2018 1 次提交
  7. 21 9月, 2018 6 次提交
  8. 18 8月, 2018 1 次提交
    • S
      mm: convert return type of handle_mm_fault() caller to vm_fault_t · 50a7ca3c
      Souptick Joarder 提交于
      Use new return type vm_fault_t for fault handler.  For now, this is just
      documenting that the function returns a VM_FAULT value rather than an
      errno.  Once all instances are converted, vm_fault_t will become a
      distinct type.
      
      Ref-> commit 1c8f4220 ("mm: change return type to vm_fault_t")
      
      In this patch all the caller of handle_mm_fault() are changed to return
      vm_fault_t type.
      
      Link: http://lkml.kernel.org/r/20180617084810.GA6730@jordon-HP-15-Notebook-PCSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50a7ca3c
  9. 30 7月, 2018 1 次提交
  10. 24 5月, 2018 2 次提交
    • C
      powerpc/mm: Only read faulting instruction when necessary in do_page_fault() · 0e36b0d1
      Christophe Leroy 提交于
      Commit a7a9dcd8 ("powerpc: Avoid taking a data miss on every
      userspace instruction miss") has shown that limiting the read of
      faulting instruction to likely cases improves performance.
      
      This patch goes further into this direction by limiting the read
      of the faulting instruction to the only cases where it is likely
      needed.
      
      On an MPC885, with the same benchmark app as in the commit referred
      above, we see a reduction of about 3900 dTLB misses (approx 3%):
      
      Before the patch:
       Performance counter stats for './fault 500' (10 runs):
      
               683033312      cpu-cycles                                                    ( +-  0.03% )
                  134538      dTLB-load-misses                                              ( +-  0.03% )
                   46099      iTLB-load-misses                                              ( +-  0.02% )
                   19681      faults                                                        ( +-  0.02% )
      
             5.389747878 seconds time elapsed                                          ( +-  0.06% )
      
      With the patch:
      
       Performance counter stats for './fault 500' (10 runs):
      
               682112862      cpu-cycles                                                    ( +-  0.03% )
                  130619      dTLB-load-misses                                              ( +-  0.03% )
                   46073      iTLB-load-misses                                              ( +-  0.05% )
                   19681      faults                                                        ( +-  0.01% )
      
             5.381342641 seconds time elapsed                                          ( +-  0.07% )
      
      The proper work of the huge stack expansion was tested with the
      following app:
      
      int main(int argc, char **argv)
      {
      	char buf[1024 * 1025];
      
      	sprintf(buf, "Hello world !\n");
      	printf(buf);
      
      	exit(0);
      }
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Add include of pagemap.h to fix build errors]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0e36b0d1
    • C
      powerpc/mm: Use instruction symbolic names in store_updates_sp() · 8a0b1120
      Christophe Leroy 提交于
      Use symbolic names defined in asm/ppc-opcode.h
      instead of hardcoded values.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8a0b1120
  11. 25 4月, 2018 1 次提交
    • E
      signal: Ensure every siginfo we send has all bits initialized · 3eb0f519
      Eric W. Biederman 提交于
      Call clear_siginfo to ensure every stack allocated siginfo is properly
      initialized before being passed to the signal sending functions.
      
      Note: It is not safe to depend on C initializers to initialize struct
      siginfo on the stack because C is allowed to skip holes when
      initializing a structure.
      
      The initialization of struct siginfo in tracehook_report_syscall_exit
      was moved from the helper user_single_step_siginfo into
      tracehook_report_syscall_exit itself, to make it clear that the local
      variable siginfo gets fully initialized.
      
      In a few cases the scope of struct siginfo has been reduced to make it
      clear that siginfo siginfo is not used on other paths in the function
      in which it is declared.
      
      Instances of using memset to initialize siginfo have been replaced
      with calls clear_siginfo for clarity.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      3eb0f519
  12. 04 4月, 2018 1 次提交
  13. 20 1月, 2018 2 次提交
  14. 16 1月, 2018 1 次提交
  15. 02 1月, 2018 1 次提交
    • J
      powerpc/mm: Fix SEGV on mapped region to return SEGV_ACCERR · ecb101ae
      John Sperbeck 提交于
      The recent refactoring of the powerpc page fault handler in commit
      c3350602 ("powerpc/mm: Make bad_area* helper functions") caused
      access to protected memory regions to indicate SEGV_MAPERR instead of
      the traditional SEGV_ACCERR in the si_code field of a user-space
      signal handler. This can confuse debug libraries that temporarily
      change the protection of memory regions, and expect to use SEGV_ACCERR
      as an indication to restore access to a region.
      
      This commit restores the previous behavior. The following program
      exhibits the issue:
      
          $ ./repro read  || echo "FAILED"
          $ ./repro write || echo "FAILED"
          $ ./repro exec  || echo "FAILED"
      
          #include <stdio.h>
          #include <stdlib.h>
          #include <string.h>
          #include <unistd.h>
          #include <signal.h>
          #include <sys/mman.h>
          #include <assert.h>
      
          static void segv_handler(int n, siginfo_t *info, void *arg) {
                  _exit(info->si_code == SEGV_ACCERR ? 0 : 1);
          }
      
          int main(int argc, char **argv)
          {
                  void *p = NULL;
                  struct sigaction act = {
                          .sa_sigaction = segv_handler,
                          .sa_flags = SA_SIGINFO,
                  };
      
                  assert(argc == 2);
                  p = mmap(NULL, getpagesize(),
                          (strcmp(argv[1], "write") == 0) ? PROT_READ : 0,
                          MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
                  assert(p != MAP_FAILED);
      
                  assert(sigaction(SIGSEGV, &act, NULL) == 0);
                  if (strcmp(argv[1], "read") == 0)
                          printf("%c", *(unsigned char *)p);
                  else if (strcmp(argv[1], "write") == 0)
                          *(unsigned char *)p = 0;
                  else if (strcmp(argv[1], "exec") == 0)
                          ((void (*)(void))p)();
                  return 1;  /* failed to generate SEGV */
          }
      
      Fixes: c3350602 ("powerpc/mm: Make bad_area* helper functions")
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: NJohn Sperbeck <jsperbeck@google.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [mpe: Add commit references in change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ecb101ae
  16. 10 8月, 2017 2 次提交
  17. 03 8月, 2017 14 次提交