1. 08 8月, 2020 1 次提交
    • M
      mm: remove unneeded includes of <asm/pgalloc.h> · ca15ca40
      Mike Rapoport 提交于
      Patch series "mm: cleanup usage of <asm/pgalloc.h>"
      
      Most architectures have very similar versions of pXd_alloc_one() and
      pXd_free_one() for intermediate levels of page table.  These patches add
      generic versions of these functions in <asm-generic/pgalloc.h> and enable
      use of the generic functions where appropriate.
      
      In addition, functions declared and defined in <asm/pgalloc.h> headers are
      used mostly by core mm and early mm initialization in arch and there is no
      actual reason to have the <asm/pgalloc.h> included all over the place.
      The first patch in this series removes unneeded includes of
      <asm/pgalloc.h>
      
      In the end it didn't work out as neatly as I hoped and moving
      pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require
      unnecessary changes to arches that have custom page table allocations, so
      I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
      to mm/.
      
      This patch (of 8):
      
      In most cases <asm/pgalloc.h> header is required only for allocations of
      page table memory.  Most of the .c files that include that header do not
      use symbols declared in <asm/pgalloc.h> and do not require that header.
      
      As for the other header files that used to include <asm/pgalloc.h>, it is
      possible to move that include into the .c file that actually uses symbols
      from <asm/pgalloc.h> and drop the include from the header file.
      
      The process was somewhat automated using
      
      	sed -i -E '/[<"]asm\/pgalloc\.h/d' \
                      $(grep -L -w -f /tmp/xx \
                              $(git grep -E -l '[<"]asm/pgalloc\.h'))
      
      where /tmp/xx contains all the symbols defined in
      arch/*/include/asm/pgalloc.h.
      
      [rppt@linux.ibm.com: fix powerpc warning]
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca15ca40
  2. 29 7月, 2020 2 次提交
  3. 27 7月, 2020 1 次提交
    • N
      powerpc/64s/hash: Fix hash_preload running with interrupts enabled · 909adfc6
      Nicholas Piggin 提交于
      Commit 2f92447f ("powerpc/book3s64/hash: Use the pte_t address from the
      caller") removed the local_irq_disable from hash_preload, but it was
      required for more than just the page table walk: the hash pte busy bit is
      effectively a lock which may be taken in interrupt context, and the local
      update flag test must not be preempted before it's used.
      
      This solves apparent lockups with perf interrupting __hash_page_64K. If
      get_perf_callchain then also takes a hash fault on the same page while it
      is already locked, it will loop forever taking hash faults, which looks like
      this:
      
        cpu 0x49e: Vector: 100 (System Reset) at [c00000001a4f7d70]
            pc: c000000000072dc8: hash_page_mm+0x8/0x800
            lr: c00000000000c5a4: do_hash_page+0x24/0x38
            sp: c0002ac1cc69ac70
           msr: 8000000000081033
          current = 0xc0002ac1cc602e00
          paca    = 0xc00000001de1f280   irqmask: 0x03   irq_happened: 0x01
            pid   = 20118, comm = pread2_processe
        Linux version 5.8.0-rc6-00345-g1fad14f18bc6
        49e:mon> t
        [c0002ac1cc69ac70] c00000000000c5a4 do_hash_page+0x24/0x38 (unreliable)
        --- Exception: 300 (Data Access) at c00000000008fa60 __copy_tofrom_user_power7+0x20c/0x7ac
        [link register   ] c000000000335d10 copy_from_user_nofault+0xf0/0x150
        [c0002ac1cc69af70] c00032bf9fa3c880 (unreliable)
        [c0002ac1cc69afa0] c000000000109df0 read_user_stack_64+0x70/0xf0
        [c0002ac1cc69afd0] c000000000109fcc perf_callchain_user_64+0x15c/0x410
        [c0002ac1cc69b060] c000000000109c00 perf_callchain_user+0x20/0x40
        [c0002ac1cc69b080] c00000000031c6cc get_perf_callchain+0x25c/0x360
        [c0002ac1cc69b120] c000000000316b50 perf_callchain+0x70/0xa0
        [c0002ac1cc69b140] c000000000316ddc perf_prepare_sample+0x25c/0x790
        [c0002ac1cc69b1a0] c000000000317350 perf_event_output_forward+0x40/0xb0
        [c0002ac1cc69b220] c000000000306138 __perf_event_overflow+0x88/0x1a0
        [c0002ac1cc69b270] c00000000010cf70 record_and_restart+0x230/0x750
        [c0002ac1cc69b620] c00000000010d69c perf_event_interrupt+0x20c/0x510
        [c0002ac1cc69b730] c000000000027d9c performance_monitor_exception+0x4c/0x60
        [c0002ac1cc69b750] c00000000000b2f8 performance_monitor_common_virt+0x1b8/0x1c0
        --- Exception: f00 (Performance Monitor) at c0000000000cb5b0 pSeries_lpar_hpte_insert+0x0/0x160
        [link register   ] c0000000000846f0 __hash_page_64K+0x210/0x540
        [c0002ac1cc69ba50] 0000000000000000 (unreliable)
        [c0002ac1cc69bb00] c000000000073ae0 update_mmu_cache+0x390/0x3a0
        [c0002ac1cc69bb70] c00000000037f024 wp_page_copy+0x364/0xce0
        [c0002ac1cc69bc20] c00000000038272c do_wp_page+0xdc/0xa60
        [c0002ac1cc69bc70] c0000000003857bc handle_mm_fault+0xb9c/0x1b60
        [c0002ac1cc69bd50] c00000000006c434 __do_page_fault+0x314/0xc90
        [c0002ac1cc69be20] c00000000000c5c8 handle_page_fault+0x10/0x2c
        --- Exception: 300 (Data Access) at 00007fff8c861fe8
        SP (7ffff6b19660) is in userspace
      
      Fixes: 2f92447f ("powerpc/book3s64/hash: Use the pte_t address from the caller")
      Reported-by: NAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Reported-by: NAnton Blanchard <anton@ozlabs.org>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200727060947.10060-1-npiggin@gmail.com
      909adfc6
  4. 20 7月, 2020 1 次提交
    • C
      net: remove compat_sys_{get,set}sockopt · 55db9c0e
      Christoph Hellwig 提交于
      Now that the ->compat_{get,set}sockopt proto_ops methods are gone
      there is no good reason left to keep the compat syscalls separate.
      
      This fixes the odd use of unsigned int for the compat_setsockopt
      optlen and the missing sock_use_custom_sol_socket.
      
      It would also easily allow running the eBPF hooks for the compat
      syscalls, but such a large change in behavior does not belong into
      a consolidation patch like this one.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55db9c0e
  5. 19 7月, 2020 2 次提交
  6. 17 7月, 2020 2 次提交
  7. 15 7月, 2020 1 次提交
  8. 14 7月, 2020 1 次提交
  9. 13 7月, 2020 1 次提交
    • A
      powerpc/book3s64/pkeys: Fix pkey_access_permitted() for execute disable pkey · 192b6a78
      Aneesh Kumar K.V 提交于
      Even if the IAMR value denies execute access, the current code returns
      true from pkey_access_permitted() for an execute permission check, if
      the AMR read pkey bit is cleared.
      
      This results in repeated page fault loop with a test like below:
      
        #define _GNU_SOURCE
        #include <errno.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <signal.h>
        #include <inttypes.h>
      
        #include <assert.h>
        #include <malloc.h>
        #include <unistd.h>
        #include <pthread.h>
        #include <sys/mman.h>
      
        #ifdef SYS_pkey_mprotect
        #undef SYS_pkey_mprotect
        #endif
      
        #ifdef SYS_pkey_alloc
        #undef SYS_pkey_alloc
        #endif
      
        #ifdef SYS_pkey_free
        #undef SYS_pkey_free
        #endif
      
        #undef PKEY_DISABLE_EXECUTE
        #define PKEY_DISABLE_EXECUTE	0x4
      
        #define SYS_pkey_mprotect	386
        #define SYS_pkey_alloc		384
        #define SYS_pkey_free		385
      
        #define PPC_INST_NOP		0x60000000
        #define PPC_INST_BLR		0x4e800020
        #define PROT_RWX		(PROT_READ | PROT_WRITE | PROT_EXEC)
      
        static int sys_pkey_mprotect(void *addr, size_t len, int prot, int pkey)
        {
        	return syscall(SYS_pkey_mprotect, addr, len, prot, pkey);
        }
      
        static int sys_pkey_alloc(unsigned long flags, unsigned long access_rights)
        {
        	return syscall(SYS_pkey_alloc, flags, access_rights);
        }
      
        static int sys_pkey_free(int pkey)
        {
        	return syscall(SYS_pkey_free, pkey);
        }
      
        static void do_execute(void *region)
        {
        	/* jump to region */
        	asm volatile(
        		"mtctr	%0;"
        		"bctrl"
        		: : "r"(region) : "ctr", "lr");
        }
      
        static void do_protect(void *region)
        {
        	size_t pgsize;
        	int i, pkey;
      
        	pgsize = getpagesize();
      
        	pkey = sys_pkey_alloc(0, PKEY_DISABLE_EXECUTE);
        	assert (pkey > 0);
      
        	/* perform mprotect */
        	assert(!sys_pkey_mprotect(region, pgsize, PROT_RWX, pkey));
        	do_execute(region);
      
        	/* free pkey */
        	assert(!sys_pkey_free(pkey));
      
        }
      
        int main(int argc, char **argv)
        {
        	size_t pgsize, numinsns;
        	unsigned int *region;
        	int i;
      
        	/* allocate memory region to protect */
        	pgsize = getpagesize();
        	region = memalign(pgsize, pgsize);
        	assert(region != NULL);
        	assert(!mprotect(region, pgsize, PROT_RWX));
      
        	/* fill page with NOPs with a BLR at the end */
        	numinsns = pgsize / sizeof(region[0]);
        	for (i = 0; i < numinsns - 1; i++)
        		region[i] = PPC_INST_NOP;
        	region[i] = PPC_INST_BLR;
      
        	do_protect(region);
      
        	return EXIT_SUCCESS;
        }
      
      The fix is to only check the IAMR for an execute check, the AMR value
      is not relevant.
      
      Fixes: f2407ef3 ("powerpc: helper to validate key-access permissions of a pte")
      Cc: stable@vger.kernel.org # v4.16+
      Reported-by: NSandipan Das <sandipan@linux.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      [mpe: Add detail to change log, tweak wording & formatting]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200712132047.1038594-1-aneesh.kumar@linux.ibm.com
      192b6a78
  10. 10 7月, 2020 2 次提交
  11. 08 7月, 2020 1 次提交
  12. 05 7月, 2020 2 次提交
  13. 02 7月, 2020 1 次提交
  14. 29 6月, 2020 1 次提交
    • A
      powerpc/mm/pkeys: Make pkey access check work on execute_only_key · 19ab500e
      Aneesh Kumar K.V 提交于
      Jan reported that LTP mmap03 was getting stuck in a page fault loop
      after commit c46241a3 ("powerpc/pkeys: Check vma before returning
      key fault error to the user"), as well as a minimised reproducer:
      
        #include <fcntl.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <unistd.h>
        #include <sys/mman.h>
      
        int main(int ac, char **av)
        {
        	int page_sz = getpagesize();
        	int fildes;
        	char *addr;
      
        	fildes = open("tempfile", O_WRONLY | O_CREAT, 0666);
        	write(fildes, &fildes, sizeof(fildes));
        	close(fildes);
      
        	fildes = open("tempfile", O_RDONLY);
        	unlink("tempfile");
      
        	addr = mmap(0, page_sz, PROT_EXEC, MAP_FILE | MAP_PRIVATE, fildes, 0);
      
        	printf("%d\n", *addr);
        	return 0;
        }
      
      And noticed that access_pkey_error() in page fault handler now always
      seem to return false:
      
        __do_page_fault
          access_pkey_error(is_pkey: 1, is_exec: 0, is_write: 0)
            arch_vma_access_permitted
      	pkey_access_permitted
      	  if (!is_pkey_enabled(pkey))
      	    return true
            return false
      
      pkey_access_permitted() should not check if the pkey is available in
      UAMOR (using is_pkey_enabled()). The kernel needs to do that check
      only when allocating keys. This also makes sure the execute_only_key
      which is marked as non-manageable via UAMOR is handled correctly in
      pkey_access_permitted(), and fixes the bug.
      
      Fixes: c46241a3 ("powerpc/pkeys: Check vma before returning key fault error to the user")
      Reported-by: NJan Stancek <jstancek@redhat.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      [mpe: Include bug report details etc. in the change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200627070147.297535-1-aneesh.kumar@linux.ibm.com
      19ab500e
  15. 22 6月, 2020 2 次提交
  16. 20 6月, 2020 1 次提交
    • C
      powerpc/8xx: Provide ptep_get() with 16k pages · c0e1c8c2
      Christophe Leroy 提交于
      READ_ONCE() now enforces atomic read, which leads to:
      
        CC      mm/gup.o
      In file included from ./include/linux/kernel.h:11:0,
                       from mm/gup.c:2:
      In function 'gup_hugepte.constprop',
          inlined from 'gup_huge_pd.isra.79' at mm/gup.c:2465:8:
      ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_222' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
        _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                            ^
      ./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert'
          prefix ## suffix();    \
          ^
      ./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
        _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
        ^
      ./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert'
        compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
        ^
      ./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type'
        compiletime_assert_rwonce_type(x);    \
        ^
      mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE'
        pte = READ_ONCE(*ptep);
              ^
      In function 'gup_get_pte',
          inlined from 'gup_pte_range' at mm/gup.c:2228:9,
          inlined from 'gup_pmd_range' at mm/gup.c:2613:15,
          inlined from 'gup_pud_range' at mm/gup.c:2641:15,
          inlined from 'gup_p4d_range' at mm/gup.c:2666:15,
          inlined from 'gup_pgd_range' at mm/gup.c:2694:15,
          inlined from 'internal_get_user_pages_fast' at mm/gup.c:2795:3:
      ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_219' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
        _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                            ^
      ./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert'
          prefix ## suffix();    \
          ^
      ./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
        _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
        ^
      ./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert'
        compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
        ^
      ./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type'
        compiletime_assert_rwonce_type(x);    \
        ^
      mm/gup.c:2199:9: note: in expansion of macro 'READ_ONCE'
        return READ_ONCE(*ptep);
               ^
      make[2]: *** [mm/gup.o] Error 1
      
      Define ptep_get() on 8xx when using 16k pages.
      
      Fixes: 9e343b46 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
      Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Acked-by: NWill Deacon <will@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/341688399c1b102756046d19ea6ce39db1ae4742.1592225558.git.christophe.leroy@csgroup.eu
      c0e1c8c2
  17. 19 6月, 2020 2 次提交
    • L
      maccess: make get_kernel_nofault() check for minimal type compatibility · 0c389d89
      Linus Torvalds 提交于
      Now that we've renamed probe_kernel_address() to get_kernel_nofault()
      and made it look and behave more in line with get_user(), some of the
      subtle type behavior differences end up being more obvious and possibly
      dangerous.
      
      When you do
      
              get_user(val, user_ptr);
      
      the type of the access comes from the "user_ptr" part, and the above
      basically acts as
      
              val = *user_ptr;
      
      by design (except, of course, for the fact that the actual dereference
      is done with a user access).
      
      Note how in the above case, the type of the end result comes from the
      pointer argument, and then the value is cast to the type of 'val' as
      part of the assignment.
      
      So the type of the pointer is ultimately the more important type both
      for the access itself.
      
      But 'get_kernel_nofault()' may now _look_ similar, but it behaves very
      differently.  When you do
      
              get_kernel_nofault(val, kernel_ptr);
      
      it behaves like
      
              val = *(typeof(val) *)kernel_ptr;
      
      except, of course, for the fact that the actual dereference is done with
      exception handling so that a faulting access is suppressed and returned
      as the error code.
      
      But note how different the casting behavior of the two superficially
      similar accesses are: one does the actual access in the size of the type
      the pointer points to, while the other does the access in the size of
      the target, and ignores the pointer type entirely.
      
      Actually changing get_kernel_nofault() to act like get_user() is almost
      certainly the right thing to do eventually, but in the meantime this
      patch adds logit to at least verify that the pointer type is compatible
      with the type of the result.
      
      In many cases, this involves just casting the pointer to 'void *' to
      make it obvious that the type of the pointer is not the important part.
      It's not how 'get_user()' acts, but at least the behavioral difference
      is now obvious and explicit.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0c389d89
    • C
      maccess: rename probe_kernel_address to get_kernel_nofault · 25f12ae4
      Christoph Hellwig 提交于
      Better describe what this helper does, and match the naming of
      copy_from_kernel_nofault.
      
      Also switch the argument order around, so that it acts and looks
      like get_user().
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      25f12ae4
  18. 18 6月, 2020 2 次提交
  19. 17 6月, 2020 3 次提交
  20. 16 6月, 2020 5 次提交
  21. 15 6月, 2020 1 次提交
  22. 14 6月, 2020 1 次提交
    • M
      treewide: replace '---help---' in Kconfig files with 'help' · a7f7f624
      Masahiro Yamada 提交于
      Since commit 84af7a61 ("checkpatch: kconfig: prefer 'help' over
      '---help---'"), the number of '---help---' has been gradually
      decreasing, but there are still more than 2400 instances.
      
      This commit finishes the conversion. While I touched the lines,
      I also fixed the indentation.
      
      There are a variety of indentation styles found.
      
        a) 4 spaces + '---help---'
        b) 7 spaces + '---help---'
        c) 8 spaces + '---help---'
        d) 1 space + 1 tab + '---help---'
        e) 1 tab + '---help---'    (correct indentation)
        f) 1 tab + 1 space + '---help---'
        g) 1 tab + 2 spaces + '---help---'
      
      In order to convert all of them to 1 tab + 'help', I ran the
      following commend:
      
        $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      a7f7f624
  23. 12 6月, 2020 1 次提交
  24. 11 6月, 2020 2 次提交
  25. 10 6月, 2020 1 次提交