1. 04 2月, 2020 1 次提交
    • A
      powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case · 12e4d53f
      Aneesh Kumar K.V 提交于
      Patch series "Fixup page directory freeing", v4.
      
      This is a repost of patch series from Peter with the arch specific changes
      except ppc64 dropped.  ppc64 changes are added here because we are redoing
      the patch series on top of ppc64 changes.  This makes it easy to backport
      these changes.  Only the first 2 patches need to be backported to stable.
      
      The thing is, on anything SMP, freeing page directories should observe the
      exact same order as normal page freeing:
      
       1) unhook page/directory
       2) TLB invalidate
       3) free page/directory
      
      Without this, any concurrent page-table walk could end up with a
      Use-after-Free.  This is esp.  trivial for anything that has software
      page-table walkers (HAVE_FAST_GUP / software TLB fill) or the hardware
      caches partial page-walks (ie.  caches page directories).
      
      Even on UP this might give issues since mmu_gather is preemptible these
      days.  An interrupt or preempted task accessing user pages might stumble
      into the free page if the hardware caches page directories.
      
      This patch series fixes ppc64 and add generic MMU_GATHER changes to
      support the conversion of other architectures.  I haven't added patches
      w.r.t other architecture because they are yet to be acked.
      
      This patch (of 9):
      
      A followup patch is going to make sure we correctly invalidate page walk
      cache before we free page table pages.  In order to keep things simple
      enable RCU_TABLE_FREE even for !SMP so that we don't have to fixup the
      !SMP case differently in the followup patch
      
      !SMP case is right now broken for radix translation w.r.t page walk
      cache flush.  We can get interrupted in between page table free and
      that would imply we have page walk cache entries pointing to tables
      which got freed already.  Michael said "both our platforms that run on
      Power9 force SMP on in Kconfig, so the !SMP case is unlikely to be a
      problem for anyone in practice, unless they've hacked their kernel to
      build it !SMP."
      
      Link: http://lkml.kernel.org/r/20200116064531.483522-2-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12e4d53f
  2. 05 9月, 2019 5 次提交
  3. 30 8月, 2019 2 次提交
  4. 27 8月, 2019 1 次提交
  5. 04 7月, 2019 1 次提交
  6. 19 6月, 2019 1 次提交
  7. 07 6月, 2019 1 次提交
    • N
      powerpc/64s: Fix THP PMD collapse serialisation · 33258a1d
      Nicholas Piggin 提交于
      Commit 1b2443a5 ("powerpc/book3s64: Avoid multiple endian
      conversion in pte helpers") changed the actual bitwise tests in
      pte_access_permitted by using pte_write() and pte_present() helpers
      rather than raw bitwise testing _PAGE_WRITE and _PAGE_PRESENT bits.
      
      The pte_present() change now returns true for PTEs which are
      !_PAGE_PRESENT and _PAGE_INVALID, which is the combination used by
      pmdp_invalidate() to synchronize access from lock-free lookups.
      pte_access_permitted() is used by pmd_access_permitted(), so allowing
      GUP lock free access to proceed with such PTEs breaks this
      synchronisation.
      
      This bug has been observed on a host using the hash page table MMU,
      with random crashes and corruption in guests, usually together with
      bad PMD messages in the host.
      
      Fix this by adding an explicit check in pmd_access_permitted(), and
      documenting the condition explicitly.
      
      The pte_write() change should be okay, and would prevent GUP from
      falling back to the slow path when encountering savedwrite PTEs, which
      matches what x86 (that does not implement savedwrite) does.
      
      Fixes: 1b2443a5 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      33258a1d
  8. 31 5月, 2019 1 次提交
  9. 02 5月, 2019 2 次提交
  10. 13 3月, 2019 1 次提交
    • M
      treewide: add checks for the return value of memblock_alloc*() · 8a7f97b9
      Mike Rapoport 提交于
      Add check for the return value of memblock_alloc*() functions and call
      panic() in case of error.  The panic message repeats the one used by
      panicing memblock allocators with adjustment of parameters to include
      only relevant ones.
      
      The replacement was mostly automated with semantic patches like the one
      below with manual massaging of format strings.
      
        @@
        expression ptr, size, align;
        @@
        ptr = memblock_alloc(size, align);
        + if (!ptr)
        + 	panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align);
      
      [anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type]
        Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org
      [rppt@linux.ibm.com: fix format strings for panics after memblock_alloc]
        Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com
      [rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails]
        Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx
      [akpm@linux-foundation.org: fix xtensa printk warning]
      Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAnders Roxell <anders.roxell@linaro.org>
      Reviewed-by: Guo Ren <ren_guo@c-sky.com>		[c-sky]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>	[s390]
      Reviewed-by: Juergen Gross <jgross@suse.com>		[Xen]
      Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Acked-by: Max Filippov <jcmvbkbc@gmail.com>		[xtensa]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a7f97b9
  11. 08 3月, 2019 1 次提交
    • M
      powerpc: prefer memblock APIs returning virtual address · f806714f
      Mike Rapoport 提交于
      Patch series "memblock: simplify several early memory allocation", v4.
      
      These patches simplify some of the early memory allocations by replacing
      usage of older memblock APIs with newer and shinier ones.
      
      Quite a few places in the arch/ code allocated memory using a memblock
      API that returns a physical address of the allocated area, then
      converted this physical address to a virtual one and then used memset(0)
      to clear the allocated range.
      
      More recent memblock APIs do all the three steps in one call and their
      usage simplifies the code.
      
      It's important to note that regardless of API used, the core allocation
      is nearly identical for any set of memblock allocators: first it tries
      to find a free memory with all the constraints specified by the caller
      and then falls back to the allocation with some or all constraints
      disabled.
      
      The first three patches perform the conversion of call sites that have
      exact requirements for the node and the possible memory range.
      
      The fourth patch is a bit one-off as it simplifies openrisc's
      implementation of pte_alloc_one_kernel(), and not only the memblock
      usage.
      
      The fifth patch takes care of simpler cases when the allocation can be
      satisfied with a simple call to memblock_alloc().
      
      The sixth patch removes one-liner wrappers for memblock_alloc on arm and
      unicore32, as suggested by Christoph.
      
      This patch (of 6):
      
      There are a several places that allocate memory using memblock APIs that
      return a physical address, convert the returned address to the virtual
      address and frequently also memset(0) the allocated range.
      
      Update these places to use memblock allocators already returning a
      virtual address.  Use memblock functions that clear the allocated memory
      instead of calling memset(0) where appropriate.
      
      The calls to memblock_alloc_base() that were not followed by memset(0)
      are replaced with memblock_alloc_try_nid_raw().  Since the latter does
      not panic() when the allocation fails, the appropriate panic() calls are
      added to the call sites.
      
      Link: http://lkml.kernel.org/r/1546248566-14910-2-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Michal Simek <michal.simek@xilinx.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f806714f
  12. 06 3月, 2019 1 次提交
  13. 31 1月, 2019 1 次提交
    • A
      powerpc/radix: Fix kernel crash with mremap() · 579b9239
      Aneesh Kumar K.V 提交于
      With support for split pmd lock, we use pmd page pmd_huge_pte pointer
      to store the deposited page table. In those config when we move page
      tables we need to make sure we move the deposited page table to the
      correct pmd page. Otherwise this can result in crash when we withdraw
      of deposited page table because we can find the pmd_huge_pte NULL.
      
      eg:
      
        __split_huge_pmd+0x1070/0x1940
        __split_huge_pmd+0xe34/0x1940 (unreliable)
        vma_adjust_trans_huge+0x110/0x1c0
        __vma_adjust+0x2b4/0x9b0
        __split_vma+0x1b8/0x280
        __do_munmap+0x13c/0x550
        sys_mremap+0x220/0x7e0
        system_call+0x5c/0x70
      
      Fixes: 675d9952 ("powerpc/book3s64: Enable split pmd ptlock.")
      Cc: stable@vger.kernel.org # v4.18+
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      579b9239
  14. 04 12月, 2018 2 次提交
  15. 20 10月, 2018 1 次提交
    • A
      powerpc/mm: Fix WARN_ON with THP NUMA migration · dd0e144a
      Aneesh Kumar K.V 提交于
      WARNING: CPU: 12 PID: 4322 at /arch/powerpc/mm/pgtable-book3s64.c:76 set_pmd_at+0x4c/0x2b0
       Modules linked in:
       CPU: 12 PID: 4322 Comm: qemu-system-ppc Tainted: G        W         4.19.0-rc3-00758-g8f0c636b0542 #36
       NIP:  c0000000000872fc LR: c000000000484eec CTR: 0000000000000000
       REGS: c000003fba876fe0 TRAP: 0700   Tainted: G        W          (4.19.0-rc3-00758-g8f0c636b0542)
       MSR:  900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 24282884  XER: 00000000
       CFAR: c000000000484ee8 IRQMASK: 0
       GPR00: c000000000484eec c000003fba877268 c000000001f0ec00 c000003fbd229f80
       GPR04: 00007c8fe8e00000 c000003f864c5a38 860300853e0000c0 0000000000000080
       GPR08: 0000000080000000 0000000000000001 0401000000000080 0000000000000001
       GPR12: 0000000000002000 c000003fffff5400 c000003fce292000 00007c9024570000
       GPR16: 0000000000000000 0000000000ffffff 0000000000000001 c000000001885950
       GPR20: 0000000000000000 001ffffc0004807c 0000000000000008 c000000001f49d05
       GPR24: 00007c8fe8e00000 c0000000020f2468 ffffffffffffffff c000003fcd33b090
       GPR28: 00007c8fe8e00000 c000003fbd229f80 c000003f864c5a38 860300853e0000c0
       NIP [c0000000000872fc] set_pmd_at+0x4c/0x2b0
       LR [c000000000484eec] do_huge_pmd_numa_page+0xb1c/0xc20
       Call Trace:
       [c000003fba877268] [c00000000045931c] mpol_misplaced+0x1bc/0x230 (unreliable)
       [c000003fba8772c8] [c000000000484eec] do_huge_pmd_numa_page+0xb1c/0xc20
       [c000003fba877398] [c00000000040d344] __handle_mm_fault+0x5e4/0x2300
       [c000003fba8774d8] [c00000000040f400] handle_mm_fault+0x3a0/0x420
       [c000003fba877528] [c0000000003ff6f4] __get_user_pages+0x2e4/0x560
       [c000003fba877628] [c000000000400314] get_user_pages_unlocked+0x104/0x2a0
       [c000003fba8776c8] [c000000000118f44] __gfn_to_pfn_memslot+0x284/0x6a0
       [c000003fba877748] [c0000000001463a0] kvmppc_book3s_radix_page_fault+0x360/0x12d0
       [c000003fba877838] [c000000000142228] kvmppc_book3s_hv_page_fault+0x48/0x1300
       [c000003fba877988] [c00000000013dc08] kvmppc_vcpu_run_hv+0x1808/0x1b50
       [c000003fba877af8] [c000000000126b44] kvmppc_vcpu_run+0x34/0x50
       [c000003fba877b18] [c000000000123268] kvm_arch_vcpu_ioctl_run+0x288/0x2d0
       [c000003fba877b98] [c00000000011253c] kvm_vcpu_ioctl+0x1fc/0x8c0
       [c000003fba877d08] [c0000000004e9b24] do_vfs_ioctl+0xa44/0xae0
       [c000003fba877db8] [c0000000004e9c44] ksys_ioctl+0x84/0xf0
       [c000003fba877e08] [c0000000004e9cd8] sys_ioctl+0x28/0x80
      
      We removed the pte_protnone check earlier with the understanding that we
      mark the pte invalid before the set_pte/set_pmd usage. But the huge pmd
      autonuma still use the set_pmd_at directly. This is ok because a protnone pte
      won't have translation cache in TLB.
      
      Fixes: da7ad366 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      dd0e144a
  16. 03 10月, 2018 2 次提交
    • A
      powerpc/mm/book3s: Check for pmd_large instead of pmd_trans_huge · ae28f17b
      Aneesh Kumar K.V 提交于
      Update few code paths to check for pmd_large.
      
      set_pmd_at:
      We want to use this to store swap pte at pmd level. For swap ptes we don't want
      to set H_PAGE_THP_HUGE. Hence check for pmd_large in set_pmd_at. This remove
      the false WARN_ON when using this with swap pmd entry.
      
      pmd_page:
      We don't really use them on pmd migration entries. But they can also work with
      migration entries and we don't differentiate at the pte level. Hence update
      pmd_page to work with pmd migration entries too
      
      __find_linux_pte:
      lockless page table walk need to handle pmd migration entries. pmd_trans_huge
      check will return false on them. We don't set thp = 1 for such entries, but
      update hpage_shift correctly. Without this we will walk pmd migration entries
      as a pte page pointer which is wrong.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ae28f17b
    • A
      powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit · da7ad366
      Aneesh Kumar K.V 提交于
      With this patch we use 0x8000000000000000UL (_PAGE_PRESENT) to indicate a valid
      pgd/pud/pmd entry. We also switch the p**_present() to look at this bit.
      
      With pmd_present, we have a special case. We need to make sure we consider a
      pmd marked invalid during THP split as present. Right now we clear the
      _PAGE_PRESENT bit during a pmdp_invalidate. Inorder to consider this special
      case we add a new pte bit _PAGE_INVALID (mapped to _RPAGE_SW0). This bit is
      only used with _PAGE_PRESENT cleared. Hence we are not really losing a pte bit
      for this special case. pmd_present is also updated to look at _PAGE_INVALID.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      da7ad366
  17. 13 8月, 2018 1 次提交
  18. 07 8月, 2018 1 次提交
  19. 20 6月, 2018 1 次提交
  20. 03 6月, 2018 3 次提交
  21. 15 5月, 2018 7 次提交
  22. 30 3月, 2018 1 次提交
  23. 27 3月, 2018 1 次提交
    • M
      powerpc/mm: Fix section mismatch warning in stop_machine_change_mapping() · bde709a7
      Mauricio Faria de Oliveira 提交于
      Fix the warning messages for stop_machine_change_mapping(), and a number
      of other affected functions in its call chain.
      
      All modified functions are under CONFIG_MEMORY_HOTPLUG, so __meminit
      is okay (keeps them / does not discard them).
      
      Boot-tested on powernv/power9/radix-mmu and pseries/power8/hash-mmu.
      
          $ make -j$(nproc) CONFIG_DEBUG_SECTION_MISMATCH=y vmlinux
          ...
            MODPOST vmlinux.o
          WARNING: vmlinux.o(.text+0x6b130): Section mismatch in reference from the function stop_machine_change_mapping() to the function .meminit.text:create_physical_mapping()
          The function stop_machine_change_mapping() references
          the function __meminit create_physical_mapping().
          This is often because stop_machine_change_mapping lacks a __meminit
          annotation or the annotation of create_physical_mapping is wrong.
      
          WARNING: vmlinux.o(.text+0x6b13c): Section mismatch in reference from the function stop_machine_change_mapping() to the function .meminit.text:create_physical_mapping()
          The function stop_machine_change_mapping() references
          the function __meminit create_physical_mapping().
          This is often because stop_machine_change_mapping lacks a __meminit
          annotation or the annotation of create_physical_mapping is wrong.
          ...
      Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bde709a7
  24. 01 2月, 2018 1 次提交