1. 16 1月, 2016 2 次提交
    • M
      arch/powerpc/include/asm/pgtable-ppc64.h: add pmd_[dirty|mkclean] for THP · d5d6a443
      Minchan Kim 提交于
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
      support.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5d6a443
    • K
      powerpc, thp: remove infrastructure for handling splitting PMDs · 7aa9a23c
      Kirill A. Shutemov 提交于
      With new refcounting we don't need to mark PMDs splitting.  Let's drop
      code to handle this.
      
      pmdp_splitting_flush() is not needed too: on splitting PMD we will do
      pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
      needed for fast_gup.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7aa9a23c
  2. 12 1月, 2016 2 次提交
    • H
      powerpc/mm: fix _PAGE_SWP_SOFT_DIRTY breaking swapoff · 2f10f1a7
      Hugh Dickins 提交于
      Swapoff after swapping hangs on the G5, when CONFIG_CHECKPOINT_RESTORE=y
      but CONFIG_MEM_SOFT_DIRTY is not set.  That's because the non-zero
      _PAGE_SWP_SOFT_DIRTY bit, added by CONFIG_HAVE_ARCH_SOFT_DIRTY=y, is not
      discounted when CONFIG_MEM_SOFT_DIRTY is not set: so swap ptes cannot be
      recognized.
      
      (I suspect that the peculiar dependence of HAVE_ARCH_SOFT_DIRTY on
      CHECKPOINT_RESTORE in arch/powerpc/Kconfig comes from an incomplete
      attempt to solve this problem.)
      
      It's true that the relationship between CONFIG_HAVE_ARCH_SOFT_DIRTY and
      and CONFIG_MEM_SOFT_DIRTY is too confusing, and it's true that swapoff
      should be made more robust; but nevertheless, fix up the powerpc ifdefs
      as x86_64 and s390 (which met the same problem) have them, defining the
      bits as 0 if CONFIG_MEM_SOFT_DIRTY is not set.
      
      Fixes: 7207f436 ("powerpc/mm: Add page soft dirty tracking")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2f10f1a7
    • A
      powerpc/mm: Fix _PAGE_PTE breaking swapoff · 44734f23
      Aneesh Kumar K.V 提交于
      Core kernel expects swp_entry_t to consist of only swap type and swap
      offset. We should not leak pte bits into swp_entry_t. This breaks
      swapoff which use the swap type and offset to build a swp_entry_t and
      later compare that to the swp_entry_t obtained from linux page table
      pte. Leaking pte bits into swp_entry_t breaks that comparison and
      results in us looping in try_to_unuse.
      
      The stack trace can be anywhere below try_to_unuse() in mm/swapfile.c,
      since swapoff is circling around and around that function, reading from
      each used swap block into a page, then trying to find where that page
      belongs, looking at every non-file pte of every mm that ever swapped.
      
      Fixes: 6a119eae ("powerpc/mm: Add a _PAGE_PTE bit")
      Reported-by: NHugh Dickins <hughd@google.com>
      Suggested-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      44734f23
  3. 17 12月, 2015 1 次提交
    • L
      powerpc/mm: Add page soft dirty tracking · 7207f436
      Laurent Dufour 提交于
      User space checkpoint and restart tool (CRIU) needs the page's change
      to be soft tracked. This allows to do a pre checkpoint and then dump
      only touched pages.
      
      This is done by using a newly assigned PTE bit (_PAGE_SOFT_DIRTY) when
      the page is backed in memory, and a new _PAGE_SWP_SOFT_DIRTY bit when
      the page is swapped out.
      
      To introduce a new PTE _PAGE_SOFT_DIRTY bit value common to hash 4k
      and hash 64k pte, the bits already defined in hash-*4k.h should be
      shifted left by one.
      
      The _PAGE_SWP_SOFT_DIRTY bit is dynamically put after the swap type in
      the swap pte. A check is added to ensure that the bit is not
      overwritten by _PAGE_HPTEFLAGS.
      Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7207f436
  4. 14 12月, 2015 10 次提交
  5. 12 10月, 2015 1 次提交
    • A
      powerpc/mm: Differentiate between hugetlb and THP during page walk · 891121e6
      Aneesh Kumar K.V 提交于
      We need to properly identify whether a hugepage is an explicit or
      a transparent hugepage in follow_huge_addr(). We used to depend
      on hugepage shift argument to do that. But in some case that can
      result in wrong results. For ex:
      
      On finding a transparent hugepage we set hugepage shift to PMD_SHIFT.
      But we can end up clearing the thp pte, via pmdp_huge_get_and_clear.
      We do prevent reusing the pfn page via the usage of
      kick_all_cpus_sync(). But that happens after we updated the pte to 0.
      Hence in follow_huge_addr() we can find hugepage shift set, but transparent
      huge page check fail for a thp pte.
      
      NOTE: We fixed a variant of this race against thp split in commit
      691e95fd
      ("powerpc/mm/thp: Make page table walk safe against thp split/collapse")
      
      Without this patch, we may hit the BUG_ON(flags & FOLL_GET) in
      follow_page_mask occasionally.
      
      In the long term, we may want to switch ppc64 64k page size config to
      enable CONFIG_ARCH_WANT_GENERAL_HUGETLB
      Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      891121e6
  6. 18 8月, 2015 2 次提交
    • M
      powerpc/mm: Drop the 64K on 4K version of pte_pagesize_index() · 95300577
      Michael Ellerman 提交于
      Now that support for 64k pages with a 4K kernel is removed, this code is
      unreachable.
      
      CONFIG_PPC_HAS_HASH_64K can only be true when CONFIG_PPC_64K_PAGES is
      also true.
      
      But when CONFIG_PPC_64K_PAGES is true we include pte-hash64.h which
      includes pte-hash64-64k.h, which defines both pte_pagesize_index() and
      crucially __real_pte, which means this definition can never be used.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      95300577
    • M
      powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash · 74b5037b
      Michael Ellerman 提交于
      The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
      PAGE_SIZE.
      
      However when built with a 4K PAGE_SIZE there is an additional config
      option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
      also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.
      
      This is used in one obscure configuration, to support 64K pages for SPU
      local store on the Cell processor when the rest of the kernel is using
      4K pages.
      
      In this configuration, pte_pagesize_index() is defined to just pass
      through its arguments to get_slice_psize(). However pte_pagesize_index()
      is called for both user and kernel addresses, whereas get_slice_psize()
      only knows how to handle user addresses.
      
      This has been broken forever, however until recently it happened to
      work. That was because in get_slice_psize() the large kernel address
      would cause the right shift of the slice mask to return zero.
      
      However in commit 7aa0727f ("powerpc/mm: Increase the slice range to
      64TB"), the get_slice_psize() code was changed so that instead of a
      right shift we do an array lookup based on the address. When passed a
      kernel address this means we index way off the end of the slice array
      and return random junk.
      
      That is only fatal if we happen to hit something non-zero, but when we
      do return a non-zero value we confuse the MMU code and eventually cause
      a check stop.
      
      This fix is ugly, but simple. When we're called for a kernel address we
      return 4K, which is always correct in this configuration, otherwise we
      use the slice mask.
      
      Fixes: 7aa0727f ("powerpc/mm: Increase the slice range to 64TB")
      Reported-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      74b5037b
  7. 25 6月, 2015 3 次提交
  8. 19 6月, 2015 1 次提交
  9. 11 5月, 2015 1 次提交
  10. 17 2月, 2015 1 次提交
  11. 12 2月, 2015 1 次提交
  12. 11 12月, 2014 1 次提交
  13. 14 11月, 2014 2 次提交
  14. 02 10月, 2014 1 次提交
  15. 13 8月, 2014 1 次提交
    • A
      powerpc/thp: Handle combo pages in invalidate · fc047955
      Aneesh Kumar K.V 提交于
      If we changed base page size of the segment, either via sub_page_protect
      or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash
      table entries. We do a lazy hash page table flush for all mapped pages
      in the demoted segment. This happens when we handle hash page fault for
      these pages.
      
      We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a
      pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte,
      that implies that we could possibly have older 64K hash pte entries in
      the hash page table and we need to invalidate those entries.
      
      Use _PAGE_COMBO to determine the page size with which we should
      invalidate the hash table entries on unmap.
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fc047955
  16. 17 2月, 2014 1 次提交
  17. 29 1月, 2014 1 次提交
  18. 15 1月, 2014 1 次提交
    • A
      powerpc/thp: Fix crash on mremap · b3084f4d
      Aneesh Kumar K.V 提交于
      This patch fix the below crash
      
      NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
      LR [c0000000000439ac] .hash_page+0x18c/0x5e0
      ...
      Call Trace:
      [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
      [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
      [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
      
      On ppc64 we use the pgtable for storing the hpte slot information and
      store address to the pgtable at a constant offset (PTRS_PER_PMD) from
      pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
      the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
      from new pmd.
      
      We also want to move the withdraw and deposit before the set_pmd so
      that, when page fault find the pmd as trans huge we can be sure that
      pgtable can be located at the offset.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b3084f4d
  19. 11 10月, 2013 1 次提交
    • A
      powerpc: Prepare to support kernel handling of IOMMU map/unmap · 8e0861fa
      Alexey Kardashevskiy 提交于
      The current VFIO-on-POWER implementation supports only user mode
      driven mapping, i.e. QEMU is sending requests to map/unmap pages.
      However this approach is really slow, so we want to move that to KVM.
      Since H_PUT_TCE can be extremely performance sensitive (especially with
      network adapters where each packet needs to be mapped/unmapped) we chose
      to implement that as a "fast" hypercall directly in "real
      mode" (processor still in the guest context but MMU off).
      
      To be able to do that, we need to provide some facilities to
      access the struct page count within that real mode environment as things
      like the sparsemem vmemmap mappings aren't accessible.
      
      This adds an API function realmode_pfn_to_page() to get page struct when
      MMU is off.
      
      This adds to MM a new function put_page_unless_one() which drops a page
      if counter is bigger than 1. It is going to be used when MMU is off
      (for example, real mode on PPC64) and we want to make sure that page
      release will not happen in real mode as it may crash the kernel in
      a horrible way.
      
      CONFIG_SPARSEMEM_VMEMMAP and CONFIG_FLATMEM are supported.
      
      Cc: linux-mm@kvack.org
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8e0861fa
  20. 21 6月, 2013 5 次提交
  21. 30 4月, 2013 1 次提交