提交 · d5d6a443b24304711fe83b312d29ff26cfa03f0c · openeuler / Kernel

16 1月, 2016 2 次提交

arch/powerpc/include/asm/pgtable-ppc64.h: add pmd_[dirty|mkclean] for THP · d5d6a443

由 Minchan Kim 提交于 1月 15, 2016

MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
of the contents since MADV_FREE syscall is called for THP page.

This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
support.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Shaohua Li <shli@kernel.org>
Cc: <yalin.wang2010@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jason Evans <je@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttil <mika.penttila@nextfour.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Shaohua Li <shli@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d5d6a443

powerpc, thp: remove infrastructure for handling splitting PMDs · 7aa9a23c

由 Kirill A. Shutemov 提交于 1月 15, 2016

With new refcounting we don't need to mark PMDs splitting.  Let's drop
code to handle this.

pmdp_splitting_flush() is not needed too: on splitting PMD we will do
pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
needed for fast_gup.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7aa9a23c

12 1月, 2016 2 次提交

powerpc/mm: fix _PAGE_SWP_SOFT_DIRTY breaking swapoff · 2f10f1a7

由 Hugh Dickins 提交于 1月 09, 2016

Swapoff after swapping hangs on the G5, when CONFIG_CHECKPOINT_RESTORE=y
but CONFIG_MEM_SOFT_DIRTY is not set.  That's because the non-zero
_PAGE_SWP_SOFT_DIRTY bit, added by CONFIG_HAVE_ARCH_SOFT_DIRTY=y, is not
discounted when CONFIG_MEM_SOFT_DIRTY is not set: so swap ptes cannot be
recognized.

(I suspect that the peculiar dependence of HAVE_ARCH_SOFT_DIRTY on
CHECKPOINT_RESTORE in arch/powerpc/Kconfig comes from an incomplete
attempt to solve this problem.)

It's true that the relationship between CONFIG_HAVE_ARCH_SOFT_DIRTY and
and CONFIG_MEM_SOFT_DIRTY is too confusing, and it's true that swapoff
should be made more robust; but nevertheless, fix up the powerpc ifdefs
as x86_64 and s390 (which met the same problem) have them, defining the
bits as 0 if CONFIG_MEM_SOFT_DIRTY is not set.

Fixes: 7207f436 ("powerpc/mm: Add page soft dirty tracking")
Signed-off-by: NHugh Dickins <hughd@google.com>
Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2f10f1a7

powerpc/mm: Fix _PAGE_PTE breaking swapoff · 44734f23

由 Aneesh Kumar K.V 提交于 1月 11, 2016

Core kernel expects swp_entry_t to consist of only swap type and swap
offset. We should not leak pte bits into swp_entry_t. This breaks
swapoff which use the swap type and offset to build a swp_entry_t and
later compare that to the swp_entry_t obtained from linux page table
pte. Leaking pte bits into swp_entry_t breaks that comparison and
results in us looping in try_to_unuse.

The stack trace can be anywhere below try_to_unuse() in mm/swapfile.c,
since swapoff is circling around and around that function, reading from
each used swap block into a page, then trying to find where that page
belongs, looking at every non-file pte of every mm that ever swapped.

Fixes: 6a119eae ("powerpc/mm: Add a _PAGE_PTE bit")
Reported-by: NHugh Dickins <hughd@google.com>
Suggested-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

44734f23

17 12月, 2015 1 次提交

powerpc/mm: Add page soft dirty tracking · 7207f436

由 Laurent Dufour 提交于 12月 03, 2015

User space checkpoint and restart tool (CRIU) needs the page's change
to be soft tracked. This allows to do a pre checkpoint and then dump
only touched pages.

This is done by using a newly assigned PTE bit (_PAGE_SOFT_DIRTY) when
the page is backed in memory, and a new _PAGE_SWP_SOFT_DIRTY bit when
the page is swapped out.

To introduce a new PTE _PAGE_SOFT_DIRTY bit value common to hash 4k
and hash 64k pte, the bits already defined in hash-*4k.h should be
shifted left by one.

The _PAGE_SWP_SOFT_DIRTY bit is dynamically put after the swap type in
the swap pte. A check is added to ensure that the bit is not
overwritten by _PAGE_HPTEFLAGS.
Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7207f436

14 12月, 2015 10 次提交

powerpc/mm: Don't hardcode the hash pte slot shift · 4d9057c3

由 Aneesh Kumar K.V 提交于 12月 01, 2015

Use the #define instead of open-coding the same
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4d9057c3

powerpc/mm: Add a _PAGE_PTE bit · 6a119eae

由 Aneesh Kumar K.V 提交于 12月 01, 2015

For a pte entry we will have _PAGE_PTE set. Our pte page
address have a minimum alignment requirement of HUGEPD_SHIFT_MASK + 1.
We use the lower 7 bits to indicate hugepd. ie.

For pmd and pgd we can find:
1) _PAGE_PTE set pte -> indicate PTE
2) bits [2..6] non zero -> indicate hugepd.
   They also encode the size. We skip bit 1 (_PAGE_PRESENT).
3) othewise pointer to next table.
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6a119eae

powerpc/mm: Move PTE bits from generic functions to hash64 functions. · 1ca72129

由 Aneesh Kumar K.V 提交于 12月 01, 2015

functions which operate on pte bits are moved to hash*.h and other
generic functions are moved to pgtable.h
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1ca72129

powerpc/mm: Move hash64 PTE bits from book3s/64/pgtable.h to hash.h · 371352ca

由 Aneesh Kumar K.V 提交于 12月 01, 2015

This enables us to keep hash64 related bits together, and makes it easy
to follow.
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

371352ca

powerpc/mm: Don't use pmd_val, pud_val and pgd_val as lvalue · f281b5d5

由 Aneesh Kumar K.V 提交于 12月 01, 2015

We convert them static inline function here as we did with pte_val in
the previous patch
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f281b5d5

powerpc/mm: Drop pte-common.h from BOOK3S 64 · b0412ea9

由 Aneesh Kumar K.V 提交于 12月 01, 2015

We copy only needed PTE bits define from pte-common.h to respective
hash related header. This should greatly simply later patches in which
we are going to change the pte format for hash config
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b0412ea9

powerpc/mm: Delete booke bits from book3s · cbbb8683

由 Aneesh Kumar K.V 提交于 12月 01, 2015

We also move __ASSEMBLY__ towards the end of header. This avoid
having #ifndef __ASSEMBLY___ all over the header
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

cbbb8683

powerpc/mm: Move hash specific pte width and other defines to book3s · ab537dca

由 Aneesh Kumar K.V 提交于 12月 01, 2015

This further make a copy of pte defines to book3s/64/hash*.h. This
remove the dependency on pgtable-ppc64-4k.h and pgtable-ppc64-64k.h
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ab537dca

powerpc/mm: make a separate copy for book3s · 3dfcb315

由 Aneesh Kumar K.V 提交于 12月 01, 2015

In this patch we do:
cp pgtable-ppc32.h book3s/32/pgtable.h
cp pgtable-ppc64.h book3s/64/pgtable.h

This enable us to do further changes to hash specific config.
We will change the page table format for 64bit hash in later patches.
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3dfcb315

powerpc/mm: move pte headers to book3s directory · 26b6a3d9

由 Aneesh Kumar K.V 提交于 12月 01, 2015

Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

26b6a3d9

12 10月, 2015 1 次提交

powerpc/mm: Differentiate between hugetlb and THP during page walk · 891121e6

由 Aneesh Kumar K.V 提交于 10月 09, 2015

We need to properly identify whether a hugepage is an explicit or
a transparent hugepage in follow_huge_addr(). We used to depend
on hugepage shift argument to do that. But in some case that can
result in wrong results. For ex:

On finding a transparent hugepage we set hugepage shift to PMD_SHIFT.
But we can end up clearing the thp pte, via pmdp_huge_get_and_clear.
We do prevent reusing the pfn page via the usage of
kick_all_cpus_sync(). But that happens after we updated the pte to 0.
Hence in follow_huge_addr() we can find hugepage shift set, but transparent
huge page check fail for a thp pte.

NOTE: We fixed a variant of this race against thp split in commit
691e95fd
("powerpc/mm/thp: Make page table walk safe against thp split/collapse")

Without this patch, we may hit the BUG_ON(flags & FOLL_GET) in
follow_page_mask occasionally.

In the long term, we may want to switch ppc64 64k page size config to
enable CONFIG_ARCH_WANT_GENERAL_HUGETLB
Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

891121e6

18 8月, 2015 2 次提交

powerpc/mm: Drop the 64K on 4K version of pte_pagesize_index() · 95300577

由 Michael Ellerman 提交于 8月 07, 2015

Now that support for 64k pages with a 4K kernel is removed, this code is
unreachable.

CONFIG_PPC_HAS_HASH_64K can only be true when CONFIG_PPC_64K_PAGES is
also true.

But when CONFIG_PPC_64K_PAGES is true we include pte-hash64.h which
includes pte-hash64-64k.h, which defines both pte_pagesize_index() and
crucially __real_pte, which means this definition can never be used.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

95300577

powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash · 74b5037b

由 Michael Ellerman 提交于 8月 07, 2015

The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
PAGE_SIZE.

However when built with a 4K PAGE_SIZE there is an additional config
option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.

This is used in one obscure configuration, to support 64K pages for SPU
local store on the Cell processor when the rest of the kernel is using
4K pages.

In this configuration, pte_pagesize_index() is defined to just pass
through its arguments to get_slice_psize(). However pte_pagesize_index()
is called for both user and kernel addresses, whereas get_slice_psize()
only knows how to handle user addresses.

This has been broken forever, however until recently it happened to
work. That was because in get_slice_psize() the large kernel address
would cause the right shift of the slice mask to return zero.

However in commit 7aa0727f ("powerpc/mm: Increase the slice range to
64TB"), the get_slice_psize() code was changed so that instead of a
right shift we do an array lookup based on the address. When passed a
kernel address this means we index way off the end of the slice array
and return random junk.

That is only fatal if we happen to hit something non-zero, but when we
do return a non-zero value we confuse the MMU code and eventually cause
a check stop.

This fix is ugly, but simple. When we're called for a kernel address we
return 4K, which is always correct in this configuration, otherwise we
use the slice mask.

Fixes: 7aa0727f ("powerpc/mm: Increase the slice range to 64TB")
Reported-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

74b5037b

25 6月, 2015 3 次提交

mm: clarify that the function operates on hugepage pte · 8809aa2d

由 Aneesh Kumar K.V 提交于 6月 24, 2015

We have confusing functions to clear pmd, pmd_clear_* and pmd_clear.  Add
_huge_ to pmdp_clear functions so that we are clear that they operate on
hugepage pte.

We don't bother about other functions like pmdp_set_wrprotect,
pmdp_clear_flush_young, because they operate on PTE bits and hence
indicate they are operating on hugepage ptes
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8809aa2d

powerpc/mm: use generic version of pmdp_clear_flush() · f28b6ff8

由 Aneesh Kumar K.V 提交于 6月 24, 2015

Also move the pmd_trans_huge check to generic code.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f28b6ff8

mm/thp: split out pmd collapse flush into separate functions · 15a25b2e

由 Aneesh Kumar K.V 提交于 6月 24, 2015

Architectures like ppc64 [1] need to do special things while clearing pmd
before a collapse. For them this operation is largely different from a
normal hugepage pte clear. Hence add a separate function to clear pmd
before collapse. After this patch pmdp_* functions operate only on
hugepage pte, and not on regular pmd_t values pointing to page table.

[1] ppc64 needs to invalidate all the normal page pte mappings we already
have inserted in the hardware hash page table. But before doing that we
need to make sure there are no parallel hash page table insert going on.
So we need to do a kick_all_cpus_sync() before flushing the older hash
table entries. By moving this to a separate function we capture these
details and mention how it is different from a hugepage pte clear.

This patch is a cleanup and only does code movement for clarity. There
should not be any change in functionality.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

15a25b2e

19 6月, 2015 1 次提交

powerpc/mm: Change the swap encoding in pte. · e1483158

由 Aneesh Kumar K.V 提交于 6月 17, 2015

Current swap encoding in pte can't support large pfns
above 4TB. Change the swap encoding such that we put
the swap type in the PTE bits. Also add build checks
to make sure we don't overlap with HPTEFLAGS.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e1483158

11 5月, 2015 1 次提交

powerpc: Make STRICT_MM_TYPECHECKS a config option · f1e7c202

由 Michael Ellerman 提交于 3月 25, 2015

The STRICT_MM_TYPECHECKS code has bit-rotted over the years. To make it
possible to easily build test it, make it a CONFIG option.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f1e7c202

17 2月, 2015 1 次提交

powerpc: drop _PAGE_FILE and pte_file()-related helpers · 780fc564

由 Kirill A. Shutemov 提交于 2月 16, 2015

We've replaced remap_file_pages(2) implementation with emulation.  Nobody
creates non-linear mapping anymore.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

780fc564

12 2月, 2015 1 次提交

mm: make FIRST_USER_ADDRESS unsigned long on all archs · d016bf7e

由 Kirill A. Shutemov 提交于 2月 11, 2015

LKP has triggered a compiler warning after my recent patch "mm: account
pmd page tables to the process":

    mm/mmap.c: In function 'exit_mmap':
 >> mm/mmap.c:2857:2: warning: right shift count >= width of type [enabled by default]

The code:

 > 2857                WARN_ON(mm_nr_pmds(mm) >
   2858                                round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);

In this, on tile, we have FIRST_USER_ADDRESS defined as 0.  round_up() has
the same type -- int.  PUD_SHIFT.

I think the best way to fix it is to define FIRST_USER_ADDRESS as unsigned
long.  On every arch for consistency.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d016bf7e

11 12月, 2014 1 次提交

mm: fix huge zero page accounting in smaps report · c164e038

由 Kirill A. Shutemov 提交于 12月 10, 2014

As a small zero page, huge zero page should not be accounted in smaps
report as normal page.

For small pages we rely on vm_normal_page() to filter out zero page, but
vm_normal_page() is not designed to handle pmds.  We only get here due
hackish cast pmd to pte in smaps_pte_range() -- pte and pmd format is not
necessary compatible on each and every architecture.

Let's add separate codepath to handle pmds.  follow_trans_huge_pmd() will
detect huge zero page for us.

We would need pmd_dirty() helper to do this properly.  The patch adds it
to THP-enabled architectures which don't yet have one.

[akpm@linux-foundation.org: use do_div to fix 32-bit build]
Signed-off-by: N"Kirill A. Shutemov" <kirill@shutemov.name>
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Tested-by: NFengwei Yin <yfw.kernel@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c164e038

14 11月, 2014 2 次提交

powerpc/mm: Switch to generic RCU get_user_pages_fast · b30e7590

由 Aneesh Kumar K.V 提交于 11月 05, 2014

This patch switch the ppc arch to use the generic RCU based
gup implementation.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b30e7590

powerpc/mm: Add missing pmd accessors · 06743521

由 Aneesh Kumar K.V 提交于 11月 05, 2014

This patch add documentation and missing accessors.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

06743521

02 10月, 2014 1 次提交

powerpc: Add printk levels to powerpc code · a7696b36

由 Anton Blanchard 提交于 9月 17, 2014

Add printk levels to some places in the powerpc port.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a7696b36

13 8月, 2014 1 次提交

powerpc/thp: Handle combo pages in invalidate · fc047955

由 Aneesh Kumar K.V 提交于 8月 13, 2014

If we changed base page size of the segment, either via sub_page_protect
or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash
table entries. We do a lazy hash page table flush for all mapped pages
in the demoted segment. This happens when we handle hash page fault for
these pages.

We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a
pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte,
that implies that we could possibly have older 64K hash pte entries in
the hash page table and we need to invalidate those entries.

Use _PAGE_COMBO to determine the page size with which we should
invalidate the hash table entries on unmap.

CC: <stable@vger.kernel.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fc047955

17 2月, 2014 1 次提交

powerpc/mm: Add new "set" flag argument to pte/pmd update function · 88247e8d

由 Aneesh Kumar K.V 提交于 2月 12, 2014

pte_update() is a powerpc-ism used to change the bits of a PTE
when the access permission is being restricted (a flush is
potentially needed).

It uses atomic operations on when needed and handles the hash
synchronization on hash based processors.

It is currently only used to clear PTE bits and so the current
implementation doesn't provide a way to also set PTE bits.

The new _PAGE_NUMA bit, when set, is actually restricting access
so it must use that function too, so this change adds the ability
for pte_update() to also set bits.

We will use this later to set the _PAGE_NUMA bit.
Acked-by: NMel Gorman <mgorman@suse.de>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

88247e8d

29 1月, 2014 1 次提交

powerpc/mm: Fix compile error of pgtable-ppc64.h · fd120dc2

由 Li Zhong 提交于 1月 28, 2014

It seems that forward declaration couldn't work well with typedef, use
struct spinlock directly to avoiding following build errors:

In file included from include/linux/spinlock.h:81,
                 from include/linux/seqlock.h:35,
                 from include/linux/time.h:5,
                 from include/uapi/linux/timex.h:56,
                 from include/linux/timex.h:56,
                 from include/linux/sched.h:17,
                 from arch/powerpc/kernel/asm-offsets.c:17:
include/linux/spinlock_types.h:76: error: redefinition of typedef 'spinlock_t'
/root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: previous declaration of 'spinlock_t' was here
Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fd120dc2

15 1月, 2014 1 次提交

powerpc/thp: Fix crash on mremap · b3084f4d

由 Aneesh Kumar K.V 提交于 1月 13, 2014

This patch fix the below crash

NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
LR [c0000000000439ac] .hash_page+0x18c/0x5e0
...
Call Trace:
[c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
[437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
[437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58

On ppc64 we use the pgtable for storing the hpte slot information and
store address to the pgtable at a constant offset (PTRS_PER_PMD) from
pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
from new pmd.

We also want to move the withdraw and deposit before the set_pmd so
that, when page fault find the pmd as trans huge we can be sure that
pgtable can be located at the offset.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

b3084f4d

11 10月, 2013 1 次提交

powerpc: Prepare to support kernel handling of IOMMU map/unmap · 8e0861fa

由 Alexey Kardashevskiy 提交于 8月 28, 2013

The current VFIO-on-POWER implementation supports only user mode
driven mapping, i.e. QEMU is sending requests to map/unmap pages.
However this approach is really slow, so we want to move that to KVM.
Since H_PUT_TCE can be extremely performance sensitive (especially with
network adapters where each packet needs to be mapped/unmapped) we chose
to implement that as a "fast" hypercall directly in "real
mode" (processor still in the guest context but MMU off).

To be able to do that, we need to provide some facilities to
access the struct page count within that real mode environment as things
like the sparsemem vmemmap mappings aren't accessible.

This adds an API function realmode_pfn_to_page() to get page struct when
MMU is off.

This adds to MM a new function put_page_unless_one() which drops a page
if counter is bigger than 1. It is going to be used when MMU is off
(for example, real mode on PPC64) and we want to make sure that page
release will not happen in real mode as it may crash the kernel in
a horrible way.

CONFIG_SPARSEMEM_VMEMMAP and CONFIG_FLATMEM are supported.

Cc: linux-mm@kvack.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

8e0861fa

21 6月, 2013 5 次提交

powerpc/THP: Enable THP on PPC64 · 437d4964

由 Aneesh Kumar K.V 提交于 6月 20, 2013

We enable only if the we support 16MB page size.
Reviewed-by: NDavid Gibson <dwg@au1.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

437d4964

powerpc: Replace find_linux_pte with find_linux_pte_or_hugepte · 12bc9f6f

由 Aneesh Kumar K.V 提交于 6月 20, 2013

Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly
document why we don't need to handle transparent hugepages at callsites.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

12bc9f6f

powerpc: move find_linux_pte_or_hugepte and gup_hugepte to common code · 29409997

由 Aneesh Kumar K.V 提交于 6月 20, 2013

We will use this in the later patch for handling THP pages
Reviewed-by: NDavid Gibson <dwg@au1.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

29409997

powerpc/THP: Implement transparent hugepages for ppc64 · 074c2eae

由 Aneesh Kumar K.V 提交于 6月 20, 2013

We now have pmd entries covering 16MB range and the PMD table double its original size.
We use the second half of the PMD table to deposit the pgtable (PTE page).
The depoisted PTE page is further used to track the HPTE information. The information
include [ secondary group | 3 bit hidx | valid ]. We use one byte per each HPTE entry.
With 16MB hugepage and 64K HPTE we need 256 entries and with 4K HPTE we need
4096 entries. Both will fit in a 4K PTE page. On hugepage invalidate we need to walk
the PTE page and invalidate all valid HPTEs.

This patch implements necessary arch specific functions for THP support and also
hugepage invalidate logic. These PMD related functions are intentionally kept
similar to their PTE counter-part.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

074c2eae

powerpc/THP: Double the PMD table size for THP · f940f528

由 Aneesh Kumar K.V 提交于 6月 20, 2013

THP code does PTE page allocation along with large page request and deposit them
for later use. This is to ensure that we won't have any failures when we split
hugepages to regular pages.

On powerpc we want to use the deposited PTE page for storing hash pte slot and
secondary bit information for the HPTEs. We use the second half
of the pmd table to save the deposted PTE page.
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f940f528

30 4月, 2013 1 次提交

powerpc: Don't truncate pgd_index wrongly · 0e5f35d0

由 Aneesh Kumar K.V 提交于 4月 28, 2013

With PGD_INDEX_SIZE set to 12 the existing macro doesn't work. Fix it to
use PTRS_PER_PGD

The idea originally was to have one more bit in the result of
pgd_index() than PGD_INDEX_SIZE, so that if one had an address
corresponding to the last PGD entry, and then incremented that address
by PGD_SIZE, and took pgd_index() of that, you wouldn't end up with
zero.  The commit that introduced that dates back to 2002, and the
code that was sensitive to that edge case has long since been
refactored (several times), so there is no need for it these days.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0e5f35d0

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功