提交 · 4ffe713b7587b14695c9bec26a000fc88ef54895 · openeuler / Kernel

14 10月, 2018 9 次提交

powerpc/mm: Increase the max addressable memory to 2PB · 4ffe713b

由 Aneesh Kumar K.V 提交于 9月 20, 2018

Currently we limit the max addressable memory to 128TB. This patch increase the
limit to 2PB. We can have devices like nvdimm which adds memory above 512TB
limit.

We still don't support regular system ram above 512TB. One of the challenge with
that is the percpu allocator, that allocates per node memory and use the max
distance between them as the percpu offsets. This means with large gap in
address space ( system ram above 1PB) we will run out of vmalloc space to map
the percpu allocation.

In order to support addressable memory above 512TB, kernel should be able to
linear map this range. To do that with hash translation we now add 4 context
to kernel linear map region. Our per context addressable range is 512TB. We
still keep VMALLOC and VMEMMAP region to old size. SLB miss handlers is updated
to validate these limit.

We also limit this update to SPARSEMEM_VMEMMAP and SPARSEMEM_EXTREME
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4ffe713b

powerpc/mm/hash: Rename get_ea_context to get_user_context · c9f80734

由 Aneesh Kumar K.V 提交于 9月 20, 2018

We will be adding get_kernel_context later. Update function name to indicate
this handle context allocation user space address.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c9f80734

powerpc/64s/hash: Simplify slb_flush_and_rebolt() · 94ee4272

由 Nicholas Piggin 提交于 10月 03, 2018

slb_flush_and_rebolt() is misleading, it is called in virtual mode, so
it can not possibly change the stack, so it should not be touching the
shadow area. And since vmalloc is no longer bolted, it should not
change any bolted mappings at all.

Change the name to slb_flush_and_restore_bolted(), and have it just
load the kernel stack from what's currently in the shadow SLB area.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

94ee4272

powerpc/64s/hash: Provide arch_setup_exec() hooks for hash slice setup · 425d3314

由 Nicholas Piggin 提交于 9月 15, 2018

This will be used by the SLB code in the next patch, but for now this
sets the slb_addr_limit to the correct size for 32-bit tasks.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

425d3314

powerpc/book3s64: Avoid multiple endian conversion in pte helpers · 1b2443a5

由 Christophe Leroy 提交于 10月 09, 2018

In the same spirit as already done in pte query helpers,
this patch changes pte setting helpers to perform endian
conversions on the constants rather than on the pte value.

In the meantime, it changes pte_access_permitted() to use
pte helpers for the same reason.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1b2443a5

powerpc/mm: move __P and __S tables in the common pgtable.h · f4805785

由 Christophe Leroy 提交于 10月 09, 2018

__P and __S flags are the same for all platform and should remain
as is in the future, so avoid duplication.
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f4805785

powerpc/mm: drop unused page flags · 093d7ca2

由 Christophe Leroy 提交于 10月 09, 2018

The following page flags in pte-common.h can be dropped:

_PAGE_ENDIAN is only used in mm/fsl_booke_mmu.c and is defined in
asm/nohash/32/pte-fsl-booke.h

_PAGE_4K_PFN is nowhere defined nor used

_PAGE_READ, _PAGE_WRITE and _PAGE_PTE are only defined and used
in book3s/64

The following page flags in book3s/64/pgtable.h can be dropped as
they are not used on this platform nor by common code.

_PAGE_NA, _PAGE_RO, _PAGE_USER and _PAGE_PSIZE
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

093d7ca2

powerpc/mm: add pte helpers to query and change pte flags · daba7902

由 Christophe Leroy 提交于 10月 09, 2018

In order to avoid using generic _PAGE_XXX flags in powerpc
core functions, define helpers for all needed flags:
- pte_mkuser() and pte_mkprivileged() to set/unset and/or
unset/set _PAGE_USER and/or _PAGE_PRIVILEGED
- pte_hashpte() to check if _PAGE_HASHPTE is set.
- pte_ci() check if cache is inhibited (already existing on book3s/64)
- pte_exprotect() to protect against execution
- pte_exec() and pte_mkexec() to query and set page execution
- pte_mkpte() to set _PAGE_PTE flag.
- pte_hw_valid() to check _PAGE_PRESENT since pte_present does
something different on book3s/64.

On book3s/32 there is no exec protection, so pte_mkexec() and
pte_exprotect() are nops and pte_exec() returns always true.
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

daba7902

powerpc: handover page flags with a pgprot_t parameter · c766ee72

由 Christophe Leroy 提交于 10月 09, 2018

In order to avoid multiple conversions, handover directly a
pgprot_t to map_kernel_page() as already done for radix.

Do the same for __ioremap_caller() and __ioremap_at().
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c766ee72

03 10月, 2018 5 次提交

powerpc/mm:book3s: Enable THP migration support · a0820ff3

由 Aneesh Kumar K.V 提交于 9月 20, 2018

Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a0820ff3

powerpc/mm/thp: update pmd_trans_huge to check for pmd_present · 8890e033

由 Aneesh Kumar K.V 提交于 9月 20, 2018

We need to make sure pmd_trans_huge returns false for a pmd migration entry.
We mark the migration entry by clearing the _PAGE_PRESENT bit. We keep the
_PAGE_PTE bit set to indicate a leaf page table entry. Hence we need to make
sure we check for pmd_present() so that pmd_trans_huge won't return true on
pmd migration entry.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8890e033

powerpc/mm/hugetlb/book3s: add _PAGE_PRESENT to hugepd pointer. · f1981b5b

由 Aneesh Kumar K.V 提交于 9月 20, 2018

This make hugetlb directory pointer similar to other page able entries. A hugepd
entry is identified by lack of _PAGE_PTE bit set and directory size stored in
HUGEPD_SHIFT_MASK. We update that to also look at _PAGE_PRESENT
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f1981b5b

powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit · da7ad366

由 Aneesh Kumar K.V 提交于 9月 20, 2018

With this patch we use 0x8000000000000000UL (_PAGE_PRESENT) to indicate a valid
pgd/pud/pmd entry. We also switch the p**_present() to look at this bit.

With pmd_present, we have a special case. We need to make sure we consider a
pmd marked invalid during THP split as present. Right now we clear the
_PAGE_PRESENT bit during a pmdp_invalidate. Inorder to consider this special
case we add a new pte bit _PAGE_INVALID (mapped to _RPAGE_SW0). This bit is
only used with _PAGE_PRESENT cleared. Hence we are not really losing a pte bit
for this special case. pmd_present is also updated to look at _PAGE_INVALID.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

da7ad366

Revert "convert SLB miss handlers to C" and subsequent commits · 54be0b9c

由 Michael Ellerman 提交于 10月 02, 2018

This reverts commits:
  5e46e29e ("powerpc/64s/hash: convert SLB miss handlers to C")
  8fed04d0 ("powerpc/64s/hash: remove user SLB data from the paca")
  655deecf ("powerpc/64s/hash: SLB allocation status bitmaps")
  2e162674 ("powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup")
  89ca4e12 ("powerpc/64s/hash: Add a SLB preload cache")

This series had a few bugs, and the fixes are not all trivial. So
revert most of it for now.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

54be0b9c

19 9月, 2018 4 次提交

powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup · 2e162674

由 Nicholas Piggin 提交于 9月 15, 2018

This will be used by the SLB code in the next patch, but for now this
sets the slb_addr_limit to the correct size for 32-bit tasks.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2e162674

powerpc/64s/hash: remove user SLB data from the paca · 8fed04d0

由 Nicholas Piggin 提交于 9月 15, 2018

User SLB mappig data is copied into the PACA from the mm->context so
it can be accessed by the SLB miss handlers.

After the C conversion, SLB miss handlers now run with relocation on,
and user SLB misses are able to take recursive kernel SLB misses, so
the user SLB mapping data can be removed from the paca and accessed
directly.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8fed04d0

powerpc/64s/hash: remove the vmalloc segment from the bolted SLB · 85376e2a

由 Nicholas Piggin 提交于 9月 15, 2018

Remove the vmalloc segment from bolted SLBEs. This is not required to
be bolted, and seems like it was added to help pre-load the SLB on
context switch. However there are now other segments like the vmemmap
segment and non-zero node memory that often take misses after a context
switch, so it is better to solve this in a more general way.

A subsequent change will track free SLB entries and uses those rather
than round-robin overwrite valid entries, which makes it far less
likely for kernel SLBEs to be evicted after they are installed.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

85376e2a

powerpc/pseries: Dump the SLB contents on SLB MCE errors. · c6d15258

由 Mahesh Salgaonkar 提交于 9月 11, 2018

If we get a machine check exceptions due to SLB errors then dump the
current SLB contents which will be very much helpful in debugging the
root cause of SLB errors. Introduce an exclusive buffer per cpu to hold
faulty SLB entries. In real mode mce handler saves the old SLB contents
into this buffer accessible through paca and print it out later in virtual
mode.

With this patch the console will log SLB contents like below on SLB MCE
errors:

[  507.297236] SLB contents of cpu 0x1
[  507.297237] Last SLB entry inserted at slot 16
[  507.297238] 00 c000000008000000 400ea1b217000500
[  507.297239]   1T  ESID=   c00000  VSID=      ea1b217 LLP:100
[  507.297240] 01 d000000008000000 400d43642f000510
[  507.297242]   1T  ESID=   d00000  VSID=      d43642f LLP:110
[  507.297243] 11 f000000008000000 400a86c85f000500
[  507.297244]   1T  ESID=   f00000  VSID=      a86c85f LLP:100
[  507.297245] 12 00007f0008000000 4008119624000d90
[  507.297246]   1T  ESID=       7f  VSID=      8119624 LLP:110
[  507.297247] 13 0000000018000000 00092885f5150d90
[  507.297247]  256M ESID=        1  VSID=   92885f5150 LLP:110
[  507.297248] 14 0000010008000000 4009e7cb50000d90
[  507.297249]   1T  ESID=        1  VSID=      9e7cb50 LLP:110
[  507.297250] 15 d000000008000000 400d43642f000510
[  507.297251]   1T  ESID=   d00000  VSID=      d43642f LLP:110
[  507.297252] 16 d000000008000000 400d43642f000510
[  507.297253]   1T  ESID=   d00000  VSID=      d43642f LLP:110
[  507.297253] ----------------------------------
[  507.297254] SLB cache ptr value = 3
[  507.297254] Valid SLB cache entries:
[  507.297255] 00 EA[0-35]=    7f000
[  507.297256] 01 EA[0-35]=        1
[  507.297257] 02 EA[0-35]=     1000
[  507.297257] Rest of SLB cache entries:
[  507.297258] 03 EA[0-35]=    7f000
[  507.297258] 04 EA[0-35]=        1
[  507.297259] 05 EA[0-35]=     1000
[  507.297260] 06 EA[0-35]=       12
[  507.297260] 07 EA[0-35]=    7f000
Suggested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c6d15258

23 8月, 2018 1 次提交

powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid. · bd0dbb73

由 Aneesh Kumar K.V 提交于 8月 22, 2018

When splitting a huge pmd pte, we need to mark the pmd entry invalid. We
can do that by clearing _PAGE_PRESENT bit. But then that will be taken as a
swap pte. In order to differentiate between the two use a software pte bit
when invalidating.

For regular pte, due to bd5050e3 ("powerpc/mm/radix: Change pte relax
sequence to handle nest MMU hang") we need to mark the pte entry invalid when
relaxing access permission. Instead of marking pte_none which can result in
different page table walk routines possibly skipping this pte entry, invalidate
it but still keep it marked present.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bd0dbb73

13 8月, 2018 1 次提交

powerpc/mm/book3s/radix: Add mapping statistics · a2dc009a

由 Aneesh Kumar K.V 提交于 8月 13, 2018

Add statistics that show how memory is mapped within the kernel linear mapping.
This is similar to commit 37cd944c ("s390/pgtable: add mapping statistics")

We don't do this with Hash translation mode. Hash uses one size (mmu_linear_psize)
to map the kernel linear mapping and we print the linear psize during boot as
below.

"Page orders: linear mapping = 24, virtual = 16, io = 16, vmemmap = 24"

A sample output looks like:

DirectMap4k:           0 kB
DirectMap64k:       18432 kB
DirectMap2M:     1030144 kB
DirectMap1G:    11534336 kB
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a2dc009a

10 8月, 2018 2 次提交

powerpc/mm/hash: Remove unnecessary do { } while(0) loop · f405b510

由 Aneesh Kumar K.V 提交于 8月 09, 2018

Avoid coverity false warnings like:

  *** CID 187347:  Control flow issues  (UNREACHABLE)
  /arch/powerpc/mm/hash_native_64.c: 819 in native_flush_hash_range()
  813        slot += hidx & _PTEIDX_GROUP_IX;
  814        hptep = htab_address + slot;
  815        want_v = hpte_encode_avpn(vpn, psize, ssize);
  816        hpte_v = hpte_get_old_v(hptep);
  817
  818        if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID))
  >>>     CID 187347:  Control flow issues  (UNREACHABLE)
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f405b510

powerpc/64s: move machine check SLB flushing to mm/slb.c · e7e81847

由 Nicholas Piggin 提交于 8月 10, 2018

The machine check code that flushes and restores bolted segments in
real mode belongs in mm/slb.c. This will also be used by pseries
machine check and idle code in future changes.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e7e81847

30 7月, 2018 2 次提交

powerpc/mm: Don't report PUDs as memory leaks when using kmemleak · a984506c

由 Michael Ellerman 提交于 7月 20, 2018

Paul Menzel reported that kmemleak was producing reports such as:

  unreferenced object 0xc0000000f8b80000 (size 16384):
    comm "init", pid 1, jiffies 4294937416 (age 312.240s)
    hex dump (first 32 bytes):
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    backtrace:
      [<00000000d997deb7>] __pud_alloc+0x80/0x190
      [<0000000087f2e8a3>] move_page_tables+0xbac/0xdc0
      [<00000000091e51c2>] shift_arg_pages+0xc0/0x210
      [<00000000ab88670c>] setup_arg_pages+0x22c/0x2a0
      [<0000000060871529>] load_elf_binary+0x41c/0x1648
      [<00000000ecd9d2d4>] search_binary_handler.part.11+0xbc/0x280
      [<0000000034e0cdd7>] __do_execve_file.isra.13+0x73c/0x940
      [<000000005f953a6e>] sys_execve+0x58/0x70
      [<000000009700a858>] system_call+0x5c/0x70

Indicating that a PUD was being leaked.

However what's really happening is that kmemleak is not able to
recognise the references from the PGD to the PUD, because they are not
fully qualified pointers.

We can confirm that in xmon, eg:

Find the task struct for pid 1 "init":
  0:mon> P
       task_struct     ->thread.ksp    PID   PPID S  P CMD
  c0000001fe7c0000 c0000001fe803960      1      0 S 13 systemd

Dump virtual address 0 to find the PGD:
  0:mon> dv 0 c0000001fe7c0000
  pgd  @ 0xc0000000f8b01000

Dump the memory of the PGD:
  0:mon> d c0000000f8b01000
  c0000000f8b01000 00000000f8b90000 0000000000000000  |................|
  c0000000f8b01010 0000000000000000 0000000000000000  |................|
  c0000000f8b01020 0000000000000000 0000000000000000  |................|
  c0000000f8b01030 0000000000000000 00000000f8b80000  |................|
                                    ^^^^^^^^^^^^^^^^

There we can see the reference to our supposedly leaked PUD. But
because it's missing the leading 0xc, kmemleak won't recognise it.

We can confirm it's still in use by translating an address that is
mapped via it:
  0:mon> dv 7fff94000000 c0000001fe7c0000
  pgd  @ 0xc0000000f8b01000
  pgdp @ 0xc0000000f8b01038 = 0x00000000f8b80000 <--
  pudp @ 0xc0000000f8b81ff8 = 0x00000000037c4000
  pmdp @ 0xc0000000037c5ca0 = 0x00000000fbd89000
  ptep @ 0xc0000000fbd89000 = 0xc0800001d5ce0386
  Maps physical address = 0x00000001d5ce0000
  Flags = Accessed Dirty Read Write

The fix is fairly simple. We need to tell kmemleak to ignore PUD
allocations and never report them as leaks. We can also tell it not to
scan the PGD, because it will never find pointers in there. However it
will still notice if we allocate a PGD and then leak it.
Reported-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Tested-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a984506c

powerpc: move ASM_CONST and stringify_in_c() into asm-const.h · ec0c464c

由 Christophe Leroy 提交于 7月 05, 2018

This patch moves ASM_CONST() and stringify_in_c() into
dedicated asm-const.h, then cleans all related inclusions.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
[mpe: asm-compat.h should include asm-const.h]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ec0c464c

24 7月, 2018 1 次提交

powerpc/mm/hash: Add hpte_get_old_v and use that instead of opencoding · a833280b

由 Aneesh Kumar K.V 提交于 6月 29, 2018

No functional change
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a833280b

16 7月, 2018 1 次提交

powerpc/64s: Remove POWER9 DD1 support · 2bf1071a

由 Nicholas Piggin 提交于 7月 05, 2018

POWER9 DD1 was never a product. It is no longer supported by upstream
firmware, and it is not effectively supported in Linux due to lack of
testing.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NMichael Ellerman <mpe@ellerman.id.au>
[mpe: Remove arch_make_huge_pte() entirely]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2bf1071a

20 6月, 2018 1 次提交

powerpc/mm/hash/4k: Free hugetlb page table caches correctly. · fadd03c6

由 Aneesh Kumar K.V 提交于 6月 14, 2018

With 4k page size for hugetlb we allocate hugepage directories from its on slab
cache. With patch 0c4d2680 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free")
we missed to free these allocated hugepd tables.

Update pgtable_free to handle hugetlb hugepd directory table.

Fixes: 0c4d2680 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free")
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Add CONFIG_HUGETLB_PAGE guard to fix build break]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fadd03c6

08 6月, 2018 1 次提交

mm: introduce ARCH_HAS_PTE_SPECIAL · 3010a5ea

由 Laurent Dufour 提交于 6月 07, 2018

Currently the PTE special supports is turned on in per architecture
header files.  Most of the time, it is defined in
arch/*/include/asm/pgtable.h depending or not on some other per
architecture static definition.

This patch introduce a new configuration variable to manage this
directly in the Kconfig files.  It would later replace
__HAVE_ARCH_PTE_SPECIAL.

Here notes for some architecture where the definition of
__HAVE_ARCH_PTE_SPECIAL is not obvious:

arm
 __HAVE_ARCH_PTE_SPECIAL which is currently defined in
arch/arm/include/asm/pgtable-3level.h which is included by
arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set.
So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE.

powerpc
__HAVE_ARCH_PTE_SPECIAL is defined in 2 files:
 - arch/powerpc/include/asm/book3s/64/pgtable.h
 - arch/powerpc/include/asm/pte-common.h
The first one is included if (PPC_BOOK3S & PPC64) while the second is
included in all the other cases.
So select ARCH_HAS_PTE_SPECIAL all the time.

sparc:
__HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) &&
defined(__arch64__) which are defined through the compiler in
sparc/Makefile if !SPARC32 which I assume to be if SPARC64.
So select ARCH_HAS_PTE_SPECIAL if SPARC64

There is no functional change introduced by this patch.

Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.comSigned-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: NJerome Glisse <jglisse@redhat.com>
Reviewed-by: NJerome Glisse <jglisse@redhat.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <albert@sifive.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Christophe LEROY <christophe.leroy@c-s.fr>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3010a5ea

03 6月, 2018 6 次提交

powerpc/64s/radix: optimise pte_update · 85bcfaf6

由 Nicholas Piggin 提交于 6月 01, 2018

Implementing pte_update with pte_xchg (which uses cmpxchg) is
inefficient. A single larx/stcx. works fine, no need for the less
efficient cmpxchg sequence.

Then remove the memory barriers from the operation. There is a
requirement for TLB flushing to load mm_cpumask after the store
that reduces pte permissions, which is moved into the TLB flush
code.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

85bcfaf6

powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags · f1cb8f9b

由 Nicholas Piggin 提交于 6月 01, 2018

The ISA suggests ptesync after setting a pte, to prevent a table walk
initiated by a subsequent access from missing that store and causing a
spurious fault. This is an architectual allowance that allows an
implementation's page table walker to be incoherent with the store
queue.

However there is no correctness problem in taking a spurious fault in
userspace -- the kernel copes with these at any time, so the updated
pte will be found eventually. Spurious kernel faults on vmap memory
must be avoided, so a ptesync is put into flush_cache_vmap.

On POWER9 so far I have not found a measurable window where this can
result in more minor faults, so as an optimisation, remove the costly
ptesync from pte updates. If an implementation benefits from ptesync,
it would be better to add it back in update_mmu_cache, so it's not
done for things like fork(2).

fork --fork --exec benchmark improved 5.2% (12400->13100).
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f1cb8f9b

powerpc/64s/radix: make ptep_get_and_clear_full non-atomic for the full case · f569bd94

由 Nicholas Piggin 提交于 6月 01, 2018

This matches other architectures, when we know there will be no
further accesses to the address (e.g., for teardown), page table
entries can be cleared non-atomically.

The comments about NMMU are bogus: all MMU notifiers (including NMMU)
are released at this point, with their TLBs flushed. An NMMU access at
this point would be a bug.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f569bd94

powerpc/64s/radix: do not flush TLB on spurious fault · 6d8278c4

由 Nicholas Piggin 提交于 6月 01, 2018

In the case of a spurious fault (which can happen due to a race with
another thread that changes the page table), the default Linux mm code
calls flush_tlb_page for that address. This is not required because
the pte will be re-fetched. Hash does not wire this up to a hardware
TLB flush for this reason. This patch avoids the flush for radix.

>From Power ISA v3.0B, p.1090:

    Setting a Reference or Change Bit or Upgrading Access Authority
    (PTE Subject to Atomic Hardware Updates)

    If the only change being made to a valid PTE that is subject to
    atomic hardware updates is to set the Refer- ence or Change bit to
    1 or to add access authorities, a simpler sequence suffices
    because the translation hardware will refetch the PTE if an access
    is attempted for which the only problems were reference and/or
    change bits needing to be set or insufficient access authority.

The nest MMU on POWER9 does not re-fetch the PTE after such an access
attempt before faulting, so address spaces with a coprocessor
attached will continue to flush in these cases.

This reduces tlbies for a kernel compile workload from 0.95M to 0.90M.

fork --fork --exec benchmark improved 0.5% (12300->12400).
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6d8278c4

powerpc/mm: Change function prototype · e4c1112c

由 Aneesh Kumar K.V 提交于 5月 29, 2018

In later patch, we use the vma and psize to do tlb flush. Do the prototype
update in separate patch to make the review easy.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e4c1112c

powerpc/mm/radix: Move function from radix.h to pgtable-radix.c · 044003b5

由 Aneesh Kumar K.V 提交于 5月 29, 2018

In later patch we will update them which require them to be moved
to pgtable-radix.c. Keeping the function in radix.h results in
compile warning as below.

./arch/powerpc/include/asm/book3s/64/radix.h: In function ‘radix__ptep_set_access_flags’:
./arch/powerpc/include/asm/book3s/64/radix.h:196:28: error: dereferencing pointer to incomplete type ‘struct vm_area_struct’
  struct mm_struct *mm = vma->vm_mm;
                            ^~
./arch/powerpc/include/asm/book3s/64/radix.h:204:6: error: implicit declaration of function ‘atomic_read’; did you mean ‘__atomic_load’? [-Werror=implicit-function-declaration]
      atomic_read(&mm->context.copros) > 0) {
      ^~~~~~~~~~~
      __atomic_load
./arch/powerpc/include/asm/book3s/64/radix.h:204:21: error: dereferencing pointer to incomplete type ‘struct mm_struct’
      atomic_read(&mm->context.copros) > 0) {

Instead of fixing header dependencies, we move the function to pgtable-radix.c
Also the function is now large to be a static inline . Doing the
move in separate patch helps in review.

No functional change in this patch. Only code movement.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

044003b5

17 5月, 2018 1 次提交

powerpc/mm/radix: implement LPID based TLB flushes to be used by KVM · 0078778a

由 Nicholas Piggin 提交于 5月 09, 2018

Implement a local TLB flush for invalidating an LPID with variants for
process or partition scope. And a global TLB flush for invalidating
a partition scoped page of an LPID.

These will be used by KVM in subsequent patches.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0078778a

15 5月, 2018 4 次提交

A
powerpc/mm: Use page fragments for allocation page table at PMD level · 738f9645
由 Aneesh Kumar K.V 提交于 4月 16, 2018
```
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
```
738f9645
A
powerpc/mm: Implement helpers for pagetable fragment support at PMD level · 8a6c697b
由 Aneesh Kumar K.V 提交于 4月 16, 2018
```
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
```
8a6c697b

powerpc/book3s64/mm: Simplify the rcu callback for page table free · 0c4d2680

由 Aneesh Kumar K.V 提交于 4月 16, 2018

Instead of encoding shift in the table address, use an enumerated index value.
This allow us to do different things in the callback for pte and pmd.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0c4d2680

powerpc/mm/book3s64/4k: Switch 4k pagesize config to use pagetable fragment · 1c7ec8a4

由 Aneesh Kumar K.V 提交于 4月 16, 2018

4K config use one full page at level 4 of the pagetable. Add support for single
fragment allocation in pagetable fragment code and and use that for 4K config.
This makes both 4k and 64k use the same code path. Later we will switch pmd to
use the page table fragment code. This is done only for 64bit platforms which
is using page table fragment support.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1c7ec8a4

04 4月, 2018 1 次提交

powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix · fb4e5dbd

由 Aneesh Kumar K.V 提交于 3月 22, 2018

With split PTL (page table lock) config, we allocate the level
4 (leaf) page table using pte fragment framework instead of slab cache
like other levels. This was done to enable us to have split page table
lock at the level 4 of the page table. We use page->plt backing the
all the level 4 pte fragment for the lock.

Currently with Radix, we use only 16 fragments out of the allocated
page. In radix each fragment is 256 bytes which means we use only 4k
out of the allocated 64K page wasting 60k of the allocated memory.
This was done earlier to keep it closer to hash.

This patch update the pte fragment count to 256, thereby using the
full 64K page and reducing the memory usage. Performance tests shows
really low impact even with THP disabled. With THP disabled we will be
contenting further less on level 4 ptl and hence the impact should be
further low.

  256 threads:
    without patch (10 runs of ./ebizzy  -m -n 1000 -s 131072 -S 100)
      median = 15678.5
      stdev = 42.1209

    with patch:
      median = 15354
      stdev = 194.743

This is with THP disabled. With THP enabled the impact of the patch
will be less.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fb4e5dbd

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功