提交 · c90fca951e90ba470a3dc6087667edffcf8db21b · openanolis / cloud-kernel

03 6月, 2018 6 次提交

powerpc/64s/radix: optimise pte_update · 85bcfaf6

由 Nicholas Piggin 提交于 6月 01, 2018

Implementing pte_update with pte_xchg (which uses cmpxchg) is
inefficient. A single larx/stcx. works fine, no need for the less
efficient cmpxchg sequence.

Then remove the memory barriers from the operation. There is a
requirement for TLB flushing to load mm_cpumask after the store
that reduces pte permissions, which is moved into the TLB flush
code.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

85bcfaf6

powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags · f1cb8f9b

由 Nicholas Piggin 提交于 6月 01, 2018

The ISA suggests ptesync after setting a pte, to prevent a table walk
initiated by a subsequent access from missing that store and causing a
spurious fault. This is an architectual allowance that allows an
implementation's page table walker to be incoherent with the store
queue.

However there is no correctness problem in taking a spurious fault in
userspace -- the kernel copes with these at any time, so the updated
pte will be found eventually. Spurious kernel faults on vmap memory
must be avoided, so a ptesync is put into flush_cache_vmap.

On POWER9 so far I have not found a measurable window where this can
result in more minor faults, so as an optimisation, remove the costly
ptesync from pte updates. If an implementation benefits from ptesync,
it would be better to add it back in update_mmu_cache, so it's not
done for things like fork(2).

fork --fork --exec benchmark improved 5.2% (12400->13100).
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f1cb8f9b

powerpc/64s/radix: make ptep_get_and_clear_full non-atomic for the full case · f569bd94

由 Nicholas Piggin 提交于 6月 01, 2018

This matches other architectures, when we know there will be no
further accesses to the address (e.g., for teardown), page table
entries can be cleared non-atomically.

The comments about NMMU are bogus: all MMU notifiers (including NMMU)
are released at this point, with their TLBs flushed. An NMMU access at
this point would be a bug.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f569bd94

powerpc/64s/radix: do not flush TLB on spurious fault · 6d8278c4

由 Nicholas Piggin 提交于 6月 01, 2018

In the case of a spurious fault (which can happen due to a race with
another thread that changes the page table), the default Linux mm code
calls flush_tlb_page for that address. This is not required because
the pte will be re-fetched. Hash does not wire this up to a hardware
TLB flush for this reason. This patch avoids the flush for radix.

>From Power ISA v3.0B, p.1090:

    Setting a Reference or Change Bit or Upgrading Access Authority
    (PTE Subject to Atomic Hardware Updates)

    If the only change being made to a valid PTE that is subject to
    atomic hardware updates is to set the Refer- ence or Change bit to
    1 or to add access authorities, a simpler sequence suffices
    because the translation hardware will refetch the PTE if an access
    is attempted for which the only problems were reference and/or
    change bits needing to be set or insufficient access authority.

The nest MMU on POWER9 does not re-fetch the PTE after such an access
attempt before faulting, so address spaces with a coprocessor
attached will continue to flush in these cases.

This reduces tlbies for a kernel compile workload from 0.95M to 0.90M.

fork --fork --exec benchmark improved 0.5% (12300->12400).
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6d8278c4

powerpc/mm: Change function prototype · e4c1112c

由 Aneesh Kumar K.V 提交于 5月 29, 2018

In later patch, we use the vma and psize to do tlb flush. Do the prototype
update in separate patch to make the review easy.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e4c1112c

powerpc/mm/radix: Move function from radix.h to pgtable-radix.c · 044003b5

由 Aneesh Kumar K.V 提交于 5月 29, 2018

In later patch we will update them which require them to be moved
to pgtable-radix.c. Keeping the function in radix.h results in
compile warning as below.

./arch/powerpc/include/asm/book3s/64/radix.h: In function ‘radix__ptep_set_access_flags’:
./arch/powerpc/include/asm/book3s/64/radix.h:196:28: error: dereferencing pointer to incomplete type ‘struct vm_area_struct’
  struct mm_struct *mm = vma->vm_mm;
                            ^~
./arch/powerpc/include/asm/book3s/64/radix.h:204:6: error: implicit declaration of function ‘atomic_read’; did you mean ‘__atomic_load’? [-Werror=implicit-function-declaration]
      atomic_read(&mm->context.copros) > 0) {
      ^~~~~~~~~~~
      __atomic_load
./arch/powerpc/include/asm/book3s/64/radix.h:204:21: error: dereferencing pointer to incomplete type ‘struct mm_struct’
      atomic_read(&mm->context.copros) > 0) {

Instead of fixing header dependencies, we move the function to pgtable-radix.c
Also the function is now large to be a static inline . Doing the
move in separate patch helps in review.

No functional change in this patch. Only code movement.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

044003b5

17 5月, 2018 1 次提交

powerpc/mm/radix: implement LPID based TLB flushes to be used by KVM · 0078778a

由 Nicholas Piggin 提交于 5月 09, 2018

Implement a local TLB flush for invalidating an LPID with variants for
process or partition scope. And a global TLB flush for invalidating
a partition scoped page of an LPID.

These will be used by KVM in subsequent patches.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0078778a

15 5月, 2018 4 次提交

A
powerpc/mm: Use page fragments for allocation page table at PMD level · 738f9645
由 Aneesh Kumar K.V 提交于 4月 16, 2018
```
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
```
738f9645
A
powerpc/mm: Implement helpers for pagetable fragment support at PMD level · 8a6c697b
由 Aneesh Kumar K.V 提交于 4月 16, 2018
```
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
```
8a6c697b

powerpc/book3s64/mm: Simplify the rcu callback for page table free · 0c4d2680

由 Aneesh Kumar K.V 提交于 4月 16, 2018

Instead of encoding shift in the table address, use an enumerated index value.
This allow us to do different things in the callback for pte and pmd.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0c4d2680

powerpc/mm/book3s64/4k: Switch 4k pagesize config to use pagetable fragment · 1c7ec8a4

由 Aneesh Kumar K.V 提交于 4月 16, 2018

4K config use one full page at level 4 of the pagetable. Add support for single
fragment allocation in pagetable fragment code and and use that for 4K config.
This makes both 4k and 64k use the same code path. Later we will switch pmd to
use the page table fragment code. This is done only for 64bit platforms which
is using page table fragment support.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1c7ec8a4

04 4月, 2018 1 次提交

powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix · fb4e5dbd

由 Aneesh Kumar K.V 提交于 3月 22, 2018

With split PTL (page table lock) config, we allocate the level
4 (leaf) page table using pte fragment framework instead of slab cache
like other levels. This was done to enable us to have split page table
lock at the level 4 of the page table. We use page->plt backing the
all the level 4 pte fragment for the lock.

Currently with Radix, we use only 16 fragments out of the allocated
page. In radix each fragment is 256 bytes which means we use only 4k
out of the allocated 64K page wasting 60k of the allocated memory.
This was done earlier to keep it closer to hash.

This patch update the pte fragment count to 256, thereby using the
full 64K page and reducing the memory usage. Performance tests shows
really low impact even with THP disabled. With THP disabled we will be
contenting further less on level 4 ptl and hence the impact should be
further low.

  256 threads:
    without patch (10 runs of ./ebizzy  -m -n 1000 -s 131072 -S 100)
      median = 15678.5
      stdev = 42.1209

    with patch:
      median = 15354
      stdev = 194.743

This is with THP disabled. With THP enabled the impact of the patch
will be less.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fb4e5dbd

30 3月, 2018 5 次提交

powerpc/mm/hash: Don't memset pgd table if not needed · 872a100a

由 Aneesh Kumar K.V 提交于 3月 26, 2018

We need to zero-out pgd table only if we share the slab cache with
pud/pmd level caches. With the support of 4PB, we don't share the slab
cache anymore. Instead of removing the code completely hide it within
an #ifdef. We don't need to do this with any other page table level,
because they all allocate table of double the size and we take of
initializing the first half corrrectly during page table zap.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Consolidate multiple #if / #ifdef into one]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

872a100a

powerpc/mm/hash64: Increase the VA range · c2b4d8b7

由 Aneesh Kumar K.V 提交于 3月 26, 2018

This patch increases the max virtual (effective) address value to 4PB.
With 4K page size config we continue to limit ourself to 64TB.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Keep the H_PGTABLE_RANGE test, update it to work]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c2b4d8b7

powerpc/mm: Add support for handling > 512TB address in SLB miss · f384796c

由 Aneesh Kumar K.V 提交于 3月 26, 2018

For addresses above 512TB we allocate additional mmu contexts. To make
it all easy, addresses above 512TB are handled with IR/DR=1 and with
stack frame setup.

The mmu_context_t is also updated to track the new extended_ids. To
support upto 4PB we need a total 8 contexts.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Minor formatting tweaks and comment wording, switch BUG to WARN
      in get_ea_context().]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f384796c

powerpc/mm/keys: Move pte bits to correct headers · 1a2f7789

由 Aneesh Kumar K.V 提交于 3月 07, 2018

Memory keys are supported only with hash translation mode. Instead of
using #ifdef in generic code move the key related pte bits to
respective headers
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1a2f7789

powerpc/mm: Pass node id into create_section_mapping · 29ab6c47

由 Nicholas Piggin 提交于 2月 14, 2018

Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
[mpe: Move __map_kernel_page_nid() inside #ifdef SPARSEMEM_VMEMMAP]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

29ab6c47

23 3月, 2018 2 次提交

powerpc/mm/radix: Remove unused code · 99491e2d

由 Aneesh Kumar K.V 提交于 3月 23, 2018

These function are not used in the code. Remove them.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

99491e2d

powerpc/mm: Add tracking of the number of coprocessors using a context · aff6f8cb

由 Benjamin Herrenschmidt 提交于 3月 23, 2018

Currently, when using coprocessors (which use the Nest MMU), we
simply increment the active_cpu count to force all TLB invalidations
to be come broadcast.

Unfortunately, due to an errata in POWER9, we will need to know
more specifically that coprocessors are in use.

This maintains a separate copros counter in the MMU context for
that purpose.

NB. The commit mentioned in the fixes tag below is not at fault for
the bug we're fixing in this commit and the next, but this fix applies
on top the infrastructure it introduced.

Fixes: 03b8abed ("cxl: Enable global TLBIs for cxl contexts")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

aff6f8cb

13 3月, 2018 1 次提交

powerpc/mm/slice: implement a slice mask cache · 5709f7cf

由 Nicholas Piggin 提交于 3月 07, 2018

Calculating the slice mask can become a signifcant overhead for
get_unmapped_area. This patch adds a struct slice_mask for
each page size in the mm_context, and keeps these in synch with
the slices psize arrays and slb_addr_limit.

On Book3S/64 this adds 288 bytes to the mm_context_t for the
slice mask caches.

On POWER8, this increases vfork+exec+exit performance by 9.9%
and reduces time to mmap+munmap a 64kB page by 28%.

Reduces time to mmap+munmap by about 10% on 8xx.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5709f7cf

06 3月, 2018 2 次提交

powerpc/mm/slice: Allow up to 64 low slices · 15472423

由 Christophe Leroy 提交于 2月 22, 2018

While the implementation of the "slices" address space allows
a significant amount of high slices, it limits the number of
low slices to 16 due to the use of a single u64 low_slices_psize
element in struct mm_context_t

On the 8xx, the minimum slice size is the size of the area
covered by a single PMD entry, ie 4M in 4K pages mode and 64M in
16K pages mode. This means we could have at least 64 slices.

In order to override this limitation, this patch switches the
handling of low_slices_psize to char array as done already for
high_slices_psize.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

15472423

powerpc/mm/slice: create header files dedicated to slices · a3286f05

由 Christophe Leroy 提交于 2月 22, 2018

In preparation for the following patch which will enhance 'slices'
for supporting PPC32 in order to fix an issue on hugepages on 8xx,
this patch takes out of page*.h all bits related to 'slices' and put
them into newly created slice.h header files.
While common parts go into asm/slice.h, subarch specific
parts go into respective books3s/64/slice.c and nohash/64/slice.c
'slices'
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a3286f05

13 2月, 2018 4 次提交

powerpc/mm/hash64: Zero PGD pages on allocation · fc5c2f4a

由 Aneesh Kumar K.V 提交于 2月 13, 2018

On powerpc we allocate page table pages from slab caches of different
sizes. Currently we have a constructor that zeroes out the objects when
we allocate them for the first time.

We expect the objects to be zeroed out when we free the the object
back to slab cache. This happens in the unmap path. For hugetlb pages
we call huge_pte_get_and_clear() to do that.

With the current configuration of page table size, both PUD and PGD
level tables are allocated from the same slab cache. At the PUD level,
we use the second half of the table to store the slot information. But
we never clear that when unmapping.

When such a freed object is then allocated for a PGD page, the second
half of the page table page will not be zeroed as expected. This
results in a kernel crash.

Fix it by always clearing PGD pages when they're allocated.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Change log wording and formatting, add whitespace]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fc5c2f4a

powerpc/mm/hash64: Store the slot information at the right offset for hugetlb · ff31e105

由 Aneesh Kumar K.V 提交于 2月 11, 2018

The hugetlb pte entries are at the PMD and PUD level, so we can't use
PTRS_PER_PTE to find the second half of the page table. Use the right
offset for PUD/PMD to get to the second half of the table.

Fixes: bf9a95f9 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages")
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ff31e105

powerpc/mm/hash64: Allocate larger PMD table if hugetlb config is enabled · 4a7aa4fe

由 Aneesh Kumar K.V 提交于 2月 11, 2018

We use the second half of the page table to store slot information, so we must
allocate it always if hugetlb is possible.

Fixes: bf9a95f9 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages")
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4a7aa4fe

powerpc/mm: Fix crashes with 16G huge pages · fae22116

由 Aneesh Kumar K.V 提交于 2月 11, 2018

To support memory keys, we moved the hash pte slot information to the
second half of the page table. This was ok with PTE entries at level
4 (PTE page) and level 3 (PMD). We already allocate larger page table
pages at those levels to accomodate extra details. For level 4 we
already have the extra space which was used to track 4k hash page
table entry details and at level 3 the extra space was allocated to
track the THP details.

With hugetlbfs PTE, we used this extra space at the PMD level to store
the slot details. But we also support hugetlbfs PTE at PUD level for
16GB pages and PUD level page didn't allocate extra space. This
resulted in memory corruption.

Fix this by allocating extra space at PUD level when HUGETLB is
enabled.

Fixes: bf9a95f9 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages")
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fae22116

01 2月, 2018 2 次提交

mm/thp: remove pmd_huge_split_prepare() · 423ac9af

由 Aneesh Kumar K.V 提交于 1月 31, 2018

Instead of marking the pmd ready for split, invalidate the pmd.  This
should take care of powerpc requirement.  Only side effect is that we
mark the pmd invalid early.  This can result in us blocking access to
the page a bit longer if we race against a thp split.

[kirill.shutemov@linux.intel.com: rebased, dirty THP once]
Link: http://lkml.kernel.org/r/20171213105756.69879-13-kirill.shutemov@linux.intel.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Daney <david.daney@cavium.com>
Cc: David Miller <davem@davemloft.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nitin Gupta <nitin.m.gupta@oracle.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

423ac9af

powerpc/mm: update pmdp_invalidate to return old pmd value · 8cc931e0

由 Aneesh Kumar K.V 提交于 1月 31, 2018

It's required to avoid losing dirty and accessed bits.

Link: http://lkml.kernel.org/r/20171213105756.69879-7-kirill.shutemov@linux.intel.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8cc931e0

30 1月, 2018 1 次提交

powerpc/mm/radix: Fix build error when RADIX_MMU=n · 015eb1b8

由 Michael Ellerman 提交于 1月 30, 2018

The recent TLB flush rework broke the build when the Radix MMU is
disabled at build time, eg:

  (.text+0x264): undefined reference to `.radix__tlbiel_all'

We could add an empty version, but if we ever called it by accident
that would indicate a bad bug, so add a stub that just WARNs if we do.

Fixes: d4748276 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9")
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

015eb1b8

20 1月, 2018 6 次提交

powerpc: check key protection for user page access · bca7aacf

由 Ram Pai 提交于 1月 18, 2018

Make sure that the kernel does not access user pages without
checking their key-protection.
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
[mpe: Integrate with upstream version of pte_access_permitted()]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bca7aacf

powerpc: helper to validate key-access permissions of a pte · f2407ef3

由 Ram Pai 提交于 1月 18, 2018

helper function that checks if the read/write/execute is allowed
on the pte.
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f2407ef3

powerpc: Program HPTE key protection bits · a6590ca5

由 Ram Pai 提交于 1月 18, 2018

Map the PTE protection key bits to the HPTE key protection bits,
while creating HPTE  entries.
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a6590ca5

powerpc: map vma key-protection bits to pte key bits. · eb95d016

由 Ram Pai 提交于 1月 18, 2018

Map  the  key  protection  bits of the vma to the pkey bits in
the PTE.

The PTE  bits used  for pkey  are  3,4,5,6  and 57. The  first
four bits are the same four bits that were freed up  initially
in this patch series. remember? :-) Without those four bits
this patch wouldn't be possible.

BUT, on 4k kernel, bit 3, and 4 could not be freed up. remember?
Hence we have to be satisfied with 5, 6 and 7.
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

eb95d016

powerpc: introduce execute-only pkey · 5586cf61

由 Ram Pai 提交于 1月 18, 2018

This patch provides the implementation of execute-only pkey.
The architecture-independent layer expects the arch-dependent
layer, to support the ability to create and enable a special
key which has execute-only permission.
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5586cf61

powerpc: track allocation status of all pkeys · 4fb158f6

由 Ram Pai 提交于 1月 18, 2018

Total 32 keys are available on power7 and above. However
pkey 0,1 are reserved. So effectively we  have  30 pkeys.

On 4K kernels, we do not  have  5  bits  in  the  PTE to
represent  all the keys; we only have 3bits. Two of those
keys are reserved; pkey 0 and pkey 1. So effectively  we
have 6 pkeys.

This patch keeps track of reserved keys, allocated  keys
and keys that are currently free.

Also it  adds  skeletal  functions  and macros, that the
architecture-independent code expects to be available.
Reviewed-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4fb158f6

19 1月, 2018 1 次提交

powerpc/64s: Fix ps3 build error due to tlbiel_all() · 7a074fc0

由 Michael Ellerman 提交于 1月 19, 2018

The recent changes to TLB handling broke the PS3 build:

arch/powerpc/include/asm/book3s/64/tlbflush.h:30: undefined reference to `.hash__tlbiel_all'

Fix it by adding an fallback version of tlbiel_all() for non-native
builds. It should never be called, due to checks in callers so it
calls BUG(). We should probably clean it up further but this will
suffice for now.

Fixes: d4748276 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9")
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7a074fc0

17 1月, 2018 1 次提交

powerpc/64s: Improve local TLB flush for boot and MCE on POWER9 · d4748276

由 Nicholas Piggin 提交于 12月 24, 2017

There are several cases outside the normal address space management
where a CPU's entire local TLB is to be flushed:

  1. Booting the kernel, in case something has left stale entries in
     the TLB (e.g., kexec).

  2. Machine check, to clean corrupted TLB entries.

One other place where the TLB is flushed, is waking from deep idle
states. The flush is a side-effect of calling ->cpu_restore with the
intention of re-setting various SPRs. The flush itself is unnecessary
because in the first case, the TLB should not acquire new corrupted
TLB entries as part of sleep/wake (though they may be lost).

This type of TLB flush is coded inflexibly, several times for each CPU
type, and they have a number of problems with ISA v3.0B:

- The current radix mode of the MMU is not taken into account, it is
  always done as a hash flushn For IS=2 (LPID-matching flush from host)
  and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if
  the R field does not match the current radix mode.

- ISA v3.0B hash must flush the partition and process table caches as
  well.

- ISA v3.0B radix must flush partition and process scoped translations,
  partition and process table caches, and also the page walk cache.

So consolidate the flushing code and implement it in C and inline asm
under the mm/ directory with the rest of the flush code. Add ISA v3.0B
cases for radix and hash, and use the radix flush in radix environment.

Provide a way for IS=2 (LPID flush) to specify the radix mode of the
partition. Have KVM pass in the radix mode of the guest.

Take out the flushes from early cputable/dt_cpu_ftrs detection hooks,
and move it later in the boot process after, the MMU registers are set
up and before relocation is first turned on.

The TLB flush is no longer called when restoring from deep idle states.
This was not be done as a separate step because booting secondaries
uses the same cpu_restore as idle restore, which needs the TLB flush.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d4748276

16 1月, 2018 2 次提交

powerpc/mm: Introduce _PAGE_NA · 35175033

由 Christophe Leroy 提交于 1月 12, 2018

Today, PAGE_NONE is defined as a page not having _PAGE_USER.
In some circunstances, when the CPU supports it, it might be
better to be able to flag a page with NO ACCESS.

In a following patch, the 8xx will switch user access being flagged
in the PMD, therefore it will not be possible anymore to use
_PAGE_USER as a way to flag a page with no access.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

35175033

powerpc/mm: extend _PAGE_PRIVILEGED to all CPUs · 812fadcb

由 Christophe Leroy 提交于 1月 12, 2018

commit ac29c640 ("powerpc/mm: Replace _PAGE_USER with
_PAGE_PRIVILEGED") introduced _PAGE_PRIVILEGED for BOOK3S/64

This patch generalises _PAGE_PRIVILEGED for all CPUs, allowing
to have either _PAGE_PRIVILEGED or _PAGE_USER or both.

PPC_8xx has a _PAGE_SHARED flag which is set for and only for
all non user pages. Lets rename it _PAGE_PRIVILEGED to remove
confusion as it has nothing to do with Linux shared pages.

On BookE, there's a _PAGE_BAP_SR which has to be set for kernel
pages: defining _PAGE_PRIVILEGED as _PAGE_BAP_SR will make
this generic
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

812fadcb

22 12月, 2017 1 次提交

powerpc/mm/book3s/64: Add proper pte access check helper · f72a85e3

由 Aneesh Kumar K.V 提交于 12月 04, 2017

pte_access_premitted get called in get_user_pages_fast path. If we
have marked the pte PROT_NONE, we should not allow a read access on
the address. With the current implementation we are not checking the
READ and only check for WRITE. This is needed on archs like ppc64 that
implement PROT_NONE using RWX access instead of _PAGE_PRESENT. Also
add pte_user check just to make sure we are not accessing kernel
mapping.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f72a85e3

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功