提交 · 84d3e2480564949a26870b6750a9168eb11f4c17 · openeuler / raspberrypi-kernel

12 3月, 2016 8 次提交

powerpc: Simplify test in __dma_sync() · 8478d7f0

由 Christophe Leroy 提交于 2月 09, 2016

This simplification helps the compiler. We now have only one test
instead of two, so it reduces the number of branches.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <oss@buserror.net>

8478d7f0

powerpc/8xx: rewrite flush_instruction_cache() in C · 766d45cb

由 Christophe Leroy 提交于 2月 09, 2016

On PPC8xx, flushing instruction cache is performed by writing
in register SPRN_IC_CST. This registers suffers CPU6 ERRATA.
The patch rewrites the fonction in C so that CPU6 ERRATA will
be handled transparently
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <oss@buserror.net>

766d45cb

powerpc/8xx: rewrite set_context() in C · a7761fe4

由 Christophe Leroy 提交于 2月 09, 2016

There is no real need to have set_context() in assembly.
Now that we have mtspr() handling CPU6 ERRATA directly, we
can rewrite set_context() in C language for easier maintenance.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <oss@buserror.net>

a7761fe4

powerpc32: remove ioremap_base · e974cd4b

由 Christophe Leroy 提交于 2月 09, 2016

ioremap_base is not initialised and is nowhere used so remove it
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <oss@buserror.net>

e974cd4b

powerpc32: Remove useless/wrong MMU:setio progress message · c562eb06

由 Christophe Leroy 提交于 2月 09, 2016

Commit 77116849 ("[POWERPC] Remove unused machine call outs")
removed the call to setup_io_mappings(), so remove the associated
progress line message
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <oss@buserror.net>

c562eb06

powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together · 3084cdb7

由 Christophe Leroy 提交于 2月 09, 2016

x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of
purpose, and are never defined at the same time.
So rename them x_block_mapped() and define them in the relevant
places
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <oss@buserror.net>

3084cdb7

powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c · 516d9189

由 Christophe Leroy 提交于 2月 09, 2016

Now we have a 8xx specific .c file for that so put it in there
as other powerpc variants do
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <oss@buserror.net>

516d9189

powerpc/8xx: Map linear kernel RAM with 8M pages · a372acfa

由 Christophe Leroy 提交于 2月 09, 2016

On a live running system (VoIP gateway for Air Trafic Control), over
a 10 minutes period (with 277s idle), we get 87 millions DTLB misses
and approximatly 35 secondes are spent in DTLB handler.
This represents 5.8% of the overall time and even 10.8% of the
non-idle time.
Among those 87 millions DTLB misses, 15% are on user addresses and
85% are on kernel addresses. And within the kernel addresses, 93%
are on addresses from the linear address space and only 7% are on
addresses from the virtual address space.

MPC8xx has no BATs but it has 8Mb page size. This patch implements
mapping of kernel RAM using 8Mb pages, on the same model as what is
done on the 40x.

In 4k pages mode, each PGD entry maps a 4Mb area: we map every two
entries to the same 8Mb physical page. In each second entry, we add
4Mb to the page physical address to ease life of the FixupDAR
routine. This is just ignored by HW.

In 16k pages mode, each PGD entry maps a 64Mb area: each PGD entry
will point to the first page of the area. The DTLB handler adds
the 3 bits from EPN to map the correct page.

With this patch applied, we now get only 13 millions TLB misses
during the 10 minutes period. The idle time has increased to 313s
and the overall time spent in DTLB miss handler is 6.3s, which
represents 1% of the overall time and 2.2% of non-idle time.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <oss@buserror.net>

a372acfa

05 3月, 2016 1 次提交

powerpc/mm: any thread in one core can be the first to setup TLB1 · ebb9d30a

由 chenhui zhao 提交于 12月 24, 2015

On e6500, in the case of cpu hotplug, either thread in one core
may be the first thread initilzing the TLB1. The subsequent threads
must not setup it again.

The code is derived from the comment of Scott Wood.
Signed-off-by: NChenhui Zhao <chenhui.zhao@freescale.com>
Signed-off-by: NScott Wood <oss@buserror.net>

ebb9d30a

03 3月, 2016 4 次提交

powerpc/mm: add _PAGE_HASHPTE similar to 4K hash · c367a441

由 Aneesh Kumar K.V 提交于 3月 01, 2016

We don't need to update linux page table entry with _PAGE_HASHPTE early
in hash pte fault. A parallel pte update will loop via _PAGE_BUSY
and look at _PAGE_HASHPTE for a required hpte flush only if
_PAGE_BUSY is cleared. That ensures a pte update will wait for a
parallel hpte insert to finish before looking at _PAGE_HASHPTE bit.

To avoid further confusion drop setting _PAGE_HASHPTE in cmpxchg in __hash_page_4K.

commit 41743a4e ("powerpc: Free a PTE bit on ppc64 with 64K pages")
did similar change for 64K config
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c367a441

powerp/mm: Update code comments · e9a68147

由 Aneesh Kumar K.V 提交于 3月 01, 2016

We are updating pte in those functions.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e9a68147

mm: Some arch may want to use HPAGE_PMD related values as variables · ff20c2e0

由 Kirill A. Shutemov 提交于 3月 01, 2016

With next generation power processor, we are having a new mmu model
[1] that require us to maintain a different linux page table format.

Inorder to support both current and future ppc64 systems with a single
kernel we need to make sure kernel can select between different page
table format at runtime. With the new MMU (radix MMU) added, we will
have two different pmd hugepage size 16MB for hash model and 2MB for
Radix model. Hence make HPAGE_PMD related values as a variable.

Actual conversion of HPAGE_PMD to a variable for ppc64 happens in a
followup patch.

[1] http://ibm.biz/power-isa3 (Needs registration).
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ff20c2e0

powerpc/mm: Switch book3s 64 with 64K page size to 4 level page table · 368ced78

由 Aneesh Kumar K.V 提交于 3月 01, 2016

This is needed so that we can support both hash and radix page table
using single kernel. Radix kernel uses a 4 level table.

We now use physical address in upper page table tree levels. Even though
they are aligned to their size, for the masked bits we use the
bit positions as per PowerISA 3.0.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

368ced78

02 3月, 2016 1 次提交

powerpc/mm: Split hash page table sizing heuristic into a helper · 5c3c7ede

由 David Gibson 提交于 2月 09, 2016

htab_get_table_size() either retrieve the size of the hash page table (HPT)
from the device tree - if the HPT size is determined by firmware - or
uses a heuristic to determine a good size based on RAM size if the kernel
is responsible for allocating the HPT.

To support a PAPR extension allowing resizing of the HPT, we're going to
want the memory size -> HPT size logic elsewhere, so split it out into a
helper function.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5c3c7ede

01 3月, 2016 4 次提交

powerpc/mm: Clean up memory hotplug failure paths · 1dace6c6

由 David Gibson 提交于 2月 09, 2016

This makes a number of cleanups to handling of mapping failures during
memory hotplug on Power:

For errors creating the linear mapping for the hot-added region:
  * This is now reported with EFAULT which is more appropriate than the
    previous EINVAL (the failure is unlikely to be related to the
    function's parameters)
  * An error in this path now prints a warning message, rather than just
    silently failing to add the extra memory.
  * Previously a failure here could result in the region being partially
    mapped.  We now clean up any partial mapping before failing.

For errors creating the vmemmap for the hot-added region:
   * This is now reported with EFAULT instead of causing a BUG() - this
     could happen for external reason (e.g. full hash table) so it's better
     to handle this non-fatally
   * An error message is also printed, so the failure won't be silent
   * As above a failure could cause a partially mapped region, we now
     clean this up. [mpe: move htab_remove_mapping() out of #ifdef
     CONFIG_MEMORY_HOTPLUG to enable this]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NPaul Mackerras <paulus@samba.org>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1dace6c6

powerpc/mm: Handle removing maybe-present bolted HPTEs · 27828f98

由 David Gibson 提交于 2月 09, 2016

At the moment the hpte_removebolted callback in ppc_md returns void and
will BUG_ON() if the hpte it's asked to remove doesn't exist in the first
place.  This is awkward for the case of cleaning up a mapping which was
partially made before failing.

So, we add a return value to hpte_removebolted, and have it return ENOENT
in the case that the HPTE to remove didn't exist in the first place.

In the (sole) caller, we propagate errors in hpte_removebolted to its
caller to handle.  However, we handle ENOENT specially, continuing to
complete the unmapping over the specified range before returning the error
to the caller.

This means that htab_remove_mapping() will work sanely on a partially
present mapping, removing any HPTEs which are present, while also returning
ENOENT to its caller in case it's important there.

There are two callers of htab_remove_mapping():
   - In remove_section_mapping() we already WARN_ON() any error return,
     which is reasonable - in this case the mapping should be fully
     present
   - In vmemmap_remove_mapping() we BUG_ON() any error.  We change that to
     just a WARN_ON() in the case of ENOENT, since failing to remove a
     mapping that wasn't there in the first place probably shouldn't be
     fatal.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

27828f98

powerpc/mm: Clean up error handling for htab_remove_mapping · abd0a0e7

由 David Gibson 提交于 2月 09, 2016

Currently, the only error that htab_remove_mapping() can report is -EINVAL,
if removal of bolted HPTEs isn't implemeted for this platform.  We make
a few clean ups to the handling of this:

 * EINVAL isn't really the right code - there's nothing wrong with the
   function's arguments - use ENODEV instead
 * We were also printing a warning message, but that's a decision better
   left up to the callers, so remove it
 * One caller is vmemmap_remove_mapping(), which will just BUG_ON() on
   error, making the warning message redundant, so no change is needed
   there.
 * The other caller is remove_section_mapping().  This is called in the
   memory hot remove path at a point after vmemmap_remove_mapping() so
   if hpte_removebolted isn't implemented, we'd expect to have already
   BUG()ed anyway.  Put a WARN_ON() here, in lieu of a printk() since this
   really shouldn't be happening.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

abd0a0e7

powerpc: Fix misspellings in comments. · 446957ba

由 Adam Buchbinder 提交于 2月 24, 2016

Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

446957ba

29 2月, 2016 3 次提交

powerpc/mm/book3s-64: Move HPTE-related bits in PTE to upper end · a9d4996d

由 Paul Mackerras 提交于 2月 22, 2016

This moves the _PAGE_HASHPTE, _PAGE_F_GIX and _PAGE_F_SECOND fields in
the Linux PTE on 64-bit Book 3S systems to the most significant byte.
Of the 5 bits, one is a software-use bit and the other four are
reserved bit positions in the PowerISA v3.0 radix PTE format.
Using these bits is OK because these bits are all to do with tracking
the HPTE(s) associated with the Linux PTE, and therefore won't be
needed in radix mode.  This frees up bit positions in the lower two
bytes.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a9d4996d

powerpc/mm/book3s-64: Move _PAGE_PRESENT to the most significant bit · 849f86a6

由 Paul Mackerras 提交于 2月 22, 2016

This changes _PAGE_PRESENT for 64-bit Book 3S processors from 0x2 to
0x8000_0000_0000_0000, because that is where PowerISA v3.0 CPUs in
radix mode will expect to find it.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

849f86a6

powerpc/mm/book3s-64: Use physical addresses in upper page table tree levels · c61a8843

由 Paul Mackerras 提交于 2月 23, 2016

This changes the Linux page tables to store physical addresses
rather than kernel virtual addresses in the upper levels of the
tree (pgd, pud and pmd) for 64-bit Book 3S machines.

This also changes the hugepd pointers used to implement hugepages
when the base page size is 4k to store physical addresses rather than
virtual addresses (again just for 64-bit Book3S machines).

This frees up some high order bits, and will be needed with
PowerISA v3.0 machines which read the page table tree in hardware
in radix mode.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c61a8843

27 2月, 2016 2 次提交

powerpc/mm/book3s-64: Free up 7 high-order bits in the Linux PTE · f1a9ae03

由 Paul Mackerras 提交于 2月 22, 2016

This frees up bits 57-63 in the Linux PTE on 64-bit Book 3S machines.
In the 4k page case, this is done just by reducing the size of the
RPN field to 39 bits, giving 51-bit real addresses.  In the 64k page
case, we had 10 unused bits in the middle of the PTE, so this moves
the RPN field down 10 bits to make use of those unused bits.  This
means the RPN field is now 3 bits larger at 37 bits, giving 53-bit
real addresses in the normal case, or 49-bit real addresses for the
special 4k PFN case.

We are doing this in order to be able to move some other PTE bits
into the positions where PowerISA V3.0 processors will expect to
find them in radix-tree mode.  Ultimately we will be able to move
the RPN field to lower bit positions and make it larger.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f1a9ae03

P
powerpc/mm/book3s-64: Clean up some obsolete or misleading comments · 1ec3f937
由 Paul Mackerras 提交于 2月 22, 2016
```
No code changes.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
```
1ec3f937

22 2月, 2016 1 次提交

powerpc/mm/hash: Clear the invalid slot information correctly · 9ab3ac23

由 Aneesh Kumar K.V 提交于 2月 20, 2016

We can get a hash pte fault with 4k base page size and find the pte
already inserted with 64K base page size. In that case we need to clear
the existing slot information from the old pte. Fix this correctly

With THP, we also clear the slot information with respect to all
the 64K hash pte mapping that 16MB page. They are all invalid
now. This make sure we don't find the slot valid when we fault with
4k base page size. Finding the slot valid should not result in any wrong
behavior because we do check again in hash page table for the validity.
But we can avoid that check completely.

Fixes: a43c0eb8 ("powerpc/mm: Convert 4k hash insert to C")
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9ab3ac23

15 2月, 2016 1 次提交

powerpc/mm: Fix Multi hit ERAT cause by recent THP update · c777e2a8

由 Aneesh Kumar K.V 提交于 2月 09, 2016

With ppc64 we use the deposited pgtable_t to store the hash pte slot
information. We should not withdraw the deposited pgtable_t without
marking the pmd none. This ensure that low level hash fault handling
will skip this huge pte and we will handle them at upper levels.

Recent change to pmd splitting changed the above in order to handle the
race between pmd split and exit_mmap. The race is explained below.

Consider following race:

		CPU0				CPU1
shrink_page_list()
  add_to_swap()
    split_huge_page_to_list()
      __split_huge_pmd_locked()
        pmdp_huge_clear_flush_notify()
	// pmd_none() == true
					exit_mmap()
					  unmap_vmas()
					    zap_pmd_range()
					      // no action on pmd since pmd_none() == true
	pmd_populate()

As result the THP will not be freed. The leak is detected by check_mm():

	BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512

The above required us to not mark pmd none during a pmd split.

The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
level fault handling code skip this pte. At higher level we do take ptl
lock. That should serialze us against the pmd split. Once the lock is
acquired we do check the pmd again using pmd_same. That should always
return false for us and hence we should retry the access. We do the
pmd_same check in all case after taking plt with
THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
huge_pmd_set_accessed)

Also make sure we wait for irq disable section in other cpus to finish
before flipping a huge pte entry with a regular pmd entry. Code paths
like find_linux_pte_or_hugepte depend on irq disable to get
a stable pte_t pointer. A parallel thp split need to make sure we
don't convert a pmd pte to a regular pmd entry without waiting for the
irq disable section to finish.

Fixes: eef1b3ba ("thp: implement split_huge_pmd()")
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c777e2a8

25 1月, 2016 1 次提交

powerpc/mm: Allow user space to map rtas_rmo_buf · e256caa7

由 Vasant Hegde 提交于 1月 21, 2016

With commit 90a545e9 (restrict /dev/mem to idle io memory ranges) mapping
rtas_rmo_buf from user space is failing. Hence we are not able to make
RTAS syscall.

This patch calls page_is_rtas_user_buf before calling iomem_is_exclusive
in  devmem_is_allowed(). This will allow user space to map rtas_rmo_buf
and we are able to make RTAS syscall.
Reported-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
CC: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e256caa7

16 1月, 2016 3 次提交

powerpc, thp: remove infrastructure for handling splitting PMDs · 7aa9a23c

由 Kirill A. Shutemov 提交于 1月 15, 2016

With new refcounting we don't need to mark PMDs splitting.  Let's drop
code to handle this.

pmdp_splitting_flush() is not needed too: on splitting PMD we will do
pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
needed for fast_gup.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7aa9a23c

mm: drop tail page refcounting · ddc58f27

由 Kirill A. Shutemov 提交于 1月 15, 2016

Tail page refcounting is utterly complicated and painful to support.

It uses ->_mapcount on tail pages to store how many times this page is
pinned.  get_page() bumps ->_mapcount on tail page in addition to
->_count on head.  This information is required by split_huge_page() to
be able to distribute pins from head of compound page to tails during
the split.

We will need ->_mapcount to account PTE mappings of subpages of the
compound page.  We eliminate need in current meaning of ->_mapcount in
tail pages by forbidding split entirely if the page is pinned.

The only user of tail page refcounting is THP which is marked BROKEN for
now.

Let's drop all this mess.  It makes get_page() and put_page() much
simpler.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: NSasha Levin <sasha.levin@oracle.com>
Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NJerome Marchand <jmarchan@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ddc58f27

thp: rename split_huge_page_pmd() to split_huge_pmd() · 78ddc534

由 Kirill A. Shutemov 提交于 1月 15, 2016

We are going to decouple splitting THP PMD from splitting underlying
compound page.

This patch renames split_huge_page_pmd*() functions to split_huge_pmd*()
to reflect the fact that it doesn't imply page splitting, only PMD.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: NSasha Levin <sasha.levin@oracle.com>
Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NJerome Marchand <jmarchan@redhat.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

78ddc534

09 1月, 2016 1 次提交

powerpc: Fix build break due to paca mm_context_t changes · c33e54fa

由 Michael Ellerman 提交于 1月 09, 2016

Commit 2fc251a8 ("powerpc: Copy only required pieces of the
mm_context_t to the paca") broke the build for CONFIG_PPC_STD_MMU_64=y
and CONFIG_PPC_MM_SLICES=n.

That only happens for a kernel built with 4K pages and HUGETLB disabled,
which is why we missed it.

Fix it by adding a mm_ctx_user_psize member to the paca and populating
it in the appropriate places.

Fixes: 2fc251a8 ("powerpc: Copy only required pieces of the mm_context_t to the paca")
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c33e54fa

27 12月, 2015 1 次提交

powerpc: Copy only required pieces of the mm_context_t to the paca · 2fc251a8

由 Michael Neuling 提交于 12月 11, 2015

Currently we copy the whole mm_context_t to the paca but only access a
few bits of it.  This is wasteful of space paca and also takes quite
some time in the hot path of context switching.

This patch pulls in only the required bits from the mm_context_t to
the paca and on context switch, copies only those.

Benchmarking this (On top of Anton's recent MSR context switching
changes [1]) using processes and yield shows an improvement of almost
3% on POWER8:

  http://ozlabs.org/~anton/junkcode/context_switch2.c
  ./context_switch2 --test=yield --process 0 0

1. https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-October/135700.htmlSigned-off-by: NMichael Neuling <mikey@neuling.org>
[mpe: Rename paca fields to be mm_ctx_foo rather than context_foo]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2fc251a8

23 12月, 2015 1 次提交

powerpc/e6500: add locking to hugetlb · 2749539b

由 Scott Wood 提交于 12月 04, 2015

e6500 has threads but does not have TLB write conditional. Thus,
the hugetlb code needs to take the same lock that the normal TLB miss
handlers take, to ensure that the tlbsx and tlbwe are atomic.
Signed-off-by: NScott Wood <scottwood@freescale.com>

2749539b

19 12月, 2015 1 次提交

powerpc: Add function to copy mm_context_t to the paca · c395465d

由 Michael Neuling 提交于 10月 28, 2015

This adds a function to copy the mm->context to the paca.  This is
only a basic conversion for now but will be used more extensively in
the next patch.

This also adds #ifdef CONFIG_PPC_BOOK3S around this code since it's
not used elsewhere.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c395465d

14 12月, 2015 7 次提交

powerpc/mm: Don't open code pgtable_t size · 4dcbd88e

由 Aneesh Kumar K.V 提交于 12月 01, 2015

The slot information of base page size hash pte is stored in the
pgtable_t w.r.t transparent hugepage. We need to make sure we don't
index beyond pgtable_t size.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4dcbd88e

powerpc/mm: Don't hardcode page table size · 62607bc6

由 Aneesh Kumar K.V 提交于 12月 01, 2015

pte and pmd table size are dependent on config items. Don't
hard code the same. This make sure we use the right value
when masking pmd entries and also while checking pmd_bad
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

62607bc6

powerpc/mm: Add a _PAGE_PTE bit · 6a119eae

由 Aneesh Kumar K.V 提交于 12月 01, 2015

For a pte entry we will have _PAGE_PTE set. Our pte page
address have a minimum alignment requirement of HUGEPD_SHIFT_MASK + 1.
We use the lower 7 bits to indicate hugepd. ie.

For pmd and pgd we can find:
1) _PAGE_PTE set pte -> indicate PTE
2) bits [2..6] non zero -> indicate hugepd.
   They also encode the size. We skip bit 1 (_PAGE_PRESENT).
3) othewise pointer to next table.
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6a119eae

powerpc/mm: Move THP headers around · e34aa03c

由 Aneesh Kumar K.V 提交于 12月 01, 2015

We support THP only with book3s_64 and 64K page size. Move
THP details to hash64-64k.h to clarify the same.
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e34aa03c

powerpc/mm: Move hugetlb related headers · 26a344ae

由 Aneesh Kumar K.V 提交于 12月 01, 2015

W.r.t hugetlb, we support two format for pmd. With book3s_64 and
64K linux page size, we can have pte at the pmd level. Hence we
don't need to support hugepd there. For everything else hugepd
is supported and pmd_huge is (0).
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

26a344ae

powerpc/mm: Move WIMG update to helper. · 40e8550a

由 Aneesh Kumar K.V 提交于 12月 01, 2015

Only difference here is, we apply the WIMG mapping early, so rflags
passed to updatepp will also be changed.
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

40e8550a

powerpc/mm: Add helper for converting pte bit to hpte bits · c6a3c495

由 Aneesh Kumar K.V 提交于 12月 01, 2015

Instead of open coding it in multiple code paths, export the helper
and add more documentation. Also make sure we don't make assumption
regarding pte bit position
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c6a3c495