提交 · a7e73e711e48e138a25265c156cca9a1ed109b50 · openeuler / raspberrypi-kernel

17 4月, 2015 2 次提交

powerpc/mm/thp: Return pte address if we find trans_splitting. · 7d6e7f7f

由 Aneesh Kumar K.V 提交于 3月 30, 2015

For THP that is marked trans splitting, we return the pte.
This require the callers to handle the pmd_trans_splitting scenario,
if they care. All the current callers are either looking at pfn or
write_ok, hence we don't need to update them.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7d6e7f7f

powerpc/mm/thp: Make page table walk safe against thp split/collapse · 691e95fd

由 Aneesh Kumar K.V 提交于 3月 30, 2015

We can disable a THP split or a hugepage collapse by disabling irq.
We do send IPI to all the cpus in the early part of split/collapse,
and disabling local irq ensure we don't make progress with
split/collapse. If the THP is getting split we return NULL from
find_linux_pte_or_hugepte(). For all the current callers it should be ok.
We need to be careful if we want to use returned pte_t pointer outside
the irq disabled region. W.r.t to THP split, the pfn remains the same,
but then a hugepage collapse will result in a pfn change. There are
few steps we can take to avoid a hugepage collapse.One way is to take page
reference inside the irq disable region. Other option is to take
mmap_sem so that a parallel collapse will not happen. We can also
disable collapse by taking pmd_lock. Another method used by kvm
subsystem is to check whether we had a mmu_notifer update in between
using mmu_notifier_retry().
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

691e95fd

16 4月, 2015 1 次提交

powerpc/hugetlb: Call mm_dec_nr_pmds() in hugetlb_free_pmd_range() · 50c6a665

由 Scott Wood 提交于 4月 10, 2015

Commit dc6c9a35 ("mm: account pmd page tables to the process")
added a counter that is incremented whenever a PMD is allocated and
decremented whenever a PMD is freed.  For hugepages on PPC, common code
is used to allocated PMDs, but arch-specific code is used to free PMDs.

This results in kernel output such as "BUG: non-zero nr_pmds on freeing
mm: 1" when using hugepages.

Update the PPC hugepage PMD freeing code to decrement the count, just
as the above commit did for free_pmd_range().

Fixes: dc6c9a35 ("mm: account pmd page tables to the process")
Signed-off-by: NScott Wood <scottwood@freescale.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 4.0.x

50c6a665

15 4月, 2015 2 次提交

mm: expose arch_mmap_rnd when available · 2b68f6ca

由 Kees Cook 提交于 4月 14, 2015

When an architecture fully supports randomizing the ELF load location,
a per-arch mmap_rnd() function is used to find a randomized mmap base.
In preparation for randomizing the location of ET_DYN binaries
separately from mmap, this renames and exports these functions as
arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE
for describing this feature on architectures that support it
(which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390
already supports a separated ET_DYN ASLR from mmap ASLR without the
ARCH_BINFMT_ELF_RANDOMIZE_PIE logic).
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Russell King <linux@arm.linux.org.uk>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Arun Chandran <achandran@mvista.com>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Min-Hua Chen <orca.chen@gmail.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Vineeth Vijayan <vvijayan@mvista.com>
Cc: Jeff Bailey <jeffbailey@google.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Behan Webster <behanw@converseincode.com>
Cc: Ismael Ripoll <iripoll@upv.es>
Cc: Jan-Simon Mller <dl9pf@gmx.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2b68f6ca

powerpc: standardize mmap_rnd() usage · ed632274

由 Kees Cook 提交于 4月 14, 2015

In preparation for splitting out ET_DYN ASLR, this refactors the use of
mmap_rnd() to be used similarly to arm and x86.

(Can mmap ASLR be safely enabled in the legacy mmap case here?  Other
archs use "mm->mmap_base = TASK_UNMAPPED_BASE + random_factor".)
Signed-off-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ed632274

10 4月, 2015 2 次提交

powerpc: Replace mem_init_done with slab_is_available() · f691fa10

由 Michael Ellerman 提交于 3月 30, 2015

We have a powerpc specific global called mem_init_done which is "set on
boot once kmalloc can be called".

But that's not *quite* true. We set it at the bottom of mem_init(), and
rely on the fact that mm_init() calls kmem_cache_init() immediately
after that, and nothing is running in parallel.

So replace it with the generic and 100% correct slab_is_available().
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f691fa10

powerpc: Fix compile errors with STRICT_MM_TYPECHECKS enabled · 4f9c53c8

由 Michael Ellerman 提交于 3月 25, 2015

Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Fix the 32-bit code also]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4f9c53c8

07 4月, 2015 2 次提交

powerpc/mm: Change setbat() to take a pgprot_t rather than flags · 5dd4e4f6

由 Michael Ellerman 提交于 3月 25, 2015

The callers of setbat() are actually passing a pgprot_t for the flags
parameter. This doesn't matter unless STRICT_MM_TYPECHECKS is enabled.
So we can turn that on without breaking the build, change setbat() to
take a pgprot_t and have it convert it to an unsigned long internally.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5dd4e4f6

powerpc/mm: Remove duplicate declaration of setbat() · 91108335

由 Michael Ellerman 提交于 3月 25, 2015

This is already declared in mmu_decl.h, so we don't need a second
version in the C file.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

91108335

26 3月, 2015 1 次提交

powerpc/mm: Free string after creating kmem cache · e77553cb

由 Yanjiang Jin 提交于 2月 27, 2015

kmem_cache_create()->kmem_cache_create_memcg()->kstrdup() allocates new
space and copys name's content, so it is safe to free name memory after
calling kmem_cache_create(). Else kmemleak will report the below
warning:

unreferenced object 0xc0000000f9002160 (size 16):
  comm "swapper/0", pid 0, jiffies 4294892296 (age 1386.640s)
  hex dump (first 16 bytes):
    70 67 74 61 62 6c 65 2d 32 5e 39 00 de ad be ef  pgtable-2^9.....
  backtrace:
    [<c0000000004e03ec>] .kvasprintf+0x5c/0xa0
    [<c0000000004e045c>] .kasprintf+0x2c/0x50
    [<c00000000002e36c>] .pgtable_cache_add+0xac/0x100
    [<c00000000002e3e4>] .pgtable_cache_init+0x24/0x80
    [<c000000000c6c67c>] .start_kernel+0x228/0x4c8
    [<c000000000000594>] .start_here_common+0x24/0x90
Signed-off-by: NYanjiang Jin <yanjiang.jin@windriver.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e77553cb

23 3月, 2015 2 次提交

powerpc/32: %pF is only for function pointers · cc83458d

由 Scott Wood 提交于 3月 11, 2015

Use %pS for actual addresses, otherwise you'll get bad output
on arches like ppc64 where %pF expects a function descriptor.  Even on
other architectures, refrain from setting a bad example that people
copy.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

cc83458d

powerpc/numa: Reset node_possible_map to only node_online_map · 3af229f2

由 Nishanth Aravamudan 提交于 3月 10, 2015

Raghu noticed an issue with excessive memory allocation on power with a
simple cgroup test, specifically, in mem_cgroup_css_alloc ->
for_each_node -> alloc_mem_cgroup_per_zone_info(), which ends up blowing
up the kmalloc-2048 slab (to the order of 200MB for 400 cgroup
directories).

The underlying issue is that NODES_SHIFT on power is 8 (256 NUMA nodes
possible), which defines node_possible_map, which in turn defines the
value of nr_node_ids in setup_nr_node_ids and the iteration of
for_each_node.

In practice, we never see a system with 256 NUMA nodes, and in fact, we
do not support node hotplug on power in the first place, so the nodes
that are online when we come up are the nodes that will be present for
the lifetime of this kernel. So let's, at least, drop the NUMA possible
map down to the online map at runtime. This is similar to what x86 does
in its initialization routines.

mem_cgroup_css_alloc should also be fixed to only iterate over
memory-populated nodes and handle hotplug, but that is a separate
change.
Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3af229f2

18 3月, 2015 4 次提交

powerpc/vphn: parsing code rewrite · 3338a65b

由 Greg Kurz 提交于 2月 23, 2015

The current VPHN parsing logic has some flaws that this patch aims to fix:

1) when the value 0xffff is read, the value 0xffffffff gets added to the
the output list and its element count isn't incremented. This is wrong.
According to PAPR+ the domain identifiers are packed into a sequence
terminated by the "reserved value of all ones". This means that 0xffff
is a stream terminator.

2) the combination of byteswaps and casts make the code hardly readable.
Let's parse the stream one 16-bit field at a time instead.

3) it is assumed that the hypercall returns 12 32-bit values packed into
6 64-bit registers. According to PAPR+, the domain identifiers may be
streamed as 16-bit values. Let's increase the number of expected numbers
to 24.
Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3338a65b

powerpc/vphn: move VPHN parsing logic to a separate file · 4b6cfb2a

由 Greg Kurz 提交于 2月 23, 2015

The goal behind this patch is to be able to write userland tests for the
VPHN parsing code.
Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4b6cfb2a

powerpc/vphn: move endianness fixing to vphn_unpack_associativity() · b1fc9484

由 Greg Kurz 提交于 2月 23, 2015

The first argument to vphn_unpack_associativity() is a const long *, but the
parsing code expects __be64 values actually. Let's move the endian fixing
down for consistency.
Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b1fc9484

powerpc/vphn: clarify the H_HOME_NODE_ASSOCIATIVITY API · 9d0e6d9d

由 Greg Kurz 提交于 2月 23, 2015

The number of values returned by the H_HOME_NODE_ASSOCIATIVITY h_call deserves
to be explicitly defined, for a better understanding of the code.
Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9d0e6d9d

17 3月, 2015 1 次提交

powerpc: Remove duplicate cacheable_memcpy/memzero functions · b05ae4ee

由 Kyle Moffett 提交于 11月 14, 2011

These functions are only used from one place each.  If the cacheable_*
versions really are more efficient, then those changes should be
migrated into the common code instead.

NOTE: The old routines are just flat buggy on kernels that support
      hardware with different cacheline sizes.
Signed-off-by: NKyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

b05ae4ee

17 2月, 2015 1 次提交

powerpc: drop _PAGE_FILE and pte_file()-related helpers · 780fc564

由 Kirill A. Shutemov 提交于 2月 16, 2015

We've replaced remap_file_pages(2) implementation with emulation.  Nobody
creates non-linear mapping anymore.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

780fc564

13 2月, 2015 2 次提交

ppc64: add paranoid warnings for unexpected DSISR_PROTFAULT · 842915f5

由 Mel Gorman 提交于 2月 12, 2015

ppc64 should not be depending on DSISR_PROTFAULT and it's unexpected if
they are triggered.  This patch adds warnings just in case they are being
accidentally depended upon.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: NSasha Levin <sasha.levin@oracle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

842915f5

mm: convert p[te|md]_numa users to p[te|md]_protnone_numa · 8a0516ed

由 Mel Gorman 提交于 2月 12, 2015

Convert existing users of pte_numa and friends to the new helper.  Note
that the kernel is broken after this patch is applied until the other page
table modifiers are also altered.  This patch layout is to make review
easier.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Acked-by: NAneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: NSasha Levin <sasha.levin@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8a0516ed

12 2月, 2015 2 次提交

arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma() · 1757bbd9

由 Naoya Horiguchi 提交于 2月 11, 2015

We don't have to use mm_walk->private to pass vma to the callback function
because of mm_walk->vma.  And walk_page_vma() is useful if we walk over a
single vma.
Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1757bbd9

mm/hugetlb: reduce arch dependent code around follow_huge_* · 61f77eda

由 Naoya Horiguchi 提交于 2月 11, 2015

Currently we have many duplicates in definitions around
follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this
patch tries to remove the m.  The basic idea is to put the default
implementation for these functions in mm/hugetlb.c as weak symbols
(regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement
arch-specific code only when the arch needs it.

For follow_huge_addr(), only powerpc and ia64 have their own
implementation, and in all other architectures this function just returns
ERR_PTR(-EINVAL).  So this patch sets returning ERR_PTR(-EINVAL) as
default.

As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to
always return 0 in your architecture (like in ia64 or sparc,) it's never
called (the callsite is optimized away) no matter how implemented it is.
So in such architectures, we don't need arch-specific implementation.

In some architecture (like mips, s390 and tile,) their current
arch-specific follow_huge_(pmd|pud)() are effectively identical with the
common code, so this patch lets these architecture use the common code.

One exception is metag, where pmd_huge() could return non-zero but it
expects follow_huge_pmd() to always return NULL.  This means that we need
arch-specific implementation which returns NULL.  This behavior looks
strange to me (because non-zero pmd_huge() implies that the architecture
supports PMD-based hugepage, so follow_huge_pmd() can/should return some
relevant value,) but that's beyond this cleanup patch, so let's keep it.

Justification of non-trivial changes:
- in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this
  patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE
  is true when follow_huge_pmd() can be called (note that pmd_huge() has
  the same check and always returns 0 for !MACHINE_HAS_HPAGE.)
- in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common
  code. This patch forces these archs use PMD_MASK, but it's OK because
  they are identical in both archs.
  In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20.
  In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and
  PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but
  PTE_ORDER is always 0, so these are identical.
Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: NHugh Dickins <hughd@google.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

61f77eda

04 2月, 2015 1 次提交

powerpc/mm: Warn on flushing tlb page in kernel context · c2c896be

由 Arseny Solokha 提交于 2月 04, 2015

Function __flush_tlb_page() must only be called for user contexts, so
put in extra hardening to warn on calling it for kernel context.
Signed-off-by: NArseny Solokha <asolokha@kb.kras.ru>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c2c896be

31 1月, 2015 1 次提交

powerpc/mm: bail out early when flushing TLB page · 0dc294f7

由 Arseny Solokha 提交于 1月 30, 2015

MMU_NO_CONTEXT is conditionally defined as 0 or (unsigned int)-1. However,
in __flush_tlb_page() a corresponding variable is only tested for open
coded 0, which can cause NULL pointer dereference if `mm' argument was
legitimately passed as such.

Bail out early in case the first argument is NULL, thus eliminate confusion
between different values of MMU_NO_CONTEXT and avoid disabling and then
re-enabling preemption unnecessarily.
Signed-off-by: NArseny Solokha <asolokha@kb.kras.ru>
Signed-off-by: NScott Wood <scottwood@freescale.com>

0dc294f7

30 1月, 2015 5 次提交

powerpc32: Use kmem_cache memory for PGDIR · ce67f5d0

由 LEROY Christophe 提交于 1月 20, 2015

When pages are not 4K, PGDIR table is allocated with kmalloc(). In order to
optimise TLB handlers, aligned memory is needed. kmalloc() doesn't provide
aligned memory blocks, so lets use a kmem_cache pool instead.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <scottwood@freescale.com>

ce67f5d0

powerpc/8xx: reduce pressure on TLB due to context switches · debddd95

由 LEROY Christophe 提交于 1月 19, 2015

For nohash powerpc, when we run out of contexts, contexts are freed by stealing
used contexts in-turn. When a victim has been selected, the associated TLB
entries are freed using _tlbil_pid(). Unfortunatly, on the PPC 8xx, _tlbil_pid()
does a tlbia, hence flushes ALL TLB entries and not only the one linked to the
stolen context. Therefore, as implented today, at each task switch requiring a
new context, all entries are flushed.

This patch modifies the implementation so that when running out of contexts, all
contexts get freed at once, hence dividing the number of calls to tlbia by 16.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <scottwood@freescale.com>

debddd95

powerpc32: adds handling of _PAGE_RO · a7b9f671

由 LEROY Christophe 提交于 1月 19, 2015

Some powerpc like the 8xx don't have a RW bit in PTE bits but a RO
(Read Only) bit.  This patch implements the handling of a _PAGE_RO flag
to be used in place of _PAGE_RW
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
[scottwood@freescale.com: fix whitespace]
Signed-off-by: NScott Wood <scottwood@freescale.com>

a7b9f671

powerpc: Remove duplicate tlbcam_index declarations · 238cac16

由 Emil Medve 提交于 1月 21, 2015

They seem to be leftovers from '14cf11af powerpc: Merge enough to start
building in arch/powerpc'
Signed-off-by: NEmil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: NScott Wood <scottwood@freescale.com>

238cac16

vm: add VM_FAULT_SIGSEGV handling support · 33692f27

由 Linus Torvalds 提交于 1月 29, 2015

The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.

That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works.  However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.

In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV.  And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.

However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space.  And user space really
expected SIGSEGV, not SIGBUS.

To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it.  They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.

This is the mindless minimal patch to do this.  A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.

Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.
Reported-and-tested-by: NTakashi Iwai <tiwai@suse.de>
Tested-by: NJan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

33692f27

28 1月, 2015 1 次提交

powerpc: Remove some unused functions · 8aa989b8

由 Michael Ellerman 提交于 1月 27, 2015

Remove slice_set_psize() which is not used.

It was added in 3a8247cc "powerpc: Only demote individual slices
rather than whole process" but was never used.

Remove vsx_assist_exception() which is not used.

It was added in ce48b210 "powerpc: Add VSX context save/restore,
ptrace and signal support" but was never used.

Remove generic_mach_cpu_die() which is not used.

Its last caller was removed in 375f561a "powerpc/powernv: Always go
into nap mode when CPU is offline".

Remove mpc7448_hpc2_power_off() and mpc7448_hpc2_halt() which are
unused.

These were introduced in c5d56332 "[POWERPC] Add general support for
mpc7448hpc2 (Taiga) platform" but were never used.

This was partially found by using a static code analysis program called
cppcheck.
Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
[mpe: Update changelog with details on when/why they are unused]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8aa989b8

19 1月, 2015 1 次提交

ppc/hugetlbfs: Replace ACCESS_ONCE with READ_ONCE · da1a288d

由 Christian Borntraeger 提交于 1月 06, 2015

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the ppc/hugetlbfs code to replace ACCESS_ONCE with READ_ONCE.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

da1a288d

14 12月, 2014 1 次提交

mm/debug-pagealloc: make debug-pagealloc boottime configurable · 031bc574

由 Joonsoo Kim 提交于 12月 12, 2014

Now, we have prepared to avoid using debug-pagealloc in boottime.  So
introduce new kernel-parameter to disable debug-pagealloc in boottime, and
makes related functions to be disabled in this case.

Only non-intuitive part is change of guard page functions.  Because guard
page is effective only if debug-pagealloc is enabled, turning off
according to debug-pagealloc is reasonable thing to do.
Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

031bc574

05 12月, 2014 1 次提交

powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault · aefa5688

由 Aneesh Kumar K.V 提交于 12月 04, 2014

upatepp can get called for a nohpte fault when we find from the linux
page table that the translation was hashed before. In that case
we are sure that there is no existing translation, hence we could
avoid doing tlbie.

We could possibly race with a parallel fault filling the TLB. But
that should be ok because updatepp is only ever relaxing permissions.
We also look at linux pte permission bits when filling hash pte
permission bits. We also hold the linux pte busy bits while
inserting/updating a hashpte entry, hence a paralle update of
linux pte is not possible. On the other hand mprotect involves
ptep_modify_prot_start which cause a hpte invalidate and not updatepp.

Performance number:
We use randbox_access_bench written by Anton.

Kernel with THP disabled and smaller hash page table size.

86.60% random_access_b [kernel.kallsyms] [k] .native_hpte_updatepp
2.10% random_access_b random_access_bench [.] doit
1.99% random_access_b [kernel.kallsyms] [k] .do_raw_spin_lock
1.85% random_access_b [kernel.kallsyms] [k] .native_hpte_insert
1.26% random_access_b [kernel.kallsyms] [k] .native_flush_hash_range
1.18% random_access_b [kernel.kallsyms] [k] .__delay
0.69% random_access_b [kernel.kallsyms] [k] .native_hpte_remove
0.37% random_access_b [kernel.kallsyms] [k] .clear_user_page
0.34% random_access_b [kernel.kallsyms] [k] .__hash_page_64K
0.32% random_access_b [kernel.kallsyms] [k] fast_exception_return
0.30% random_access_b [kernel.kallsyms] [k] .hash_page_mm

With Fix:

27.54% random_access_b random_access_bench [.] doit
22.90% random_access_b [kernel.kallsyms] [k] .native_hpte_insert
5.76% random_access_b [kernel.kallsyms] [k] .native_hpte_remove
5.20% random_access_b [kernel.kallsyms] [k] fast_exception_return
5.12% random_access_b [kernel.kallsyms] [k] .__hash_page_64K
4.80% random_access_b [kernel.kallsyms] [k] .hash_page_mm
3.31% random_access_b [kernel.kallsyms] [k] data_access_common
1.84% random_access_b [kernel.kallsyms] [k] .trace_hardirqs_on_caller
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

aefa5688

02 12月, 2014 4 次提交

powerpc/mm/thp: Use tlbiel if possible · d557b098

由 Aneesh Kumar K.V 提交于 11月 02, 2014

If we know that user address space has never executed on other cpus
we could use tlbiel.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d557b098

powerpc/mm/thp: Remove code duplication · f1581bf1

由 Aneesh Kumar K.V 提交于 11月 02, 2014

Rename invalidate_old_hpte to flush_hash_hugepage and use that in
other places.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f1581bf1

powerpc/mm/hugetlb: Sanity check gigantic hugepage count · c4f3eb5f

由 James Yang 提交于 11月 14, 2014

Limit the number of gigantic hugepages specified by the
hugepages= parameter to MAX_NUMBER_GPAGES.
Signed-off-by: NJames Yang <James.Yang@freescale.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

c4f3eb5f

powerpc/mm: Check for matching hpte without taking hpte lock · 0ec2698f

由 Aneesh Kumar K.V 提交于 11月 03, 2014

With smaller hash page table config, we would end up in situation
where we would be replacing hash page table slot frequently. In
such config, we will find the hpte to be not matching, and we
can do that check without holding the hpte lock. We need to
recheck the hpte again after holding lock.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0ec2698f

25 11月, 2014 1 次提交

of/reconfig: Always use the same structure for notifiers · f5242e5a

由 Grant Likely 提交于 11月 24, 2014

The OF_RECONFIG notifier callback uses a different structure depending
on whether it is a node change or a property change. This is silly, and
not very safe. Rework the code to use the same data structure regardless
of the type of notifier.
Signed-off-by: NGrant Likely <grant.likely@linaro.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Cc: <linuxppc-dev@lists.ozlabs.org>

f5242e5a

19 11月, 2014 1 次提交

powerpc: Remove more traces of bootmem · e39f223f

由 Michael Ellerman 提交于 11月 18, 2014

Although we are now selecting NO_BOOTMEM, we still have some traces of
bootmem lying around. That is because even with NO_BOOTMEM there is
still a shim that converts bootmem calls into memblock calls, but
ultimately we want to remove all traces of bootmem.

Most of the patch is conversions from alloc_bootmem() to
memblock_virt_alloc(). In general a call such as:

  p = (struct foo *)alloc_bootmem(x);

Becomes:

  p = memblock_virt_alloc(x, 0);

We don't need the cast because memblock_virt_alloc() returns a void *.
The alignment value of zero tells memblock to use the default alignment,
which is SMP_CACHE_BYTES, the same value alloc_bootmem() uses.

We remove a number of NULL checks on the result of
memblock_virt_alloc(). That is because memblock_virt_alloc() will panic
if it can't allocate, in exactly the same way as alloc_bootmem(), so the
NULL checks are and always have been redundant.

The memory returned by memblock_virt_alloc() is already zeroed, so we
remove several memsets of the result of memblock_virt_alloc().

Finally we convert a few uses of __alloc_bootmem(x, y, MAX_DMA_ADDRESS)
to just plain memblock_virt_alloc(). We don't use memblock_alloc_base()
because MAX_DMA_ADDRESS is ~0ul on powerpc, so limiting the allocation
to that is pointless, 16XB ought to be enough for anyone.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e39f223f

17 11月, 2014 1 次提交

mmu_gather: move minimal range calculations into generic code · fb7332a9

由 Will Deacon 提交于 10月 29, 2014

On architectures with hardware broadcasting of TLB invalidation messages
, it makes sense to reduce the range of the mmu_gather structure when
unmapping page ranges based on the dirty address information passed to
tlb_remove_tlb_entry.

arm64 already does this by directly manipulating the start/end fields
of the gather structure, but this confuses the generic code which
does not expect these fields to change and can end up calculating
invalid, negative ranges when forcing a flush in zap_pte_range.

This patch moves the minimal range calculation out of the arm64 code
and into the generic implementation, simplifying zap_pte_range in the
process (which no longer needs to care about start/end, since they will
point to the appropriate ranges already). With the range being tracked
by core code, the need_flush flag is dropped in favour of checking that
the end of the range has actually been set.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

fb7332a9