提交 · 8d30c14cab30d405a05f2aaceda1e9ad57800f36 · openeuler / Kernel

11 2月, 2009 4 次提交

powerpc/mm: Rework I$/D$ coherency (v3) · 8d30c14c

由 Benjamin Herrenschmidt 提交于 2月 10, 2009

This patch reworks the way we do I and D cache coherency on PowerPC.

The "old" way was split in 3 different parts depending on the processor type:

   - Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.

   - Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().

   - Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.

That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.

Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.

This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c

The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).

If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.

For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.

For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.

This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

8d30c14c

powerpc/mm: Move 64-bit unmapped_area to top of address space · 91b0f5ec

由 Anton Blanchard 提交于 2月 09, 2009

We currently place mmaps just below the stack on 32bit, but leave them
in the middle of the address space on 64bit:

00100000-00120000 r-xp 00100000 00:00 0 [vdso]
10000000-10010000 r-xp 00000000 08:06 179534 /tmp/sleep
10010000-10020000 rw-p 00000000 08:06 179534 /tmp/sleep
10020000-10130000 rw-p 10020000 00:00 0 [heap]
40000000000-40000030000 r-xp 00000000 08:06 440743 /lib64/ld-2.9.so
40000030000-40000040000 rw-p 00020000 08:06 440743 /lib64/ld-2.9.so
40000050000-400001f0000 r-xp 00000000 08:06 440671 /lib64/libc-2.9.so
400001f0000-40000200000 r--p 00190000 08:06 440671 /lib64/libc-2.9.so
40000200000-40000220000 rw-p 001a0000 08:06 440671 /lib64/libc-2.9.so
40000220000-40008230000 rw-p 40000220000 00:00 0
fffffbc0000-fffffd10000 rw-p fffffeb0000 00:00 0 [stack]

Right now it isn't an issue, but at some stage we will run into mmap or
hugetlb allocation issues. Using the same layout as 32bit gives us a
some breathing room. This matches what x86-64 is doing too.

00100000-00103000 r-xp 00100000 00:00 0 [vdso]
10000000-10001000 r-xp 00000000 08:06 554894 /tmp/test
10010000-10011000 r--p 00000000 08:06 554894 /tmp/test
10011000-10012000 rw-p 00001000 08:06 554894 /tmp/test
10012000-10113000 rw-p 10012000 00:00 0 [heap]
fffefdf7000-ffff7df8000 rw-p fffefdf7000 00:00 0
ffff7df8000-ffff7f97000 r-xp 00000000 08:06 130591 /lib64/libc-2.9.so
ffff7f97000-ffff7fa6000 ---p 0019f000 08:06 130591 /lib64/libc-2.9.so
ffff7fa6000-ffff7faa000 r--p 0019e000 08:06 130591 /lib64/libc-2.9.so
ffff7faa000-ffff7fc0000 rw-p 001a2000 08:06 130591 /lib64/libc-2.9.so
ffff7fc0000-ffff7fc4000 rw-p ffff7fc0000 00:00 0
ffff7fc4000-ffff7fec000 r-xp 00000000 08:06 130663 /lib64/ld-2.9.so
ffff7fee000-ffff7ff0000 rw-p ffff7fee000 00:00 0
ffff7ffa000-ffff7ffb000 rw-p ffff7ffa000 00:00 0
ffff7ffb000-ffff7ffc000 r--p 00027000 08:06 130663 /lib64/ld-2.9.so
ffff7ffc000-ffff7fff000 rw-p 00028000 08:06 130663 /lib64/ld-2.9.so
ffff7fff000-ffff8000000 rw-p ffff7fff000 00:00 0
fffffc59000-fffffc6e000 rw-p ffffffeb000 00:00 0 [stack]
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

91b0f5ec

powerpc/numa: Remove redundant find_cpu_node() · 8b16cd23

由 Milton Miller 提交于 1月 08, 2009

Use of_get_cpu_node, which is a superset of numa.c's find_cpu_node in
a less restrictive section (text vs cpuinit).
Signed-off-by: NMilton Miller <miltonm@bga.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

8b16cd23

powerpc/numa: Avoid possible reference beyond prop. length in find_min_common_depth() · 20fcefe5

由 Milton Miller 提交于 1月 08, 2009

find_min_common_depth() was checking the property length incorrectly.
The value is in bytes not cells, and it is using the second entry.
Signed-off-By: NMilton Miller <miltonm@bga.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

20fcefe5

10 2月, 2009 1 次提交

powerpc/fsl-booke: Fix mapping functions to use phys_addr_t · 6c24b174

由 Kumar Gala 提交于 2月 09, 2009

Fixed v_mapped_by_tlbcam() and p_mapped_by_tlbcam() to use phys_addr_t
instead of unsigned long.  In 36-bit physical mode we really need these
functions to deal with phys_addr_t when trying to match a physical
address or when returning one.
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

6c24b174

29 1月, 2009 3 次提交

powerpc/fsl-booke: Make CAM entries used for lowmem configurable · 96051465

由 Trent Piepho 提交于 12月 08, 2008

On booke processors, the code that maps low memory only uses up to three
CAM entries, even though there are sixteen and nothing else uses them.

Make this number configurable in the advanced options menu along with max
low memory size.  If one wants 1 GB of lowmem, then it's typically
necessary to have four CAM entries.
Signed-off-by: NTrent Piepho <tpiepho@freescale.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

96051465

powerpc/fsl-booke: Allow larger CAM sizes than 256 MB · c8f3570b

由 Trent Piepho 提交于 12月 08, 2008

The code that maps kernel low memory would only use page sizes up to 256
MB.  On E500v2 pages up to 4 GB are supported.

However, a page must be aligned to a multiple of the page's size.  I.e.
256 MB pages must aligned to a 256 MB boundary.  This was enforced by a
requirement that the physical and virtual addresses of the start of lowmem
be aligned to 256 MB.  Clearly requiring 1GB or 4GB alignment to allow
pages of that size isn't acceptable.

To solve this, I simply have adjust_total_lowmem() take alignment into
account when it decides what size pages to use.  Give it PAGE_OFFSET =
0x7000_0000, PHYSICAL_START = 0x3000_0000, and 2GB of RAM, and it will map
pages like this:
PA 0x3000_0000 VA 0x7000_0000 Size 256 MB
PA 0x4000_0000 VA 0x8000_0000 Size 1 GB
PA 0x8000_0000 VA 0xC000_0000 Size 256 MB
PA 0x9000_0000 VA 0xD000_0000 Size 256 MB
PA 0xA000_0000 VA 0xE000_0000 Size 256 MB

Because the lowmem mapping code now takes alignment into account,
PHYSICAL_ALIGN can be lowered from 256 MB to 64 MB.  Even lower might be
possible.  The lowmem code will work down to 4 kB but it's possible some of
the boot code will fail before then.  Poor alignment will force small pages
to be used, which combined with the limited number of TLB1 pages available,
will result in very little memory getting mapped.  So alignments less than
64 MB probably aren't very useful anyway.
Signed-off-by: NTrent Piepho <tpiepho@freescale.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

c8f3570b

powerpc/fsl-booke: Remove code duplication in lowmem mapping · f88747e7

由 Trent Piepho 提交于 12月 08, 2008

The code to map lowmem uses three CAM aka TLB[1] entries to cover it.  The
size of each is stored in three globals named __cam0, __cam1, and __cam2.
All the code that uses them is duplicated three times for each of the three
variables.

We have these things called arrays and loops....

Once converted to use an array, it will be easier to make the number of
CAMs configurable.
Signed-off-by: NTrent Piepho <tpiepho@freescale.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

f88747e7

28 1月, 2009 1 次提交

powerpc/mm: Fix handling of _PAGE_COHERENT in BAT setup code · 4c456a67

由 Gerhard Pircher 提交于 1月 23, 2009

_PAGE_COHERENT is now always set in _PAGE_RAM resp. PAGE_KERNEL.
Thus it has to be masked out, if the BAT mapping should be non
cacheable or CPU_FTR_NEED_COHERENT is not set.

This will work on normal SMP setups because we force-set
CPU_FTR_NEED_COHERENT as part of CPU_FTR_COMMON on SMP.
Signed-off-by: NGerhard Pircher <gerhard_pircher@gmx.net>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

4c456a67

16 1月, 2009 1 次提交

powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slices · 9ba0fdbf

由 Dave Kleikamp 提交于 1月 14, 2009

powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slices

The subpage_prot syscall fails on second and subsequent calls for a given
region, because is_hugepage_only_range() is mis-identifying the 4 kB
slices when the process has a 64 kB page size.
Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

9ba0fdbf

13 1月, 2009 1 次提交

powerpc: Change u64/s64 to a long long integer type · fe333321

由 Ingo Molnar 提交于 1月 06, 2009

Convert arch/powerpc/ over to long long based u64:

 -#ifdef __powerpc64__
 -# include <asm-generic/int-l64.h>
 -#else
 -# include <asm-generic/int-ll64.h>
 -#endif
 +#include <asm-generic/int-ll64.h>

This will avoid reoccuring spurious warnings in core kernel code that
comes when people test on their own hardware. (i.e. x86 in ~98% of the
cases) This is what x86 uses and it generally helps keep 64-bit code
32-bit clean too.

[Adjusted to not impact user mode (from paulus) - sfr]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fe333321

08 1月, 2009 9 次提交

powerpc/mm: Make clear_fixmap() actually work · 7021d86a

由 Anton Vorontsov 提交于 12月 29, 2008

The clear_fixmap() routine issues map_page() with flags set to 0.
Currently this causes a BUG_ON() inside the map_page(), as it assumes
that a PTE should be clear before mapping.

This patch makes the map_page() to trigger the BUG_ON() only if the
flags were set.
Signed-off-by: NAnton Vorontsov <avorontsov@ru.mvista.com>
Acked-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7021d86a

powerpc: Fix missing semicolons in mmu_decl.h · 4a082682

由 Benjamin Herrenschmidt 提交于 1月 06, 2009

This is a brown paper bag from one of my earlier patches that
breaks build on 40x and 8xx.

And yes, I've now added 40x and 8xx to my list of test configs :-)
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

4a082682

powerpc: Remove the redundant _tlbil_pid at SMP case · d6a09e0c

由 Dave Liu 提交于 12月 30, 2008

Signed-off-by: NDave Liu <daveliu@freescale.com>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d6a09e0c

powerpc/mm: Cleanup careful_allocation(): consolidate memset() · 893473df

由 Dave Hansen 提交于 12月 09, 2008

Both users of careful_allocation() immediately memset() the
result.  So, just do it in one place.

Also give careful_allocation() a 'z' prefix to bring it in
line with kzmalloc() and friends.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

893473df

powerpc/mm: Make careful_allocation() return virtual addrs · 0be210fd

由 Dave Hansen 提交于 12月 09, 2008

Since we memset() the result in both of the uses here,
just make careful_alloc() return a virtual address.
Also, add a separate variable to store the physial
address that comes back from the lmb_alloc() functions.
This makes it less likely that someone will screw it up
forgetting to convert before returning since the vaddr
is always in a void* and the paddr is always in an
unsigned long.

I admit this is arbitrary since one of its users needs
a paddr and one a vaddr, but it does remove a good
number of casts.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0be210fd

powerpc/mm:: Cleanup careful_allocation(): bootmem already panics · 5d21ea2b

由 Dave Hansen 提交于 12月 09, 2008

If we fail a bootmem allocation, the bootmem code itself
panics.  No need to redo it here.

Also change the wording of the other panic.  We don't
strictly have to allocate memory on the specified node.
It is just a hint and that node may not even *have* any
memory on it.  In that case we can and do fall back to
other nodes.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

5d21ea2b

powerpc/mm: Add better comment on careful_allocation() · c555e520

由 Dave Hansen 提交于 12月 09, 2008

The behavior in careful_allocation() really confused me
at first.  Add a comment to hopefully make it easier
on the next doofus that looks at it.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

c555e520

powerpc/fsl-booke: Remove num_tlbcam_entries · 6fd8be4b

由 Trent Piepho 提交于 12月 08, 2008

This is a global variable defined in fsl_booke_mmu.c with a value that gets
initialized in assembly code in head_fsl_booke.S.

It's never used.

If some code ever does want to know the number of entries in TLB1, then
"numcams = mfspr(SPRN_TLB1CFG) & 0xfff", is a whole lot simpler than a
global initialized during kernel boot from assembly.
Signed-off-by: NTrent Piepho <tpiepho@freescale.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

6fd8be4b

powerpc/fsl-booke: Don't hard-code size of struct tlbcam · 19f5465e

由 Trent Piepho 提交于 12月 08, 2008

Some assembly code in head_fsl_booke.S hard-coded the size of struct tlbcam
to 20 when it indexed the TLBCAM table.  Anyone changing the size of struct
tlbcam would not know to expect that.

The kernel already has a system to get the size of C structures into
assembly language files, asm-offsets, so let's use it.

The definition of the struct gets moved to a header, so that asm-offsets.c
can include it.
Signed-off-by: NTrent Piepho <tpiepho@freescale.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

19f5465e

07 1月, 2009 2 次提交

mm: show node to memory section relationship with symlinks in sysfs · c04fc586

由 Gary Hade 提交于 1月 06, 2009

Show node to memory section relationship with symlinks in sysfs

Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX.  For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.

Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.

In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
  - Provides information needed to determine the specific node
    on which a defective DIMM is located.  This will reduce system
    downtime when the node or defective DIMM is swapped out.
  - Prevents unintended onlining of a memory section that was
    previously offlined due to a defective DIMM.  This could happen
    during node hot-add when the user or node hot-add assist script
    onlines _all_ offlined sections due to user or script inability
    to identify the specific memory sections located on the hot-added
    node.  The consequences of reintroducing the defective memory
    could be ugly.
  - Provides information needed to vary the amount and distribution
    of memory on specific nodes for testing or debugging purposes.
Future:
  - Will provide information needed to identify the memory
    sections that need to be offlined prior to physical removal
    of a specific node.

Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems.  Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.
Signed-off-by: NGary Hade <garyhade@us.ibm.com>
Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c04fc586

mm: report the MMU pagesize in /proc/pid/smaps · 3340289d

由 Mel Gorman 提交于 1月 06, 2009

The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the
kernel to back a VMA.  This matches the size used by the MMU in the
majority of cases.  However, one counter-example occurs on PPC64 kernels
whereby a kernel using 64K as a base pagesize may still use 4K pages for
the MMU on older processor.  To distinguish, this patch reports
MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3340289d

29 12月, 2008 1 次提交

powerpc/44x: Support 16K/64K base page sizes on 44x · ca9153a3

由 Ilya Yanok 提交于 12月 11, 2008

This adds support for 16k and 64k page sizes on PowerPC 44x processors.

The PGDIR table is much smaller than a page when using 16k or 64k
pages (512 and 32 bytes respectively) so we allocate the PGDIR with
kzalloc() instead of __get_free_pages().

One PTE table covers rather a large memory area when using 16k or 64k
pages (32MB or 512MB respectively), so we can easily put FIXMAP and
PKMAP in the area covered by one PTE table.
Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
Signed-off-by: NVladimir Panfilov <pvr@emcraft.com>
Signed-off-by: NIlya Yanok <yanok@emcraft.com>
Acked-by: NJosh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

ca9153a3

23 12月, 2008 2 次提交

powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M · ccdcef72

由 Dale Farnsworth 提交于 12月 17, 2008

Add the ability for a classic ppc kernel to be loaded at an address
of 32MB.  This done by fixing a few places that assume we are loaded
at address 0, and by changing several uses of KERNELBASE to use
PAGE_OFFSET, instead.
Signed-off-by: NDale Farnsworth <dale@farnsworth.org>
Signed-off-by: NAnton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

ccdcef72

powerpc/32: Allow __ioremap on RAM addresses for kdump kernel · 01695a96

由 Anton Vorontsov 提交于 12月 17, 2008

While for debugging it is good to catch bogus users of ioremap, though
for kdump support it is more convenient to use __ioremap for
copy_oldmem_page() (exactly as we do for PPC64 currently).

Note that copy_oldmem_page() calls __ioremap with flags set to '0',
so it should be safe with the regard to the caches.

The other option is to use kmap_atomic_pfn()[1], but it will not work
for kernels compiled without HIGHMEM.

That is, on a board with 256MB RAM and crashkernel=64M@32M case, the
!HIGHMEM capturing kernel maps 0-96M range, which does not include all
the memory needed to capture the dump. And, obviously, accessing
anything upper than 96M will cause faults.

[1] http://ozlabs.org/pipermail/linuxppc-dev/2007-November/046747.htmlSigned-off-by: NAnton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

01695a96

21 12月, 2008 9 次提交

powerpc: Fix missing 'blr' in _tlbia() · a1495359

由 Benjamin Herrenschmidt 提交于 12月 21, 2008

Rework to MMU code dropped a much missed 'blr' instruction.
Brown-Paper-Bag-Worn-By: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

a1495359

powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED · 64b3d0e8

由 Benjamin Herrenschmidt 提交于 12月 18, 2008

Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in
in the hash code based on some CPU feature bit.  We also manipulate
_PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places.

This changes the logic so that instead, the PTE now contains
_PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms
that need it.  The hash code clears it if the feature bit is not set.

It also adds some clean accessors to setup various valid combinations
of access flags and change various bits of code to use them instead.

This should help having the PTE actually containing the bit
combinations that we really want.

I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead
set it explicitely from the TLB miss.  I will ultimately remove it
completely as it appears that it might not be needed after all
but in the meantime, having it in the TLB miss makes things a
lot easier.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

64b3d0e8

powerpc/mm: Runtime allocation of mmu context maps for nohash CPUs · 77520351

由 Benjamin Herrenschmidt 提交于 12月 18, 2008

This makes the MMU context code used for CPUs with no hash table
(except 603) dynamically allocate the various maps used to track
the state of contexts.

Only the main free map and CPU 0 stale map are allocated at boot
time.  Other CPU maps are allocated when those CPUs are brought up
and freed if they are unplugged.

This also moves the initialization of the MMU context management
slightly later during the boot process, which should be fine as
it's really only needed when userland if first started anyways.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

77520351

powerpc/44x: No need to mask MSR:CE, ME or DE in _tlbil_va on 440 · 760ec0e0

由 Benjamin Herrenschmidt 提交于 12月 18, 2008

The handlers for Critical, Machine Check or Debug interrupts
will save and restore MMUCR nowadays, thus we only need to
disable normal interrupts when invalidating TLB entries.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NKumar Gala <galak@kernel.crashing.org>
Acked-by: NJosh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

760ec0e0

powerpc/mm: Split low level tlb invalidate for nohash processors · 2a4aca11

由 Benjamin Herrenschmidt 提交于 12月 18, 2008

Currently, the various forms of low level TLB invalidations are all
implemented in misc_32.S for 32-bit processors, in a fairly scary
mess of #ifdef's and with interesting duplication such as a whole
bunch of code for FSL _tlbie and _tlbia which are no longer used.

This moves things around such that _tlbie is now defined in
hash_low_32.S and is only used by the 32-bit hash code, and all
nohash CPUs use the various _tlbil_* forms that are now moved to
a new file, tlb_nohash_low.S.

I moved all the definitions for that stuff out of
include/asm/tlbflush.h as they are really internal mm stuff, into
mm/mmu_decl.h

The code should have no functional changes.  I kept some variants
inline for trivial forms on things like 40x and 8xx.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

2a4aca11

powerpc/mm: Add SMP support to no-hash TLB handling · f048aace

由 Benjamin Herrenschmidt 提交于 12月 18, 2008

This commit moves the whole no-hash TLB handling out of line into a
new tlb_nohash.c file, and implements some basic SMP support using
IPIs and/or broadcast tlbivax instructions.

Note that I'm using local invalidations for D->I cache coherency.

At worst, if another processor is trying to execute the same and
has the old entry in its TLB, it will just take a fault and re-do
the TLB flush locally (it won't re-do the cache flush in any case).
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

f048aace

powerpc/mm: Introduce MMU features · 7c03d653

由 Benjamin Herrenschmidt 提交于 12月 18, 2008

We're soon running out of CPU features and I need to add some new
ones for various MMU related bits, so this patch separates the MMU
features from the CPU features.  I moved over the 32-bit MMU related
ones, added base features for MMU type families, but didn't move
over any 64-bit only feature yet.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

7c03d653

powerpc/mm: Rework context management for CPUs with no hash table · 2ca8cf73

由 Benjamin Herrenschmidt 提交于 12月 18, 2008

This reworks the context management code used by 4xx,8xx and
freescale BookE.  It adds support for SMP by implementing a
concept of stale context map to lazily flush the TLB on
processors where a context may have been invalidated.  This
also contains the ground work for generalizing such lazy TLB
flushing by just picking up a new PID and marking the old one
stale.  This will be implemented later.

This is a first implementation that uses a global spinlock.

Ideally, we should try to get at least the fast path (context ID
already assigned) lockless or limited to a per context lock,
but for now this will do.

I tried to keep the UP case reasonably simple to avoid adding
too much overhead to 8xx which does a lot of context stealing
since it effectively has only 16 PIDs available.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

2ca8cf73

powerpc/mm: Split mmu_context handling · 5e696617

由 Benjamin Herrenschmidt 提交于 12月 18, 2008

This splits the mmu_context handling between 32-bit hash based
processors, 64-bit hash based processors and everybody else.  This is
preliminary work for adding SMP support for BookE processors.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

5e696617

16 12月, 2008 4 次提交

powerpc/mm: Remove flush_HPTE() · f63837f0

由 Benjamin Herrenschmidt 提交于 12月 14, 2008

The function flush_HPTE() is used in only one place, the implementation
of DEBUG_PAGEALLOC on ppc32.

It's actually a dup of flush_tlb_page() though it's -slightly- more
efficient on hash based processors.  We remove it and replace it by
a direct call to the hash flush code on those processors and to
flush_tlb_page() for everybody else.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

f63837f0

powerpc/mm: Rename tlb_32.c and tlb_64.c to tlb_hash32.c and tlb_hash64.c · e41e811a

由 Benjamin Herrenschmidt 提交于 12月 14, 2008

This renames the files to clarify the fact that they are used by
the hash based family of CPUs (the 603 being an exception in that
family but is still handled by that code).

This paves the way for the new tlb_nohash.c coming via a subsequent
commit.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

e41e811a

powerpc: Fix bootmem reservation on uninitialized node · a4c74ddd

由 Dave Hansen 提交于 12月 11, 2008

careful_allocation() was calling into the bootmem allocator for
nodes which had not been fully initialized and caused a previous
bug:  http://patchwork.ozlabs.org/patch/10528/  So, I merged a
few broken out loops in do_init_bootmem() to fix it.  That changed
the code ordering.

I think this bug is triggered by having reserved areas for a node
which are spanned by another node's contents.  In the
mark_reserved_regions_for_nid() code, we attempt to reserve the
area for a node before we have allocated the NODE_DATA() for that
nid.  We do this since I reordered that loop.  I suck.

This is causing crashes at bootup on some systems, as reported
by Jon Tollefson.

This may only present on some systems that have 16GB pages
reserved.  But, it can probably happen on any system that is
trying to reserve large swaths of memory that happen to span other
nodes' contents.

This commit ensures that we do not touch bootmem for any node which
has not been initialized, and also removes a compile warning about
an unused variable.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

a4c74ddd

powerpc: Check for valid hugepage size in hugetlb_get_unmapped_area · 48f797de

由 Brian King 提交于 12月 04, 2008

It looks like most of the hugetlb code is doing the correct thing if
hugepages are not supported, but the mmap code is not.  If we get into
the mmap code when hugepages are not supported, such as in an LPAR
which is running Active Memory Sharing, we can oops the kernel.  This
fixes the oops being seen in this path.

oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: nfs(N) lockd(N) nfs_acl(N) sunrpc(N) ipv6(N) fuse(N) loop(N)
dm_mod(N) sg(N) ibmveth(N) sd_mod(N) crc_t10dif(N) ibmvscsic(N)
scsi_transport_srp(N) scsi_tgt(N) scsi_mod(N)
Supported: No
NIP: c000000000038d60 LR: c00000000003945c CTR: c0000000000393f0
REGS: c000000077e7b830 TRAP: 0300   Tainted: G
(2.6.27.5-bz50170-2-ppc64)
MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 44000448  XER: 20000001
DAR: c000002000af90a8, DSISR: 0000000040000000
TASK = c00000007c1b8600[4019] 'hugemmap01' THREAD: c000000077e78000 CPU: 6
GPR00: 0000001fffffffe0 c000000077e7bab0 c0000000009a4e78 0000000000000000
GPR04: 0000000000010000 0000000000000001 00000000ffffffff 0000000000000001
GPR08: 0000000000000000 c000000000af90c8 0000000000000001 0000000000000000
GPR12: 000000000000003f c000000000a73880 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000010000
GPR20: 0000000000000000 0000000000000003 0000000000010000 0000000000000001
GPR24: 0000000000000003 0000000000000000 0000000000000001 ffffffffffffffb5
GPR28: c000000077ca2e80 0000000000000000 c00000000092af78 0000000000010000
NIP [c000000000038d60] .slice_get_unmapped_area+0x6c/0x4e0
LR [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80
Call Trace:
[c000000077e7bbc0] [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80
[c000000077e7bc30] [c000000000107e30] .get_unmapped_area+0x64/0xd8
[c000000077e7bcb0] [c00000000010b140] .do_mmap_pgoff+0x140/0x420
[c000000077e7bd80] [c00000000000bf5c] .sys_mmap+0xc4/0x140
[c000000077e7be30] [c0000000000086b4] syscall_exit+0x0/0x40
Instruction dump:
fac1ffb0 fae1ffb8 fb01ffc0 fb21ffc8 fb41ffd0 fb61ffd8 fb81ffe0 fbc1fff0
fbe1fff8 f821fef1 f8c10158 f8e10160 <7d49002e> f9010168 e92d01b0 eb4902b0
Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

48f797de

03 12月, 2008 2 次提交

powerpc: Use RCU based pte freeing mechanism for all powerpc · 0186f47e

由 Kumar Gala 提交于 11月 19, 2008

Refactor the RCU based pte free code that was used on ppc64 to be used
on all powerpc.

Additionally refactor pte_free() & pte_free_kernel() into common code
between ppc32 & ppc64.
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

0186f47e

powerpc: hash_page_sync should only be used on SMP & STD_MMU_32 · f4f3a126

由 Kumar Gala 提交于 11月 19, 2008

Clean up the ifdefs so we only use hash_page_sync if we have
CONFIG_SMP && CONFIG_PPC_STD_MMU_32.
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

f4f3a126

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功