提交 · f3b3f28493d93232a37d5fbb5cb5ad168ede0e1a · gsplhtlxg / clone-Linux

11 4月, 2017 12 次提交

powerpc/powernv/idle: Don't override default/deepest directly in kernel · f3b3f284

由 Gautham R. Shenoy 提交于 3月 22, 2017

Currently during idle-init on power9, if we don't find suitable stop
states in the device tree that can be used as the
default_stop/deepest_stop, we set stop0 (ESL=1,EC=1) as the default
stop state psscr to be used by power9_idle and deepest stop state
which is used by CPU-Hotplug.

However, if the platform firmware has not configured or enabled a stop
state, the kernel should not make any assumptions and fallback to a
default choice.

If the kernel uses a stop state that is not configured by the platform
firmware, it may lead to further failures which should be avoided.

In this patch, we modify the init code to ensure that the kernel uses
only the stop states exposed by the firmware through the device
tree. When a suitable default stop state isn't found, we disable
ppc_md.power_save for power9. Similarly, when a suitable
deepest_stop_state is not found in the device tree exported by the
firmware, fall back to the default busy-wait loop in the CPU-Hotplug
code.

[Changelog written with inputs from svaidy@linux.vnet.ibm.com]
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f3b3f284

powerpc/powernv/smp: Add busy-wait loop as fall back for CPU-Hotplug · 90061231

由 Gautham R. Shenoy 提交于 3月 22, 2017

Currently, the powernv cpu-offline function assumes that platform idle
states such as stop on POWER9, winkle/sleep/nap on POWER8 are always
available. On POWER8, it picks nap as the default state if other deep
idle states like sleep/winkle are not available and enabled in the
platform.

On POWER9, nap is not available and all idle states are managed by
STOP instruction.  The parameters to the idle state are passed through
processor stop status control register (PSSCR).  Hence as such
executing STOP would take parameters from current PSSCR. We do not
want to make any assumptions in kernel on what STOP states and PSSCR
features are configured by the platform.

Ideally platform will configure a good set of stop states that can be
used in the kernel.  We would like to start with a clean slate, if the
platform choose to not configure any state or there is an error in
platform firmware that lead to no stop states being configured or
allowed to be requested.

This patch adds a fallback method for CPU-Hotplug that is similar to
snooze loop at idle where the threads are left to spin at low priority
and hence reduce the cycles consumed.

This is a safe fallback mechanism in the case when no stop state would
be requested if the platform firmware did not configure them most
likely due to an error condition.

Requesting a stop state when the platform has not configured them or
enabled them would lead to further error conditions which could be
difficult to debug.

[Changelog written with inputs from svaidy@linux.vnet.ibm.com]
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

90061231

powerpc/powernv: Move CPU-Offline idle state invocation from smp.c to idle.c · a7cd88da

由 Gautham R. Shenoy 提交于 3月 22, 2017

Move the piece of code in powernv/smp.c::pnv_smp_cpu_kill_self() which
transitions the CPU to the deepest available platform idle state to a
new function named pnv_cpu_offline() in powernv/idle.c. The rationale
behind this code movement is that the data required to determine the
deepest available platform state resides in powernv/idle.c.
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a7cd88da

powerpc/hugetlb: Add ABI defines for supported HugeTLB page sizes · 2c9faa76

由 Anshuman Khandual 提交于 4月 07, 2017

Add user space exported API definitions for 512KB, 1MB, 2MB, 8MB, 16MB,
1GB, 16GB non default huge page sizes to be used with mmap() system
call.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Reword the comment to emphasise that these are only needed to use
 the non-default huge page size, and updated the change log.]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2c9faa76

powerpc/mm: Remove reduntant initmem information from log · ea614555

由 Anshuman Khandual 提交于 4月 07, 2017

Generic core VM already prints these information in the log
buffer, hence there is no need for a second print. This just
removes the second print from arch powerpc NUMA init path.

Before the patch:

  $ dmesg | grep "Initmem"

  numa: Initmem setup node 0 [mem 0x00000000-0xffffffff]
  numa: Initmem setup node 1 [mem 0x100000000-0x1ffffffff]
  numa: Initmem setup node 2 [mem 0x200000000-0x2ffffffff]
  numa: Initmem setup node 3 [mem 0x300000000-0x3ffffffff]
  numa: Initmem setup node 4 [mem 0x400000000-0x4ffffffff]
  numa: Initmem setup node 5 [mem 0x500000000-0x5ffffffff]
  numa: Initmem setup node 6 [mem 0x600000000-0x6ffffffff]
  numa: Initmem setup node 7 [mem 0x700000000-0x7ffffffff]
  Initmem setup node 0 [mem 0x0000000000000000-0x00000000ffffffff]
  Initmem setup node 1 [mem 0x0000000100000000-0x00000001ffffffff]
  Initmem setup node 2 [mem 0x0000000200000000-0x00000002ffffffff]
  Initmem setup node 3 [mem 0x0000000300000000-0x00000003ffffffff]
  Initmem setup node 4 [mem 0x0000000400000000-0x00000004ffffffff]
  Initmem setup node 5 [mem 0x0000000500000000-0x00000005ffffffff]
  Initmem setup node 6 [mem 0x0000000600000000-0x00000006ffffffff]
  Initmem setup node 7 [mem 0x0000000700000000-0x00000007ffffffff]

After the patch just the latter set is printed.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ea614555

powerpc: Make sparsemem the default on 64-bit Book3S · 7b3912f4

由 Michael Ellerman 提交于 4月 05, 2017

Make sparsemem the default on all 64-bit Book3S platforms. It already is
for pseries and ps3, and we need to enable it for powernv because on
POWER9 memory between chips is discontiguous.

For the other platforms sparsemem should work fine, though it might add
a small amount of overhead. We can always force FLATMEM in the
defconfigs if necessary.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7b3912f4

powerpc/nohash: Fix use of mmu_has_feature() in setup_initial_memory_limit() · 4868e350

由 Michael Ellerman 提交于 4月 03, 2017

setup_initial_memory_limit() is called from early_init_devtree(), which
runs prior to feature patching. If the kernel is built with CONFIG_JUMP_LABEL=y
and CONFIG_JUMP_LABEL_FEATURE_CHECKS=y then we will potentially get the
wrong value.

If we also have CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG=y we get a warning
and backtrace:

  Warning! mmu_has_feature() used prior to jump label init!
  CPU: 0 PID: 0 Comm: swapper Not tainted 4.11.0-rc4-gccN-next-20170331-g6af2434 #1
  Call Trace:
  [c000000000fc3d50] [c000000000a26c30] .dump_stack+0xa8/0xe8 (unreliable)
  [c000000000fc3de0] [c00000000002e6b8] .setup_initial_memory_limit+0xa4/0x104
  [c000000000fc3e60] [c000000000d5c23c] .early_init_devtree+0xd0/0x2f8
  [c000000000fc3f00] [c000000000d5d3b0] .early_setup+0x90/0x11c
  [c000000000fc3f90] [c000000000000520] start_here_multiplatform+0x68/0x80

Fix it by using early_mmu_has_feature().

Fixes: c12e6f24 ("powerpc: Add option to use jump label for mmu_has_feature()")
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4868e350

powerpc: Remove unnecessary includes of asm/debug.h · 3ae05fb3

由 Michael Ellerman 提交于 2月 10, 2017

These files don't seem to have any need for asm/debug.h, now that all it
includes are the debugger hooks and breakpoint definitions.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3ae05fb3

powerpc: Create asm/debugfs.h and move powerpc_debugfs_root there · 7644d581

由 Michael Ellerman 提交于 2月 10, 2017

powerpc_debugfs_root is the dentry representing the root of the
"powerpc" directory tree in debugfs.

Currently it sits in asm/debug.h, a long with some other things that
have "debug" in the name, but are otherwise unrelated.

Pull it out into a separate header, which also includes linux/debugfs.h,
and convert all the users to include debugfs.h instead of debug.h.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7644d581

powerpc/powernv: Require MMU_NOTIFIER to fix NPU build · abfe8026

由 Alistair Popple 提交于 4月 10, 2017

In the recent commit 1ab66d1f ("powerpc/powernv: Introduce address
translation services for Nvlink2") the NPU code gained a dependency on MMU
notifiers.

All our defconfigs have KVM enabled, which selects MMU_NOTIFIER, but if KVM is
not enabled then the build breaks.

Fix it by always selecting MMU_NOTIFIER when we're building powernv.

Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2")
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
[mpe: Reword change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

abfe8026

powerpc/mm/radix: Remove unnecessary ptesync · f7327e0b

由 Aneesh Kumar K.V 提交于 4月 01, 2017

For a tlbiel with pid, we need to issue tlbiel with set number encoded. We
don't need to do ptesync for each of those. Instead we need one for the entire
tlbiel pid operation.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f7327e0b

powerpc/mm/radix: Don't do page walk cache flush when doing full mm flush · f6b0df55

由 Aneesh Kumar K.V 提交于 4月 01, 2017

For fullmm tlb flush, we do a flush with RIC_FLUSH_ALL which will invalidate all
related caches (radix__tlb_flush()). Hence the pwc flush is not needed.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f6b0df55

04 4月, 2017 3 次提交

powerpc/powernv: Add OPAL exports attributes to sysfs · 11fe909d

由 Matt Brown 提交于 3月 30, 2017

New versions of OPAL have a device node /ibm,opal/firmware/exports, each
property of which describes a range of memory in OPAL that Linux might
want to export to userspace for debugging.

This patch adds a sysfs file under 'opal/exports' for each property
found there, and makes it read-only by root.
Signed-off-by: NMatt Brown <matthew.brown.dev@gmail.com>
[mpe: Drop counting of props, rename to attr, free on sysfs error, c'log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

11fe909d

powerpc/prom: Increase minimum RMA size to 512MB · 687da8fc

由 Sukadev Bhattiprolu 提交于 3月 27, 2017

When booting very large systems with a large initrd, we run out of
space early in boot for either RTAS or the flattened device tree (FDT).
Boot fails with messages like:

	Could not allocate memory for RTAS
or
	No memory for flatten_device_tree (no room)

Increasing the minimum RMA size to 512MB fixes the problem. This
should not have an impact on smaller LPARs (with 256MB memory),
as the firmware will cap the RMA to the memory assigned to the LPAR.

Fix is based on input/discussions with Michael Ellerman. Thanks to
Praveen K. Pandey for testing on a large system.
Reported-by: NPraveen K. Pandey <preveen.pandey@in.ibm.com>
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

687da8fc

powerpc/powernv: Introduce address translation services for Nvlink2 · 1ab66d1f

由 Alistair Popple 提交于 4月 03, 2017

Nvlink2 supports address translation services (ATS) allowing devices
to request address translations from an mmu known as the nest MMU
which is setup to walk the CPU page tables.

To access this functionality certain firmware calls are required to
setup and manage hardware context tables in the nvlink processing unit
(NPU). The NPU also manages forwarding of TLB invalidates (known as
address translation shootdowns/ATSDs) to attached devices.

This patch exports several methods to allow device drivers to register
a process id (PASID/PID) in the hardware tables and to receive
notification of when a device should stop issuing address translation
requests (ATRs). It also adds a fault handler to allow device drivers
to demand fault pages in.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
[mpe: Fix up comment formatting, use flush_tlb_mm()]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1ab66d1f

03 4月, 2017 6 次提交

powerpc/powernv: Add sanity checks to pnv_pci_get_{gpu|npu}_dev · 4c3b89ef

由 Alistair Popple 提交于 4月 03, 2017

The pnv_pci_get_{gpu|npu}_dev functions are used to find associations
between nvlink PCIe devices and standard PCIe devices. However they
lacked basic sanity checking which results in NULL pointer
dereferencing if they are incorrect called can be harder to spot than
an explicit WARN_ON.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4c3b89ef

drivers/of/base.c: Add of_property_read_u64_index · 2475a2b6

由 Alistair Popple 提交于 4月 03, 2017

There is of_property_read_u32_index but no u64 variant. This patch
adds one similar to the u32 version for u64.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Acked-by: NRob Herring <robh@kernel.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2475a2b6

powerpc/mm: Remove stale comment about the DART hole · f6f9195b

由 Oliver O'Halloran 提交于 4月 03, 2017

The code to fix the problem it describes was removed in commit
c40785ad ("powerpc/dart: Use a cachable DART"), and it uses the
stupid comment style. Away it goooooooooooooes!
Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f6f9195b

powerpc: Avoid taking a data miss on every userspace instruction miss · a7a9dcd8

由 Anton Blanchard 提交于 4月 03, 2017

Early on in do_page_fault() we call store_updates_sp(), regardless of
the type of exception. For an instruction miss this doesn't make
sense, because we only use this information to detect if a data miss
is the result of a stack expansion instruction or not.

Worse still, it results in a data miss within every userspace
instruction miss handler, because we try and load the very instruction
we are about to install a pte for!

A simple exec microbenchmark runs 6% faster on POWER8 with this fix:

 #include <stdlib.h>
 #include <stdio.h>
 #include <unistd.h>

int main(int argc, char *argv[])
{
	unsigned long left = atol(argv[1]);
	char leftstr[16];

	if (left-- == 0)
		return 0;

	sprintf(leftstr, "%ld", left);
	execlp(argv[0], argv[0], leftstr, NULL);
	perror("exec failed\n");

	return 0;
}

Pass the number of iterations on the command line (eg 10000) and time
how long it takes to execute.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a7a9dcd8

powerpc/book3s: Print task info if we take a machine check in user mode · 63f44d65

由 Michael Ellerman 提交于 4月 03, 2017

For an MCE (Machine Check Exception) that hits while in user mode
MSR(PR=1), print the task info to the console MCE error log. This may
help to identify an application that triggered the MCE.

After this patch the MCE console looks like:

  Severe Machine check interrupt [Recovered]
    NIP: [0000000010039778] PID: 762 Comm: ebizzy
    Initiator: CPU
    Error type: SLB [Multihit]
      Effective address: 0000000010039778

  Severe Machine check interrupt [Not recovered]
    NIP: [0000000010039778] PID: 763 Comm: ebizzy
    Initiator: CPU
    Error type: UE [Page table walk ifetch]
      Effective address: 0000000010039778
  ebizzy[763]: unhandled signal 7 at 0000000010039778 nip 0000000010039778 lr 0000000010001b44 code 30004
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

63f44d65

powerpc/book3s: Print the kernel function name in machine check · 5b1d6fc2

由 Mahesh Salgaonkar 提交于 3月 28, 2017

For D-side errors we print the load/store address that caused the
machine check as 'Effective address'. But the instruction that may have
caused the machine check can also be helpful, so in addition to printing
the NIP, also print the kernel function name as well.

After this patch the MCE console log would look like:

  Severe Machine check interrupt [Recovered]
    NIP [d00000001bc70194]: init_module+0x194/0x2b0 [bork_kernel]
    Initiator: CPU
    Error type: SLB [Parity]
      Effective address: d000000026de0000
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5b1d6fc2

01 4月, 2017 5 次提交

powerpc/mm: Enable mappings above 128TB · f4ea6dcb

由 Aneesh Kumar K.V 提交于 3月 30, 2017

Not all user space application is ready to handle wide addresses. It's
known that at least some JIT compilers use higher bits in pointers to
encode their information. It collides with valid pointers with 512TB
addresses and leads to crashes.

To mitigate this, we are not going to allocate virtual address space
above 128TB by default.

But userspace can ask for allocation from full address space by
specifying hint address (with or without MAP_FIXED) above 128TB.

If hint address set above 128TB, but MAP_FIXED is not specified, we try
to look for unmapped area by specified address. If it's already
occupied, we look for unmapped area in *full* address space, rather than
from 128TB window.

This approach helps to easily make application's memory allocator aware
about large address space without manually tracking allocated virtual
address space.

This is going to be a per mmap decision. ie, we can have some mmaps with
larger addresses and other that do not.

A sample memory layout looks like:

10000000-10010000 r-xp 00000000 fc:00 9057045 /home/max_addr_512TB
10010000-10020000 r--p 00000000 fc:00 9057045 /home/max_addr_512TB
10020000-10030000 rw-p 00010000 fc:00 9057045 /home/max_addr_512TB
10029630000-10029660000 rw-p 00000000 00:00 0 [heap]
7fff834a0000-7fff834b0000 rw-p 00000000 00:00 0
7fff834b0000-7fff83670000 r-xp 00000000 fc:00 9177190 /lib/powerpc64le-linux-gnu/libc-2.23.so
7fff83670000-7fff83680000 r--p 001b0000 fc:00 9177190 /lib/powerpc64le-linux-gnu/libc-2.23.so
7fff83680000-7fff83690000 rw-p 001c0000 fc:00 9177190 /lib/powerpc64le-linux-gnu/libc-2.23.so
7fff83690000-7fff836a0000 rw-p 00000000 00:00 0
7fff836a0000-7fff836c0000 r-xp 00000000 00:00 0 [vdso]
7fff836c0000-7fff83700000 r-xp 00000000 fc:00 9177193 /lib/powerpc64le-linux-gnu/ld-2.23.so
7fff83700000-7fff83710000 r--p 00030000 fc:00 9177193 /lib/powerpc64le-linux-gnu/ld-2.23.so
7fff83710000-7fff83720000 rw-p 00040000 fc:00 9177193 /lib/powerpc64le-linux-gnu/ld-2.23.so
7fffdccf0000-7fffdcd20000 rw-p 00000000 00:00 0 [stack]
1000000000000-1000000010000 rw-p 00000000 00:00 0
1ffff83710000-1ffff83720000 rw-p 00000000 00:00 0
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f4ea6dcb

A
powerpc/mm: Switch some TASK_SIZE checks to use mm_context addr_limit · fbfef902
由 Aneesh Kumar K.V 提交于 3月 22, 2017
```
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
```
fbfef902

powerpc/pseries: Skip using reserved virtual address range · 82228e36

由 Aneesh Kumar K.V 提交于 3月 22, 2017

Now that we use all the available virtual address range, we need to make
sure we don't generate VSID such that it overlaps with the reserved vsid
range. Reserved vsid range include the virtual address range used by the
adjunct partition and also the VRMA virtual segment. We find the context
value that can result in generating such a VSID and reserve it early in
boot.

We don't look at the adjunct range, because for now we disable the
adjunct usage in a Linux LPAR via CAS interface.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Rewrite hash__reserve_context_id(), move the rest into pseries]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

82228e36

powerpc/mm/hash: Store addr_limit in PACA · bb183221

由 Aneesh Kumar K.V 提交于 3月 22, 2017

We optmize the slice page size array copy to paca by copying only the
range based on addr_limit. This will require us to not look at page size
array beyond addr_limit in PACA on slb fault. To enable that copy task
size to paca which will be used during slb fault.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Rename from task_size to addr_limit, consolidate #ifdefs]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bb183221

powerpc/mm: Add addr_limit to mm_context and use it to derive max slice index · 957b778a

由 Aneesh Kumar K.V 提交于 3月 22, 2017

In the followup patch, we will increase the slice array size to handle
512TB range, but will limit the max addr to 128TB. Avoid doing
unnecessary computation and avoid doing slice mask related operation
above address limit.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

957b778a

31 3月, 2017 14 次提交

powerpc/mm/hash: Increase VA range to 128TB · f6eedbba

由 Aneesh Kumar K.V 提交于 3月 22, 2017

We update the hash linux page table layout such that we can support
512TB. But we limit the TASK_SIZE to 128TB. We can switch to 128TB by
default without conditional because that is the max virtual address
supported by other architectures. We will later add a mechanism to
on-demand increase the application's effective address range to 512TB.

Having the page table layout changed to accommodate 512TB makes testing
large memory configuration easier with less code changes to kernel
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f6eedbba

powerpc/mm/hash: Convert mask to unsigned long · 59248aec

由 Aneesh Kumar K.V 提交于 3月 22, 2017

This doesn't have any functional change. But helps in avoiding mistakes
in case the shift bit changes
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

59248aec

powerpc/mm/hash: Support 68 bit VA · e6f81a92

由 Aneesh Kumar K.V 提交于 3月 29, 2017

Inorder to support large effective address range (512TB), we want to
increase the virtual address bits to 68. But we do have platforms like
p4 and p5 that can only do 65 bit VA. We support those platforms by
limiting context bits on them to 16.

The protovsid -> vsid conversion is verified to work with both 65 and 68
bit va values. I also documented the restrictions in a table format as
part of code comments.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e6f81a92

powerpc/mm/hash: Check for non-kernel address in get_kernel_vsid() · 85beb1c4

由 Michael Ellerman 提交于 3月 29, 2017

get_kernel_vsid() has a very stern comment saying that it's only valid
for kernel addresses, but there's nothing in the code to enforce that.

Rather than hoping our callers are well behaved, add a check and return
a VSID of 0 (invalid).
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

85beb1c4

powerpc/mm/hash: Use context ids 1-4 for the kernel · 941711a3

由 Aneesh Kumar K.V 提交于 3月 29, 2017

Currently we use the top 4 context ids (0x7fffc-0x7ffff) for the kernel.
Kernel VSIDs are built using these top context values and effective the
segement ID. In subsequent patches we want to increase the max effective
address to 512TB. We will achieve that by increasing the effective
segment IDs there by increasing virtual address range.

We will be switching to a 68bit virtual address in the following patch.
But platforms like Power4 and Power5 only support a 65 bit virtual
address. We will handle that by limiting the context bits to 16 instead
of 19 on those platforms. That means the max context id will have a
different value on different platforms.

So that we don't have to deal with the kernel context ids changing
between different platforms, move the kernel context ids down to use
context ids 1-4.

We can't use segment 0 of context-id 0, because that maps to VSID 0,
which we want to keep as invalid, so we avoid context-id 0 entirely.
Similarly we can't use the last segment of the maximum context, so we
avoid it too.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Switch from 0-3 to 1-4 so VSID=0 remains invalid]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

941711a3

powerpc/mm: Split radix vs hash mm context initialisation · 760573c1

由 Michael Ellerman 提交于 3月 29, 2017

Complete the split of the radix vs hash mm context initialisation.

This is mostly code movement, with the exception that we now limit the
context allocation to PRTB_ENTRIES - 1 on radix.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

760573c1

powerpc/mm/hash: Pull hash constants into hash__alloc_context_id() · c1ff840d

由 Michael Ellerman 提交于 3月 29, 2017

The min and max context id values used in alloc_context_id() are
currently the right values for use on hash, and happen to also be safe
for use on radix.

But we need to change that in a subsequent patch, so make the min/max
ids parameters and pull the hash values into hsah__alloc_context_id().
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c1ff840d

powerpc/mm/hash: Abstract context id allocation for KVM · a336f2f5

由 Michael Ellerman 提交于 3月 29, 2017

KVM wants to be able to allocate an MMU context id, which it does
currently by calling __init_new_context().

We're about to rework that code, so provide a wrapper for KVM so it
can not worry about the details.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a336f2f5

powerpc/mm/slice: Update slice mask printing to use bitmap printing. · 302413ca

由 Aneesh Kumar K.V 提交于 3月 22, 2017

We now get output like below which is much better.

[    0.935306]  good_mask low_slice: 0-15
[    0.935360]  good_mask high_slice: 0-511

Compared to

[    0.953414]  good_mask:1111111111111111 - 1111111111111.........

I also fixed an error with slice_dbg printing.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

302413ca

powerpc/mm/slice: Move slice_mask struct definition to slice.c · 82185222

由 Aneesh Kumar K.V 提交于 3月 22, 2017

This structure definition need not be in a header since this is used only by
slice.c file. So move it to slice.c. This also allow us to use SLICE_NUM_HIGH
instead of 64.

I also switch the low_slices type to u64 from u16. This doesn't have an impact
on size of struct due to padding added with u16 type. This helps in using
bitmap printing function for printing slice mask.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

82185222

powerpc/mm: Remove checks that TASK_SIZE_USER64 is too small · 1b2fa5a0

由 Aneesh Kumar K.V 提交于 3月 22, 2017

Remove the checks that TASK_SIZE_USER64 is smaller than H_PGTABLE_RANGE
and USER_VSID_RANGE.

In a following patch we will deliberately add support for a TASK_SIZE
smaller than both ranges, so this will no longer be an error condition.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Keep the check in pgtable_64.c that we don't exceed USER_VSID_RANGE]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1b2fa5a0

powerpc/mm: Move copy_mm_to_paca to paca.c · 52b1e665

由 Aneesh Kumar K.V 提交于 3月 22, 2017

We also update the function arg to struct mm_struct. Move this so that function
finds the definition of struct mm_struct. No functional change in this patch.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

52b1e665

powerpc/mm/slice: Update the function prototype · a4d36215

由 Aneesh Kumar K.V 提交于 3月 22, 2017

This avoid copying the slice_mask struct as function return value
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a4d36215

powerpc/mm/slice: Convert slice_mask high slice to a bitmap · f3207c12

由 Aneesh Kumar K.V 提交于 3月 22, 2017

In followup patch we want to increase the va range which will result
in us requiring high_slices to have more than 64 bits. To enable this
convert high_slices to bitmap. We keep the number bits same in this patch
and later change that to higher value
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Fold in fix to use bitmap_empty()]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f3207c12