提交 · 3cdfcbfd32b9d1c0d4a6fa80ee9c390927aab948 · openanolis / cloud-kernel

01 9月, 2017 1 次提交

powerpc: Change analyse_instr so it doesn't modify *regs · 3cdfcbfd

由 Paul Mackerras 提交于 8月 30, 2017

The analyse_instr function currently doesn't just work out what an
instruction does, it also executes those instructions whose effect
is only to update CPU registers that are stored in struct pt_regs.
This is undesirable because optprobes uses analyse_instr to work out
if an instruction could be successfully emulated in future.

This changes analyse_instr so it doesn't modify *regs; instead it
stores information in the instruction_op structure to indicate what
registers (GPRs, CR, XER, LR) would be set and what value they would
be set to. A companion function called emulate_update_regs() can
then use that information to update a pt_regs struct appropriately.

As a minor cleanup, this replaces inline asm using the cntlzw and
cntlzd instructions with calls to __builtin_clz() and __builtin_clzl().
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3cdfcbfd

31 8月, 2017 18 次提交

powerpc: Correct instruction code for xxlor instruction · 93b2d3cf

由 Paul Mackerras 提交于 8月 30, 2017

The instruction code for xxlor that commit 0016a4cf ("powerpc:
Emulate most Book I instructions in emulate_step()", 2010-06-15)
added is actually the code for xxlnor.  It is used in get_vsr()
and put_vsr() and the effect of the error is that if emulate_step
is used to emulate a VSX load or store from any register other
than vsr0, the bitwise complement of the correct value will be
loaded or stored.  This corrects the error.

Fixes: 0016a4cf ("powerpc: Emulate most Book I instructions in emulate_step()")
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

93b2d3cf

powerpc/smp: Add cpu_l2_cache_map · 2a636a56

由 Oliver O'Halloran 提交于 6月 29, 2017

We want to add an extra level to the CPU scheduler topology to account
for cores which share a cache. To do this we need to build a cpumask
for each CPU that indicates which CPUs share this cache to use as an
input to the scheduler.
Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2a636a56

powerpc/asm: Convert .llong directives to .8byte · eb039161

由 Tobin C. Harding 提交于 3月 09, 2017

.llong is an undocumented PPC specific directive. The generic
equivalent is .quad, but even better (because it's self describing) is
.8byte.

Convert all .llong directives to .8byte.
Signed-off-by: NTobin C. Harding <me@tobin.cc>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

eb039161

powerpc/xmon: Add ISA v3.0 SPRs to SPR dump · d1e1b351

由 Balbir Singh 提交于 8月 30, 2017

Add support for printing the PIDR/TIDR for ISA 300 and PSSCR and PTCR
in ISA 3.0 hypervisor mode.

SPRN_PSSCR_PR is the privileged mode access and is used when we are
not in hypervisor mode.
Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
[mpe: Split out of larger patch]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d1e1b351

powerpc/powernv/vas: Define copy/paste interfaces · 2392c8c8

由 Sukadev Bhattiprolu 提交于 8月 28, 2017

Define interfaces (wrappers) to the 'copy' and 'paste'
instructions (which are new in PowerISA 3.0). These are intended to be
used to by NX driver(s) to submit Coprocessor Request Blocks (CRBs) to
the NX hardware engines.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2392c8c8

powerpc/powernv/vas: Define vas_tx_win_open() · 5239af67

由 Sukadev Bhattiprolu 提交于 8月 28, 2017

Define an interface to open a VAS send window. This interface is
intended to be used the Nest Accelerator (NX) driver(s) to open
a send window and use it to submit compression/encryption requests
to a VAS receive window.

The receive window, identified by the [vasid, cop] parameters, must
already be open in VAS (i.e connected to an NX engine).
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5239af67

powerpc/powernv/vas: Define vas_win_close() interface · 98271d41

由 Sukadev Bhattiprolu 提交于 8月 28, 2017

Define the vas_win_close() interface which should be used to close a
send or receive windows.

While the hardware configurations required to open send and receive
windows differ, the configuration to close a window is the same for
both. So we use a single interface to close the window.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

98271d41

powerpc/powernv/vas: Define vas_rx_win_open() interface · 62c4eda4

由 Sukadev Bhattiprolu 提交于 8月 28, 2017

Define the vas_rx_win_open() interface. This interface is intended to
be used by the Nest Accelerator (NX) driver(s) to setup receive
windows for one or more NX engines (which implement compression &
encryption algorithms in the hardware).

Follow-on patches will provide an interface to close the window and to
open a send window that kernel subsystems can use to access the NX
engines.

The interface to open a receive window is expected to be invoked for
each instance of VAS in the system.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

62c4eda4

powerpc/powernv: Move GET_FIELD/SET_FIELD to vas.h · b6622a33

由 Sukadev Bhattiprolu 提交于 8月 28, 2017

Move the GET_FIELD and SET_FIELD macros to vas.h as VAS and other
users of VAS, including NX-842 can use those macros.

There is a lot of related code between the VAS/NX kernel drivers
and skiboot. For consistency, switch the order of parameters in
SET_FIELD to match the order in skiboot.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Reviewed-by: NDan Streetman <ddstreet@ieee.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b6622a33

powerpc/powernv/vas: Define macros, register fields and structures · 96768914

由 Sukadev Bhattiprolu 提交于 8月 28, 2017

Define macros for the VAS hardware registers and bit-fields as well
as couple of data structures needed by the VAS driver.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
[mpe: Fixup include guard to use _ASM_POWERPC_VAS_H]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

96768914

powerpc/pci: Remove OF node back pointer from pci_dn · f1e08232

由 Alexey Kardashevskiy 提交于 8月 29, 2017

The check_req() helper uses pci_get_pdn() to get an OF node pointer.
pci_get_pdn() returns a pci_dn pointer which either:
1) from the OF node returned by pci_device_to_OF_node();
2) from the parent child_list where entries don't have OF node pointers.
Since check_req() does not care about 2), it can call
pci_device_to_OF_node() directly, hence the change.

The find_pe_dn() helper uses embedded pci_dn to get an OF node which is
also stored in edev->pdev so let's take a shortcut and call
pci_device_to_OF_node() directly.

With these 2 changes, we can finally get rid of the OF node back pointer.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f1e08232

powerpc/eeh: Remove unnecessary config_addr from eeh_dev · 405b33a7

由 Alexey Kardashevskiy 提交于 8月 29, 2017

The eeh_dev struct hold a config space address of an associated node
and the very same address is also stored in the pci_dn struct which
is always present during the eeh_dev lifetime.

This uses bus:devfn directly from pci_dn instead of cached and packed
config_addr.

Since config_addr is made from device's bus:dev.fn, there is no point
in keeping it in the debugfs either so remove that too.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

405b33a7

powerpc/eeh: Remove unnecessary pointer to phb from eeh_dev · 69672bd7

由 Alexey Kardashevskiy 提交于 8月 29, 2017

The eeh_dev struct already holds a pointer to pci_dn which it does not
exist without and pci_dn itself holds the very same pointer so just
use it.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

69672bd7

powerpc/eeh: Reduce to one the number of places where edev is allocated · 8bae6a23

由 Alexey Kardashevskiy 提交于 8月 29, 2017

arch/powerpc/kernel/eeh_dev.c:57 is the only legit place where edev
is allocated; other 2 places allocate it on stack and in the heap for
a very short period of time to use eeh_pe_get() as takes edev.

This changes eeh_pe_get() to receive required parameters explicitly.

This removes unnecessary temporary allocation of edev.

This uses the "pe_no" name instead of the "pe_config_addr" name as
it actually is a PE number and not a config space address as it seemed.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: NRussell Currey <ruscur@russell.cc>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8bae6a23

powerpc/powernv: Use kernel crash path for machine checks · 6fcd6baa

由 Nicholas Piggin 提交于 7月 19, 2017

There are quite a few machine check exceptions that can be caused by
kernel bugs. To make debugging easier, use the kernel crash path in
cases of synchronous machine checks that occur in kernel mode, if that
would not result in the machine going straight to panic or crash dump.

There is a downside here that die()ing the process in kernel mode can
still leave the system unstable. panic_on_oops will always force the
system to fail-stop, so systems where that behaviour is important will
still do the right thing.

As a test, when triggering an i-side 0111b error (ifetch from foreign
address) in kernel mode process context on POWER9, the kernel currently
dies quickly like this:

  Severe Machine check interrupt [Not recovered]
    NIP [ffff000000000000]: 0xffff000000000000
    Initiator: CPU
    Error type: Real address [Instruction fetch (foreign)]
  [  127.426651616,0] OPAL: Reboot requested due to Platform error.
      Effective[  127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000
  opal: Reboot type 1 not supported
  Kernel panic - not syncing: PowerNV Unrecovered Machine Check
  CPU: 56 PID: 4425 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a26-dirty #35
  Call Trace:
  [  128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to
    buggy/mising code in OPAL for this BMC
    Rebooting in 10 seconds..
  Trying to free IRQ 496 from IRQ context!

After this patch, the process is killed and the kernel continues with
this message, which gives enough information to identify the offending
branch (i.e., with CFAR):

  Severe Machine check interrupt [Not recovered]
    NIP [ffff000000000000]: 0xffff000000000000
    Initiator: CPU
    Error type: Real address [Instruction fetch (foreign)]
      Effective address: ffff000000000000
  Oops: Machine check, sig: 7 [#1]
  SMP NR_CPUS=2048
  NUMA
  PowerNV
  Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ...
  CPU: 22 PID: 4436 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a26-dirty #36
  task: c000000932300000 task.stack: c000000932380000
  NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000
  REGS: c00000000fc8fd80 TRAP: 0200   Tainted: G   M             (4.12.0-rc1-13857-ga4700a26-dirty)
  MSR: 90000000001c1003 <SF,HV,ME,RI,LE>
    CR: 24000484  XER: 20000000
  CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1
  GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000
  GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8
  GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030
  GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918
  NIP [ffff000000000000] 0xffff000000000000
  LR [00000000217706a4] 0x217706a4
  Call Trace:
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6fcd6baa

powerpc/powernv: Flush console before platform error reboot · b746e3e0

由 Nicholas Piggin 提交于 7月 19, 2017

Unrecovered MCE and HMI errors are sent through a special restart OPAL
call to log the platform error. The downside is that they don't go
through normal Linux crash paths, so they don't give much information
to the Linux console.

Change this by providing a special crash function which does some of
the console flushing from the panic() path before calling firmware to
reboot.

The downside of this is a little more code to execute before reaching
the firmware reboot. However in practice, it's critical to get the
Linux console messages output in order to debug a problem. So this is
a desirable tradeoff.

Note on the implementation: It is difficult to plumb a custom reboot
handler into the panic path, because panic does a little bit too much
work. For example, it will try to delay with the timebase, but that
may be corrupted in some cases resulting in a hang without reaching
the platform reboot. Another problem is that panic can invoke the
crash dump code which is not what we want in the case of a hardware
platform error. Long-term the best solution will be to rework the
panic path so it can be suitable for this kind of panic, but for now
we just duplicate a bit of the code.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b746e3e0

powerpc: Do not call ppc_md.panic in fadump panic notifier · a3b2cb30

由 Nicholas Piggin 提交于 7月 05, 2017

If fadump is not registered, and no other crash or debug handlers are
registered, the powerpc panic handler stops the guest before the
generic panic code can push out debug information to the console.

Currently, system reset injection causes the guest to silently stop.

Stop calling ppc_md.panic in the panic notifier. crash_fadump already
does rtas_os_term() to terminate the guest if fadump is registered.

Remove ppc_md.panic. Move fadump panic notifier into fadump code.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a3b2cb30

powerpc/64: Fix watchdog configuration regressions · 70412c55

由 Nicholas Piggin 提交于 8月 28, 2017

This fixes a couple more bits of fallout from the new hard lockup watchdog
patch.

It restores the required hw_nmi_get_sample_period() function for the
perf watchdog, and removes some function declarations on 64e that are only
defined for 64s. This fixes the 64e build when the hardlockup detector is
enabled.

It restores the default behaviour of disabling the perf watchdog, and also
fixes disabling the 64s watchdog when running as a guest.

Fixes: 2104180a ("powerpc/64s: implement arch-specific hardlockup watchdog")
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

70412c55

29 8月, 2017 2 次提交

powerpc/64s: Move IDLE_STATE_ENTER_SEQ[_NORET] into idle_book3s.S · aafc8a83

由 Nicholas Piggin 提交于 8月 29, 2017

This macro is only used in idle_book3s.S, move it in there and add a
more descriptive comment.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch and write change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

aafc8a83

KVM: PPC: Book3S HV: POWER9 does not require secondary thread management · 94a04bc2

由 Nicholas Piggin 提交于 8月 25, 2017

POWER9 CPUs have independent MMU contexts per thread, so KVM does not
need to quiesce secondary threads, so the hwthread_req/hwthread_state
protocol does not have to be used. So patch it away on POWER9, and patch
away the branch from the Linux idle wakeup to kvm_start_guest that is
never used.

Add a warning and error out of kvmppc_grab_hwthread in case it is ever
called on POWER9.

This avoids a hwsync in the idle wakeup path on POWER9.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Acked-by: NPaul Mackerras <paulus@ozlabs.org>
[mpe: Use WARN(...) instead of WARN_ON()/pr_err(...)]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

94a04bc2

23 8月, 2017 4 次提交

powerpc/mm: Make switch_mm_irqs_off() out of line · 3a2df379

由 Benjamin Herrenschmidt 提交于 7月 24, 2017

It's too big to be inline, there is no reason to keep it
that way.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rework to incorporate the comment changes via fixes branch]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3a2df379

powerpc/mm: Optimize detection of thread local mm's · a619e59c

由 Benjamin Herrenschmidt 提交于 7月 24, 2017

Instead of comparing the whole CPU mask every time, let's
keep a counter of how many bits are set in the mask. Thus
testing for a local mm only requires testing if that counter
is 1 and the current CPU bit is set in the mask.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a619e59c

powerpc/mm: Avoid double irq save/restore in activate_mm · 058ccc34

由 Benjamin Herrenschmidt 提交于 7月 24, 2017

It calls switch_mm() which already does the irq save/restore
these days.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

058ccc34

powerpc/mm: Move pgdir setting into a helper · 43ed84a8

由 Benjamin Herrenschmidt 提交于 7月 24, 2017

Makes switch_mm_irqs_off() a bit more readable
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

43ed84a8

18 8月, 2017 1 次提交

powerpc/mm: Ensure cpumask update is ordered · 1a92a80a

由 Benjamin Herrenschmidt 提交于 7月 24, 2017

There is no guarantee that the various isync's involved with
the context switch will order the update of the CPU mask with
the first TLB entry for the new context being loaded by the HW.

Be safe here and add a memory barrier to order any subsequent
load/store which may bring entries into the TLB.

The corresponding barrier on the other side already exists as
pte updates use pte_xchg() which uses __cmpxchg_u64 which has
a sync after the atomic operation.

Cc: stable@vger.kernel.org
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
[mpe: Add comments in the code]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1a92a80a

17 8月, 2017 3 次提交

powerpc/mm: Don't send IPI to all cpus on THP updates · fa4531f7

由 Aneesh Kumar K.V 提交于 7月 27, 2017

Now that we made sure that lockless walk of linux page table is mostly
limitted to current task(current->mm->pgdir) we can update the THP
update sequence to only send IPI to CPUs on which this task has run.
This helps in reducing the IPI overload on systems with large number
of CPUs.

WRT kvm even though kvm is walking page table with vpc->arch.pgdir,
it is done only on secondary CPUs and in that case we have primary CPU
added to task's mm cpumask. Sending an IPI to primary will force the
secondary to do a vm exit and hence this mm cpumask usage is safe
here.

WRT CAPI, we still end up walking linux page table with capi context
MM. For now the pte lookup serialization sends an IPI to all CPUs in
CPI is in use. We can further improve this by adding the CAPI
interrupt handling CPU to task mm cpumask. That will be done in a
later patch.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fa4531f7

powerpc/mm: Rename find_linux_pte_or_hugepte() · 94171b19

由 Aneesh Kumar K.V 提交于 7月 27, 2017

Add newer helpers to make the function usage simpler. It is always
recommended to use find_current_mm_pte() for walking the page table.
If we cannot use find_current_mm_pte(), it should be documented why
the said usage of __find_linux_pte() is safe against a parallel THP
split.

For now we have KVM code using __find_linux_pte(). This is because kvm
code ends up calling __find_linux_pte() in real mode with MSR_EE=0 but
with PACA soft_enabled = 1. We may want to fix that later and make
sure we keep the MSR_EE and PACA soft_enabled in sync. When we do that
we can switch kvm to use find_linux_pte().
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

94171b19

powerpc/string: Implement optimized memset variants · 694fc88c

由 Naveen N. Rao 提交于 3月 28, 2017

Based on Matthew Wilcox's patches for other architectures.
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

694fc88c

16 8月, 2017 3 次提交

powerpc/topology: Remove the unused parent_node() macro · 7baebe54

由 Dou Liyang 提交于 7月 26, 2017

Commit a7be6e5a ("mm: drop useless local parameters of
__register_one_node()") removes the last user of parent_node().

The parent_node() macro in POWERPC platform is unnecessary.

Remove it for cleanup.
Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7baebe54

powerpc/mm/hugetlb: Allow runtime allocation of 16G. · 4ae279c2

由 Aneesh Kumar K.V 提交于 7月 28, 2017

Now that we have GIGANTIC_PAGE enabled on powerpc, use this for 16G hugepages
with hash translation mode. Depending on the total system memory we have, we may
be able to allocate 16G hugepages runtime. This also remove the hugetlb setup
difference between hash/radix translation mode.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4ae279c2

powerpc/mm/hugetlb: Add support for reserving gigantic huge pages via kernel command line · 79cc38de

由 Aneesh Kumar K.V 提交于 7月 28, 2017

With commit aa888a74 ("hugetlb: support larger than MAX_ORDER") we added
support for allocating gigantic hugepages via kernel command line. Switch
ppc64 arch specific code to use that.

W.r.t FSL support, we now limit our allocation range using BOOTMEM_ALLOC_ACCESSIBLE.

We use the kernel command line to do reservation of hugetlb pages on powernv
platforms. On pseries hash mmu mode the supported gigantic huge page size is
16GB and that can only be allocated with hypervisor assist. For pseries the
command line option doesn't do the allocation. Instead pseries does gigantic
hugepage allocation based on hypervisor hint that is specified via
"ibm,expected#pages" property of the memory node.

Cc: Scott Wood <oss@buserror.net>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

79cc38de

15 8月, 2017 5 次提交

powerpc/hugetlb: fix page rights verification in gup_hugepte() · ca8afd40

由 Christophe Leroy 提交于 7月 12, 2017

gup_hugepte() checks if pages are present and readable, and
when  'write' is set, also checks if the pages are writable.

Initially this was done by checking if _PAGE_PRESENT and
_PAGE_READ were set. In addition, _PAGE_WRITE was verified for write
accesses.

The problem is that we have to handle the three following cases:
1/ The target defines __PAGE_READ and __PAGE_WRITE
2/ The target defines __PAGE_RW
3/ The target defines __PAGE_RO

In case 1/, this is obvious
In case 2/, __PAGE_READ is defined as 0 and __PAGE_WRITE as __PAGE_RW
so it works as well.
But in case 3, __PAGE_RW is defined as 0, which means __PAGE_WRITE is 0
and then the test returns true (page writable) in all cases.

A first correction was attempted in commit 6b8cb66a ("powerpc: Fix
usage of _PAGE_RO in hugepage"), but that fix is wrong:
instead of checking that the page is writable when write is requested,
it checks that the page is NOT writable when write is NOT requested.

This patch adds a new pte_read() helper to check whether a page is
readable or not. This avoids handling all possible cases in
gup_hugepte().

Then gup_hugepte() is modified to use pte_present(), pte_read()
and pte_write() instead of the raw flags.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ca8afd40

powerpc/mm: Simplify __set_fixmap() · 4cfac2f9

由 Christophe Leroy 提交于 8月 02, 2017

__set_fixmap() uses __fix_to_virt() then does the boundary checks
by it self. Instead, we can use fix_to_virt() which does the
verification at build time. For this, we need to use it inline
so that GCC can see the real value of idx at buildtime.

In the meantime, we remove the 'fixmaps' variable.
This variable is set but has never been used from the beginning
(commit 2c419bde ("[POWERPC] Port fixmap from x86 and use
for kmap_atomic"))
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4cfac2f9

powerpc/mm: declare some local functions static · 86b19520

由 Christophe Leroy 提交于 8月 02, 2017

get_pteptr() and __mapin_ram_chunk() are only used locally,
so define them static
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

86b19520

powerpc/mm: Fix kernel RAM protection after freeing unused memory on PPC32 · 3184cc4b

由 Christophe Leroy 提交于 8月 02, 2017

As seen below, allthough the init sections have been freed, the
associated memory area is still marked as executable in the
page tables.

~ dmesg
[    5.860093] Freeing unused kernel memory: 592K (c0570000 - c0604000)

~ cat /sys/kernel/debug/kernel_page_tables
---[ Start of kernel VM ]---
0xc0000000-0xc0497fff        4704K  rw  X  present dirty accessed shared
0xc0498000-0xc056ffff         864K  rw     present dirty accessed shared
0xc0570000-0xc059ffff         192K  rw  X  present dirty accessed shared
0xc05a0000-0xc7ffffff      125312K  rw     present dirty accessed shared
---[ vmalloc() Area ]---

This patch fixes that.

The implementation is done by reusing the change_page_attr()
function implemented for CONFIG_DEBUG_PAGEALLOC
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3184cc4b

powerpc/mm/nohash: Move definition of PGALLOC_GFP to fix build errors · 5b6c133e

由 Michael Ellerman 提交于 8月 15, 2017

In some obscure Book3E configs (randconfig) we can end up missing a
definition for PGALLOC_GFP in pgtable_64.c.

Fix it by moving the definition to asm/pgalloc.h.

Fixes: de3b8761 ("powerpc/mm/book(e)(3s)/64: Add page table accounting")
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5b6c133e

10 8月, 2017 3 次提交

powerpc/8xx: Use symbolic PVR value · 3ee87674

由 Christophe Leroy 提交于 8月 08, 2017

For the 8xx, PVR values defined in arch/powerpc/include/asm/reg.h
are nowhere used.

Remove all defines and add PVR_8xx

Use it in arch/powerpc/kernel/cputable.c
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3ee87674

powerpc/8xx: Getting rid of remaining use of CONFIG_8xx · 968159c0

由 Christophe Leroy 提交于 8月 08, 2017

Two config options exist to define powerpc MPC8xx:
* CONFIG_PPC_8xx
* CONFIG_8xx

arch/powerpc/platforms/Kconfig.cputype has contained the following
comment about CONFIG_8xx item for some years:
"# this is temp to handle compat with arch=ppc"

arch/powerpc is now the only place with remaining use of
CONFIG_8xx: get rid of them.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

968159c0

powerpc/time: refactor MFTB() to limit number of ifdefs · 72e4b2cd

由 Christophe Leroy 提交于 8月 08, 2017

The 8xx cannot access the TBL and TBU registers using mfspr/mtspr
It must be accessed using mftb/mftbu

Due to this, there is a number of places with #ifdef CONFIG_8xx

This patch defines new macros MFTBL(x) and MFTBU(x) on the same model
as MFTB(x) and tries to make use of them as much as possible.

In arch/powerpc/include/asm/timex.h, we also remove the ifdef
for the asm() operands as the compiler doesn't mind unused operands
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

72e4b2cd

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功