提交 · a7d623d4d053ccb0cdfad210bced2ec25ddf69a2 · openeuler / raspberrypi-kernel

01 12月, 2015 11 次提交

powerpc: Move part of giveup_vsx into c · a7d623d4

由 Anton Blanchard 提交于 10月 29, 2015

Move the MSR modification into c. Removing it from the assembly
function will allow us to avoid costly MSR writes by batching them
up.

Check the FP and VMX bits before calling the relevant giveup_*()
function. This makes giveup_vsx() and flush_vsx_to_thread() perform
more like their sister functions, and allows us to use
flush_vsx_to_thread() in the signal code.

Move the check_if_tm_restore_required() check in.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a7d623d4

powerpc: Move part of giveup_fpu,altivec,spe into c · 98da581e

由 Anton Blanchard 提交于 10月 29, 2015

Move the MSR modification into new c functions. Removing it from
the low level functions will allow us to avoid costly MSR writes
by batching them up.

Move the check_if_tm_restore_required() check into these new functions.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

98da581e

powerpc: Remove NULL task struct pointer checks in FP and vector code · b51b1153

由 Anton Blanchard 提交于 10月 29, 2015

We used to allow giveup_*() to be called with a NULL task struct
pointer. Now those cases are handled in the caller we can remove
the checks. We can also remove giveup_altivec_notask() which is also
unused.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b51b1153

powerpc: Create mtmsrd_isync() · 611b0e5c

由 Anton Blanchard 提交于 10月 29, 2015

mtmsrd_isync() will do an mtmsrd followed by an isync on older
processors. On newer processors we avoid the isync via a feature fixup.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

611b0e5c

powerpc: Simplify TM restore checks · b86fd2bd

由 Anton Blanchard 提交于 10月 29, 2015

Instead of having multiple giveup_*_maybe_transactional() functions,
separate out the TM check into a new function called
check_if_tm_restore_required().

This will make it easier to optimise the giveup_*() functions in a
subsequent patch.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b86fd2bd

powerpc: Remove UP only lazy floating point and vector optimisations · af1bbc3d

由 Anton Blanchard 提交于 10月 29, 2015

The UP only lazy floating point and vector optimisations were written
back when SMP was not common, and neither glibc nor gcc used vector
instructions. Now SMP is very common, glibc aggressively uses vector
instructions and gcc autovectorises.

We want to add new optimisations that apply to both UP and SMP, but
in preparation for that remove these UP only optimisations.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

af1bbc3d

powerpc: Remove redundant mflr in _switch · 68bfa962

由 Anton Blanchard 提交于 10月 29, 2015

No need to execute mflr twice.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

68bfa962

powerpc: Create context switch helpers save_sprs() and restore_sprs() · 152d523e

由 Anton Blanchard 提交于 10月 29, 2015

Move all our context switch SPR save and restore code into two
helpers. We do a few optimisations:

- Group all mfsprs and all mtsprs. In many cases an mtspr sets a
scoreboarding bit that an mfspr waits on, so the current practise of
mfspr A; mtspr A; mfpsr B; mtspr B is the worst scheduling we can
do.

- SPR writes are slow, so check that the value is changing before
writing it.

A context switch microbenchmark using yield():

http://ozlabs.org/~anton/junkcode/context_switch2.c

./context_switch2 --test=yield 0 0

shows an improvement of almost 10% on POWER8.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

152d523e

powerpc: Don't disable MSR bits in do_load_up_transact_*() functions · af72ab64

由 Anton Blanchard 提交于 10月 29, 2015

Similar to the non TM load_up_*() functions, don't disable the MSR
bits on the way out.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

af72ab64

powerpc: Don't disable kernel FP/VMX/VSX MSR bits on context switch · 07e45c12

由 Anton Blanchard 提交于 10月 29, 2015

Writing the MSR is slow, so we want to avoid it whenever possible.

A subsequent patch will add a debug option that strictly manages the
FP/VMX/VSX unavailable bits. For now just remove it, matching what
we do in other areas of the kernel (eg enable_kernel_altivec()).

A context switch microbenchmark using yield():

http://ozlabs.org/~anton/junkcode/context_switch2.c

./context_switch2 --test=yield --fp 0 0

shows an improvement of almost 3% on POWER8.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

07e45c12

powerpc/64: Include KVM guest test in all interrupt vectors · 31a40e2b

由 Paul Mackerras 提交于 11月 12, 2015

Currently, if HV KVM is configured but PR KVM isn't, we don't include
a test to see whether we were interrupted in KVM guest context for the
set of interrupts which get delivered directly to the guest by hardware
if they occur in the guest.  This includes things like program
interrupts.

However, the recent bug where userspace could set the MSR for a VCPU
to have an illegal value in the TS field, and thus cause a TM Bad Thing
type of program interrupt on the hrfid that enters the guest, showed that
we can never be completely sure that these interrupts can never occur
in the guest entry/exit code.  If one of these interrupts does happen
and we have HV KVM configured but not PR KVM, then we end up trying to
run the handler in the host with the MMU set to the guest MMU context,
which generally ends badly.

Thus, for robustness it is better to have the test in every interrupt
vector, so that if some way is found to trigger some interrupt in the
guest entry/exit path, we can handle it without immediately crashing
the host.

This means that the distinction between KVMTEST and KVMTEST_PR goes
away.  Thus we delete KVMTEST_PR and associated macros and use KVMTEST
everywhere that we previously used either KVMTEST_PR or KVMTEST.  It
also means that SOFTEN_TEST_HV_201 becomes the same as SOFTEN_TEST_PR,
so we deleted SOFTEN_TEST_HV_201 and use SOFTEN_TEST_PR instead.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

31a40e2b

26 11月, 2015 3 次提交

powerpc: Add rN aliases to the pt_regs_offset table. · 343c3327

由 Rashmica Gupta 提交于 11月 21, 2015

It is common practice with powerpc to use 'rN' to refer to register 'N'. However
when using the pt_regs_offset table we have to use 'gprN'.

So add aliases such that both 'rN' and 'gprN' can be used.

For example, we can currently do:
  $ su -
  $ cd /sys/kernel/debug/tracing
  $ echo "p:probe/sys_fchownat sys_fchownat %gpr3:s32 +0(%gpr4):string %gpr5:s32 %gpr6:s32 %gpr7:s32" > kprobe_events
  $ echo 1 > events/probe/sys_fchownat/enable
  $ touch /tmp/foo
  $ chown root /tmp/foo
  $ echo 0 > events/enable
  $ cat trace
    chown-2925  [014] d...    76.160657: sys_fchownat: (SyS_fchownat+0x8/0x1a0) arg1=-100 arg2="/tmp/foo" arg3=0 arg4=-1 arg5=0

Instead we'd like to be able to use:
 $ echo "p:probe/sys_fchownat sys_fchownat %r3:s32 +0(%r4):string %r5:s32 %r6:s32 %r7:s32" > kprobe_events
Signed-off-by: NRashmica Gupta <rashmicy@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

343c3327

powerpc: Standardise on NR_syscalls rather than __NR_syscalls. · f43194e4

由 Rashmica Gupta 提交于 11月 19, 2015

Most architectures use NR_syscalls as the #define for the number of syscalls.

We use __NR_syscalls, and then define NR_syscalls as __NR_syscalls.

__NR_syscalls is not used outside arch code, whereas NR_syscalls is. So as
NR_syscalls must be defined and __NR_syscalls does not, replace __NR_syscalls
with NR_syscalls.
Signed-off-by: NRashmica Gupta <rashmicy@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f43194e4

powerpc: Remove unused function trace_syscall() · cdfc8ed6

由 Rashmica Gupta 提交于 11月 19, 2015

This function has been unused since commit 14cf11af ("powerpc: Merge enough
to start building in arch/powerpc."), so remove it.
Signed-off-by: NRashmica Gupta <rashmicy@gmail.com>
Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

cdfc8ed6

28 10月, 2015 12 次提交

powerpc/dma: dma_set_coherent_mask() should not be GPL only · 977bf062

由 Benjamin Herrenschmidt 提交于 10月 27, 2015

When turning this from inline to an exported function I was a bit
over-eager and made it GPL only. This prevents the use of pretty much
all non-GPL PCI driver which is a bit over the top. Let's bring it
back in line with other architecture.

Fixes: 817820b0 ("powerpc/iommu: Support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask")
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

977bf062

powerpc/prom: Use of_get_next_parent() in of_get_ibm_chip_id() · 16c1d606

由 Michael Ellerman 提交于 10月 26, 2015

Use of_get_next_parent() to simplifiy the logic in of_get_ibm_chip_id().
Original-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

16c1d606

powerpc/book3e-64: Enable kexec · 96eea642

由 Tiejun Chen 提交于 10月 06, 2015

Allow KEXEC for book3e, and bypass or convert non-book3e stuff
in kexec code.
Signed-off-by: NTiejun Chen <tiejun.chen@windriver.com>
[scottwood@freescale.com: move code to minimize diff, and cleanup]
Signed-off-by: NScott Wood <scottwood@freescale.com>

96eea642

powerpc/book3e-64/kexec: Set "r4 = 0" when entering spinloop · ae73e4cc

由 Scott Wood 提交于 10月 06, 2015

book3e_secondary_core_init will only create a TLB entry if r4 = 0,
so do so.
Signed-off-by: NScott Wood <scottwood@freescale.com>

ae73e4cc

powerpc/book3e-64/kexec: Enable SMP release · 567cf94d

由 Scott Wood 提交于 10月 06, 2015

The SMP release mechanism for FSL book3e is different from when booting
with normal hardware.  In theory we could simulate the normal spin
table mechanism, but not at the addresses U-Boot put in the device tree
-- so there'd need to be even more communication between the kernel and
kexec to set that up.  Instead, kexec-tools will set a boolean property
linux,booted-from-kexec in the /chosen node.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Cc: devicetree@vger.kernel.org

567cf94d

powerpc/book3e-64/kexec: create an identity TLB mapping · cf904e30

由 Tiejun Chen 提交于 10月 06, 2015

book3e has no real MMU mode so we have to create an identity TLB
mapping to make sure we can access the real physical address.
Signed-off-by: NTiejun Chen <tiejun.chen@windriver.com>
[scottwood: cleanup, and split off some changes]
Signed-off-by: NScott Wood <scottwood@freescale.com>

cf904e30

powerpc/book3e-64: Don't limit paca to 256 MiB · ecc4999f

由 Scott Wood 提交于 10月 06, 2015

This limit only makes sense on book3s, and on book3e it can cause
problems with kdump if we don't have any memory under 256 MiB.
Signed-off-by: NScott Wood <scottwood@freescale.com>

ecc4999f

powerpc/book3e/kdump: Enable crash_kexec_wait_realmode · eeaab663

由 Scott Wood 提交于 10月 06, 2015

While book3e doesn't have "real mode", we still want to wait for
all the non-crash cpus to complete their shutdown.
Signed-off-by: NScott Wood <scottwood@freescale.com>

eeaab663

powerpc/book3e: support CONFIG_RELOCATABLE · 1cb6e064

由 Tiejun Chen 提交于 10月 06, 2015

book3e is different with book3s since 3s includes the exception
vectors code in head_64.S as it relies on absolute addressing
which is only possible within this compilation unit. So we have
to get that label address with got.

And when boot a relocated kernel, we should reset ipvr properly again
after .relocate.
Signed-off-by: NTiejun Chen <tiejun.chen@windriver.com>
[scottwood: cleanup and ifdef removal]
Signed-off-by: NScott Wood <scottwood@freescale.com>

1cb6e064

powerpc/booke64: Fix args to copy_and_flush · 835c031c

由 Tiejun Chen 提交于 10月 06, 2015

Convert r4/r5, not r6, to a virtual address when calling
copy_and_flush.  Otherwise, r3 is already virtual, and copy_to_flush
tries to access r3+r6, PAGE_OFFSET gets added twice.

This isn't normally seen because on book3e we normally enter with
the kernel at zero and thus skip copy_to_flush -- but it will be
needed for kexec support.
Signed-off-by: NTiejun Chen <tiejun.chen@windriver.com>
[scottwood: split patch and rewrote changelog]
Signed-off-by: NScott Wood <scottwood@freescale.com>

835c031c

powerpc/book3e-64: rename interrupt_end_book3e with __end_interrupts · 68d10140

由 Tiejun Chen 提交于 10月 06, 2015

Rename 'interrupt_end_book3e' to '__end_interrupts' so that the symbol
can be used by both book3s and book3e.
Signed-off-by: NTiejun Chen <tiejun.chen@windriver.com>
[scottwood: edit changelog]
Signed-off-by: NScott Wood <scottwood@freescale.com>

68d10140

powerpc/e6500: kexec: Handle hardware threads · f34b3e19

由 Scott Wood 提交于 10月 06, 2015

The new kernel will be expecting secondary threads to be disabled,
not spinning.
Signed-off-by: NScott Wood <scottwood@freescale.com>

f34b3e19

23 10月, 2015 1 次提交

powerpc/85xx: Load all early TLB entries at once · d9e1831a

由 Scott Wood 提交于 10月 06, 2015

Use an AS=1 trampoline TLB entry to allow all normal TLB1 entries to
be loaded at once.  This avoids the need to keep the translation that
code is executing from in the same TLB entry in the final TLB
configuration as during early boot, which in turn is helpful for
relocatable kernels (e.g. kdump) where the kernel is not running from
what would be the first TLB entry.

On e6500, we limit map_mem_in_cams() to the primary hwthread of a
core (the boot cpu is always considered primary, as a kdump kernel
can be entered on any cpu).  Each TLB only needs to be set up once,
and when we do, we don't want another thread to be running when we
create a temporary trampoline TLB1 entry.
Signed-off-by: NScott Wood <scottwood@freescale.com>

d9e1831a

22 10月, 2015 1 次提交

powerpc/rtas: Validate rtas.entry before calling enter_rtas() · 8832317f

由 Vasant Hegde 提交于 10月 16, 2015

Currently we do not validate rtas.entry before calling enter_rtas(). This
leads to a kernel oops when user space calls rtas system call on a powernv
platform (see below). This patch adds code to validate rtas.entry before
making enter_rtas() call.

  Oops: Exception in kernel mode, sig: 4 [#1]
  SMP NR_CPUS=1024 NUMA PowerNV
  task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000
  NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140
  REGS: c0000007e1a7b920 TRAP: 0e40   Not tainted  (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
  MSR: 1000000000081000 <HV,ME>  CR: 00000000  XER: 00000000
  CFAR: c000000000009c0c SOFTE: 0
  NIP [0000000000000000]           (null)
  LR [0000000000009c14] 0x9c14
  Call Trace:
  [c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable)
  [c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0
  [c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98

Cc: stable@vger.kernel.org # v3.2+
Fixes: 55190f88 ("powerpc: Add skeleton PowerNV platform")
Reported-by: NNAGESWARA R. SASTRY <nasastry@in.ibm.com>
Signed-off-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com>
[mpe: Reword change log, trim oops, and add stable + fixes]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8832317f

21 10月, 2015 5 次提交

powerpc/eeh: More relaxed condition for enabled IO path · 872ee2d6

由 Gavin Shan 提交于 10月 08, 2015

When one or both of the below two flags are marked in the PE state, the
PE's IO path is regarded as enabled: EEH_STATE_MMIO_ACTIVE or
EEH_STATE_MMIO_ENABLED.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

872ee2d6

powerpc/eeh: Force reset on fenced PHB · 8234fced

由 Gavin Shan 提交于 10月 08, 2015

On fenced PHB, the error handlers in the drivers of its subordinate
devices could return PCI_ERS_RESULT_CAN_RECOVER, indicating no reset
will be issued during the recovery. It's conflicting with the fact
that fenced PHB won't be recovered without reset.

This limits the return value from the error handlers in the drivers
of the fenced PHB's subordinate devices to PCI_ERS_RESULT_NEED_NONE
or PCI_ERS_RESULT_NEED_RESET, to ensure reset will be issued during
recovery.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8234fced

powerpc/eeh: More relaxed hotplug criterion · f2da4ccf

由 Gavin Shan 提交于 10月 08, 2015

Currently, we rely on the existence of struct pci_driver::err_handler
to decide if the corresponding PCI device should be unplugged during
EEH recovery (partially hotplug case). However that check is not
sufficient. Some device drivers implement only some of the EEH error
handlers to collect diag-data. That means the driver still expects a
hotplug to recover from the EEH error.

This makes the hotplug criterion more relaxed: if the device driver
doesn't provide all necessary EEH error handlers, it will experience
hotplug during EEH recovery.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
[mpe: Minor change log rewording]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f2da4ccf

powerpc/eeh: Don't unfreeze PHB PE after reset · 527d10ef

由 Gavin Shan 提交于 10月 08, 2015

On PowerNV platform, the PE is kept in frozen state until the PE
reset is completed to avoid recursive EEH error caused by MMIO
access during the period of EEH reset. The PE's frozen state is
cleared after BARs of PCI device included in the PE are restored
and enabled. However, we needn't clear the frozen state for PHB PE
explicitly at this point as there is no real PE for PHB PE. As the
PHB PE is always binding with PE#0, we actually clear PE#0, which
is wrong. It doesn't incur any problem though.

This checks if the PE is PHB PE and doesn't clear the frozen state
if it is.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

527d10ef

powerpc/prom: Avoid reference to potentially freed memory · 1856f50c

由 Christophe Jaillet 提交于 10月 16, 2015

of_get_property() is used inside the loop, but then the reference to the
node is dropped before dereferencing the prop pointer, which could by then
point to junk if the node has been freed.

Instead use of_property_read_u32() to actually read the property
value before dropping the reference.

of_property_read_u32() requires at least one cell (u32) to be present,
which is stricter than the old logic which would happily dereference a
property of any size. However we believe all device trees in the wild
have at least one cell.

Skiboot may produce memory nodes with more than one cell, but that is
OK, of_property_read_u32() will return the first one.
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
[mpe: Expand change log with device tree details]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1856f50c

19 10月, 2015 3 次提交

vTPM: get the buffer allocated for event log instead of the actual log · 9e5d4af4

由 Hon Ching \(Vicky\) Lo 提交于 10月 07, 2015

The OS should ask Power Firmware (PFW) for the size of the buffer
allocated for the event log, instead of the size of the actual
event log.  It then passes the buffer adddress and size to PFW in
the handover process, into which PFW copies the log.
Signed-off-by: NHon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com>
Signed-off-by: NPeter Huewe <peterhuewe@gmx.de>

9e5d4af4

vTPM: reformat event log to be byte-aligned · b4ed0469

由 Hon Ching \(Vicky\) Lo 提交于 10月 07, 2015

The event log generated by OpenFirmware in PowerPC is 4-byte aligned.
This patch reformats the log to be byte-aligned for the Linux client.
Signed-off-by: NHon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com>
Signed-off-by: NPeter Huewe <peterhuewe@gmx.de>

b4ed0469

vTPM: fix searching for the right vTPM node in device tree · 2f82e982

由 Hon Ching \(Vicky\) Lo 提交于 10月 07, 2015

Replace all occurrences of '/ibm,vtpm' with '/vdevice/vtpm',
as only the latter is guanranteed to be available for the client OS.
The '/ibm,vtpm' node should only be used by Open Firmware, which
is susceptible to changes.
Signed-off-by: NHon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com>
Signed-off-by: NPeter Huewe <peterhuewe@gmx.de>

2f82e982

15 10月, 2015 3 次提交

powerpc: discard .exit.data at runtime · 4c812318

由 Stephen Rothwell 提交于 10月 08, 2015

.exit.text is discarded at run time and there are some references from
that to .exit.data, so we need to discard .exit.data at run time as well.

Fixes these errors:

`.exit.data' referenced in section `.exit.text' of drivers/built-in.o: defined in discarded section `.exit.data' of drivers/built-in.o
`.exit.data' referenced in section `.exit.text' of drivers/built-in.o: defined in discarded section `.exit.data' of drivers/built-in.o
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4c812318

powerpc/eeh: atomic_dec_if_positive() to update passthru count · 54f9a64a

由 Gavin Shan 提交于 8月 27, 2015

No need to have two atomic opertions (update and fetch/check) when
decreasing PE's number of passed devices as one atomic operation
is enough.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

54f9a64a

powerpc/pci: export pcibios_free_controller() · 6b8b252f

由 Andrew Donnellan 提交于 9月 10, 2015

Export pcibios_free_controller(), so it can be used by the cxl module to
free virtual PHBs.
Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6b8b252f

12 10月, 2015 1 次提交

powerpc/mm: Differentiate between hugetlb and THP during page walk · 891121e6

由 Aneesh Kumar K.V 提交于 10月 09, 2015

We need to properly identify whether a hugepage is an explicit or
a transparent hugepage in follow_huge_addr(). We used to depend
on hugepage shift argument to do that. But in some case that can
result in wrong results. For ex:

On finding a transparent hugepage we set hugepage shift to PMD_SHIFT.
But we can end up clearing the thp pte, via pmdp_huge_get_and_clear.
We do prevent reusing the pfn page via the usage of
kick_all_cpus_sync(). But that happens after we updated the pte to 0.
Hence in follow_huge_addr() we can find hugepage shift set, but transparent
huge page check fail for a thp pte.

NOTE: We fixed a variant of this race against thp split in commit
691e95fd
("powerpc/mm/thp: Make page table walk safe against thp split/collapse")

Without this patch, we may hit the BUG_ON(flags & FOLL_GET) in
follow_page_mask occasionally.

In the long term, we may want to switch ppc64 64k page size config to
enable CONFIG_ARCH_WANT_GENERAL_HUGETLB
Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

891121e6