提交 · 2cd76629f6c40f7b227ba7b3885ce10e6ec0face · openeuler / raspberrypi-kernel

16 11月, 2011 6 次提交

powerpc/trace: Add a dummy stack frame for trace_hardirqs_off · 2cd76629

由 Kevin Hao 提交于 11月 10, 2011

The trace_hardirqs_off will use CALLER_ADDR0 and CALLER_ADDR1.
If an exception occurs in user mode, there is only one stack frame
on the stack and accessing the CALLER_ADDR1 will causes the following
call trace. So we create a dummy stack frame to make
trace_hardirqs_off happy.

WARNING: at kernel/smp.c:459
Modules linked in:
NIP: c0093280 LR: c00930a0 CTR: c0010780
REGS: edb87ae0 TRAP: 0700   Not tainted  (3.1.0)
MSR: 00021002 <ME,CE>  CR: 28002888  XER: 00000000
TASK = edce2ac0[17658] 'mthread-lock-on' THREAD: edb86000 CPU: 5
GPR00: 00000001 edb87b90 edce2ac0 00000005 c0019594 edb87bd8 00000001 00000fe3
GPR08: 00041000 c084138c 4e20120d edb87b90 48002888 1001aa7c 00000000 00000000
GPR16: 48830000 10012a8c 00000000 10000af4 00000001 c0810000 00000000 00000000
GPR24: ee9aa920 c0816a18 00000000 00000005 c0019594 edb87bd8 ee20178c edb87b90
NIP [c0093280] smp_call_function_many+0x214/0x2b4
LR [c00930a0] smp_call_function_many+0x34/0x2b4
Call Trace:
[edb87b90] [c00930a0] smp_call_function_many+0x34/0x2b4 (unreliable)
[edb87bd0] [c00194ec] __flush_tlb_page+0xac/0x100
[edb87c00] [c001957c] flush_tlb_page+0x3c/0x54
[edb87c10] [c00180ac] ptep_set_access_flags+0x74/0x12c
[edb87c40] [c0128068] handle_pte_fault+0x2f0/0x9ac
[edb87cb0] [c0128c3c] handle_mm_fault+0x104/0x1dc
[edb87ce0] [c05f40f4] do_page_fault+0x2dc/0x630
[edb87e50] [c001078c] handle_page_fault+0xc/0x80
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

2cd76629

powerpc: Copy down exception vectors after feature fixups · d715e433

由 Anton Blanchard 提交于 11月 14, 2011

kdump fails because we try to execute an HV only instruction. Feature
fixups are being applied after we copy the exception vectors down to 0
so they miss out on any updates.

We have always had this issue but it only became critical in v3.0
when we added CFAR support (breaks POWER5) and v3.1 when we added
POWERNV (breaks everyone).
Signed-off-by: NAnton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> [v3.0+]
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d715e433

powerpc: panic if we can't instantiate RTAS · 6d1e2c6c

由 Anton Blanchard 提交于 11月 14, 2011

I had to debug a strange situation where all manner of things were
failing. SMT threads, storage and network were all completely broken.

The root cause was we couldn't find enough memory to instantiate RTAS -
this was a network install so the initrd was huge.

Instead of limping along and failing in mysterious ways we should just
panic up front if RTAS exists and we can't allocate space for it.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

6d1e2c6c

powerpc/4xx: Fix typos in kexec config dependencies · bbc24a25

由 Suzuki Poulose 提交于 11月 14, 2011

Kexec is not supported on 47x. 47x is a variant of 44x with slightly
different MMU and SMP support. There was a typo in the config dependency
for kexec. This patch fixes the same.
Signed-off-by: NSuzuki K. Poulose <suzuki@in.ibm.com>
Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
Cc:	Kumar Gala <galak@kernel.crashing.org>
Cc:	Josh Boyer <jwboyer@gmail.com>
Cc:	linux ppc dev <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

bbc24a25

A
powerpc/fsl: MCU_MPC8349EMITX wants I2C built-in, modular won't do... · 82640a6b
由 Al Viro 提交于 11月 08, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
```
82640a6b

powerpc: Fix build breakage in jump_label.c · 9c8b3907

由 Al Viro 提交于 11月 08, 2011

Should do what other architectures do and wrap all that code into
the appropriate ifdef
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

9c8b3907

15 11月, 2011 1 次提交

fsl-rio: fix compile error · e0ce42e1

由 Liu Gang 提交于 11月 11, 2011

The "#include <linux/module.h>" was replaced by "#include <linux/export.h>"
in the patch "powerpc: various straight conversions from module.h --> export.h".
This will cause the following compile problem:
arch/powerpc/sysdev/fsl_rio.c: In function 'fsl_rio_mcheck_exception':
arch/powerpc/sysdev/fsl_rio.c:296: error: implicit declaration of function 'search_exception_tables'.

The file fsl_rio.c needs the declaration of function "search_exception_tables"
in the header file "linux/module.h".
Signed-off-by: NLiu Gang <Gang.Liu@freescale.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

e0ce42e1

08 11月, 2011 8 次提交

powerpc/kvm: Fix build failure with HV KVM and CBE · 5ccf55dd

由 Alexander Graf 提交于 9月 13, 2011

When running with HV KVM and CBE config options enabled, I get
build failures like the following:

  arch/powerpc/kernel/head_64.o: In function `cbe_system_error_hv':
  (.text+0x1228): undefined reference to `do_kvm_0x1202'
  arch/powerpc/kernel/head_64.o: In function `cbe_maintenance_hv':
  (.text+0x1628): undefined reference to `do_kvm_0x1602'
  arch/powerpc/kernel/head_64.o: In function `cbe_thermal_hv':
  (.text+0x1828): undefined reference to `do_kvm_0x1802'

This is because we jump to a KVM handler when HV is enabled, but we
only generate the handler with PR KVM mode.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

5ccf55dd

powerpc/ps3: Fix lv1_gpu_attribute hcall · 9fce85f7

由 Geoff Levand 提交于 10月 12, 2011

The lv1_gpu_attribute hcall takes three, not five input
arguments.  Adjust the lv1 hcall table and all calls.
Signed-off-by: NGeoff Levand <geoff@infradead.org>
CC: Takashi Iwai <tiwai@suse.de>
Acked-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

9fce85f7

powerpc/ps3: Fix PS3 repository build warnings · 5233e26e

由 Geoff Levand 提交于 10月 12, 2011

Fix uninitialized variable warnings in build of repository.c
Signed-off-by: NGeoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

5233e26e

powerpc/irq: Remove IRQF_DISABLED · a3a9f3b4

由 Yong Zhang 提交于 10月 21, 2011

Since commit [e58aa3d2: genirq: Run irq handlers with interrupts disabled],
We run all interrupt handlers with interrupts disabled
and we even check and yell when an interrupt handler
returns with interrupts enabled (see commit [b738a50a:
genirq: Warn when handler enables interrupts]).

So now this flag is a NOOP and can be removed.
Signed-off-by: NYong Zhang <yong.zhang0@gmail.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NGeoff Levand <geoff@infradead.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a3a9f3b4

powerpc/numa: NUMA topology support for PowerNV · 1c8ee733

由 Dipankar Sarma 提交于 10月 28, 2011

This patch adds support for numa topology on powernv platforms running
OPAL formware. It checks for the type of platform at run time and
sets the affinity form correctly so that NUMA topology can be discovered
correctly.
Signed-off-by: NDipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

1c8ee733

powerpc: Add System RAM to /proc/iomem · c40dd2f7

由 Anton Blanchard 提交于 11月 02, 2011

We've resisted adding System RAM to /proc/iomem because it is
the wrong place for it. Unfortunately we continue to find tools
that rely on this behaviour so give up and add it in.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

c40dd2f7

powerpc: Add KVM as module to defconfigs · 88cf11b4

由 Michael Neuling 提交于 11月 07, 2011

Add HV mode KVM to Book3 server 64bit defconfigs as a module.

Doesn't add much to the size:
   text	   data	    bss	     dec	    hex	filename
8244109	4686767	 994000	13924876	 d47a0c	vmlinux.vanilla
8256092 4691607  994128 13941827         d4bc43 vmlinux.kvm

This should enable more testing of this configuration.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

88cf11b4

powerpc/kvm: Fix build with older toolchains · ad61d64e

由 Nishanth Aravamudan 提交于 11月 07, 2011

Fix KVM build for older toolchains (found with .powerpc64-unknown-linux-gnu-gcc
(crosstool-NG-1.8.1) 4.3.2):

  AS      arch/powerpc/kvm/book3s_hv_rmhandlers.o
arch/powerpc/kvm/book3s_hv_rmhandlers.S: Assembler messages:
arch/powerpc/kvm/book3s_hv_rmhandlers.S:1388: Error: Unrecognized opcode: `popcntw'
make[1]: *** [arch/powerpc/kvm/book3s_hv_rmhandlers.o] Error 1
make: *** [_module_arch/powerpc/kvm] Error 2
Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ad61d64e

04 11月, 2011 6 次提交

powerpc/p3060qds: Add support for P3060QDS board · 96cc017c

由 Shengzhou Liu 提交于 8月 26, 2011

The P3060QDS is a Freescale reference board that hosts the six-core P3060 SOC.
The P3060 Processor combines six e500mc Power Architecture processor cores with
high-performance datapath acceleration architecture(DPAA), CoreNet fabric
infrastructure, as well as network and peripheral interfaces.

P3060QDS Board Overview:
Memory subsystem:
  - 2G Bytes unbuffered DDR3 SDRAM SO-DIMM(64bit bus)
  - 128M Bytes NOR flash single-chip memory
  - 16M Bytes SPI flash
  - 8K Bytes AT24C64 I2C EEPROM
Ethernet:
  - 4x1G + 4x1G/2.5G Ethernet controllers
  - 2xRGMII + 1xMII, three VSC8641 PHYs on board
  - Suport multiple Vitesse VSC8234 SGMII Cards in Slot1/2/3
PCIe: Two PCI Express 2.0 controllers/ports
USB:  Two USB2.0, USB1(TYPE-A) and USB2(TYPE-AB) on board
I2C:  Four I2C controllers
UART: Supports up to four UARTs
RapidIO: Supports two serial RapidIO ports
Signed-off-by: NShengzhou Liu <Shengzhou.Liu@freescale.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

96cc017c

powerpc/83xx: Add shutdown request support to MCU handling on MPC8349 MITX · 6ca6ca5d

由 Fabio Baltieri 提交于 8月 15, 2011

This patch add support for calling ctrl_alt_del() when the power button is
pressed for more than about 2 seconds on some freescale MPC83xx evaluation
boards and reference design.

The code uses a kthread to poll the CTRL_BTN bit each second.

Also change Kconfig entry of the driver to bool, as device's gpio
registration is broken when loading as module.

Tested on an MPC8315E RDB board.
Signed-off-by: NFabio Baltieri <fabio.baltieri@gmail.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

6ca6ca5d

powerpc/85xx: Make kexec to interate over online cpus · 43a327b7

由 Matthew McClintock 提交于 10月 25, 2011

This is not strictly required, because this iterates over logical
cpus and they are not (currently) discontigous. But, it's cleaner
code and more obvious what is going on
Signed-off-by: NMatthew McClintock <msm@freescale.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

43a327b7

powerpc/fsl_booke: Fix comment in head_fsl_booke.S · 7d0d3ad5

由 Matthew McClintock 提交于 10月 25, 2011

Fix typo in comments introduced by:

commit 6dece0eb
Author: Scott Wood <scottwood@freescale.com>
Date:   Mon Jul 25 11:29:33 2011 +0000

    powerpc/32: Pass device tree address as u64 to machine_init
Signed-off-by: NMatthew McClintock <msm@freescale.com>
cc: Scott Wood <scottwood@freescale.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

7d0d3ad5

powerpc/85xx: issue 15 EOI after core reset for FSL CoreNet devices · 44f16fcf

由 Matthew McClintock 提交于 10月 26, 2011

This is listed as a requirement for Freescale CoreNet based devices  (e.g
p4080ds with MPIC v4.x) after issuing a core reset to properly clear pending
interrupts.
Signed-off-by: NMatthew McClintock <msm@freescale.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

44f16fcf

powerpc/86xx: Correct Gianfar support for GE boards · 62f3de91

由 Martyn Welch 提交于 11月 03, 2011

The GE DTBs were not updated when the Gianfar driver was converted to an
of_platform_driver in commit b31a1d8b. Update
the DTBs, adding the required TBI entries.
Signed-off-by: NMartyn Welch <martyn.welch@ge.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

62f3de91

03 11月, 2011 8 次提交

arch/powerpc/sysdev/fsl_rio.c: release rapidio port I/O region resource if... · e80dd9a7

由 Liu Gang 提交于 11月 02, 2011

arch/powerpc/sysdev/fsl_rio.c: release rapidio port I/O region resource if port failed to initialize

The "struct rio_mport" contains a member of master port I/O memory
resource structure "struct resource iores".  This resource will be read
from device tree and be used for rapidio R/W transaction memory space.
Rapidio requests the port I/O memory resource under the root resource
"iomem_resource".

			struct rio_mport *port;
			port = kzalloc(sizeof(struct rio_mport), GFP_KERNEL);

			request_resource(&iomem_resource, &port->iores);

When port failed to initialize, allocated "rio_mport" structure memory
will be freed, and the port I/O memory resource structure pointer
"&port->iores" will be invalid.  If other requests resource under
"iomem_resource", "&port->iores" node may be operated in the child
resources list and this will cause the system to crash.

So the requested port I/O memory resource should be released before
freeing allocated "rio_mport" structure.
Signed-off-by: NLiu Gang <Gang.Liu@freescale.com>
Acked-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e80dd9a7

thp: share get_huge_page_tail() · b35a35b5

由 Andrea Arcangeli 提交于 11月 02, 2011

This avoids duplicating the function in every arch gup_fast.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b35a35b5

powerpc: gup_huge_pmd() return 0 if pte changes · cf592bf7

由 Andrea Arcangeli 提交于 11月 02, 2011

powerpc didn't return 0 in that case, if it's rolling back the *nr pointer
it should also return zero to avoid adding pages to the array at the wrong
offset.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cf592bf7

powerpc: gup_hugepte() support THP based tail recounting · 3526741f

由 Andrea Arcangeli 提交于 11月 02, 2011

Up to this point the code assumed old refcounting for hugepages (pre-thp).
This updates the code directly to the thp mapcount tail page refcounting.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3526741f

powerpc: gup_hugepte() avoid freeing the head page too many times · 85964684

由 Andrea Arcangeli 提交于 11月 02, 2011

We only taken "refs" pins on the head page not "*nr" pins.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

85964684

powerpc: get_hugepte() don't put_page() the wrong page · 405e44f2

由 Andrea Arcangeli 提交于 11月 02, 2011

"page" may have changed to point to the next hugepage after the loop
completed, The references have been taken on the head page, so the
put_page must happen there too.

This is a longstanding issue pre-thp inclusion.

It's totally unclear how these page_cache_add_speculative and
pte_val(pte) != pte_val(*ptep) checks are necessary across all the
powerpc gup_fast code, when x86 doesn't need any of that: there's no way
the page can be freed with irq disabled so we're guaranteed the
atomic_inc will happen on a page with page_count > 0 (so not needing the
speculative check).

The pte check is also meaningless on x86: no need to rollback on x86 if
the pte changed, because the pte can still change a CPU tick after the
check succeeded and it won't be rolled back in that case.  The important
thing is we got a reference on a valid page that was mapped there a CPU
tick ago.  So not knowing the soft tlb refill code of ppc64 in great
detail I'm not removing the "speculative" page_count increase and the
pte checks across all the code, but unless there's a strong reason for
it they should be later cleaned up too.

If a pte can change from huge to non-huge (like it could happen with
THP) passing a pte_t *ptep to gup_hugepte() would also require to repeat
the is_hugepd in gup_hugepte(), but that shouldn't happen with hugetlbfs
only so I'm not altering that.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

405e44f2

powerpc: remove superfluous PageTail checks on the pte gup_fast · 2839bdc1

由 Andrea Arcangeli 提交于 11月 02, 2011

This part of gup_fast doesn't seem capable of handling hugetlbfs ptes,
those should be handled by gup_hugepd only, so these checks are
superfluous.

Plus if this wasn't a noop, it would have oopsed because, the insistence
of using the speculative refcounting would trigger a VM_BUG_ON if a tail
page was encountered in the page_cache_get_speculative().
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2839bdc1

mm: thp: tail page refcounting fix · 70b50f94

由 Andrea Arcangeli 提交于 11月 02, 2011

Michel while working on the working set estimation code, noticed that
calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
wasn't safe, if the pfn ended up being a tail page of a transparent
hugepage under splitting by __split_huge_page_refcount().

He then found the problem could also theoretically materialize with
page_cache_get_speculative() during the speculative radix tree lookups
that uses get_page_unless_zero() in SMP if the radix tree page is freed
and reallocated and get_user_pages is called on it before
page_cache_get_speculative has a chance to call get_page_unless_zero().

So the best way to fix the problem is to keep page_tail->_count zero at
all times.  This will guarantee that get_page_unless_zero() can never
succeed on any tail page.  page_tail->_mapcount is guaranteed zero and
is unused for all tail pages of a compound page, so we can simply
account the tail page references there and transfer them to
tail_page->_count in __split_huge_page_refcount() (in addition to the
head_page->_mapcount).

While debugging this s/_count/_mapcount/ change I also noticed get_page is
called by direct-io.c on pages returned by get_user_pages.  That wasn't
entirely safe because the two atomic_inc in get_page weren't atomic.  As
opposed to other get_user_page users like secondary-MMU page fault to
establish the shadow pagetables would never call any superflous get_page
after get_user_page returns.  It's safer to make get_page universally safe
for tail pages and to use get_page_foll() within follow_page (inside
get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
pages without taking any locks because it is run within PT lock protected
critical sections (PT lock for pte and page_table_lock for
pmd_trans_huge).

The standard get_page() as invoked by direct-io instead will now take
the compound_lock but still only for tail pages.  The direct-io paths
are usually I/O bound and the compound_lock is per THP so very
finegrined, so there's no risk of scalability issues with it.  A simple
direct-io benchmarks with all lockdep prove locking and spinlock
debugging infrastructure enabled shows identical performance and no
overhead.  So it's worth it.  Ideally direct-io should stop calling
get_page() on pages returned by get_user_pages().  The spinlock in
get_page() is already optimized away for no-THP builds but doing
get_page() on tail pages returned by GUP is generally a rare operation
and usually only run in I/O paths.

This new refcounting on page_tail->_mapcount in addition to avoiding new
RCU critical sections will also allow the working set estimation code to
work without any further complexity associated to the tail page
refcounting with THP.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Reported-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

70b50f94

01 11月, 2011 11 次提交

Cross Memory Attach · fcf63409

由 Christopher Yeoh 提交于 10月 31, 2011

The basic idea behind cross memory attach is to allow MPI programs doing
intra-node communication to do a single copy of the message rather than a
double copy of the message via shared memory.

The following patch attempts to achieve this by allowing a destination
process, given an address and size from a source process, to copy memory
directly from the source process into its own address space via a system
call.  There is also a symmetrical ability to copy from the current
process's address space into a destination process's address space.

- Use of /proc/pid/mem has been considered, but there are issues with
  using it:
  - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
  written to would need to be contiguous.
  - Currently mem_read allows only processes who are currently
  ptrace'ing the target and are still able to ptrace the target to read
  from the target. This check could possibly be moved to the open call,
  but its not clear exactly what race this restriction is stopping
  (reason  appears to have been lost)
  - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
  domain socket is a bit ugly from a userspace point of view,
  especially when you may have hundreds if not (eventually) thousands
  of processes  that all need to do this with each other
  - Doesn't allow for some future use of the interface we would like to
  consider adding in the future (see below)
  - Interestingly reading from /proc/pid/mem currently actually
  involves two copies! (But this could be fixed pretty easily)

As mentioned previously use of vmsplice instead was considered, but has
problems.  Since you need the reader and writer working co-operatively if
the pipe is not drained then you block.  Which requires some wrapping to
do non blocking on the send side or polling on the receive.  In all to all
communication it requires ordering otherwise you can deadlock.  And in the
example of many MPI tasks writing to one MPI task vmsplice serialises the
copying.

There are some cases of MPI collectives where even a single copy interface
does not get us the performance gain we could.  For example in an
MPI_Reduce rather than copy the data from the source we would like to
instead use it directly in a mathops (say the reduce is doing a sum) as
this would save us doing a copy.  We don't need to keep a copy of the data
from the source.  I haven't implemented this, but I think this interface
could in the future do all this through the use of the flags - eg could
specify the math operation and type and the kernel rather than just
copying the data would apply the specified operation between the source
and destination and store it in the destination.

Although we don't have a "second user" of the interface (though I've had
some nibbles from people who may be interested in using it for intra
process messaging which is not MPI).  This interface is something which
hardware vendors are already doing for their custom drivers to implement
fast local communication.  And so in addition to this being useful for
OpenMPI it would mean the driver maintainers don't have to fix things up
when the mm changes.

There was some discussion about how much faster a true zero copy would
go. Here's a link back to the email with some testing I did on that:

http://marc.info/?l=linux-mm&m=130105930902915&w=2

There is a basic man page for the proposed interface here:

http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

This has been implemented for x86 and powerpc, other architecture should
mainly (I think) just need to add syscall numbers for the process_vm_readv
and process_vm_writev. There are 32 bit compatibility versions for
64-bit kernels.

For arch maintainers there are some simple tests to be able to quickly
verify that the syscalls are working correctly here:

http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgzSigned-off-by: NChris Yeoh <yeohc@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <linux-man@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fcf63409

powerpc: remove non-required uses of include <linux/module.h> · ead53f22

由 Paul Gortmaker 提交于 7月 22, 2011

None of the files touched here are modules, and they are not
exporting any symbols either -- so there is no need to be including
the module.h.  Builds of all the files remains successful.

Even kernel/module.c does not need to include it, since it includes
linux/moduleloader.h instead.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

ead53f22

powerpc: various straight conversions from module.h --> export.h · 4b16f8e2

由 Paul Gortmaker 提交于 7月 22, 2011

All these files were including module.h just for the basic
EXPORT_SYMBOL infrastructure.  We can shift them off to the
export.h header which is a way smaller footprint and thus
realize some compile time gains.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

4b16f8e2

powerpc: convert hvconsole.c to export.h ; fix implicit use of errno.h · e9848d62

由 Paul Gortmaker 提交于 7月 22, 2011

This file is only exporting symbols and so should use export.h
and not module.h header. But in doing the conversion, we will
uncover that it was implicitly using errno.h via module.h:

CC arch/powerpc/platforms/pseries/hvconsole.o
arch/powerpc/platforms/pseries/hvconsole.c: In function 'hvc_put_chars':
arch/powerpc/platforms/pseries/hvconsole.c:77: error: 'EIO' undeclared (first use in this function)
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

e9848d62

powerpc: fix two implicit header uses in pseries/plpar_wrappers.h · 614f15b4

由 Paul Gortmaker 提交于 7月 22, 2011

Removing the implicit presence of module.h from almost everywhere
will reveal this implicit usage of paca.h and string.h headers as
follows:

arch/powerpc/platforms/pseries/plpar_wrappers.h:22: error: implicit declaration of function 'get_lppaca'
arch/powerpc/platforms/pseries/plpar_wrappers.h:208: error: implicit declaration of function 'memcpy'
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

614f15b4

powerpc: fix implicit use of mutex.h by include/asm/spu.h · e415372a

由 Paul Gortmaker 提交于 7月 22, 2011

We've been getting the header implicitly via module.h in the past
but when we clean that up, we'll get this failure:

CC arch/powerpc/platforms/cell/beat_spu_priv1.o
In file included from arch/powerpc/platforms/cell/beat_spu_priv1.c:22:
arch/powerpc/include/asm/spu.h:190: error: field 'list_mutex' has incomplete type
make[2]: *** [arch/powerpc/platforms/cell/beat_spu_priv1.o] Error 1
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

e415372a

powerpc: fix implicit use of cache.h in kernel/firmware.c · cab2e052

由 Paul Gortmaker 提交于 7月 22, 2011

This file only needs export.h to get EXPORT_SYMBOL, but in doing
so, it uncovers an implicit use of linux/cache.h as follows:

CC arch/powerpc/kernel/firmware.o
arch/powerpc/kernel/firmware.c:20: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__read_mostly'
arch/powerpc/kernel/firmware.c:21: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__used'
make[2]: *** [arch/powerpc/kernel/firmware.o] Error 1
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

cab2e052

powerpc: fix implicit notifier use in converting to export.h · 2a7156b9

由 Paul Gortmaker 提交于 7月 22, 2011

We can convert this file to using export.h since it only wants
to export symbols, but when we do we'll see also that it was
implicitly getting notifier.h from module.h via this failure:

CC arch/powerpc/platforms/cell/spu_notify.o
arch/powerpc/platforms/cell/spu_notify.c:28: warning: type defaults to 'int' in declaration of 'BLOCKING_NOTIFIER_HEAD'
arch/powerpc/platforms/cell/spu_notify.c:28: warning: parameter names (without types) in function declaration
arch/powerpc/platforms/cell/spu_notify.c: In function 'spu_switch_notify':
arch/powerpc/platforms/cell/spu_notify.c:32: error: implicit declaration of function 'blocking_notifier_call_chain'
arch/powerpc/platforms/cell/spu_notify.c:32: error: 'spu_switch_notifier' undeclared (first use in this function)
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

2a7156b9

powerpc: cell/beat_wrapper.h is implicitly using memcpy functions · 08f1e55c

由 Paul Gortmaker 提交于 7月 22, 2011

This has been relying on the fact that the parent file would have
module.h (and thus nearly everything) present. But once we fix that,
we'll get stuck with this failure:

In file included from arch/powerpc/platforms/cell/beat_spu_priv1.c:26:
arch/powerpc/platforms/cell/beat_wrapper.h: In function 'beat_eeprom_write':
arch/powerpc/platforms/cell/beat_wrapper.h:160: error: implicit declaration of function 'memcpy'

and many more instances of the same. Fix it in advance.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

08f1e55c

powerpc: Fix up implicit sched.h users · 62fe91bb

由 Paul Gortmaker 提交于 5月 27, 2011

They are getting it through device.h --> module.h path, but we want
to clean that up. This is a sample of what will happen if we don't:

pseries/iommu.c: In function 'tce_build_pSeriesLP':
pseries/iommu.c:136: error: implicit declaration of function 'show_stack'

pseries/eeh.c: In function 'eeh_token_to_phys':
pseries/eeh.c:359: error: 'init_mm' undeclared (first use in this function)

pseries/eeh_event.c: In function 'eeh_event_handler':
pseries/eeh_event.c:63: error: implicit declaration of function 'daemonize'
pseries/eeh_event.c:64: error: implicit declaration of function 'set_current_state'
pseries/eeh_event.c:64: error: 'TASK_INTERRUPTIBLE' undeclared (first use in this function)
pseries/eeh_event.c:64: error: (Each undeclared identifier is reported only once
pseries/eeh_event.c:64: error: for each function it appears in.)
pseries/eeh_event.c: In function 'eeh_thread_launcher':
pseries/eeh_event.c:109: error: 'CLONE_KERNEL' undeclared (first use in this function)

hotplug-cpu.c: In function 'pseries_mach_cpu_die':
hotplug-cpu.c:115: error: implicit declaration of function 'idle_task_exit'

kernel/swsusp_64.c: In function 'do_after_copyback':
kernel/swsusp_64.c:17: error: implicit declaration of function 'touch_softlockup_watchdog'

cell/spufs/context.c: In function 'alloc_spu_context':
cell/spufs/context.c:60: error: implicit declaration of function 'get_task_mm'
cell/spufs/context.c:60: warning: assignment makes pointer from integer without a cast
cell/spufs/context.c: In function 'spu_forget':
cell/spufs/context.c:127: error: implicit declaration of function 'mmput'

pasemi/dma_lib.c: In function 'pasemi_dma_stop_chan':
pasemi/dma_lib.c:332: error: implicit declaration of function 'cond_resched'

sysdev/fsl_lbc.c: In function 'fsl_lbc_ctrl_irq':
sysdev/fsl_lbc.c:247: error: 'TASK_NORMAL' undeclared (first use in this function)

Add in sched.h so these get the definitions they are looking for.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

62fe91bb

powerpc: Fix up implicit stat.h users · b56eade5

由 Paul Gortmaker 提交于 5月 27, 2011

They get it via module.h (via device.h) but we want to clean that up.
When we do, we'll get things like:

ibmebus.c:314: error: 'S_IWUSR' undeclared here (not in a function)
vio.c:972: error: 'S_IWUSR' undeclared here (not in a function)

so add in the stat header it is using explicitly in advance.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

b56eade5