- 16 11月, 2011 6 次提交
-
-
由 Kevin Hao 提交于
The trace_hardirqs_off will use CALLER_ADDR0 and CALLER_ADDR1. If an exception occurs in user mode, there is only one stack frame on the stack and accessing the CALLER_ADDR1 will causes the following call trace. So we create a dummy stack frame to make trace_hardirqs_off happy. WARNING: at kernel/smp.c:459 Modules linked in: NIP: c0093280 LR: c00930a0 CTR: c0010780 REGS: edb87ae0 TRAP: 0700 Not tainted (3.1.0) MSR: 00021002 <ME,CE> CR: 28002888 XER: 00000000 TASK = edce2ac0[17658] 'mthread-lock-on' THREAD: edb86000 CPU: 5 GPR00: 00000001 edb87b90 edce2ac0 00000005 c0019594 edb87bd8 00000001 00000fe3 GPR08: 00041000 c084138c 4e20120d edb87b90 48002888 1001aa7c 00000000 00000000 GPR16: 48830000 10012a8c 00000000 10000af4 00000001 c0810000 00000000 00000000 GPR24: ee9aa920 c0816a18 00000000 00000005 c0019594 edb87bd8 ee20178c edb87b90 NIP [c0093280] smp_call_function_many+0x214/0x2b4 LR [c00930a0] smp_call_function_many+0x34/0x2b4 Call Trace: [edb87b90] [c00930a0] smp_call_function_many+0x34/0x2b4 (unreliable) [edb87bd0] [c00194ec] __flush_tlb_page+0xac/0x100 [edb87c00] [c001957c] flush_tlb_page+0x3c/0x54 [edb87c10] [c00180ac] ptep_set_access_flags+0x74/0x12c [edb87c40] [c0128068] handle_pte_fault+0x2f0/0x9ac [edb87cb0] [c0128c3c] handle_mm_fault+0x104/0x1dc [edb87ce0] [c05f40f4] do_page_fault+0x2dc/0x630 [edb87e50] [c001078c] handle_page_fault+0xc/0x80 Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
kdump fails because we try to execute an HV only instruction. Feature fixups are being applied after we copy the exception vectors down to 0 so they miss out on any updates. We have always had this issue but it only became critical in v3.0 when we added CFAR support (breaks POWER5) and v3.1 when we added POWERNV (breaks everyone). Signed-off-by: NAnton Blanchard <anton@samba.org> Cc: <stable@kernel.org> [v3.0+] Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
I had to debug a strange situation where all manner of things were failing. SMT threads, storage and network were all completely broken. The root cause was we couldn't find enough memory to instantiate RTAS - this was a network install so the initrd was huge. Instead of limping along and failing in mysterious ways we should just panic up front if RTAS exists and we can't allocate space for it. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Suzuki Poulose 提交于
Kexec is not supported on 47x. 47x is a variant of 44x with slightly different MMU and SMP support. There was a typo in the config dependency for kexec. This patch fixes the same. Signed-off-by: NSuzuki K. Poulose <suzuki@in.ibm.com> Signed-off-by: NPaul Bolle <pebolle@tiscali.nl> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linux ppc dev <linuxppc-dev@lists.ozlabs.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Al Viro 提交于
Should do what other architectures do and wrap all that code into the appropriate ifdef Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 15 11月, 2011 1 次提交
-
-
由 Liu Gang 提交于
The "#include <linux/module.h>" was replaced by "#include <linux/export.h>" in the patch "powerpc: various straight conversions from module.h --> export.h". This will cause the following compile problem: arch/powerpc/sysdev/fsl_rio.c: In function 'fsl_rio_mcheck_exception': arch/powerpc/sysdev/fsl_rio.c:296: error: implicit declaration of function 'search_exception_tables'. The file fsl_rio.c needs the declaration of function "search_exception_tables" in the header file "linux/module.h". Signed-off-by: NLiu Gang <Gang.Liu@freescale.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 08 11月, 2011 8 次提交
-
-
由 Alexander Graf 提交于
When running with HV KVM and CBE config options enabled, I get build failures like the following: arch/powerpc/kernel/head_64.o: In function `cbe_system_error_hv': (.text+0x1228): undefined reference to `do_kvm_0x1202' arch/powerpc/kernel/head_64.o: In function `cbe_maintenance_hv': (.text+0x1628): undefined reference to `do_kvm_0x1602' arch/powerpc/kernel/head_64.o: In function `cbe_thermal_hv': (.text+0x1828): undefined reference to `do_kvm_0x1802' This is because we jump to a KVM handler when HV is enabled, but we only generate the handler with PR KVM mode. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Geoff Levand 提交于
The lv1_gpu_attribute hcall takes three, not five input arguments. Adjust the lv1 hcall table and all calls. Signed-off-by: NGeoff Levand <geoff@infradead.org> CC: Takashi Iwai <tiwai@suse.de> Acked-by: NTakashi Iwai <tiwai@suse.de> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Geoff Levand 提交于
Fix uninitialized variable warnings in build of repository.c Signed-off-by: NGeoff Levand <geoffrey.levand@am.sony.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Yong Zhang 提交于
Since commit [e58aa3d2: genirq: Run irq handlers with interrupts disabled], We run all interrupt handlers with interrupts disabled and we even check and yell when an interrupt handler returns with interrupts enabled (see commit [b738a50a: genirq: Warn when handler enables interrupts]). So now this flag is a NOOP and can be removed. Signed-off-by: NYong Zhang <yong.zhang0@gmail.com> Acked-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NGeoff Levand <geoff@infradead.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Dipankar Sarma 提交于
This patch adds support for numa topology on powernv platforms running OPAL formware. It checks for the type of platform at run time and sets the affinity form correctly so that NUMA topology can be discovered correctly. Signed-off-by: NDipankar Sarma <dipankar@in.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
We've resisted adding System RAM to /proc/iomem because it is the wrong place for it. Unfortunately we continue to find tools that rely on this behaviour so give up and add it in. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Neuling 提交于
Add HV mode KVM to Book3 server 64bit defconfigs as a module. Doesn't add much to the size: text data bss dec hex filename 8244109 4686767 994000 13924876 d47a0c vmlinux.vanilla 8256092 4691607 994128 13941827 d4bc43 vmlinux.kvm This should enable more testing of this configuration. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Nishanth Aravamudan 提交于
Fix KVM build for older toolchains (found with .powerpc64-unknown-linux-gnu-gcc (crosstool-NG-1.8.1) 4.3.2): AS arch/powerpc/kvm/book3s_hv_rmhandlers.o arch/powerpc/kvm/book3s_hv_rmhandlers.S: Assembler messages: arch/powerpc/kvm/book3s_hv_rmhandlers.S:1388: Error: Unrecognized opcode: `popcntw' make[1]: *** [arch/powerpc/kvm/book3s_hv_rmhandlers.o] Error 1 make: *** [_module_arch/powerpc/kvm] Error 2 Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 04 11月, 2011 6 次提交
-
-
由 Shengzhou Liu 提交于
The P3060QDS is a Freescale reference board that hosts the six-core P3060 SOC. The P3060 Processor combines six e500mc Power Architecture processor cores with high-performance datapath acceleration architecture(DPAA), CoreNet fabric infrastructure, as well as network and peripheral interfaces. P3060QDS Board Overview: Memory subsystem: - 2G Bytes unbuffered DDR3 SDRAM SO-DIMM(64bit bus) - 128M Bytes NOR flash single-chip memory - 16M Bytes SPI flash - 8K Bytes AT24C64 I2C EEPROM Ethernet: - 4x1G + 4x1G/2.5G Ethernet controllers - 2xRGMII + 1xMII, three VSC8641 PHYs on board - Suport multiple Vitesse VSC8234 SGMII Cards in Slot1/2/3 PCIe: Two PCI Express 2.0 controllers/ports USB: Two USB2.0, USB1(TYPE-A) and USB2(TYPE-AB) on board I2C: Four I2C controllers UART: Supports up to four UARTs RapidIO: Supports two serial RapidIO ports Signed-off-by: NShengzhou Liu <Shengzhou.Liu@freescale.com> Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
-
由 Fabio Baltieri 提交于
This patch add support for calling ctrl_alt_del() when the power button is pressed for more than about 2 seconds on some freescale MPC83xx evaluation boards and reference design. The code uses a kthread to poll the CTRL_BTN bit each second. Also change Kconfig entry of the driver to bool, as device's gpio registration is broken when loading as module. Tested on an MPC8315E RDB board. Signed-off-by: NFabio Baltieri <fabio.baltieri@gmail.com> Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
-
由 Matthew McClintock 提交于
This is not strictly required, because this iterates over logical cpus and they are not (currently) discontigous. But, it's cleaner code and more obvious what is going on Signed-off-by: NMatthew McClintock <msm@freescale.com> Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
-
由 Matthew McClintock 提交于
Fix typo in comments introduced by: commit 6dece0eb Author: Scott Wood <scottwood@freescale.com> Date: Mon Jul 25 11:29:33 2011 +0000 powerpc/32: Pass device tree address as u64 to machine_init Signed-off-by: NMatthew McClintock <msm@freescale.com> cc: Scott Wood <scottwood@freescale.com> Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
-
由 Matthew McClintock 提交于
This is listed as a requirement for Freescale CoreNet based devices (e.g p4080ds with MPIC v4.x) after issuing a core reset to properly clear pending interrupts. Signed-off-by: NMatthew McClintock <msm@freescale.com> Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
-
由 Martyn Welch 提交于
The GE DTBs were not updated when the Gianfar driver was converted to an of_platform_driver in commit b31a1d8b. Update the DTBs, adding the required TBI entries. Signed-off-by: NMartyn Welch <martyn.welch@ge.com> Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
-
- 03 11月, 2011 8 次提交
-
-
由 Liu Gang 提交于
arch/powerpc/sysdev/fsl_rio.c: release rapidio port I/O region resource if port failed to initialize The "struct rio_mport" contains a member of master port I/O memory resource structure "struct resource iores". This resource will be read from device tree and be used for rapidio R/W transaction memory space. Rapidio requests the port I/O memory resource under the root resource "iomem_resource". struct rio_mport *port; port = kzalloc(sizeof(struct rio_mport), GFP_KERNEL); request_resource(&iomem_resource, &port->iores); When port failed to initialize, allocated "rio_mport" structure memory will be freed, and the port I/O memory resource structure pointer "&port->iores" will be invalid. If other requests resource under "iomem_resource", "&port->iores" node may be operated in the child resources list and this will cause the system to crash. So the requested port I/O memory resource should be released before freeing allocated "rio_mport" structure. Signed-off-by: NLiu Gang <Gang.Liu@freescale.com> Acked-by: NAlexandre Bounine <alexandre.bounine@idt.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
This avoids duplicating the function in every arch gup_fast. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
powerpc didn't return 0 in that case, if it's rolling back the *nr pointer it should also return zero to avoid adding pages to the array at the wrong offset. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: NDavid Gibson <david@gibson.dropbear.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
Up to this point the code assumed old refcounting for hugepages (pre-thp). This updates the code directly to the thp mapcount tail page refcounting. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
We only taken "refs" pins on the head page not "*nr" pins. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
"page" may have changed to point to the next hugepage after the loop completed, The references have been taken on the head page, so the put_page must happen there too. This is a longstanding issue pre-thp inclusion. It's totally unclear how these page_cache_add_speculative and pte_val(pte) != pte_val(*ptep) checks are necessary across all the powerpc gup_fast code, when x86 doesn't need any of that: there's no way the page can be freed with irq disabled so we're guaranteed the atomic_inc will happen on a page with page_count > 0 (so not needing the speculative check). The pte check is also meaningless on x86: no need to rollback on x86 if the pte changed, because the pte can still change a CPU tick after the check succeeded and it won't be rolled back in that case. The important thing is we got a reference on a valid page that was mapped there a CPU tick ago. So not knowing the soft tlb refill code of ppc64 in great detail I'm not removing the "speculative" page_count increase and the pte checks across all the code, but unless there's a strong reason for it they should be later cleaned up too. If a pte can change from huge to non-huge (like it could happen with THP) passing a pte_t *ptep to gup_hugepte() would also require to repeat the is_hugepd in gup_hugepte(), but that shouldn't happen with hugetlbfs only so I'm not altering that. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
This part of gup_fast doesn't seem capable of handling hugetlbfs ptes, those should be handled by gup_hugepd only, so these checks are superfluous. Plus if this wasn't a noop, it would have oopsed because, the insistence of using the speculative refcounting would trigger a VM_BUG_ON if a tail page was encountered in the page_cache_get_speculative(). Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
Michel while working on the working set estimation code, noticed that calling get_page_unless_zero() on a random pfn_to_page(random_pfn) wasn't safe, if the pfn ended up being a tail page of a transparent hugepage under splitting by __split_huge_page_refcount(). He then found the problem could also theoretically materialize with page_cache_get_speculative() during the speculative radix tree lookups that uses get_page_unless_zero() in SMP if the radix tree page is freed and reallocated and get_user_pages is called on it before page_cache_get_speculative has a chance to call get_page_unless_zero(). So the best way to fix the problem is to keep page_tail->_count zero at all times. This will guarantee that get_page_unless_zero() can never succeed on any tail page. page_tail->_mapcount is guaranteed zero and is unused for all tail pages of a compound page, so we can simply account the tail page references there and transfer them to tail_page->_count in __split_huge_page_refcount() (in addition to the head_page->_mapcount). While debugging this s/_count/_mapcount/ change I also noticed get_page is called by direct-io.c on pages returned by get_user_pages. That wasn't entirely safe because the two atomic_inc in get_page weren't atomic. As opposed to other get_user_page users like secondary-MMU page fault to establish the shadow pagetables would never call any superflous get_page after get_user_page returns. It's safer to make get_page universally safe for tail pages and to use get_page_foll() within follow_page (inside get_user_pages()). get_page_foll() is safe to do the refcounting for tail pages without taking any locks because it is run within PT lock protected critical sections (PT lock for pte and page_table_lock for pmd_trans_huge). The standard get_page() as invoked by direct-io instead will now take the compound_lock but still only for tail pages. The direct-io paths are usually I/O bound and the compound_lock is per THP so very finegrined, so there's no risk of scalability issues with it. A simple direct-io benchmarks with all lockdep prove locking and spinlock debugging infrastructure enabled shows identical performance and no overhead. So it's worth it. Ideally direct-io should stop calling get_page() on pages returned by get_user_pages(). The spinlock in get_page() is already optimized away for no-THP builds but doing get_page() on tail pages returned by GUP is generally a rare operation and usually only run in I/O paths. This new refcounting on page_tail->_mapcount in addition to avoiding new RCU critical sections will also allow the working set estimation code to work without any further complexity associated to the tail page refcounting with THP. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Reported-by: NMichel Lespinasse <walken@google.com> Reviewed-by: NMichel Lespinasse <walken@google.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 11月, 2011 11 次提交
-
-
由 Christopher Yeoh 提交于
The basic idea behind cross memory attach is to allow MPI programs doing intra-node communication to do a single copy of the message rather than a double copy of the message via shared memory. The following patch attempts to achieve this by allowing a destination process, given an address and size from a source process, to copy memory directly from the source process into its own address space via a system call. There is also a symmetrical ability to copy from the current process's address space into a destination process's address space. - Use of /proc/pid/mem has been considered, but there are issues with using it: - Does not allow for specifying iovecs for both src and dest, assuming preadv or pwritev was implemented either the area read from or written to would need to be contiguous. - Currently mem_read allows only processes who are currently ptrace'ing the target and are still able to ptrace the target to read from the target. This check could possibly be moved to the open call, but its not clear exactly what race this restriction is stopping (reason appears to have been lost) - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix domain socket is a bit ugly from a userspace point of view, especially when you may have hundreds if not (eventually) thousands of processes that all need to do this with each other - Doesn't allow for some future use of the interface we would like to consider adding in the future (see below) - Interestingly reading from /proc/pid/mem currently actually involves two copies! (But this could be fixed pretty easily) As mentioned previously use of vmsplice instead was considered, but has problems. Since you need the reader and writer working co-operatively if the pipe is not drained then you block. Which requires some wrapping to do non blocking on the send side or polling on the receive. In all to all communication it requires ordering otherwise you can deadlock. And in the example of many MPI tasks writing to one MPI task vmsplice serialises the copying. There are some cases of MPI collectives where even a single copy interface does not get us the performance gain we could. For example in an MPI_Reduce rather than copy the data from the source we would like to instead use it directly in a mathops (say the reduce is doing a sum) as this would save us doing a copy. We don't need to keep a copy of the data from the source. I haven't implemented this, but I think this interface could in the future do all this through the use of the flags - eg could specify the math operation and type and the kernel rather than just copying the data would apply the specified operation between the source and destination and store it in the destination. Although we don't have a "second user" of the interface (though I've had some nibbles from people who may be interested in using it for intra process messaging which is not MPI). This interface is something which hardware vendors are already doing for their custom drivers to implement fast local communication. And so in addition to this being useful for OpenMPI it would mean the driver maintainers don't have to fix things up when the mm changes. There was some discussion about how much faster a true zero copy would go. Here's a link back to the email with some testing I did on that: http://marc.info/?l=linux-mm&m=130105930902915&w=2 There is a basic man page for the proposed interface here: http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt This has been implemented for x86 and powerpc, other architecture should mainly (I think) just need to add syscall numbers for the process_vm_readv and process_vm_writev. There are 32 bit compatibility versions for 64-bit kernels. For arch maintainers there are some simple tests to be able to quickly verify that the syscalls are working correctly here: http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgzSigned-off-by: NChris Yeoh <yeohc@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: <linux-man@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Gortmaker 提交于
None of the files touched here are modules, and they are not exporting any symbols either -- so there is no need to be including the module.h. Builds of all the files remains successful. Even kernel/module.c does not need to include it, since it includes linux/moduleloader.h instead. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Paul Gortmaker 提交于
All these files were including module.h just for the basic EXPORT_SYMBOL infrastructure. We can shift them off to the export.h header which is a way smaller footprint and thus realize some compile time gains. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Paul Gortmaker 提交于
This file is only exporting symbols and so should use export.h and not module.h header. But in doing the conversion, we will uncover that it was implicitly using errno.h via module.h: CC arch/powerpc/platforms/pseries/hvconsole.o arch/powerpc/platforms/pseries/hvconsole.c: In function 'hvc_put_chars': arch/powerpc/platforms/pseries/hvconsole.c:77: error: 'EIO' undeclared (first use in this function) Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Paul Gortmaker 提交于
Removing the implicit presence of module.h from almost everywhere will reveal this implicit usage of paca.h and string.h headers as follows: arch/powerpc/platforms/pseries/plpar_wrappers.h:22: error: implicit declaration of function 'get_lppaca' arch/powerpc/platforms/pseries/plpar_wrappers.h:208: error: implicit declaration of function 'memcpy' Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Paul Gortmaker 提交于
We've been getting the header implicitly via module.h in the past but when we clean that up, we'll get this failure: CC arch/powerpc/platforms/cell/beat_spu_priv1.o In file included from arch/powerpc/platforms/cell/beat_spu_priv1.c:22: arch/powerpc/include/asm/spu.h:190: error: field 'list_mutex' has incomplete type make[2]: *** [arch/powerpc/platforms/cell/beat_spu_priv1.o] Error 1 Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Paul Gortmaker 提交于
This file only needs export.h to get EXPORT_SYMBOL, but in doing so, it uncovers an implicit use of linux/cache.h as follows: CC arch/powerpc/kernel/firmware.o arch/powerpc/kernel/firmware.c:20: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__read_mostly' arch/powerpc/kernel/firmware.c:21: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__used' make[2]: *** [arch/powerpc/kernel/firmware.o] Error 1 Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Paul Gortmaker 提交于
We can convert this file to using export.h since it only wants to export symbols, but when we do we'll see also that it was implicitly getting notifier.h from module.h via this failure: CC arch/powerpc/platforms/cell/spu_notify.o arch/powerpc/platforms/cell/spu_notify.c:28: warning: type defaults to 'int' in declaration of 'BLOCKING_NOTIFIER_HEAD' arch/powerpc/platforms/cell/spu_notify.c:28: warning: parameter names (without types) in function declaration arch/powerpc/platforms/cell/spu_notify.c: In function 'spu_switch_notify': arch/powerpc/platforms/cell/spu_notify.c:32: error: implicit declaration of function 'blocking_notifier_call_chain' arch/powerpc/platforms/cell/spu_notify.c:32: error: 'spu_switch_notifier' undeclared (first use in this function) Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Paul Gortmaker 提交于
This has been relying on the fact that the parent file would have module.h (and thus nearly everything) present. But once we fix that, we'll get stuck with this failure: In file included from arch/powerpc/platforms/cell/beat_spu_priv1.c:26: arch/powerpc/platforms/cell/beat_wrapper.h: In function 'beat_eeprom_write': arch/powerpc/platforms/cell/beat_wrapper.h:160: error: implicit declaration of function 'memcpy' and many more instances of the same. Fix it in advance. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Paul Gortmaker 提交于
They are getting it through device.h --> module.h path, but we want to clean that up. This is a sample of what will happen if we don't: pseries/iommu.c: In function 'tce_build_pSeriesLP': pseries/iommu.c:136: error: implicit declaration of function 'show_stack' pseries/eeh.c: In function 'eeh_token_to_phys': pseries/eeh.c:359: error: 'init_mm' undeclared (first use in this function) pseries/eeh_event.c: In function 'eeh_event_handler': pseries/eeh_event.c:63: error: implicit declaration of function 'daemonize' pseries/eeh_event.c:64: error: implicit declaration of function 'set_current_state' pseries/eeh_event.c:64: error: 'TASK_INTERRUPTIBLE' undeclared (first use in this function) pseries/eeh_event.c:64: error: (Each undeclared identifier is reported only once pseries/eeh_event.c:64: error: for each function it appears in.) pseries/eeh_event.c: In function 'eeh_thread_launcher': pseries/eeh_event.c:109: error: 'CLONE_KERNEL' undeclared (first use in this function) hotplug-cpu.c: In function 'pseries_mach_cpu_die': hotplug-cpu.c:115: error: implicit declaration of function 'idle_task_exit' kernel/swsusp_64.c: In function 'do_after_copyback': kernel/swsusp_64.c:17: error: implicit declaration of function 'touch_softlockup_watchdog' cell/spufs/context.c: In function 'alloc_spu_context': cell/spufs/context.c:60: error: implicit declaration of function 'get_task_mm' cell/spufs/context.c:60: warning: assignment makes pointer from integer without a cast cell/spufs/context.c: In function 'spu_forget': cell/spufs/context.c:127: error: implicit declaration of function 'mmput' pasemi/dma_lib.c: In function 'pasemi_dma_stop_chan': pasemi/dma_lib.c:332: error: implicit declaration of function 'cond_resched' sysdev/fsl_lbc.c: In function 'fsl_lbc_ctrl_irq': sysdev/fsl_lbc.c:247: error: 'TASK_NORMAL' undeclared (first use in this function) Add in sched.h so these get the definitions they are looking for. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Paul Gortmaker 提交于
They get it via module.h (via device.h) but we want to clean that up. When we do, we'll get things like: ibmebus.c:314: error: 'S_IWUSR' undeclared here (not in a function) vio.c:972: error: 'S_IWUSR' undeclared here (not in a function) so add in the stat header it is using explicitly in advance. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-