- 17 2月, 2010 16 次提交
-
-
由 Anton Blanchard 提交于
Here is a patch from Paul Mackerras that improves the ppc64 copy_tofrom_user. The loop now does 32 bytes at a time and as well as pairing loads and stores. A quick test case that reads 8kB over and over shows the improvement: POWER6: 53% faster POWER7: 51% faster #define _XOPEN_SOURCE 500 #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> #define BUFSIZE (8 * 1024) #define ITERATIONS 10000000 int main() { char tmpfile[] = "/tmp/copy_to_user_testXXXXXX"; int fd; char *buf[BUFSIZE]; unsigned long i; fd = mkstemp(tmpfile); if (fd < 0) { perror("open"); exit(1); } if (write(fd, buf, BUFSIZE) != BUFSIZE) { perror("open"); exit(1); } for (i = 0; i < 10000000; i++) { if (pread(fd, buf, BUFSIZE, 0) != BUFSIZE) { perror("pread"); exit(1); } } unlink(tmpfile); return 0; } Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
A number of our chips like loads and stores to be paired. A small kernel module testcase shows the improvement of pairing loads and stores in copy_4k_page: POWER6: +9% POWER7: +1.5% #include <linux/module.h> #include <linux/mm.h> #define ITERATIONS 10000000 static int __init copypage_init(void) { struct timespec before, after; unsigned long i; struct page *destpage, *srcpage; char *dest, *src; destpage = alloc_page(GFP_KERNEL); srcpage = alloc_page(GFP_KERNEL); dest = page_address(destpage); src = page_address(srcpage); getnstimeofday(&before); for (i = 0; i < ITERATIONS; i++) copy_4K_page(dest, src); getnstimeofday(&after); free_page((unsigned long)dest); free_page((unsigned long)src); printk(KERN_DEBUG "copy_4K_page loop took %lu ns\n", (after.tv_sec - before.tv_sec) * NSEC_PER_SEC + (after.tv_nsec - before.tv_nsec)); return 0; } static void __exit copypage_exit(void) { } module_init(copypage_init) module_exit(copypage_exit) MODULE_LICENSE("GPL"); MODULE_AUTHOR("Anton Blanchard"); Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Nick Piggin discovered that lwsync barriers around locks were faster than isync on 970. That was a long time ago and I completely dropped the ball in testing his patches across other ppc64 processors. Turns out the idea helps on other chips. Using a microbenchmark that uses a lot of threads to contend on a global pthread mutex (and therefore a global futex), POWER6 improves 8% and POWER7 improves 2%. I checked POWER5 and while I couldn't measure an improvement, there was no regression. This patch uses the lwsync patching code to replace the isyncs with lwsyncs on CPUs that support the instruction. We were marking POWER3 and RS64 as lwsync capable but in reality they treat it as a full sync (ie slow). Remove the CPU_FTR_LWSYNC bit from these CPUs so they continue to use the faster isync method. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
do_lwsync_fixups doesn't work on 64bit, we end up writing lwsyncs to the wrong addresses: 0:mon> di c0000001000bfacc c0000001000bfacc 7c2004ac lwsync Since the lwsync section has negative offsets we need to use a signed int pointer so we sign extend the value. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
For performance reasons we are about to change ISYNC_ON_SMP to sometimes be lwsync. Now that the macro name doesn't make sense, change it and LWSYNC_ON_SMP to better explain what the barriers are doing. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Now we have real bit locks use them instead of open coding it. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
This patch implements the lwarx/ldarx hint bit for bit locks. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Recent versions of the PowerPC architecture added a hint bit to the larx instructions to differentiate between an atomic operation and a lock operation: > 0 Other programs might attempt to modify the word in storage addressed by EA > even if the subsequent Store Conditional succeeds. > > 1 Other programs will not attempt to modify the word in storage addressed by > EA until the program that has acquired the lock performs a subsequent store > releasing the lock. To avoid a binutils dependency this patch create macros for the extended lwarx format and uses it in the spinlock code. To test this change I used a simple test case that acquires and releases a global pthread mutex: pthread_mutex_lock(&mutex); pthread_mutex_unlock(&mutex); On a 32 core POWER6, running 32 test threads we spend almost all our time in the futex spinlock code: 94.37% perf [kernel] [k] ._raw_spin_lock | |--99.95%-- ._raw_spin_lock | | | |--63.29%-- .futex_wake | | | |--36.64%-- .futex_wait_setup Which is a good test for this patch. The results (in lock/unlock operations per second) are: before: 1538203 ops/sec after: 2189219 ops/sec An improvement of 42% A 32 core POWER7 improves even more: before: 1279529 ops/sec after: 2282076 ops/sec An improvement of 78% Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
I often get asked if BAD interrupts are really bad. On some boxes (eg IBM machines running a hypervisor) there are valid cases where are presented with an interrupt that is not for us. These cases are common enough to show up as thousands of BAD interrupts a day. Tone them down by calling them spurious. Since they can be a significant cause of OS jitter, we may as well log them per cpu so we know where they are occurring. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
With NO_HZ it is useful to know how often the decrementer is going off. The patch below adds an entry for it and also adds it into the /proc/stat summaries. While here, I added performance monitoring and machine check exceptions. I found it useful to keep an eye on the PMU exception rate when using the perf tool. Since it's possible to take a completely handled machine check on a System p box it also sounds like a good idea to keep a machine check summary. The event naming matches x86 to keep gratuitous differences to a minimum. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Now we use printf style alignment there is no need to manually space these fields. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
On a large machine I noticed the columns of /proc/interrupts failed to line up with the header after CPU9. At sufficiently large numbers of CPUs it becomes impossible to line up the CPU number with the counts. While fixing this I noticed x86 has a number of updates that we may as well pull in. On PowerPC we currently omit an interrupt completely if there is no active handler, whereas on x86 it is printed if there is a non zero count. The x86 code also spaces the first column correctly based on nr_irqs. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Right now we allocate a cacheline sized NR_CPUS array for xics IPI communication. Use DECLARE_PER_CPU_SHARED_ALIGNED to put it in percpu data in its own cacheline since it is written to by other cpus. On a kernel with NR_CPUS=1024, this saves quite a lot of memory: text data bss dec hex filename 8767779 2944260 1505724 13217763 c9afe3 vmlinux.irq_cpustat 8767555 2813444 1505724 13086723 c7b003 vmlinux.xics A saving of around 128kB. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
PowerPC is currently using asm-generic/hardirq.h which statically allocates an NR_CPUS irq_stat array. Switch to an arch specific implementation which uses per cpu data: On a kernel with NR_CPUS=1024, this saves quite a lot of memory: text data bss dec hex filename 8767938 2944132 1636796 13348866 cbb002 vmlinux.baseline 8767779 2944260 1505724 13217763 c9afe3 vmlinux.irq_cpustat A saving of around 128kB. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Breno Leitao 提交于
During a EEH recover, the pci_dev structure can be null, mainly if an eeh event is detected during cpi config operation. In this case, the pci_dev will not be known (and will be null) the kernel will crash with the following message: Unable to handle kernel paging request for data at address 0x000000a0 Faulting instruction address: 0xc00000000006b8b4 Oops: Kernel access of bad area, sig: 11 [#1] NIP [c00000000006b8b4] .eeh_event_handler+0x10c/0x1a0 LR [c00000000006b8a8] .eeh_event_handler+0x100/0x1a0 Call Trace: [c0000003a80dff00] [c00000000006b8a8] .eeh_event_handler+0x100/0x1a0 [c0000003a80dff90] [c000000000031f1c] .kernel_thread+0x54/0x70 The bug occurs because pci_name() tries to access a null pointer. This patch just guarantee that pci_name() is not called on Null pointers. Signed-off-by: NBreno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by: NLinas Vepstas <linasvepstas@gmail.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Corey Minyard 提交于
DMA ops requires that coherent_dma_mask be set properly for a device, but this was not being done for devices on the MV64x60 that use DMA. Both the serial and ethernet devices need this or they won't be able to allocate memory. Signed-off-by: NCorey Minyard <cminyard@mvista.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 10 2月, 2010 1 次提交
-
-
由 David Gibson 提交于
Commit f71dc176 'Make hpte_need_flush() correctly mask for multiple page sizes' introduced bug, which is triggered when a kernel with a 64k base page size is run on a system whose hardware does not 64k hash PTEs. In this case, we emulate 64k pages with multiple 4k hash PTEs, however in hpte_need_flush() we incorrectly only mask the hardware page size from the address, instead of the logical page size. This causes things to go wrong when we later attempt to iterate through the hardware subpages of the logical page. This patch corrects the error. It has been tested on pSeries bare metal by Michael Neuling. Signed-off-by: NDavid Gibson <dwg@au1.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 09 2月, 2010 11 次提交
-
-
由 Anton Blanchard 提交于
The clockevent multiplier and shift is useful information, but we only need to print it once. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
RTAS should never cause an exception but if it does (for example accessing outside our RMO) then we might go a long way through the kernel before oopsing. If we unset MSR_RI we should at least stop things on exception exit. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Frans Pop 提交于
Signed-off-by: NFrans Pop <elendil@planet.nl> Cc: linuxppc-dev@ozlabs.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
We use firmware_has_feature quite a lot these days, so it's worth putting powerpc_firmware_features into __read_mostly. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Clean up SD_NODE_INITS so we can easily compare it to x86. Similar to the work in 47734f89 (sched: Clean up topology.h) Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
We can use the much more lightweight ida allocator since we don't need the pointer storage idr provides. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Add printout of last accessed sysfs file, added to x86 in ae87221d (sysfs: crash debugging) Also add the notify_die hook that allows us to print out the ftrace buffer on oops. This is useful in conjunction with ftrace function_graph: Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=128 NUMA pSeries last sysfs file: /sys/class/net/tunl0/type Dumping ftrace buffer: ... 0) | .sysrq_handle_crash() { 0) 0.476 us | .hash_page(); 0) 0.488 us | .xmon_fault_handler(); 0) | .bad_page_fault() { 0) | .search_exception_tables() { 0) 0.590 us | .search_module_extables(); 0) 2.546 us | } 0) | .printk() { 0) | .vprintk() { 0) 0.488 us | ._raw_spin_lock(); 0) 0.572 us | .emit_log_char(); Showing the function graph of a sysrq-c crash. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
The pseries and ppc64 defconfigs have drifted apart over the years. Reduce some of the differences while still keeping the idea that the ppc64 defconfig is cross platform but enables fewer features than pseries, eg NR_CPUS is lower. Also enable a number of common adapters as modules. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
The cede latency stuff is relatively new and we don't need to complain about it not working on older firmware. Signed-off-by: NAnton Blanchard <anton@samba.org> Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Joe Perches 提交于
String constants that are continued on subsequent lines with \ are not good. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Will Schmidt 提交于
The tb_total and purr_total values reported via the hcall_stats code should be cumulative, rather than being replaced by the latest delta tb or purr value. Tested-by: NWill Schmidt <will_schmidt@vnet.ibm.com> Signed-off-by: NWill Schmidt <will_schmidt@vnet.ibm.com> Acked-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 08 2月, 2010 1 次提交
-
-
由 Mark Nelson 提交于
The code to track the CPPR values added by commit 49bd3647 ("powerpc/pseries: Track previous CPPR values to correctly EOI interrupts") broke kexec on pseries because the kexec code in xics.c calls xics_set_cpu_priority() before the IPI has been EOI'ed. This wasn't a problem previously but it now triggers a BUG_ON in xics_set_cpu_priority() because os_cppr->index isn't 0. Fix this problem by setting the index on the CPPR stack to 0 before calling xics_set_cpu_priority() in xics_teardown_cpu(). Also make it clear that we only want to set the priority when there's just one CPPR value in the stack, and enforce it by updating the value of os_cppr->stack[0] rather than os_cppr->stack[os_cppr->index]. While we're at it change the BUG_ON to a WARN_ON. Reported-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMark Nelson <markn@au1.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 04 2月, 2010 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
Updated variant of a patch by Joel Schopp. The field containing the number of supported cores which we pass to firmware via the ibm,client-architecture call was set by a previous patch statically as high as is possible (NR_CPUS). However, that value isn't quite right for a system that supports multiple threads per core, thus permitting the firmware to assign more cores to a Linux partition than it can really cope with. This patch improves it by using the device-tree to determine the number of threads supported by the processors in order to adjust the value passed to firmware. Signed-off-by: NJoel Schopp <jschopp@austin.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 03 2月, 2010 7 次提交
-
-
由 jschopp@austin.ibm.com 提交于
This patch adds 2 fields to the ibm_architecture_vec array. The first of these fields indicates the number of cores which Linux can boot. It does not account for SMT, so it may result in cpus assigned to Linux which cannot be booted. A second patch follows that dynamically updates this for SMT. The second field just indicates that our OS is Linux, and not another OS. The system may or may not use this hint to performance tune settings for Linux. Signed-off-by: NJoel Schopp <jschopp@austin.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
With dynamic irq descriptors the overhead of a large NR_IRQS is much lower than it used to be. With more MSI-X capable adapters and drivers exploiting multiple vectors we may as well allow the user to increase it beyond the current maximum of 512. 32768 seems large enough that we'd never have to bump it again (although I bet my prediction is horribly wrong). It boot tests OK and the vmlinux footprint increase is only around 500kB due to: struct irq_map_entry irq_map[NR_IRQS]; We format /proc/interrupts correctly with the previous changes: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 286: 0 0 0 0 0 0 516: 0 0 0 0 0 0 16689: 1833 0 0 0 0 0 17157: 0 0 0 0 0 0 17158: 319 0 0 0 0 0 25092: 0 0 0 0 0 0 Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Peter Tyser 提交于
Recent U-Boot commit 5ccd29c3679b3669b0bde5c501c1aa0f325a7acb caused the "cpu-release-addr" device tree property to contain the physical RAM location that secondary cores were spinning at. Previously, the "cpu-release-addr" property contained a value referencing the boot page translation address range of 0xfffffxxx, which then indirectly accessed RAM. The "cpu-release-addr" is currently ioremapped and the secondary cores kicked. However, due to the recent change in "cpu-release-addr", it sometimes points to a memory location in low memory that cannot be ioremapped. For example on a P2020-based board with 512MB of RAM the following error occurs on bootup: <...> mpic: requesting IPIs ... __ioremap(): phys addr 0x1ffff000 is RAM lr c05df9a0 Unable to handle kernel paging request for data at address 0x00000014 Faulting instruction address: 0xc05df9b0 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2 P2020 RDB Modules linked in: <... eventual kernel panic> Adding logic to conditionally ioremap or access memory directly resolves the issue. Signed-off-by: NPeter Tyser <ptyser@xes-inc.com> Signed-off-by: NNate Case <ncase@xes-inc.com> Reported-by: NDipen Dudhat <B09055@freescale.com> Tested-by: NDipen Dudhat <B09055@freescale.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Using perf to trace L1 dcache misses and dumping data addresses I found a few variables taking a lot of misses. Since they are almost never written, they should go into the __read_mostly section. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
The cputime code has a few places that do per_cpu(, smp_processor_id()). Replace them with __get_cpu_var(). Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Robert P. J. Day 提交于
Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 01 2月, 2010 2 次提交
-
-
由 Andreas Schwab 提交于
Here are the powerpc bits to remove TIF_ABI_PENDING now that set_personality() is called at the appropriate place in exec. Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
desc->affinity doesn't exit in that case. Let's use a macro for the UP variant of get_irq_server(), it's the easiest way, avoids evaluating arguments. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 29 1月, 2010 1 次提交
-
-
由 Stef van Os 提交于
Some of the newer 4xx pci cores need an explicit bit set to send type 1 transactions instead of just comparing the bus numbers. This patch enables type 1 transations for pcix nodes, thus enabling devices behind PCI bridges. Signed-off-by: NStef van Os <stef.van.os@gmail.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-