- 01 6月, 2013 1 次提交
-
-
由 Michael Neuling 提交于
No code changes, just documenting what's happening a little better. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 29 1月, 2013 1 次提交
-
-
由 Suzuki K. Poulose 提交于
Uprobes uses emulate_step in sstep.c, but we haven't explicitly specified the dependency. On pseries HAVE_HW_BREAKPOINT protects us, but 44x has no such luxury. Consolidate other users that depend on sstep and create a new config option. Signed-off-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: NSuzuki K. Poulose <suzuki@in.ibm.com> Cc: linuxppc-dev@ozlabs.org Cc: stable@vger.kernel.org Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 10 1月, 2013 1 次提交
-
-
由 Anton Blanchard 提交于
Finally remove the two level TOC and build with -mcmodel=medium. Unfortunately we can't build modules with -mcmodel=medium due to the tricks the kernel module loader plays with percpu data: # -mcmodel=medium breaks modules because it uses 32bit offsets from # the TOC pointer to create pointers where possible. Pointers into the # percpu data area are created by this method. # # The kernel module loader relocates the percpu data section from the # original location (starting with 0xd...) to somewhere in the base # kernel percpu data space (starting with 0xc...). We need a full # 64bit relocation for this to work, hence -mcmodel=large. On older kernels we fall back to the two level TOC (-mminimal-toc) Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 04 10月, 2012 1 次提交
-
-
由 Nishanth Aravamudan 提交于
In 2fae7cdb ("powerpc: Fix VMX in interrupt check in POWER7 copy loops"), Anton inadvertently introduced a regression for memcpy on POWER7 machines. copyuser and memcpy diverge slightly in their use of cr1 (copyuser doesn't use it, but memcpy does) and you end up clobbering that register with your fix. That results in (taken from an FC18 kernel): [ 18.824604] Unrecoverable VMX/Altivec Unavailable Exception f20 at c000000000052f40 [ 18.824618] Oops: Unrecoverable VMX/Altivec Unavailable Exception, sig: 6 [#1] [ 18.824623] SMP NR_CPUS=1024 NUMA pSeries [ 18.824633] Modules linked in: tg3(+) be2net(+) cxgb4(+) ipr(+) sunrpc xts lrw gf128mul dm_crypt dm_round_robin dm_multipath linear raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua squashfs cramfs [ 18.824705] NIP: c000000000052f40 LR: c00000000020b874 CTR: 0000000000000512 [ 18.824709] REGS: c000001f1fef7790 TRAP: 0f20 Not tainted (3.6.0-0.rc6.git0.2.fc18.ppc64) [ 18.824713] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 4802802e XER: 20000010 [ 18.824726] SOFTE: 0 [ 18.824728] CFAR: 0000000000000f20 [ 18.824731] TASK = c000000fa7128400[0] 'swapper/24' THREAD: c000000fa7480000 CPU: 24 GPR00: 00000000ffffffc0 c000001f1fef7a10 c00000000164edc0 c000000f9b9a8120 GPR04: c000000f9b9a8124 0000000000001438 0000000000000060 03ffffff064657ee GPR08: 0000000080000000 0000000000000010 0000000000000020 0000000000000030 GPR12: 0000000028028022 c00000000ff25400 0000000000000001 0000000000000000 GPR16: 0000000000000000 7fffffffffffffff c0000000016b2180 c00000000156a500 GPR20: c000000f968c7a90 c0000000131c31d8 c000001f1fef4000 c000000001561d00 GPR24: 000000000000000a 0000000000000000 0000000000000001 0000000000000012 GPR28: c000000fa5c04f80 00000000000008bc c0000000015c0a28 000000000000022e [ 18.824792] NIP [c000000000052f40] .memcpy_power7+0x5a0/0x7c4 [ 18.824797] LR [c00000000020b874] .pcpu_free_area+0x174/0x2d0 [ 18.824800] Call Trace: [ 18.824803] [c000001f1fef7a10] [c000000000052c14] .memcpy_power7+0x274/0x7c4 (unreliable) [ 18.824809] [c000001f1fef7b10] [c00000000020b874] .pcpu_free_area+0x174/0x2d0 [ 18.824813] [c000001f1fef7bb0] [c00000000020ba88] .free_percpu+0xb8/0x1b0 [ 18.824819] [c000001f1fef7c50] [c00000000043d144] .throtl_pd_exit+0x94/0xd0 [ 18.824824] [c000001f1fef7cf0] [c00000000043acf8] .blkg_free+0x88/0xe0 [ 18.824829] [c000001f1fef7d90] [c00000000018c048] .rcu_process_callbacks+0x2e8/0x8a0 [ 18.824835] [c000001f1fef7e90] [c0000000000a8ce8] .__do_softirq+0x158/0x4d0 [ 18.824840] [c000001f1fef7f90] [c000000000025ecc] .call_do_softirq+0x14/0x24 [ 18.824845] [c000000fa7483650] [c000000000010e80] .do_softirq+0x160/0x1a0 [ 18.824850] [c000000fa74836f0] [c0000000000a94a4] .irq_exit+0xf4/0x120 [ 18.824854] [c000000fa7483780] [c000000000020c44] .timer_interrupt+0x154/0x4d0 [ 18.824859] [c000000fa7483830] [c000000000003be0] decrementer_common+0x160/0x180 [ 18.824866] --- Exception: 901 at .plpar_hcall_norets+0x84/0xd4 [ 18.824866] LR = .check_and_cede_processor+0x48/0x80 [ 18.824871] [c000000fa7483b20] [c00000000007f018] .check_and_cede_processor+0x18/0x80 (unreliable) [ 18.824877] [c000000fa7483b90] [c00000000007f104] .dedicated_cede_loop+0x84/0x150 [ 18.824883] [c000000fa7483c50] [c0000000006bc030] .cpuidle_enter+0x30/0x50 [ 18.824887] [c000000fa7483cc0] [c0000000006bc9f4] .cpuidle_idle_call+0x104/0x720 [ 18.824892] [c000000fa7483d80] [c000000000070af8] .pSeries_idle+0x18/0x40 [ 18.824897] [c000000fa7483df0] [c000000000019084] .cpu_idle+0x1a4/0x380 [ 18.824902] [c000000fa7483ec0] [c0000000008a4c18] .start_secondary+0x520/0x528 [ 18.824907] [c000000fa7483f90] [c0000000000093f0] .start_secondary_prolog+0x10/0x14 [ 18.824911] Instruction dump: [ 18.824914] 38840008 90030000 90e30004 38630008 7ca62850 7cc300d0 78c7e102 7cf01120 [ 18.824923] 78c60660 39200010 39400020 39600030 <7e00200c> 7c0020ce 38840010 409f001c [ 18.824935] ---[ end trace 0bb95124affaaa45 ]--- [ 18.825046] Unrecoverable VMX/Altivec Unavailable Exception f20 at c000000000052d08 I believe the right fix is to make memcpy match usercopy and not use cr1. Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@kernel.org> [v3.6]
-
- 18 9月, 2012 1 次提交
-
-
由 Tiejun Chen 提交于
We don't do the real store operation for kprobing 'stwu Rx,(y)R1' since this may corrupt the exception frame, now we will do this operation safely in exception return code after migrate current exception frame below the kprobed function stack. So we only update gpr[1] here and trigger a thread flag to mask this. Note we should make sure if we trigger kernel stack over flow. Signed-off-by: NTiejun Chen <tiejun.chen@windriver.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 05 9月, 2012 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
patch_instruction() can be called very early on ppc32, when the kernel isn't yet running at it's linked address. That can cause the ! is_kernel_addr() test in __put_user() to trip and call might_sleep() which is very bad at that point during boot. Use a lower level function instead for now, at least until we get to rework ppc32 boot process to do the code patching later, like ppc64 does. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 24 8月, 2012 2 次提交
-
-
由 Anton Blanchard 提交于
The enhanced prefetch hint patches corrupt the condition register that was used to check if we are in interrupt. Fix this by using cr1. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
"powerpc: Use enhanced touch instructions in POWER7 copy_to_user/copy_from_user" was applied twice. Remove one. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 11 7月, 2012 1 次提交
-
-
由 Stephen Rothwell 提交于
This allows the linker to know that calls to them do not need to switch TOC and stop errors like the following when linking large configurations: powerpc64-linux-ld: drivers/built-in.o: In function `.gpiochip_is_requested': (.text+0x4): sibling call optimization to `_savegpr0_29' does not allow automatic multiple TOCs; recompile with -mminimal-toc or -fno-optimize-sibling-calls, or make `_savegpr0_29' extern Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 10 7月, 2012 4 次提交
-
-
由 Michael Neuling 提交于
These macros are using integers where they could be using logical names since they take registers. We are going to enforce this soon, so fix these up now. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Neuling 提交于
mtocrf define is just a wrapper around the real instructions so we can just use real register names here (ie. lower case). Also remove braces in macro so this is possible. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Neuling 提交于
Merge the defines of STACKFRAMESIZE, STK_REG, STK_PARAM from different places. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Neuling 提交于
Anything that uses a constructed instruction (ie. from ppc-opcode.h), need to use the new R0 macro, as %r0 is not going to work. Also convert usages of macros where we are just determining an offset (usually for a load/store), like: std r14,STK_REG(r14)(r1) Can't use STK_REG(r14) as %r14 doesn't work in the STK_REG macro since it's just calculating an offset. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 03 7月, 2012 8 次提交
-
-
由 Anton Blanchard 提交于
I blame Mikey for this. He elevated my slightly dubious testcase: to benchmark status. And naturally we need to be number 1 at creating zeros. So lets improve __clear_user some more. As Paul suggests we can use dcbz for large lengths. This patch gets the destination cacheline aligned then uses dcbz on whole cachelines. Before: 10485760000 bytes (10 GB) copied, 0.414744 s, 25.3 GB/s After: 10485760000 bytes (10 GB) copied, 0.268597 s, 39.0 GB/s 39 GB/s, a new record. Signed-off-by: NAnton Blanchard <anton@samba.org> Tested-by: NOlof Johansson <olof@lixom.net> Acked-by: NOlof Johansson <olof@lixom.net> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Implement a POWER7 optimised memcpy using VMX and enhanced prefetch instructions. This is a copy of the POWER7 optimised copy_to_user/copy_from_user loop. Detailed implementation and performance details can be found in commit a66086b8 (powerpc: POWER7 optimised copy_to_user/copy_from_user using VMX). I noticed memcpy issues when profiling a RAID6 workload: .memcpy .async_memcpy .async_copy_data .__raid_run_ops .handle_stripe .raid5d .md_thread I created a simplified testcase by building a RAID6 array with 4 1GB ramdisks (booting with brd.rd_size=1048576): # mdadm -CR -e 1.2 /dev/md0 --level=6 -n4 /dev/ram[0-3] I then timed how long it took to write to the entire array: # dd if=/dev/zero of=/dev/md0 bs=1M Before: 892 MB/s After: 999 MB/s A 12% improvement. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Version 2.06 of the POWER ISA introduced enhanced touch instructions, allowing us to specify a number of attributes including the length of a stream. This patch adds a software stream for both loads and stores in the POWER7 copy_tofrom_user loop. Since the setup is quite complicated and we have to use an eieio to ensure correct ordering of the "GO" command we only do this for copies above 4kB. To quantify any performance improvements we need a working set bigger than the caches so we operate on a 1GB file: # dd if=/dev/zero of=/tmp/foo bs=1M count=1024 And we compare how fast we can read the file: # dd if=/tmp/foo of=/dev/null bs=1M before: 7.7 GB/s after: 9.6 GB/s A 25% improvement. The worst case for this patch will be a completely L1 cache contained copy of just over 4kB. We can test this with the copy_to_user testcase we used to tune copy_tofrom_user originally: http://ozlabs.org/~anton/junkcode/copy_to_user.c # time ./copy_to_user2 -l 4224 -i 10000000 before: 6.807 s after: 6.946 s A 2% slowdown, which seems reasonable considering our data is unlikely to be completely L1 contained. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Implement a POWER7 optimised copy_page using VMX and enhanced prefetch instructions. We use enhanced prefetch hints to prefetch both the load and store side. We copy a cacheline at a time and fall back to regular loads and stores if we are unable to use VMX (eg we are in an interrupt). The following microbenchmark was used to assess the impact of the patch: http://ozlabs.org/~anton/junkcode/page_fault_file.c We test MAP_PRIVATE page faults across a 1GB file, 100 times: # time ./page_fault_file -p -l 1G -i 100 Before: 22.25s After: 18.89s 17% faster Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Subsequent patches will add more VMX library functions and it makes sense to keep all the c-code helper functions in the one file. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Version 2.06 of the POWER ISA introduced enhanced touch instructions, allowing us to specify a number of attributes including the length of a stream. This patch adds a software stream for both loads and stores in the POWER7 copy_tofrom_user loop. Since the setup is quite complicated and we have to use an eieio to ensure correct ordering of the "GO" command we only do this for copies above 4kB. To quantify any performance improvements we need a working set bigger than the caches so we operate on a 1GB file: # dd if=/dev/zero of=/tmp/foo bs=1M count=1024 And we compare how fast we can read the file: # dd if=/tmp/foo of=/dev/null bs=1M before: 7.7 GB/s after: 9.6 GB/s A 25% improvement. The worst case for this patch will be a completely L1 cache contained copy of just over 4kB. We can test this with the copy_to_user testcase we used to tune copy_tofrom_user originally: http://ozlabs.org/~anton/junkcode/copy_to_user.c # time ./copy_to_user2 -l 4224 -i 10000000 before: 6.807 s after: 6.946 s A 2% slowdown, which seems reasonable considering our data is unlikely to be completely L1 contained. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
I noticed __clear_user high up in a profile of one of my RAID stress tests. The testcase was doing a dd from /dev/zero which ends up calling __clear_user. __clear_user is basically a loop with a single 4 byte store which is horribly slow. We can do much better by aligning the desination and doing 32 bytes of 8 byte stores in a loop. The following testcase was used to verify the patch: http://ozlabs.org/~anton/junkcode/stress_clear_user.c To show the improvement in performance I ran a dd from /dev/zero to /dev/null on a POWER7 box: Before: # dd if=/dev/zero of=/dev/null bs=1M count=10000 10485760000 bytes (10 GB) copied, 3.72379 s, 2.8 GB/s After: # time dd if=/dev/zero of=/dev/null bs=1M count=10000 10485760000 bytes (10 GB) copied, 0.728318 s, 14.4 GB/s Over 5x faster. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Steven Rostedt 提交于
For ftrace to use the patch_instruction code, it needs to check for faults on write. Ftrace updates code all over the kernel, and we need to know if code is updated or not due to protections that are placed on some portions of the kernel. If ftrace does not detect a fault, it will error later on, and it will be much more difficult to find the problem. By changing patch_instruction() to detect faults, then ftrace will be able to make use of it too. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 28 5月, 2012 1 次提交
-
-
由 Paul Mackerras 提交于
This is much the same as for SPARC except that we can do the find_zero() function more efficiently using the count-leading-zeroes instructions. Tested on 32-bit and 64-bit PowerPC. Signed-off-by: NPaul Mackerras <paulus@samba.org> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 4月, 2012 1 次提交
-
-
由 Anton Blanchard 提交于
Remove CONFIG_POWER4_ONLY, the option is badly named and only does two things: - It wraps the MMU segment table code. With feature fixups there is little downside to compiling this in. - It uses the newer mtocrf instruction in various assembly functions. Instead of making this a compile option just do it at runtime via a feature fixup. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 29 3月, 2012 1 次提交
-
-
由 David Howells 提交于
Disintegrate asm/system.h for PowerPC. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> cc: linuxppc-dev@lists.ozlabs.org
-
- 21 3月, 2012 1 次提交
-
-
由 Stephen Rothwell 提交于
This is no longer selectable, so just remove all the dependent code. Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 19 12月, 2011 1 次提交
-
-
由 Anton Blanchard 提交于
Implement a POWER7 optimised copy_to_user/copy_from_user using VMX. For large aligned copies this new loop is over 10% faster, and for large unaligned copies it is over 200% faster. If we take a fault we fall back to the old version, this keeps things relatively simple and easy to verify. On POWER7 unaligned stores rarely slow down - they only flush when a store crosses a 4KB page boundary. Furthermore this flush is handled completely in hardware and should be 20-30 cycles. Unaligned loads on the other hand flush much more often - whenever crossing a 128 byte cache line, or a 32 byte sector if either sector is an L1 miss. Considering this information we really want to get the loads aligned and not worry about the alignment of the stores. Microbenchmarks confirm that this approach is much faster than the current unaligned copy loop that uses shifts and rotates to ensure both loads and stores are aligned. We also want to try and do the stores in cacheline aligned, cacheline sized chunks. If the store queue is unable to merge an entire cacheline of stores then the L2 cache will have to do a read/modify/write. Even worse, we will serialise this with the stores in the next iteration of the copy loop since both iterations hit the same cacheline. Based on this, the new loop does the following things: 1 - 127 bytes Get the source 8 byte aligned and use 8 byte loads and stores. Pretty boring and similar to how the current loop works. 128 - 4095 bytes Get the source 8 byte aligned and use 8 byte loads and stores, 1 cacheline at a time. We aren't doing the stores in cacheline aligned chunks so we will potentially serialise once per cacheline. Even so it is much better than the loop we have today. 4096 - bytes If both source and destination have the same alignment get them both 16 byte aligned, then get the destination cacheline aligned. Do cacheline sized loads and stores using VMX. If source and destination do not have the same alignment, we get the destination cacheline aligned, and use permute to do aligned loads. In both cases the VMX loop should be optimal - we always do aligned loads and stores and are always doing stores in cacheline aligned, cacheline sized chunks. To be able to use VMX we must be careful about interrupts and sleeping. We don't use the VMX loop when in an interrupt (which should be rare anyway) and we wrap the VMX loop in disable/enable_pagefault and fall back to the existing copy_tofrom_user loop if we do need to sleep. The VMX breakpoint of 4096 bytes was chosen using this microbenchmark: http://ozlabs.org/~anton/junkcode/copy_to_user.c Since we are using VMX and there is a cost to saving and restoring the user VMX state there are two broad cases we need to benchmark: - Best case - userspace never uses VMX - Worst case - userspace always uses VMX In reality a userspace process will sit somewhere between these two extremes. Since we need to test both aligned and unaligned copies we end up with 4 combinations. The point at which the VMX loop begins to win is: 0% VMX aligned 2048 bytes unaligned 2048 bytes 100% VMX aligned 16384 bytes unaligned 8192 bytes Considering this is a microbenchmark, the data is hot in cache and the VMX loop has better store queue merging properties we set the breakpoint to 4096 bytes, a little below the unaligned breakpoints. Some future optimisations we can look at: - Looking at the perf data, a significant part of the cost when a task is always using VMX is the extra exception we take to restore the VMX state. As such we should do something similar to the x86 optimisation that restores FPU state for heavy users. ie: /* * If the task has used fpu the last 5 timeslices, just do a full * restore of the math state immediately to avoid the trap; the * chances of needing FPU soon are obviously high now */ preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5; and /* * fpu_counter contains the number of consecutive context switches * that the FPU is used. If this is over a threshold, the lazy fpu * saving becomes unlazy to save the trap. This is an unsigned char * so that after 256 times the counter wraps and the behavior turns * lazy again; this to deal with bursty apps that only use FPU for * a short time */ - We could create a paca bit to mirror the VMX enabled MSR bit and check that first, avoiding multiple calls to calling enable_kernel_altivec. That should help with iovec based system calls like readv. - We could have two VMX breakpoints, one for when we know the user VMX state is loaded into the registers and one when it isn't. This could be a second bit in the paca so we can calculate the break points quickly. - One suggestion from Ben was to save and restore the VSX registers we use inline instead of using enable_kernel_altivec. [BenH: Fixed a problem with preempt and fixed build without CONFIG_ALTIVEC] Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 16 11月, 2011 1 次提交
-
-
由 Anton Blanchard 提交于
kdump fails because we try to execute an HV only instruction. Feature fixups are being applied after we copy the exception vectors down to 0 so they miss out on any updates. We have always had this issue but it only became critical in v3.0 when we added CFAR support (breaks POWER5) and v3.1 when we added POWERNV (breaks everyone). Signed-off-by: NAnton Blanchard <anton@samba.org> Cc: <stable@kernel.org> [v3.0+] Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 01 11月, 2011 1 次提交
-
-
由 Paul Gortmaker 提交于
All these files were including module.h just for the basic EXPORT_SYMBOL infrastructure. We can shift them off to the export.h header which is a way smaller footprint and thus realize some compile time gains. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 21 5月, 2011 1 次提交
-
-
由 Linus Torvalds 提交于
Commit e66eed65 ("list: remove prefetching from regular list iterators") removed the include of prefetch.h from list.h, which uncovered several cases that had apparently relied on that rather obscure header file dependency. So this fixes things up a bit, using grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]') grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]') to guide us in finding files that either need <linux/prefetch.h> inclusion, or have it despite not needing it. There are more of them around (mostly network drivers), but this gets many core ones. Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 5月, 2011 3 次提交
-
-
由 Milton Miller 提交于
Replace all remaining callers of alloc_maybe_bootmem with zalloc_maybe_bootmem. The callsite in pci_dn is followed with a memset to clear the memory, and not zeroing at the other callsites in the celleb fake pci code could lead to following uninitialized memory as pointers or even freeing said pointers on error paths. Signed-off-by: NMilton Miller <miltonm@bga.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
We have a confusing number of ioremap functions. Make things just a bit simpler by merging ioremap_flags and ioremap_prot. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
To make it easier to add optimised versions of copy_page, remove the 4kB loop for 64kB pages and just do all the work in copy_page. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 27 4月, 2011 1 次提交
-
-
由 Michael Ellerman 提交于
We check MSR_SF a lot in sstep.c, to decide if we need to emulate the truncation of values when running in 32-bit mode. Factor out that code into a helper, and convert it and the other uses to use MSR_64BIT. This fixes a bug on BOOK3E where kprobes would end up returning to a 32-bit address, because regs->nip was truncated, because (msr & MSR_SF) was false. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 21 1月, 2011 1 次提交
-
-
由 Michael Ellerman 提交于
When we create an alternative feature section, the else case must be the same size or smaller than the body. This is because when we patch the else case in we just overwrite the body, so there must be room. Up to now we just did this by inspection, but it's quite easy to enforce it in the assembler, so we should. The only change is to add the ifgt block, but that effects the alignment of the tabs and so the whole macro is modified. Also add a test, but #if 0 it because we don't want to break the build. Anyone who's modifying the feature macros should enable the test. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 09 12月, 2010 1 次提交
-
-
由 Anton Blanchard 提交于
The popcnt instructions went into binutils relatively recently. As with a number of other instructions, create macros and hardcode them. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 29 11月, 2010 1 次提交
-
-
由 Anton Blanchard 提交于
POWER5 added popcntb, and POWER7 added popcntw and popcntd. As a first step this patch does all the work out of line, but it would be nice to implement them as inlines with an out of line fallback. The performance issue with hweight was noticed when disabling SMT on a large (192 thread) POWER7 box. The patch improves that testcase by about 8%. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 13 10月, 2010 1 次提交
-
-
由 matt mooney 提交于
Replace EXTRA_CFLAGS with ccflags-y and EXTRA_AFLAGS with asflags-y. Signed-off-by: Nmatt mooney <mfm@muteddisk.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 02 9月, 2010 3 次提交
-
-
由 Sean MacLennan 提交于
Replace the BOOK3S_64 specific mtmsrd with the generic MTMSRD macro. Only enable ldstfp when CONFIG_PPC_FPU is set. Signed-off-by: NSean MacLennan <smaclennan@pikatech.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Sean MacLennan 提交于
Signed-off-by: NSean MacLennan <smaclennan@pikatech.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Paul Mackerras 提交于
Currently we have the lppaca structs as a simple array of NR_CPUS entries, taking up space in the data section of the kernel image. In future we would like to allocate them dynamically, so this abstracts out the accesses to the array, making it easier to change how we locate the lppaca for a given cpu in future. Specifically, lppaca[cpu] changes to lppaca_of(cpu). Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-