提交 · f7d6a7283aa1de430c6323a9714d1a787bc2d1ea · openeuler / raspberrypi-kernel

03 3月, 2017 1 次提交

powerpc: emulate_step() tests for load/store instructions · 4ceae137

由 Ravi Bangoria 提交于 2月 14, 2017

Add new selftest that test emulate_step for Normal, Floating Point,
Vector and Vector Scalar - load/store instructions. Test should run
at boot time if CONFIG_KPROBES_SANITY_TEST and CONFIG_PPC64 is set.

Sample log:

  emulate_step_test: ld             : PASS
  emulate_step_test: lwz            : PASS
  emulate_step_test: lwzx           : PASS
  emulate_step_test: std            : PASS
  emulate_step_test: ldarx / stdcx. : PASS
  emulate_step_test: lfsx           : PASS
  emulate_step_test: stfsx          : PASS
  emulate_step_test: lfdx           : PASS
  emulate_step_test: stfdx          : PASS
  emulate_step_test: lvx            : PASS
  emulate_step_test: stvx           : PASS
  emulate_step_test: lxvd2x         : PASS
  emulate_step_test: stxvd2x        : PASS
Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
[mpe: Drop start/complete lines, make it all __init]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4ceae137

25 1月, 2017 1 次提交

powerpc/64: Use optimized checksum routines on little-endian · d4fde568

由 Paul Mackerras 提交于 11月 03, 2016

Currently we have optimized hand-coded assembly checksum routines for
big-endian 64-bit systems, but for little-endian we use the generic C
routines. This modifies the optimized routines to work for
little-endian. With this, we no longer need to enable
CONFIG_GENERIC_CSUM. This also fixes a couple of comments in
checksum_64.S so they accurately reflect what the associated instruction
does.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
[mpe: Use the more common __BIG_ENDIAN__]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d4fde568

13 9月, 2016 1 次提交

powerpc/Makefile: Drop CONFIG_WORD_SIZE for BITS · 68201fbb

由 Michael Ellerman 提交于 8月 11, 2016

Commit 2578bfae ("[POWERPC] Create and use CONFIG_WORD_SIZE") added
CONFIG_WORD_SIZE, and suggests that other arches were going to do
likewise.

But that never happened, powerpc is the only architecture which uses it.

So switch to using a simple make variable, BITS, like x86, sh, sparc and
tile. It is also easier to spell and simpler, avoiding any confusion
about whether it's defined due to ordering of make vs kconfig.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

68201fbb

08 8月, 2016 1 次提交
- A
  ppc: move exports to definitions · 9445aa1a
  由 Al Viro 提交于 1月 13, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  9445aa1a
07 3月, 2016 1 次提交

powerpc/ftrace: Use $(CC_FLAGS_FTRACE) when disabling ftrace · 9a7841ae

由 Torsten Duwe 提交于 3月 03, 2016

Rather than open-coding -pg whereever we want to disable ftrace, use the
existing $(CC_FLAGS_FTRACE) variable.

This has the advantage that it will work in future when we use a
different set of flags to enable ftrace.
Signed-off-by: NTorsten Duwe <duwe@suse.de>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9a7841ae

05 3月, 2016 1 次提交

powerpc32: checksum_wrappers_64 becomes checksum_wrappers · 03bc8b0f

由 Christophe Leroy 提交于 9月 22, 2015

The powerpc64 checksum wrapper functions adds csum_and_copy_to_user()
which otherwise is implemented in include/net/checksum.h by using
csum_partial() then copy_to_user()

Those two wrapper fonctions are also applicable to powerpc32 as it is
based on the use of csum_partial_copy_generic() which also
exists on powerpc32

This patch renames arch/powerpc/lib/checksum_wrappers_64.c to
arch/powerpc/lib/checksum_wrappers.c and
makes it non-conditional to CONFIG_WORD_SIZE
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <oss@buserror.net>

03bc8b0f

11 6月, 2015 1 次提交

powerpc: Only use -mabi=altivec if toolchain supports it · 1fb3f5a7

由 Anton Blanchard 提交于 5月 26, 2015

The -mabi=altivec option is not recognised on LLVM, so use call cc-option
to check for support.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1fb3f5a7

28 1月, 2015 2 次提交
- M
  powerpc/lib: Makefile, use obj64-y to consolidate 64-bit rules · 1dcee55f
  由 Michael Ellerman 提交于 12月 22, 2014
```
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
```
  1dcee55f
- M
  powerpc/lib: Makefile, consolidate obj-y sections · 564ec2f2
  由 Michael Ellerman 提交于 12月 22, 2014
```
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
```
  564ec2f2
23 1月, 2015 1 次提交

powerpc: Add 64bit optimised memcmp · 15c2d45d

由 Anton Blanchard 提交于 1月 21, 2015

I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.

Optimise the loop in a few ways:

- Unroll the byte at a time loop

- For large (at least 32 byte) comparisons that are also 8 byte
  aligned, use an unrolled modulo scheduled loop using 8 byte
  loads. This is similar to our glibc memcmp.

A simple microbenchmark testing 10000000 iterations of an 8192 byte
memcmp was used to measure the performance:

baseline:	29.93 s

modified:	 1.70 s

Just over 17x faster.

v2: Incorporated some suggestions from Segher:

- Use andi. instead of rdlicl.

- Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
  and was a relic from a previous version.

- Don't use cr5, we have plans to use that CR field for fast local
  atomics.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

15c2d45d

29 12月, 2014 1 次提交

powerpc/lib: Do not include string.o in obj-y twice · 803d57de

由 Andreas Ruprecht 提交于 12月 17, 2014

In the Makefile, string.o (which is generated from string.S) is
included into the list of objects being built unconditionally
(obj-y) in line 12.

Additionally, if CONFIG_PPC64 is set, it is included again in
line 17.

This patch removes the latter unnecessary inclusion.
Signed-off-by: NAndreas Ruprecht <rupran@einserver.de>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

803d57de

10 11月, 2014 1 次提交

powerpc: Remove unused devm_ioremap_prot() · dedd24a1

由 Kyle McMartin 提交于 5月 25, 2013

Added in 2008, but has never had any in-tree users, and no other
architectures provide it.
Signed-off-by: NKyle McMartin <kyle@redhat.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

dedd24a1

25 9月, 2014 1 次提交

powerpc: Move lib symbol exports into arch/powerpc/lib/ppc_ksyms.c · 7b20a955

由 Anton Blanchard 提交于 8月 20, 2014

Move the lib symbol exports closer to their function definitions
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7b20a955

30 4月, 2014 1 次提交

powerpc: memcpy optimization for 64bit LE · 00f554fa

由 Philippe Bergheaud 提交于 4月 30, 2014

Unaligned stores take alignment exceptions on POWER7 running in little-endian.
This is a dumb little-endian base memcpy that prevents unaligned stores.
Once booted the feature fixup code switches over to the VMX copy loops
(which are already endian safe).

The question is what we do before that switch over. The base 64bit
memcpy takes alignment exceptions on POWER7 so we can't use it as is.
Fixing the causes of alignment exception would slow it down, because
we'd need to ensure all loads and stores are aligned either through
rotate tricks or bytewise loads and stores. Either would be bad for
all other 64bit platforms.

[ I simplified the loop a bit - Anton ]
Signed-off-by: NPhilippe Bergheaud <felix@linux.vnet.ibm.com>
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

00f554fa

30 10月, 2013 1 次提交

powerpc: Add VMX optimised xor for RAID5 · ef1313de

由 Anton Blanchard 提交于 10月 14, 2013

Add a VMX optimised xor, used primarily for RAID5. On a POWER7 blade
this is a decent win:

   32regs    : 17932.800 MB/sec
   altivec   : 19724.800 MB/sec

The bigger gain is when the same test is run in SMT4 mode, as it
would if there was a lot of work going on:

   8regs     :  8377.600 MB/sec
   altivec   : 15801.600 MB/sec

I tested this against an array created without the patch, and also
verified it worked as expected on a little endian kernel.

[ Fix !CONFIG_ALTIVEC build -- BenH ]
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ef1313de

11 10月, 2013 2 次提交

powerpc: Use generic memcpy code in little endian · de577a35

由 Anton Blanchard 提交于 9月 23, 2013

We need to fix some endian issues in our memcpy code. For now
just enable the generic memcpy routine for little endian builds.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

de577a35

powerpc: Use generic checksum code in little endian · 7a332b0c

由 Anton Blanchard 提交于 9月 23, 2013

We need to fix some endian issues in our checksum code. For now
just enable the generic checksum routines for little endian builds.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7a332b0c

29 1月, 2013 1 次提交

uprobes/powerpc: Add dependency on single step emulation · 5e249d45

由 Suzuki K. Poulose 提交于 1月 07, 2013

Uprobes uses emulate_step in sstep.c, but we haven't explicitly specified
the dependency. On pseries HAVE_HW_BREAKPOINT protects us, but 44x has no
such luxury.

Consolidate other users that depend on sstep and create a new config option.
Signed-off-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: NSuzuki K. Poulose <suzuki@in.ibm.com>
Cc: linuxppc-dev@ozlabs.org
Cc: stable@vger.kernel.org
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

5e249d45

10 1月, 2013 1 次提交

powerpc: Build kernel with -mcmodel=medium · 1fbe9cf2

由 Anton Blanchard 提交于 11月 26, 2012

Finally remove the two level TOC and build with -mcmodel=medium.

Unfortunately we can't build modules with -mcmodel=medium due to
the tricks the kernel module loader plays with percpu data:

# -mcmodel=medium breaks modules because it uses 32bit offsets from
# the TOC pointer to create pointers where possible. Pointers into the
# percpu data area are created by this method.
#
# The kernel module loader relocates the percpu data section from the
# original location (starting with 0xd...) to somewhere in the base
# kernel percpu data space (starting with 0xc...). We need a full
# 64bit relocation for this to work, hence -mcmodel=large.

On older kernels we fall back to the two level TOC (-mminimal-toc)
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

1fbe9cf2

03 7月, 2012 4 次提交

powerpc: POWER7 optimised memcpy using VMX and enhanced prefetch · b3f271e8

由 Anton Blanchard 提交于 5月 30, 2012

Implement a POWER7 optimised memcpy using VMX and enhanced prefetch
instructions.

This is a copy of the POWER7 optimised copy_to_user/copy_from_user
loop. Detailed implementation and performance details can be found in
commit a66086b8 (powerpc: POWER7 optimised
copy_to_user/copy_from_user using VMX).

I noticed memcpy issues when profiling a RAID6 workload:

	.memcpy
	.async_memcpy
	.async_copy_data
	.__raid_run_ops
	.handle_stripe
	.raid5d
	.md_thread

I created a simplified testcase by building a RAID6 array with 4 1GB
ramdisks (booting with brd.rd_size=1048576):

# mdadm -CR -e 1.2 /dev/md0 --level=6 -n4 /dev/ram[0-3]

I then timed how long it took to write to the entire array:

# dd if=/dev/zero of=/dev/md0 bs=1M

Before: 892 MB/s
After:  999 MB/s

A 12% improvement.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

b3f271e8

powerpc: POWER7 optimised copy_page using VMX and enhanced prefetch · fde69282

由 Anton Blanchard 提交于 5月 29, 2012

Implement a POWER7 optimised copy_page using VMX and enhanced
prefetch instructions. We use enhanced prefetch hints to prefetch
both the load and store side. We copy a cacheline at a time and
fall back to regular loads and stores if we are unable to use VMX
(eg we are in an interrupt).

The following microbenchmark was used to assess the impact of
the patch:

http://ozlabs.org/~anton/junkcode/page_fault_file.c

We test MAP_PRIVATE page faults across a 1GB file, 100 times:

# time ./page_fault_file -p -l 1G -i 100

Before: 22.25s
After:  18.89s

17% faster
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fde69282

powerpc: Rename copyuser_power7_vmx.c to vmx-helper.c · 6f7839e5

由 Anton Blanchard 提交于 5月 29, 2012

Subsequent patches will add more VMX library functions and it makes
sense to keep all the c-code helper functions in the one file.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

6f7839e5

powerpc: 64bit optimised __clear_user · 17968fbb

由 Anton Blanchard 提交于 5月 27, 2012

I noticed __clear_user high up in a profile of one of my RAID stress
tests. The testcase was doing a dd from /dev/zero which ends up
calling __clear_user.

__clear_user is basically a loop with a single 4 byte store which
is horribly slow. We can do much better by aligning the desination
and doing 32 bytes of 8 byte stores in a loop.

The following testcase was used to verify the patch:

http://ozlabs.org/~anton/junkcode/stress_clear_user.c

To show the improvement in performance I ran a dd from /dev/zero
to /dev/null on a POWER7 box:

Before:

# dd if=/dev/zero of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB) copied, 3.72379 s, 2.8 GB/s

After:

# time dd if=/dev/zero of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB) copied, 0.728318 s, 14.4 GB/s

Over 5x faster.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

17968fbb

19 12月, 2011 1 次提交

powerpc: POWER7 optimised copy_to_user/copy_from_user using VMX · a66086b8

由 Anton Blanchard 提交于 12月 07, 2011

Implement a POWER7 optimised copy_to_user/copy_from_user using VMX.
For large aligned copies this new loop is over 10% faster, and for
large unaligned copies it is over 200% faster.

If we take a fault we fall back to the old version, this keeps
things relatively simple and easy to verify.

On POWER7 unaligned stores rarely slow down - they only flush when
a store crosses a 4KB page boundary. Furthermore this flush is
handled completely in hardware and should be 20-30 cycles.

Unaligned loads on the other hand flush much more often - whenever
crossing a 128 byte cache line, or a 32 byte sector if either sector
is an L1 miss.

Considering this information we really want to get the loads aligned
and not worry about the alignment of the stores. Microbenchmarks
confirm that this approach is much faster than the current unaligned
copy loop that uses shifts and rotates to ensure both loads and
stores are aligned.

We also want to try and do the stores in cacheline aligned, cacheline
sized chunks. If the store queue is unable to merge an entire
cacheline of stores then the L2 cache will have to do a
read/modify/write. Even worse, we will serialise this with the stores
in the next iteration of the copy loop since both iterations hit
the same cacheline.

Based on this, the new loop does the following things:

1 - 127 bytes
Get the source 8 byte aligned and use 8 byte loads and stores. Pretty
boring and similar to how the current loop works.

128 - 4095 bytes
Get the source 8 byte aligned and use 8 byte loads and stores,
1 cacheline at a time. We aren't doing the stores in cacheline
aligned chunks so we will potentially serialise once per cacheline.
Even so it is much better than the loop we have today.

4096 - bytes
If both source and destination have the same alignment get them both
16 byte aligned, then get the destination cacheline aligned. Do
cacheline sized loads and stores using VMX.

If source and destination do not have the same alignment, we get the
destination cacheline aligned, and use permute to do aligned loads.

In both cases the VMX loop should be optimal - we always do aligned
loads and stores and are always doing stores in cacheline aligned,
cacheline sized chunks.

To be able to use VMX we must be careful about interrupts and
sleeping. We don't use the VMX loop when in an interrupt (which should
be rare anyway) and we wrap the VMX loop in disable/enable_pagefault
and fall back to the existing copy_tofrom_user loop if we do need to
sleep.

The VMX breakpoint of 4096 bytes was chosen using this microbenchmark:

http://ozlabs.org/~anton/junkcode/copy_to_user.c

Since we are using VMX and there is a cost to saving and restoring
the user VMX state there are two broad cases we need to benchmark:

- Best case - userspace never uses VMX

- Worst case - userspace always uses VMX

In reality a userspace process will sit somewhere between these two
extremes. Since we need to test both aligned and unaligned copies we
end up with 4 combinations. The point at which the VMX loop begins to
win is:

0% VMX
aligned		2048 bytes
unaligned	2048 bytes

100% VMX
aligned		16384 bytes
unaligned	8192 bytes

Considering this is a microbenchmark, the data is hot in cache and
the VMX loop has better store queue merging properties we set the
breakpoint to 4096 bytes, a little below the unaligned breakpoints.

Some future optimisations we can look at:

- Looking at the perf data, a significant part of the cost when a
  task is always using VMX is the extra exception we take to restore
  the VMX state. As such we should do something similar to the x86
  optimisation that restores FPU state for heavy users. ie:

        /*
         * If the task has used fpu the last 5 timeslices, just do a full
         * restore of the math state immediately to avoid the trap; the
         * chances of needing FPU soon are obviously high now
         */
        preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;

  and

        /*
         * fpu_counter contains the number of consecutive context switches
         * that the FPU is used. If this is over a threshold, the lazy fpu
         * saving becomes unlazy to save the trap. This is an unsigned char
         * so that after 256 times the counter wraps and the behavior turns
         * lazy again; this to deal with bursty apps that only use FPU for
         * a short time
         */

- We could create a paca bit to mirror the VMX enabled MSR bit and check
  that first, avoiding multiple calls to calling enable_kernel_altivec.
  That should help with iovec based system calls like readv.

- We could have two VMX breakpoints, one for when we know the user VMX
  state is loaded into the registers and one when it isn't. This could
  be a second bit in the paca so we can calculate the break points quickly.

- One suggestion from Ben was to save and restore the VSX registers
  we use inline instead of using enable_kernel_altivec.

[BenH: Fixed a problem with preempt and fixed build without CONFIG_ALTIVEC]
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a66086b8

29 11月, 2010 1 次提交

powerpc: Add support for popcnt instructions · 64ff3128

由 Anton Blanchard 提交于 8月 12, 2010

POWER5 added popcntb, and POWER7 added popcntw and popcntd. As a first step
this patch does all the work out of line, but it would be nice to implement
them as inlines with an out of line fallback.

The performance issue with hweight was noticed when disabling SMT on a large
(192 thread) POWER7 box. The patch improves that testcase by about 8%.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

64ff3128

13 10月, 2010 1 次提交

powerpc/Makefiles: Change to new flag variables · 4108d9ba

由 matt mooney 提交于 9月 22, 2010

Replace EXTRA_CFLAGS with ccflags-y and EXTRA_AFLAGS with asflags-y.
Signed-off-by: Nmatt mooney <mfm@muteddisk.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

4108d9ba

02 9月, 2010 1 次提交

powerpc: Optimise 64bit csum_partial_copy_generic and add csum_and_copy_from_user · fdd374b6

由 Anton Blanchard 提交于 8月 02, 2010

We use the same core loop as the new csum_partial, adding in the
stores and exception handling code. To keep things simple we do all the
exception fixup in csum_and_copy_from_user. This wrapper function is
modelled on the generic checksum code and is careful to always calculate
a complete checksum even if we only copied part of the data to userspace.

To test this I forced checksumming on over loopback and ran socklib (a
simple TCP benchmark). On a POWER6 575 throughput improved by 19% with
this patch. If I forced both the sender and receiver onto the same cpu
(with the hope of shifting the benchmark from being cache bandwidth limited
to cpu limited), adding this patch improved performance by 55%
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fdd374b6

08 7月, 2010 1 次提交

powerpc: Fix module building for gcc 4.5 and 64 bit · 7fca5dc8

由 Stephen Rothwell 提交于 6月 29, 2010

Gcc 4.5 is now generating out of line register save and restore
in the function prefix and postfix when we use -Os.
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7fca5dc8

22 6月, 2010 2 次提交

powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors · 5aae8a53

由 K.Prasad 提交于 6月 15, 2010

Implement perf-events based hw-breakpoint interfaces for PowerPC
64-bit server (Book III S) processors.  This allows access to a
given location to be used as an event that can be counted or
profiled by the perf_events subsystem.

This is done using the DABR (data breakpoint register), which can
also be used for process debugging via ptrace.  When perf_event
hw_breakpoint support is configured in, the perf_event subsystem
manages the DABR and arbitrates access to it, and ptrace then
creates a perf_event when it is requested to set a data breakpoint.

[Adopted suggestions from Paul Mackerras <paulus@samba.org> to
- emulate_step() all system-wide breakpoints and single-step only the
  per-task breakpoints
- perform arch-specific cleanup before unregistration through
  arch_unregister_hw_breakpoint()
]
Signed-off-by: NK.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

5aae8a53

powerpc: Emulate most Book I instructions in emulate_step() · 0016a4cf

由 Paul Mackerras 提交于 6月 15, 2010

This extends the emulate_step() function to handle a large proportion
of the Book I instructions implemented on current 64-bit server
processors.  The aim is to handle all the load and store instructions
used in the kernel, plus all of the instructions that appear between
l[wd]arx and st[wd]cx., so this handles the Altivec/VMX lvx and stvx
and the VSX lxv2dx and stxv2dx instructions (implemented in POWER7).

The new code can emulate user mode instructions, and checks the
effective address for a load or store if the saved state is for
user mode.  It doesn't handle little-endian mode at present.

For floating-point, Altivec/VMX and VSX instructions, it checks
that the saved MSR has the enable bit for the relevant facility
set, and if so, assumes that the FP/VMX/VSX registers contain
valid state, and does loads or stores directly to/from the
FP/VMX/VSX registers, using assembly helpers in ldstfp.S.

Instructions supported now include:
* Loads and stores, including some but not all VMX and VSX instructions,
  and lmw/stmw
* Atomic loads and stores (l[dw]arx, st[dw]cx.)
* Arithmetic instructions (add, subtract, multiply, divide, etc.)
* Compare instructions
* Rotate and mask instructions
* Shift instructions
* Logical instructions (and, or, xor, etc.)
* Condition register logical instructions
* mtcrf, cntlz[wd], exts[bhw]
* isync, sync, lwsync, ptesync, eieio
* Cache operations (dcbf, dcbst, dcbt, dcbtst)

The overflow-checking arithmetic instructions are not included, but
they appear not to be ever used in C code.

This uses decimal values for the minor opcodes in the switch statements
because that is what appears in the Power ISA specification, thus it is
easier to check that they are correct if they are in decimal.

If this is used to single-step an instruction where a data breakpoint
interrupt occurred, then there is the possibility that the instruction
is a lwarx or ldarx.  In that case we have to be careful not to lose the
reservation until we get to the matching st[wd]cx., or we'll never make
forward progress.  One alternative is to try to arrange that we can
return from interrupts and handle data breakpoint interrupts without
losing the reservation, which means not using any spinlocks, mutexes,
or atomic ops (including bitops).  That seems rather fragile.  The
other alternative is to emulate the larx/stcx and all the instructions
in between.  This is why this commit adds support for a wide range
of integer instructions.
Signed-off-by: NPaul Mackerras <paulus@samba.org>

0016a4cf

16 6月, 2009 1 次提交

powerpc: Add configurable -Werror for arch/powerpc · ba55bd74

由 Michael Ellerman 提交于 6月 09, 2009

Add the option to build the code under arch/powerpc with -Werror.

The intention is to make it harder for people to inadvertantly introduce
warnings in the arch/powerpc code. It needs to be configurable so that
if a warning is introduced, people can easily work around it while it's
being fixed.

The option is a negative, ie. don't enable -Werror, so that it will be
turned on for allyes and allmodconfig builds.

The default is n, in the hope that developers will build with -Werror,
that will probably lead to some build breaks, I am prepared to be flamed.

It's not enabled for math-emu, which is a steaming pile of warnings.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ba55bd74

27 5月, 2009 1 次提交
- B
  powerpc: Move dma-noncoherent.c from arch/powerpc/lib to arch/powerpc/mm · b16e7766
  由 Benjamin Herrenschmidt 提交于 5月 27, 2009
```
(pre-requisite to make the next patches more palatable)
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
```
  b16e7766
28 11月, 2008 1 次提交

powerpc/ppc32: static ftrace fixes for PPC32 · f1eecf0e

由 Steven Rostedt 提交于 11月 26, 2008

Impact: fix for PowerPC 32 code

There were some early init code that was not safe for static
ftrace to boot on my PowerBook. This code must only use relative
addressing, and static mcount performs a compare of the
ftrace_trace_function pointer, and gets that with an absolute address.
In the early init boot up code, this will cause a fault.

This patch removes tracing from the files containing the offending
functions.
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f1eecf0e

04 8月, 2008 1 次提交

powerpc: Remove use of CONFIG_PPC_MERGE · 9c4cb825

由 Kumar Gala 提交于 8月 02, 2008

Now that arch/ppc is gone and CONFIG_PPC_MERGE is always set, remove
the dead code associated with !CONFIG_PPC_MERGE from arch/powerpc
and include/asm-powerpc.
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

9c4cb825

01 7月, 2008 3 次提交

powerpc: Add self-tests of the feature fixup code · 362e7701

由 Michael Ellerman 提交于 6月 24, 2008

This commit adds tests of the feature fixup code, they are run during
boot if CONFIG_FTR_FIXUP_SELFTEST=y. Some of the tests manually invoke
the patching routines to check their behaviour, and others use the
macros and so are patched during the normal patching done during boot.

Because we have two sets of macros with different names, we use a macro
to generate the test of the macros, very niiiice.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

362e7701

powerpc: Split out do_feature_fixups() from cputable.c · 51c52e86

由 Michael Ellerman 提交于 6月 24, 2008

The logic to patch CPU feature sections lives in cputable.c, but these days
it's used for CPU features as well as firmware features. Move it into
it's own file for neatness and as preparation for some additions.

While we're moving the code, we pull the loop body logic into a separate
routine, and remove a comment which doesn't apply anymore.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Acked-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

51c52e86

powerpc: Move code patching code into arch/powerpc/lib/code-patching.c · aaddd3ea

由 Michael Ellerman 提交于 6月 24, 2008

We currently have a few routines for patching code in asm/system.h, because
they didn't fit anywhere else. I'd like to clean them up a little and add
some more, so first move them into a dedicated C file - they don't need to
be inlined.

While we're moving the code, drop create_function_call(), it's intended
caller never got merged and will be replaced in future with something
different.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Acked-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

aaddd3ea

16 6月, 2008 1 次提交

[POWERPC] Fix -Os kernel builds with newer gcc versions · da3de6df

由 Kumar Gala 提交于 6月 13, 2008

GCC 4.4.x looks to be adding support for generating out-of-line register
saves/restores based on:

http://gcc.gnu.org/ml/gcc-patches/2008-04/msg01678.html

This breaks the kernel if we enable CONFIG_CC_OPTIMIZE_FOR_SIZE.  To fix
this we add the use the save/restore code from gcc and simplified it down
for our needs (integer only).

Additionally, we have to link this code into each module.  The other
solution was to add EXPORT_SYMBOL() which meant going through the
trampoline which seemed nonsensical for these out-of-line routines.

Finally, we add some checks to prom_init_check.sh to ignore the
out-of-line save/restore functions.
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

da3de6df

12 5月, 2008 1 次提交

[POWERPC] ppc: More compile fixes · 0d4b6b90

由 Paul Mackerras 提交于 5月 12, 2008

This fixes a few more miscellaneous compile problems with ARCH=ppc.

1. Don't compile devres.c on ARCH=ppc, it doesn't have ioremap_flags.
2. Include <asm/irq.h> in setup.c for the __DO_IRQ_CANON definition.
3. Include <linux/proc_fs.h> in residual.c for the
   definition of create_proc_read_entry.
4. Fix xchg_ptr to be a static inline to eliminate a compiler warning.
Signed-off-by: NPaul Mackerras <paulus@samba.org>

0d4b6b90

05 5月, 2008 1 次提交

[POWERPC] devres: Add devm_ioremap_prot() · b41e5fff

由 Emil Medve 提交于 5月 03, 2008

We provide an ioremap_flags, so this provides a corresponding
devm_ioremap_prot.  The slight name difference is at Ben
Herrenschmidt's request as he plans on changing ioremap_flags to
ioremap_prot in the future.
Signed-off-by: NEmil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
Acked-by: NTejun Heo <htejun@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

b41e5fff