提交 · 87a156fb18fe15d012c3db506b6b8b001af2e58d · openanolis / cloud-kernel

14 6月, 2016 2 次提交

powerpc: Align hot loops of some string functions · 87a156fb

由 Anton Blanchard 提交于 5月 26, 2016

Align the hot loops in our assembly implementation of strncpy(),
strncmp() and memchr().
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

87a156fb

powerpc: Remove assembly versions of strcpy, strcat, strlen and strcmp · 3ece1663

由 Anton Blanchard 提交于 5月 26, 2016

A number of our assembly implementations of string functions do not
align their hot loops. I was going to align them manually, but I
realised that they are are almost instruction for instruction
identical to what gcc produces, with the advantage that gcc does
align them.

In light of that, let's just remove the assembly versions.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3ece1663

23 1月, 2015 1 次提交

powerpc: Add 64bit optimised memcmp · 15c2d45d

由 Anton Blanchard 提交于 1月 21, 2015

I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.

Optimise the loop in a few ways:

- Unroll the byte at a time loop

- For large (at least 32 byte) comparisons that are also 8 byte
  aligned, use an unrolled modulo scheduled loop using 8 byte
  loads. This is similar to our glibc memcmp.

A simple microbenchmark testing 10000000 iterations of an 8192 byte
memcmp was used to measure the performance:

baseline:	29.93 s

modified:	 1.70 s

Just over 17x faster.

v2: Incorporated some suggestions from Segher:

- Use andi. instead of rdlicl.

- Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
  and was a relic from a previous version.

- Don't use cr5, we have plans to use that CR field for fast local
  atomics.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

15c2d45d

03 7月, 2012 1 次提交

powerpc: 64bit optimised __clear_user · 17968fbb

由 Anton Blanchard 提交于 5月 27, 2012

I noticed __clear_user high up in a profile of one of my RAID stress
tests. The testcase was doing a dd from /dev/zero which ends up
calling __clear_user.

__clear_user is basically a loop with a single 4 byte store which
is horribly slow. We can do much better by aligning the desination
and doing 32 bytes of 8 byte stores in a loop.

The following testcase was used to verify the patch:

http://ozlabs.org/~anton/junkcode/stress_clear_user.c

To show the improvement in performance I ran a dd from /dev/zero
to /dev/null on a POWER7 box:

Before:

# dd if=/dev/zero of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB) copied, 3.72379 s, 2.8 GB/s

After:

# time dd if=/dev/zero of=/dev/null bs=1M count=10000
10485760000 bytes (10 GB) copied, 0.728318 s, 14.4 GB/s

Over 5x faster.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

17968fbb

28 5月, 2012 1 次提交

powerpc: Use the new generic strncpy_from_user() and strnlen_user() · 1629372c

由 Paul Mackerras 提交于 5月 28, 2012

This is much the same as for SPARC except that we can do the find_zero()
function more efficiently using the count-leading-zeroes instructions.
Tested on 32-bit and 64-bit PowerPC.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1629372c

21 5月, 2010 1 次提交

powerpc: Fix string library functions · ca5d0674

由 Andreas Schwab 提交于 5月 18, 2010

The powerpc strncmp implementation does not correctly handle a zero
length, despite the claim in 0119536c
(Add hand-coded assembly strcmp).

Additionally, all the length arguments are size_t, not int, so use
PPC_LCMPI and eq instead of cmpwi and le throughout.
Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ca5d0674

07 4月, 2010 1 次提交

powerpc: Fix handling of strncmp with zero len · 637a9902

由 Jeff Mahoney 提交于 3月 17, 2010

Commit 0119536c, which added the assembly version of strncmp to
powerpc, mentions that it adds two instructions to the version from
boot/string.S to allow it to handle len=0. Unfortunately, it doesn't
always return 0 when that is the case. The length is passed in r5, but
the return value is passed back in r3. In certain cases, this will
happen to work. Otherwise it will pass back the address of the first
string as the return value.

This patch lifts the len <= 0 handling code from memcpy to handle that
case.

Reported by: Christian_Sellars@symantec.com
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
CC: <stable@kernel.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

637a9902

22 7月, 2008 1 次提交

powerpc: Use PPC_LONG and PPC_LONG_ALIGN in lib/string.S · 76bfdcf7

由 Michael Ellerman 提交于 7月 17, 2008

Replace ifdef clutter with the PPC_LONG and PPC_LONG_ALIGN macros
for readability.

No change to the generated code.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

76bfdcf7

07 4月, 2008 1 次提交

[POWERPC] Add hand-coded assembly strcmp · 0119536c

由 Steven Rostedt 提交于 3月 01, 2008

We have an assembly version of strncmp for the bootwrapper, but not
for the kernel, so we end up using the C version in the kernel. This
takes the strncmp code from the bootup and copies it to the kernel
proper, adding two instructions so it copes correctly with len==0.
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

0119536c

01 7月, 2006 1 次提交

Remove obsolete #include <linux/config.h> · 6ab3d562

由 Jörn Engel 提交于 6月 30, 2006

Signed-off-by: NJörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

6ab3d562

10 10月, 2005 1 次提交

powerpc: Use reg.h instead of processor.h when we just want reg names · b3b8dc6c

由 Paul Mackerras 提交于 10月 10, 2005

Now that the register names and bit definitions are all in reg.h,
use that instead of processor.h in assembly code in a few places.
Signed-off-by: NPaul Mackerras <paulus@samba.org>

b3b8dc6c

26 9月, 2005 1 次提交

powerpc: Merge enough to start building in arch/powerpc. · 14cf11af

由 Paul Mackerras 提交于 9月 26, 2005

This creates the directory structure under arch/powerpc and a bunch
of Kconfig files. It does a first-cut merge of arch/powerpc/mm,
arch/powerpc/lib and arch/powerpc/platforms/powermac. This is enough
to build a 32-bit powermac kernel with ARCH=powerpc.

For now we are getting some unmerged files from arch/ppc/kernel and
arch/ppc/syslib, or arch/ppc64/kernel. This makes some minor changes
to files in those directories and files outside arch/powerpc.

The boot directory is still not merged. That's going to be interesting.
Signed-off-by: NPaul Mackerras <paulus@samba.org>

14cf11af

17 4月, 2005 1 次提交

Linux-2.6.12-rc2 · 1da177e4

由 Linus Torvalds 提交于 4月 16, 2005

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

1da177e4

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功