提交 · bb916ff7cda2ad52be433b7b1f355b79b5d7d5ee · openeuler / Kernel

27 1月, 2012 2 次提交

x86-64: Handle byte-wise tail copying in memcpy() without a loop · 9d8e2277

由 Jan Beulich 提交于 1月 26, 2012

While hard to measure, reducing the number of possibly/likely
mis-predicted branches can generally be expected to be slightly
better.

Other than apparent at the first glance, this also doesn't grow
the function size (the alignment gap to the next function just
gets smaller).
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/4F218584020000780006F422@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

9d8e2277

x86-64: Fix memcpy() to support sizes of 4Gb and above · 2ab56091

由 Jan Beulich 提交于 1月 26, 2012

While currently there doesn't appear to be any reachable in-tree
case where such large memory blocks may be passed to memcpy(),
we already had hit the problem in our Xen kernels. Just like
done recently for mmeset(), rather than working around it,
prevent others from falling into the same trap by fixing this
long standing limitation.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/4F21846F020000780006F3FA@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

2ab56091

18 5月, 2011 1 次提交

x86, mem: memcpy_64.S: Optimize memcpy by enhanced REP MOVSB/STOSB · 101068c1

由 Fenghua Yu 提交于 5月 17, 2011

Support memcpy() with enhanced rep movsb. On processors supporting enhanced
rep movsb, the alternative memcpy() function using enhanced rep movsb overrides the original function and the fast string
function.
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-8-git-send-email-fenghua.yu@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

101068c1

02 5月, 2011 1 次提交

x86: Fix spelling error in the memcpy() source code comment · 9de4966a

由 Bart Van Assche 提交于 5月 01, 2011

Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/201105011409.21629.bvanassche@acm.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

9de4966a

24 8月, 2010 1 次提交

x86, mem: Optimize memcpy by avoiding memory false dependece · 59daa706

由 Ma Ling 提交于 6月 29, 2010

All read operations after allocation stage can run speculatively,
all write operation will run in program order, and if addresses are
different read may run before older write operation, otherwise wait
until write commit. However CPU don't check each address bit,
so read could fail to recognize different address even they
are in different page.For example if rsi is 0xf004, rdi is 0xe008,
in following operation there will generate big performance latency.
1. movq (%rsi),	%rax
2. movq %rax,	(%rdi)
3. movq 8(%rsi), %rax
4. movq %rax,	8(%rdi)

If %rsi and rdi were in really the same meory page, there are TRUE
read-after-write dependence because instruction 2 write 0x008 and
instruction 3 read 0x00c, the two address are overlap partially.
Actually there are in different page and no any issues,
but without checking each address bit CPU could think they are
in the same page, and instruction 3 have to wait for instruction 2
to write data into cache from write buffer, then load data from cache,
the cost time read spent is equal to mfence instruction. We may avoid it by
tuning operation sequence as follow.

1. movq 8(%rsi), %rax
2. movq %rax,	8(%rdi)
3. movq (%rsi),	%rax
4. movq %rax,	(%rdi)

Instruction 3 read 0x004, instruction 2 write address 0x010, no any
dependence.  At last on Core2 we gain 1.83x speedup compared with
original instruction sequence.  In this patch we first handle small
size(less 20bytes), then jump to different copy mode. Based on our
micro-benchmark small bytes from 1 to 127 bytes, we got up to 2X
improvement, and up to 1.5X improvement for 1024 bytes on Corei7.  (We
use our micro-benchmark, and will do further test according to your
requirment)
Signed-off-by: NMa Ling <ling.ma@intel.com>
LKML-Reference: <1277753065-18610-1-git-send-email-ling.ma@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

59daa706

08 7月, 2010 1 次提交

x86, alternatives: Use 16-bit numbers for cpufeature index · 83a7a2ad

由 H. Peter Anvin 提交于 6月 10, 2010

We already have cpufeature indicies above 255, so use a 16-bit number
for the alternatives index.  This consumes a padding field and so
doesn't add any size, but it means that abusing the padding field to
create assembly errors on overflow no longer works.  We can retain the
test simply by redirecting it to the .discard section, however.

[ v3: updated to include open-coded locations ]
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <tip-f88731e3068f9d1392ba71cc9f50f035d26a0d4f@git.kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

83a7a2ad

30 12月, 2009 1 次提交

x86-64: Modify memcpy()/memset() alternatives mechanism · 7269e881

由 Jan Beulich 提交于 12月 18, 2009

In order to avoid unnecessary chains of branches, rather than
implementing memcpy()/memset()'s access to their alternative
implementations via a jump, patch the (larger) original function
directly.

The memcpy() part of this is slightly subtle: while alternative
instruction patching does itself use memcpy(), with the
replacement block being less than 64-bytes in size the main loop
of the original function doesn't get used for copying memcpy_c()
over memcpy(), and hence we can safely write over its beginning.

Also note that the CFI annotations are fine for both variants of
each of the functions.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4B2BB8D30200007800026AF2@vpn.id2.novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7269e881

12 3月, 2009 2 次提交

x86: memcpy, clean up · f3b6eaf0

由 Ingo Molnar 提交于 3月 12, 2009

Impact: cleanup

Make this file more readable by bringing it more in line
with the usual kernel style.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f3b6eaf0

x86-64: remove unnecessary spill/reload of rbx from memcpy · dd1ef4ec

由 Jan Beulich 提交于 3月 12, 2009

Impact: micro-optimization

This should slightly improve its performance.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
LKML-Reference: <49B8F641.76E4.0078.0@novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dd1ef4ec

11 10月, 2007 2 次提交

x86_64: move lib · 185f3d38

由 Thomas Gleixner 提交于 10月 11, 2007

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

185f3d38

x86_64: prepare shared lib/memcpy.S · ff8e90da

由 Thomas Gleixner 提交于 10月 11, 2007

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ff8e90da

12 8月, 2007 1 次提交

Do not replace whole memcpy in apply alternatives · b8d3f244

由 Petr Vandrovec 提交于 8月 12, 2007

apply_alternatives uses memcpy() to apply alternatives. Which has the
unfortunate effect that while applying memcpy alternative to memcpy
itself it tries to overwrite itself with nops - which causes #UD fault
as it overwrites half of an instruction in copy loop, and from this
point on only possible outcome is triplefault and reboot.

So let's overwrite only first two instructions of memcpy - as long as
the main memcpy loop is not in first two bytes it will work fine.
Signed-off-by: NPetr Vandrovec <petr@vandrovec.name>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b8d3f244

04 10月, 2006 1 次提交
- D
  Remove all inclusions of <linux/config.h> · 038b0a6d
  由 Dave Jones 提交于 10月 04, 2006
```
kbuild explicitly includes this at build time.
Signed-off-by: NDave Jones <davej@redhat.com>
```
  038b0a6d
26 9月, 2006 1 次提交

[PATCH] annotate arch/x86_64/lib/*.S · 8d379dad

由 Jan Beulich 提交于 9月 26, 2006

Add unwind annotations to arch/x86_64/lib/*.S, and also use the macros
provided by linux/linkage.h where-ever possible.

Some of the alternative instructions handling needed to be adjusted so
that the replacement code would also have valid unwind information.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Signed-off-by: NAndi Kleen <ak@suse.de>

8d379dad

05 2月, 2006 1 次提交

[PATCH] x86_64: Undo the earlier changes to remove unrolled copy/memset functions · 7bcd3f34

由 Andi Kleen 提交于 2月 03, 2006

They cause quite bad performance regressions on Netburst
This is temporary until we can get new optimized functions
for these CPUs.

This undoes changes that were done in 2.6.15 and in 2.6.16-rc1,
essentially bringing the code back to 2.6.14 level. Only change
is I renamed the X86_FEATURE_K8_C flag to X86_FEATURE_REP_GOOD
and fixed the check for the flag and also fixed some comments.
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7bcd3f34

15 11月, 2005 1 次提交

[PATCH] x86_64: Remove optimization for B stepping AMD K8 · a5b250a4

由 Andi Kleen 提交于 11月 05, 2005

B stepping were the first shipping Opterons. memcpy/memset/copy_page/
clear_page had special optimized version for them. These are really
old and in the minority now and the difference to the generic versions
(using rep microcode) is not that big anyways. So just remove them.

TODO: figure out optimized versions for Intel Netburst based EM64T
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a5b250a4

17 4月, 2005 1 次提交

Linux-2.6.12-rc2 · 1da177e4

由 Linus Torvalds 提交于 4月 16, 2005

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

1da177e4

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功