提交 · 819165fb34b9777f852429f2c6d6f79fbb71b9eb · openeuler / raspberrypi-kernel

21 1月, 2012 1 次提交

x86: Adjust asm constraints in atomic64 wrappers · 819165fb

由 Jan Beulich 提交于 1月 20, 2012

Eric pointed out overly restrictive constraints in atomic64_set(), but
there are issues throughout the file. In the cited case, %ebx and %ecx
are inputs only (don't get changed by either of the two low level
implementations). This was also the case elsewhere.

Further in many cases early-clobber indicators were missing.

Finally, the previous implementation rolled a custom alternative
instruction macro from scratch, rather than using alternative_call()
(which was introduced with the commit that the description of the
change in question actually refers to). Adjusting has the benefit of
not hiding referenced symbols from the compiler, which however requires
them to be declared not just in the exporting source file (which, as a
desirable side effect, in turn allows that exporting file to become a
real 5-line stub).

This patch does not eliminate the overly restrictive memory clobbers,
however: Doing so would occasionally make the compiler set up a second
register for accessing the memory object (to satisfy the added "m"
constraint), and it's not clear which of the two non-optimal
alternatives is better.

v2: Re-do the declaration and exporting of the internal symbols.
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/4F19A2A5020000780006E0D9@nat28.tlf.novell.com
Cc: Luca Barbieri <luca@luca-barbieri.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

819165fb

18 1月, 2012 1 次提交

x86, opcode: ANDN and Group 17 in x86-opcode-map.txt · ce79dac8

由 Ulrich Drepper 提交于 1月 17, 2012

The Intel documentation at

http://software.intel.com/file/36945

shows the ANDN opcode and Group 17 with encoding f2 and f3 encoding
respectively. The current version of x86-opcode-map.txt shows them
with f3 and f4. Unless someone can point to documentation which shows
the currently used encoding the following patch be applied.
Signed-off-by: NUlrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/CAOPLpQdq5SuVo9=023CYhbFLAX9rONyjmYq7jJkqc5xwctW5eA@mail.gmail.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

ce79dac8

16 1月, 2012 1 次提交

x86/kprobes: Fix typo transferred from Intel manual · 8d973b62

由 Ulrich Drepper 提交于 1月 15, 2012

The arch/x86/lib/x86-opcode-map.txt file [used by the
kprobes instruction decoder] contains the line:

  af: SCAS/W/D/Q rAX,Xv

This is what the Intel manuals show, but it's not correct.
The 'X' stands for:

  Memory addressed by the DS:rSI register pair (for example, MOVS, CMPS, OUTS, or LODS).

On the other hand 'Y' means (also see the ae byte entry for
SCASB):

  Memory addressed by the ES:rDI register pair (for example, MOVS, CMPS, INS, STOS, or SCAS).
Signed-off-by: NUlrich Drepper <drepper@gmail.com>
Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/CAOPLpQfytPyDEBF1Hbkpo7ovUerEsstVGxBr%3DEpDL-BKEMaqLA@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

8d973b62

13 12月, 2011 1 次提交

x86/i386: Use less assembly in strlen(), speed things up a bit · 890890cb

由 Alexey Dobriyan 提交于 12月 11, 2011

Current i386 strlen() hardcodes NOT/DEC sequence. DEC is
mentioned to be suboptimal on Core2. So, put only REPNE SCASB
sequence in assembly, compiler can do the rest.

The difference in generated code is like below (MCORE2=y):

	<strlen>:
		push   %edi
		mov    $0xffffffff,%ecx
		mov    %eax,%edi
		xor    %eax,%eax
		repnz scas %es:(%edi),%al
		not    %ecx

	-	dec    %ecx
	-	mov    %ecx,%eax
	+	lea    -0x1(%ecx),%eax

		pop    %edi
		ret
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Beulich <JBeulich@suse.com>
Link: http://lkml.kernel.org/r/20111211181319.GA17097@p183.telecom.bySigned-off-by: NIngo Molnar <mingo@elte.hu>

890890cb

05 12月, 2011 2 次提交

x86: Update instruction decoder to support new AVX formats · a9c373d0

由 Masami Hiramatsu 提交于 12月 05, 2011

Since new Intel software developers manual introduces
new format for AVX instruction set (including AVX2),
it is important to update x86-opcode-map.txt to fit
those changes.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20111205120557.15475.13236.stgit@cloudSigned-off-by: NIngo Molnar <mingo@elte.hu>

a9c373d0

x86: Fix instruction decoder to handle grouped AVX instructions · 130b78b2

由 Masami Hiramatsu 提交于 12月 05, 2011

For reducing memory usage of attribute table, x86 instruction
decoder puts "Group" attribute only on "no-last-prefix"
attribute table (same as vex_p == 0 case).

Thus, the decoder should look no-last-prefix table first, and
then only if it is not a group, move on to "with-last-prefix"
table (vex_p != 0).

However, current implementation, inat_get_avx_attribute()
looks with-last-prefix directly. So, when decoding
a grouped AVX instruction, the decoder fails to find correct
group because there is no "Group" attribute on the table.
This ends up with the mis-decoding of instructions, as Ingo
reported in http://thread.gmane.org/gmane.linux.kernel/1214103

This patch fixes it to check no-last-prefix table first
even if that is an AVX instruction, and get an attribute from
"with last-prefix" table only if that is not a group.
Reported-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20111205120539.15475.91428.stgit@cloudSigned-off-by: NIngo Molnar <mingo@elte.hu>

130b78b2

10 10月, 2011 1 次提交

x86: Fix insn decoder for longer instruction · 53a019a9

由 Masami Hiramatsu 提交于 10月 07, 2011

Fix x86 insn decoder for hardening against invalid length
instructions. This adds length checkings for each byte-read
site and if it exceeds MAX_INSN_SIZE, returns immediately.
This can happen when decoding user-space binary.

Caller can check whether it happened by checking insn.*.got
member is set or not.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: acme@redhat.com
Cc: ming.m.lin@intel.com
Cc: robert.richter@amd.com
Cc: ravitillo@lbl.gov
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20111007133155.10933.58577.stgit@localhost.localdomainSigned-off-by: NIngo Molnar <mingo@elte.hu>

53a019a9

27 7月, 2011 1 次提交

atomic: use <linux/atomic.h> · 60063497

由 Arun Sharma 提交于 7月 26, 2011

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: NArun Sharma <asharma@fb.com>
Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

60063497

22 7月, 2011 1 次提交

x86, perf: Make copy_from_user_nmi() a library function · 1ac2e6ca

由 Robert Richter 提交于 6月 07, 2011

copy_from_user_nmi() is used in oprofile and perf. Moving it to other
library functions like copy_from_user(). As this is x86 code for 32
and 64 bits, create a new file usercopy.c for unified code.
Signed-off-by: NRobert Richter <robert.richter@amd.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110607172413.GJ20052@erda.amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

1ac2e6ca

21 7月, 2011 3 次提交

x86: Fix write lock scalability 64-bit issue · a750036f

由 Jan Beulich 提交于 7月 19, 2011

With the write lock path simply subtracting RW_LOCK_BIAS there
is, on large systems, the theoretical possibility of overflowing
the 32-bit value that was used so far (namely if 128 or more
CPUs manage to do the subtraction, but don't get to do the
inverse addition in the failure path quickly enough).

A first measure is to modify RW_LOCK_BIAS itself - with the new
value chosen, it is good for up to 2048 CPUs each allowed to
nest over 2048 times on the read path without causing an issue.
Quite possibly it would even be sufficient to adjust the bias a
little further, assuming that allowing for significantly less
nesting would suffice.

However, as the original value chosen allowed for even more
nesting levels, to support more than 2048 CPUs (possible
currently only for 64-bit kernels) the lock itself gets widened
to 64 bits.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4E258E0D020000780004E3F0@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

a750036f

x86: Unify rwsem assembly implementation · a7386694

由 Jan Beulich 提交于 7月 19, 2011

Rather than having two functionally identical implementations
for 32- and 64-bit configurations, use the previously extended
assembly abstractions to fold the rwsem two implementations into
a shared one.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4E258DF3020000780004E3ED@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

a7386694

x86: Unify rwlock assembly implementation · 4625cd63

由 Jan Beulich 提交于 7月 19, 2011

Rather than having two functionally identical implementations
for 32- and 64-bit configurations, extend the existing assembly
abstractions enough to fold the two rwlock implementations into
a shared one.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4E258DD7020000780004E3EA@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

4625cd63

14 7月, 2011 1 次提交

x86: Make alternative instruction pointers relative · 59e97e4d

由 Andy Lutomirski 提交于 7月 13, 2011

This save a few bytes on x86-64 and means that future patches can
apply alternatives to unrelocated code.
Signed-off-by: NAndy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/ff64a6b9a1a3860ca4a7b8b6dc7b4754f9491cd7.1310563276.git.luto@mit.eduSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

59e97e4d

04 6月, 2011 1 次提交

x86, asm: Cleanup thunk_64.S · 38e6b75d

由 Borislav Petkov 提交于 5月 31, 2011

Drop thunk_ra macro in favor of an additional argument to the thunk
macro since their bodies are almost identical. Do a whitespace scrubbing
and use CFI-aware macros for full annotation.
Signed-off-by: NBorislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1306873314-32523-5-git-send-email-bp@alien8.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

38e6b75d

18 5月, 2011 6 次提交

x86, 64-bit: Fix copy_[to/from]_user() checks for the userspace address limit · 26afb7c6

由 Jiri Olsa 提交于 5月 12, 2011

As reported in BZ #30352:

  https://bugzilla.kernel.org/show_bug.cgi?id=30352

there's a kernel bug related to reading the last allowed page on x86_64.

The _copy_to_user() and _copy_from_user() functions use the following
check for address limit:

  if (buf + size >= limit)
	fail();

while it should be more permissive:

  if (buf + size > limit)
	fail();

That's because the size represents the number of bytes being
read/write from/to buf address AND including the buf address.
So the copy function will actually never touch the limit
address even if "buf + size == limit".

Following program fails to use the last page as buffer
due to the wrong limit check:

 #include <sys/mman.h>
 #include <sys/socket.h>
 #include <assert.h>

 #define PAGE_SIZE       (4096)
 #define LAST_PAGE       ((void*)(0x7fffffffe000))

 int main()
 {
        int fds[2], err;
        void * ptr = mmap(LAST_PAGE, PAGE_SIZE, PROT_READ | PROT_WRITE,
                          MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
        assert(ptr == LAST_PAGE);
        err = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
        assert(err == 0);
        err = send(fds[0], ptr, PAGE_SIZE, 0);
        perror("send");
        assert(err == PAGE_SIZE);
        err = recv(fds[1], ptr, PAGE_SIZE, MSG_WAITALL);
        perror("recv");
        assert(err == PAGE_SIZE);
        return 0;
 }

The other place checking the addr limit is the access_ok() function,
which is working properly. There's just a misleading comment
for the __range_not_ok() macro - which this patch fixes as well.

The last page of the user-space address range is a guard page and
Brian Gerst observed that the guard page itself due to an erratum on K8 cpus
(#121 Sequential Execution Across Non-Canonical Boundary Causes Processor
Hang).

However, the test code is using the last valid page before the guard page.
The bug is that the last byte before the guard page can't be read
because of the off-by-one error. The guard page is left in place.

This bug would normally not show up because the last page is
part of the process stack and never accessed via syscalls.
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
Acked-by: NBrian Gerst <brgerst@gmail.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1305210630-7136-1-git-send-email-jolsa@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

26afb7c6

x86, mem: memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB · 2f19e06a

由 Fenghua Yu 提交于 5月 17, 2011

Support memset() with enhanced rep stosb. On processors supporting enhanced
REP MOVSB/STOSB, the alternative memset_c_e function using enhanced rep stosb
overrides the fast string alternative memset_c and the original function.
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-10-git-send-email-fenghua.yu@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

2f19e06a

x86, mem: memmove_64.S: Optimize memmove by enhanced REP MOVSB/STOSB · 057e05c1

由 Fenghua Yu 提交于 5月 17, 2011

Support memmove() by enhanced rep movsb. On processors supporting enhanced
REP MOVSB/STOSB, the alternative memmove() function using enhanced rep movsb
overrides the original function.

The patch doesn't change the backward memmove case to use enhanced rep
movsb.
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-9-git-send-email-fenghua.yu@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

057e05c1

x86, mem: memcpy_64.S: Optimize memcpy by enhanced REP MOVSB/STOSB · 101068c1

由 Fenghua Yu 提交于 5月 17, 2011

Support memcpy() with enhanced rep movsb. On processors supporting enhanced
rep movsb, the alternative memcpy() function using enhanced rep movsb overrides the original function and the fast string
function.
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-8-git-send-email-fenghua.yu@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

101068c1

x86, mem: copy_user_64.S: Support copy_to/from_user by enhanced REP MOVSB/STOSB · 4307bec9

由 Fenghua Yu 提交于 5月 17, 2011

Support copy_to_user/copy_from_user() by enhanced REP MOVSB/STOSB.
On processors supporting enhanced REP MOVSB/STOSB, the alternative
copy_user_enhanced_fast_string function using enhanced rep movsb overrides the
original function and the fast string function.
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-7-git-send-email-fenghua.yu@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

4307bec9

x86, mem: clear_page_64.S: Support clear_page() with enhanced REP MOVSB/STOSB · e365c9df

由 Fenghua Yu 提交于 5月 17, 2011

Intel processors are adding enhancements to REP MOVSB/STOSB and the use of
REP MOVSB/STOSB for optimal memcpy/memset or similar functions is recommended.
Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/
STOSB).

Support clear_page() with rep stosb for processor supporting enhanced REP MOVSB
/STOSB. On processors supporting enhanced REP MOVSB/STOSB, the alternative
clear_page_c_e function using enhanced REP STOSB overrides the original function
and the fast string function.
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-6-git-send-email-fenghua.yu@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

e365c9df

02 5月, 2011 1 次提交

x86: Fix spelling error in the memcpy() source code comment · 9de4966a

由 Bart Van Assche 提交于 5月 01, 2011

Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/201105011409.21629.bvanassche@acm.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

9de4966a

28 3月, 2011 1 次提交

percpu: Omit segment prefix in the UP case for cmpxchg_double · d7c3f8ce

由 Christoph Lameter 提交于 3月 26, 2011

Omit the segment prefix in the UP case. GS is not used then
and we will generate segfaults if cmpxchg16b is used otherwise.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d7c3f8ce

18 3月, 2011 2 次提交

x86: Clean up csum-copy_64.S a bit · 2c76397b

由 Ingo Molnar 提交于 3月 18, 2011

The many stray whitespaces and other uncleanlinesses made this code
almost unreadable to me - so fix those.

No changes to the code.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2c76397b

x86: Fix common misspellings · 0d2eb44f

由 Lucas De Marchi 提交于 3月 17, 2011

They were generated by 'codespell' and then manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: trivial@kernel.org
LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0d2eb44f

02 3月, 2011 1 次提交

x86: Fix a bogus unwind annotation in lib/semaphore_32.S · e938c287

由 Jan Beulich 提交于 3月 01, 2011

'simple' would have required specifying current frame address
and return address location manually, but that's obviously not
the case (and not necessary) here.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
LKML-Reference: <4D6D1082020000780003454C@vpn.id2.novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e938c287

01 3月, 2011 3 次提交

x86: Remove unused bits from lib/thunk_*.S · 039e1389

由 Jan Beulich 提交于 2月 28, 2011

Some of the items removed were apparently never used, others
simply didn't get removed with their last user.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
LKML-Reference: <4D6BD3A002000078000341F1@vpn.id2.novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

039e1389

x86: Use {push,pop}_cfi in more places · 60cf637a

由 Jan Beulich 提交于 2月 28, 2011

Cleaning up and shortening code...
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4D6BD35002000078000341DA@vpn.id2.novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

60cf637a

x86-64: Add CFI annotations to lib/rwsem_64.S · 39f2205e

由 Jan Beulich 提交于 2月 28, 2011

These weren't part of the initial commit of this code.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4D6BCDFF02000078000341B0@vpn.id2.novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

39f2205e

28 2月, 2011 1 次提交

percpu, x86: Add arch-specific this_cpu_cmpxchg_double() support · b9ec40af

由 Christoph Lameter 提交于 2月 28, 2011

Support this_cpu_cmpxchg_double() using the cmpxchg16b and cmpxchg8b
instructions.

-tj: s/percpu_cmpxchg16b/percpu_cmpxchg16b_double/ for consistency and
     other cosmetic changes.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

b9ec40af

26 1月, 2011 1 次提交

x86-64, mem: Convert memmove() to assembly file and fix return value bug · 9599ec04

由 Fenghua Yu 提交于 1月 17, 2011

memmove_64.c only implements memmove() function which is completely written in
inline assembly code. Therefore it doesn't make sense to keep the assembly code
in .c file.

Currently memmove() doesn't store return value to rax. This may cause issue if
caller uses the return value. The patch fixes this issue.
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1295314755-6625-1-git-send-email-fenghua.yu@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

9599ec04

04 1月, 2011 1 次提交

x86: udelay: Use this_cpu_read to avoid address calculation · 357089fc

由 Christoph Lameter 提交于 12月 16, 2010

The code will use a segment prefix instead of doing the lookup and
calculation.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

357089fc

25 9月, 2010 1 次提交

x86, mem: Optimize memmove for small size and unaligned cases · 3b4b682b

由 Ma Ling 提交于 9月 17, 2010

movs instruction will combine data to accelerate moving data,
however we need to concern two cases about it.

1. movs instruction need long lantency to startup,
   so here we use general mov instruction to copy data.
2. movs instruction is not good for unaligned case,
   even if src offset is 0x10, dest offset is 0x0,
   we avoid and handle the case by general mov instruction.
Signed-off-by: NMa Ling <ling.ma@intel.com>
LKML-Reference: <1284664360-6138-1-git-send-email-ling.ma@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

3b4b682b

24 8月, 2010 2 次提交

x86, mem: Optimize memcpy by avoiding memory false dependece · 59daa706

由 Ma Ling 提交于 6月 29, 2010

All read operations after allocation stage can run speculatively,
all write operation will run in program order, and if addresses are
different read may run before older write operation, otherwise wait
until write commit. However CPU don't check each address bit,
so read could fail to recognize different address even they
are in different page.For example if rsi is 0xf004, rdi is 0xe008,
in following operation there will generate big performance latency.
1. movq (%rsi),	%rax
2. movq %rax,	(%rdi)
3. movq 8(%rsi), %rax
4. movq %rax,	8(%rdi)

If %rsi and rdi were in really the same meory page, there are TRUE
read-after-write dependence because instruction 2 write 0x008 and
instruction 3 read 0x00c, the two address are overlap partially.
Actually there are in different page and no any issues,
but without checking each address bit CPU could think they are
in the same page, and instruction 3 have to wait for instruction 2
to write data into cache from write buffer, then load data from cache,
the cost time read spent is equal to mfence instruction. We may avoid it by
tuning operation sequence as follow.

1. movq 8(%rsi), %rax
2. movq %rax,	8(%rdi)
3. movq (%rsi),	%rax
4. movq %rax,	(%rdi)

Instruction 3 read 0x004, instruction 2 write address 0x010, no any
dependence.  At last on Core2 we gain 1.83x speedup compared with
original instruction sequence.  In this patch we first handle small
size(less 20bytes), then jump to different copy mode. Based on our
micro-benchmark small bytes from 1 to 127 bytes, we got up to 2X
improvement, and up to 1.5X improvement for 1024 bytes on Corei7.  (We
use our micro-benchmark, and will do further test according to your
requirment)
Signed-off-by: NMa Ling <ling.ma@intel.com>
LKML-Reference: <1277753065-18610-1-git-send-email-ling.ma@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

59daa706

x86, mem: Don't implement forward memmove() as memcpy() · fdf42896

由 Ma, Ling 提交于 8月 23, 2010

memmove() allow source and destination address to be overlap, but
there is no such limitation for memcpy().  Therefore, explicitly
implement memmove() in both the forwards and backward directions, to
give us the ability to optimize memcpy().
Signed-off-by: NMa Ling <ling.ma@intel.com>
LKML-Reference: <C10D3FB0CD45994C8A51FEC1227CE22F0E483AD86A@shsmsx502.ccr.corp.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

fdf42896

12 8月, 2010 2 次提交

x86, asm: Use a lower case name for the end macro in atomic64_386_32.S · 417484d4

由 Luca Barbieri 提交于 8月 12, 2010

Use a lowercase name for the end macro, which somehow fixes a binutils 2.16
problem.
Signed-off-by: NLuca Barbieri <luca@luca-barbieri.com>
LKML-Reference: <tip-30246557@git.kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

417484d4

x86, asm: Refactor atomic64_386_32.S to support old binutils and be cleaner · 30246557

由 Luca Barbieri 提交于 8月 06, 2010

The old code didn't work on binutils 2.12 because setting a symbol to
a register apparently requires a fairly recent version.

This commit refactors the code to use the C preprocessor instead, and
in the process makes the whole code a bit easier to understand.

The object code produced is unchanged as expected.

This fixes kernel bugzilla 16506.
Reported-by: NDieter Stussy <kd6lvw+software@kd6lvw.ampr.org>
Signed-off-by: NLuca Barbieri <luca@luca-barbieri.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org> 2.6.35
LKML-Reference: <tip-*@git.kernel.org>

30246557

29 7月, 2010 2 次提交

x86, asm: Merge cmpxchg_486_u64() and cmpxchg8b_emu() · a378d933

由 H. Peter Anvin 提交于 7月 28, 2010

We have two functions for doing exactly the same thing -- emulating
cmpxchg8b on 486 and older hardware -- with different calling
conventions, and yet doing the same thing.  Drop the C version and use
the assembly version, via alternatives, for both the local and
non-local versions of cmpxchg8b.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <AANLkTikAmaDPji-TVDarmG1yD=fwbffcsmEU=YEuP+8r@mail.gmail.com>

a378d933

x86, asm: Move cmpxchg emulation code to arch/x86/lib · 90c8f92f

由 H. Peter Anvin 提交于 7月 28, 2010

Move cmpxchg emulation code from arch/x86/kernel/cpu (which is
otherwise CPU identification) to arch/x86/lib, where other emulation
code lives already.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <AANLkTikAmaDPji-TVDarmG1yD=fwbffcsmEU=YEuP+8r@mail.gmail.com>

90c8f92f

14 7月, 2010 1 次提交

x86, alternatives: Fix one more open-coded 8-bit alternative number · df378ccf

由 H. Peter Anvin 提交于 7月 13, 2010

Fix a missing case of an 8-bit alternative number, buried inside an
assembly macro.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Reported-by: NYinghai Lu <yinhai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <4C3BDDA3.2060900@kernel.org>

df378ccf

08 7月, 2010 1 次提交

x86, alternatives: Use 16-bit numbers for cpufeature index · 83a7a2ad

由 H. Peter Anvin 提交于 6月 10, 2010

We already have cpufeature indicies above 255, so use a 16-bit number
for the alternatives index.  This consumes a padding field and so
doesn't add any size, but it means that abusing the padding field to
create assembly errors on overflow no longer works.  We can retain the
test simply by redirecting it to the .discard section, however.

[ v3: updated to include open-coded locations ]
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <tip-f88731e3068f9d1392ba71cc9f50f035d26a0d4f@git.kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

83a7a2ad