提交 · c35f77417ebfc7c21c02aa9c8c30aa4cecf331d6 · OpenHarmony / kernel_linux

14 6月, 2012 1 次提交

x86: Define early read-mostly per-cpu macros · c35f7741

由 Ido Yariv 提交于 6月 11, 2012

Some read-mostly per-cpu data may need to be declared or defined
early, so it can be initialized and accessed before per_cpu
areas are allocated.

Only the data that resides in the per_cpu areas should be
read-mostly, as there is little benefit in optimizing cache
lines on initialization.
Signed-off-by: NIdo Yariv <ido@wizery.com>
[ Added the missing declarations in !SMP code. ]
Signed-off-by: NVlad Zolotarov <vlad@scalemp.com>
Acked-by: NShai Fultheim <shai@scalemp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/46188571.ddB8aVQYWo@vladSigned-off-by: NIngo Molnar <mingo@kernel.org>

c35f7741

15 5月, 2012 2 次提交

percpu: remove percpu_xxx() functions · 641b695c

由 Alex Shi 提交于 5月 14, 2012

Remove percpu_xxx serial functions, all of them were replaced by
this_cpu_xxx or __this_cpu_xxx serial functions
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Acked-by: NChristoph Lameter <cl@gentwo.org>
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

641b695c

x86: replace percpu_xxx funcs with this_cpu_xxx · c6ae41e7

由 Alex Shi 提交于 5月 11, 2012

Since percpu_xxx() serial functions are duplicated with this_cpu_xxx().
Removing percpu_xxx() definition and replacing them by this_cpu_xxx()
in code. There is no function change in this patch, just preparation for
later percpu_xxx serial function removing.

On x86 machine the this_cpu_xxx() serial functions are same as
__this_cpu_xxx() without no unnecessary premmpt enable/disable.

Thanks for Stephen Rothwell, he found and fixed a i386 build error in
the patch.

Also thanks for Andrew Morton, he kept updating the patchset in Linus'
tree.
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Acked-by: NChristoph Lameter <cl@gentwo.org>
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

c6ae41e7

23 12月, 2011 1 次提交

percpu: Remove irqsafe_cpu_xxx variants · 933393f5

由 Christoph Lameter 提交于 12月 22, 2011

We simply say that regular this_cpu use must be safe regardless of
preemption and interrupt state.  That has no material change for x86
and s390 implementations of this_cpu operations.  However, arches that
do not provide their own implementation for this_cpu operations will
now get code generated that disables interrupts instead of preemption.

-tj: This is part of on-going percpu API cleanup.  For detailed
     discussion of the subject, please refer to the following thread.

     http://thread.gmane.org/gmane.linux.kernel/1222078Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
LKML-Reference: <alpine.DEB.2.00.1112221154380.11787@router.home>

933393f5

15 12月, 2011 1 次提交

x86: Fix and improve percpu_cmpxchg{8,16}b_double() · cebef5be

由 Jan Beulich 提交于 12月 14, 2011

They had several problems/shortcomings:

Only the first memory operand was mentioned in the 2x32bit asm()
operands, and 2x64-bit version had a memory clobber. The first
allowed the compiler to not recognize the need to re-load the
data in case it had it cached in some register, and the second
was overly destructive.

The memory operand in the 2x32-bit asm() was declared to only be
an output.

The types of the local copies of the old and new values were
incorrect (as in other per-CPU ops, the types of the per-CPU
variables accessed should be used here, to make sure the
respective types are compatible).

The __dummy variable was pointless (and needlessly initialized
in the 2x32-bit case), given that local copies of the inputs
already exist.

The 2x64-bit variant forced the address of the first object into
%rsi, even though this is needed only for the call to the
emulation function. The real cmpxchg16b can operate on an
memory.

At once also change the return value type to what it really is -
'bool'.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4EE86D6502000078000679FE@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

cebef5be

12 7月, 2011 1 次提交

percpu: Fixup __this_cpu_xchg* operations · 688d3be8

由 Christoph Lameter 提交于 7月 12, 2011

Somehow we got into a situation where the __this_cpu_xchg() operations were
not defined in the same way as this_cpu_xchg() and friends. I had some build
failures under 32 bit that were addressed by these fixes.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

688d3be8

19 4月, 2011 1 次提交

x86, percpu: Use ASM_NOP4 instead of hardcoding P6_NOP4 · b1e7734f

由 H. Peter Anvin 提交于 4月 18, 2011

For use in assembly constants, use the ASM_NOP* defines.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1303166160-10315-2-git-send-email-hpa@linux.intel.com

b1e7734f

29 3月, 2011 2 次提交

x86: A fast way to check capabilities of the current cpu · 349c004e

由 Christoph Lameter 提交于 3月 12, 2011

Add this_cpu_has() which determines if the current cpu has a certain
ability using a segment prefix and a bit test operation.

For that we need to add bit operations to x86s percpu.h.

Many uses of cpu_has use a pointer passed to a function to determine
the current flags. That is no longer necessary after this patch.

However, this patch only converts the straightforward cases where
cpu_has is used with this_cpu_ptr. The rest is work for later.

-tj: Rolled up patch to add x86_ prefix and use percpu_read() instead
     of percpu_read_stable().
Signed-off-by: NChristoph Lameter <cl@linux.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

349c004e

percpu: Avoid extra NOP in percpu_cmpxchg16b_double · 5f55924d

由 Eric Dumazet 提交于 3月 28, 2011

percpu_cmpxchg16b_double() uses alternative_io() and looks like :

e8 .. .. .. ..  call this_cpu_cmpxchg16b_emu
X bytes	    NOPX

or, once patched (if cpu supports native instruction) on SMP build :

65 48 0f c7 0e  cmpxchg16b %gs:(%rsi)
0f 94 c0        sete %al

on !SMP build :

48 0f c7 0e     cmpxchg16b (%rsi)
0f 94 c0        sete %al

Therefore, NOPX should be :

P6_NOP3 on SMP
P6_NOP2 on !SMP
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

5f55924d

28 3月, 2011 1 次提交

percpu: Omit segment prefix in the UP case for cmpxchg_double · d7c3f8ce

由 Christoph Lameter 提交于 3月 26, 2011

Omit the segment prefix in the UP case. GS is not used then
and we will generate segfaults if cmpxchg16b is used otherwise.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d7c3f8ce

28 2月, 2011 1 次提交

percpu, x86: Add arch-specific this_cpu_cmpxchg_double() support · b9ec40af

由 Christoph Lameter 提交于 2月 28, 2011

Support this_cpu_cmpxchg_double() using the cmpxchg16b and cmpxchg8b
instructions.

-tj: s/percpu_cmpxchg16b/percpu_cmpxchg16b_double/ for consistency and
     other cosmetic changes.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

b9ec40af

26 1月, 2011 1 次提交

percpu, x86: Fix percpu_xchg_op() · 889a7a6a

由 Eric Dumazet 提交于 1月 25, 2011

These recent percpu commits:

  2485b646: x86,percpu: Move out of place 64 bit ops into X86_64 section
  8270137a: cpuops: Use cmpxchg for xchg to avoid lock semantics

Caused this 'perf top' crash:

 Kernel panic - not syncing: Fatal exception in interrupt
 Pid: 0, comm: swapper Tainted: G     D
 2.6.38-rc2-00181-gef71723 #413 Call Trace: <IRQ> [<ffffffff810465b5>]
    ? panic
    ? kmsg_dump
    ? kmsg_dump
    ? oops_end
    ? no_context
    ? __bad_area_nosemaphore
    ? perf_output_begin
    ? bad_area_nosemaphore
    ? do_page_fault
    ? __task_pid_nr_ns
    ? perf_event_tid
    ? __perf_event_header__init_id
    ? validate_chain
    ? perf_output_sample
    ? trace_hardirqs_off
    ? page_fault
    ? irq_work_run
    ? update_process_times
    ? tick_sched_timer
    ? tick_sched_timer
    ? __run_hrtimer
    ? hrtimer_interrupt
    ? account_system_vtime
    ? smp_apic_timer_interrupt
    ? apic_timer_interrupt
 ...

Looking at assembly code, I found:

list = this_cpu_xchg(irq_work_list, NULL);

gives this wrong code : (gcc-4.1.2 cross compiler)

ffffffff810bc45e:
	mov    %gs:0xead0,%rax
	cmpxchg %rax,%gs:0xead0
	jne    ffffffff810bc45e <irq_work_run+0x3e>
	test   %rax,%rax
	je     ffffffff810bc4aa <irq_work_run+0x8a>

Tell gcc we dirty eax/rax register in percpu_xchg_op()

Compiler must use another register to store pxo_new__

We also dont need to reload percpu value after a jump,
since a 'failed' cmpxchg already updated eax/rax

Wrong generated code was :
	xor     %rax,%rax   /* load 0 into %rax */
1:	mov     %gs:0xead0,%rax
	cmpxchg %rax,%gs:0xead0
	jne     1b
	test    %rax,%rax

After patch :

	xor     %rdx,%rdx   /* load 0 into %rdx */
	mov     %gs:0xead0,%rax
1:	cmpxchg %rdx,%gs:0xead0
	jne     1b:
	test    %rax,%rax
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
LKML-Reference: <1295973114.3588.312.camel@edumazet-laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

889a7a6a

12 1月, 2011 1 次提交

x86,percpu: Move out of place 64 bit ops into X86_64 section · 2485b646

由 Christoph Lameter 提交于 1月 11, 2011

Some operations that operate on 64 bit operands are defined for 32 bit.
Move them into the correct section.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

2485b646

18 12月, 2010 2 次提交

cpuops: Use cmpxchg for xchg to avoid lock semantics · 8270137a

由 Christoph Lameter 提交于 12月 14, 2010

Use cmpxchg instead of xchg to realize this_cpu_xchg.

xchg will cause LOCK overhead since LOCK is always implied but cmpxchg
will not.

Baselines:

xchg()		= 18 cycles (no segment prefix, LOCK semantics)
__this_cpu_xchg = 1 cycle

(simulated using this_cpu_read/write, two prefixes. Looks like the
cpu can use loop optimization to get rid of most of the overhead)

Cycles before:

this_cpu_xchg	 = 37 cycles (segment prefix and LOCK (implied by xchg))

After:

this_cpu_xchg	= 11 cycle (using cmpxchg without lock semantics)
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

8270137a

x86: this_cpu_cmpxchg and this_cpu_xchg operations · 7296e08a

由 Christoph Lameter 提交于 12月 14, 2010

Provide support as far as the hardware capabilities of the x86 cpus
allow.

Define CONFIG_CMPXCHG_LOCAL in Kconfig.cpu to allow core code to test for
fast cpuops implementations.

V1->V2:
	- Take out the definition for this_cpu_cmpxchg_8 and move it into
	  a separate patch.

tj: - Reordered ops to better follow this_cpu_* organization.
    - Renamed macro temp variables similar to their existing
      neighbours.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

7296e08a

17 12月, 2010 2 次提交

percpu,x86: relocate this_cpu_add_return() and friends · 40304775

由 Tejun Heo 提交于 12月 17, 2010

- include/linux/percpu.h: this_cpu_add_return() and friends were
  located next to __this_cpu_add_return().  However, the overall
  organization is to first group by preemption safeness.  Relocate
  this_cpu_add_return() and friends to preemption-safe area.

- arch/x86/include/asm/percpu.h: Relocate percpu_add_return_op() after
  other more basic operations.  Relocate [__]this_cpu_add_return_8()
  so that they're first grouped by preemption safeness.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>

40304775

x86: Support for this_cpu_add, sub, dec, inc_return · 8f1d97c7

由 Christoph Lameter 提交于 12月 06, 2010

Supply an implementation for x86 in order to generate more efficient code.

V2->V3:
	- Cleanup
	- Remove strange type checking from percpu_add_return_op.

tj: - Dropped unused typedef from percpu_add_return_op().
    - Renamed ret__ to paro_ret__ in percpu_add_return_op().
    - Minor indentation adjustments.
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

8f1d97c7

10 9月, 2010 1 次提交

x86, percpu: Optimize this_cpu_ptr · db7829c6

由 Brian Gerst 提交于 9月 09, 2010

Allow arches to implement __this_cpu_ptr, and provide an x86 version.

Before:
	movq $foo, %rax
	movq %gs:this_cpu_off, %rdx
	addq %rdx, %rax

After:
	movq $foo, %rax
	addq %gs:this_cpu_off, %rax

The benefit is doing it in one less instruction and not clobbering
a temporary register.

tj: * Beefed up the comment a bit and renamed in-macro temp variable
      to match neighboring macros.

    * Folded fix for const pointer case found in linux-next.

    * Fixed sparse notation.
Signed-off-by: NBrian Gerst <brgerst@gmail.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

db7829c6

11 6月, 2010 1 次提交

percpu, x86: Avoid warnings of unused variables in per cpu · 23b764d0

由 Andi Kleen 提交于 6月 10, 2010

Avoid hundreds of warnings with a gcc 4.6 -Wall build.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

23b764d0

29 4月, 2010 1 次提交

x86, asm: Introduce and use percpu_inc() · 402af0d7

由 Jan Beulich 提交于 4月 21, 2010

... generating slightly smaller code.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
LKML-Reference: <4BCF261F020000780003B33C@vpn.id2.novell.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

402af0d7

20 4月, 2010 1 次提交

Fix comment typo in percpu.h · 40f0a5d0

由 Justin P. Mattock 提交于 4月 19, 2010

Fix a typo in arch/x86/include/asm/percpu.h
Signed-off-by: NJustin P. Mattock <justinmattock@gmail.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

40f0a5d0

05 1月, 2010 1 次提交

percpu, x86: Generic inc / dec percpu instructions · 5917dae8

由 Christoph Lameter 提交于 1月 05, 2010

Optimize code generated for percpu access by checking for increment and
decrements.

tj: fix incorrect usage of __builtin_constant_p() and restructure
    percpu_add_op() macro.
Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

5917dae8

29 10月, 2009 2 次提交

percpu: remove per_cpu__ prefix. · dd17c8f7

由 Rusty Russell 提交于 10月 29, 2009

Now that the return from alloc_percpu is compatible with the address
of per-cpu vars, it makes sense to hand around the address of per-cpu
variables.  To make this sane, we remove the per_cpu__ prefix we used
created to stop people accidentally using these vars directly.

Now we have sparse, we can use that (next patch).

tj: * Updated to convert stuff which were missed by or added after the
      original patch.

    * Kill per_cpu_var() macro.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>

dd17c8f7

percpu: remove some sparse warnings · 0f5e4816

由 Tejun Heo 提交于 10月 29, 2009

Make the following changes to remove some sparse warnings.

* Make DEFINE_PER_CPU_SECTION() declare __pcpu_unique_* before
  defining it.

* Annotate pcpu_extend_area_map() that it is entered with pcpu_lock
  held, releases it and then reacquires it.

* Make percpu related macros use unique nested variable names.

* While at it, add pcpu prefix to __size_call[_return]() macros as
  to-be-implemented sparse annotations will add percpu specific stuff
  to these macros.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>

0f5e4816

03 10月, 2009 1 次提交

this_cpu: Implement X86 optimized this_cpu operations · 30ed1a79

由 Christoph Lameter 提交于 10月 03, 2009

Basically the existing percpu ops can be used for this_cpu variants that allow
operations also on dynamically allocated percpu data. However, we do not pass a
reference to a percpu variable in. Instead a dynamically or statically
allocated percpu variable is provided.

Preempt, the non preempt and the irqsafe operations generate the same code.
It will always be possible to have the requires per cpu atomicness in a single
RMW instruction with segment override on x86.

64 bit this_cpu operations are not supported on 32 bit.
Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

30ed1a79

04 8月, 2009 1 次提交

x86, percpu: Add 'percpu_read_stable()' interface for cacheable accesses · ed8d9adf

由 Linus Torvalds 提交于 8月 03, 2009

This is very useful for some common things like 'get_current()' and
'get_thread_info()', which can be used multiple times in a function, and
where the result is cacheable.

tj: Added the magical undocumented "P" modifier to UP __percpu_arg()
    to force gcc to dereference the pointer value passed in via the
    "p" input constraint.  Without this, percpu_read_stable() returns
    the address of the percpu variable.  Also added comment explaining
    the difference between percpu_read() and percpu_read_stable().
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

ed8d9adf

04 7月, 2009 1 次提交

x86,percpu: generalize lpage first chunk allocator · 8c4bfc6e

由 Tejun Heo 提交于 7月 04, 2009

Generalize and move x86 setup_pcpu_lpage() into
pcpu_lpage_first_chunk().  setup_pcpu_lpage() now is a simple wrapper
around the generalized version.  Other than taking size parameters and
using arch supplied callbacks to allocate/free/map memory,
pcpu_lpage_first_chunk() is identical to the original implementation.

This simplifies arch code and will help converting more archs to
dynamic percpu allocator.

While at it, factor out pcpu_calc_fc_sizes() which is common to
pcpu_embed_first_chunk() and pcpu_lpage_first_chunk().

[ Impact: code reorganization and generalization ]
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>

8c4bfc6e

22 6月, 2009 1 次提交

x86: fix pageattr handling for lpage percpu allocator and re-enable it · e59a1bb2

由 Tejun Heo 提交于 6月 22, 2009

lpage allocator aliases a PMD page for each cpu and returns whatever
is unused to the page allocator.  When the pageattr of the recycled
pages are changed, this makes the two aliases point to the overlapping
regions with different attributes which isn't allowed and known to
cause subtle data corruption in certain cases.

This can be handled in simliar manner to the x86_64 highmap alias.
pageattr code should detect if the target pages have PMD alias and
split the PMD alias and synchronize the attributes.

pcpur allocator is updated to keep the allocated PMD pages map sorted
in ascending address order and provide pcpu_lpage_remapped() function
which binary searches the array to determine whether the given address
is aliased and if so to which address.  pageattr is updated to use
pcpu_lpage_remapped() to detect the PMD alias and split it up as
necessary from cpa_process_alias().

Jan Beulich spotted the original problem and incorrect usage of vaddr
instead of laddr for lookup.

With this, lpage percpu allocator should work correctly.  Re-enable
it.

[ Impact: fix subtle lpage pageattr bug and re-enable lpage ]
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NJan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>

e59a1bb2

11 5月, 2009 1 次提交

x86: fix percpu_{to,from}_op() · 3c598766

由 Jan Beulich 提交于 5月 11, 2009

- the byte operand constraints were wrong for 32-bit
- the to-op's input operands weren't properly parenthesized

[ Impact: fix possible miscompilation or build failure ]
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

3c598766

10 3月, 2009 1 次提交

percpu: make x86 addr <-> pcpu ptr conversion macros generic · e0100983

由 Tejun Heo 提交于 3月 10, 2009

Impact: generic addr <-> pcpu ptr conversion macros

There's nothing arch specific about x86 __addr_to_pcpu_ptr() and
__pcpu_ptr_to_addr().  With proper __per_cpu_load and __per_cpu_start
defined, they'll do the right thing regardless of actual layout.

Move these macros from arch/x86/include/asm/percpu.h to mm/percpu.c
and allow archs to override it as necessary.
Signed-off-by: NTejun Heo <tj@kernel.org>

e0100983

20 2月, 2009 1 次提交

x86: convert to the new dynamic percpu allocator · 11124411

由 Tejun Heo 提交于 2月 20, 2009

Impact: use new dynamic allocator, unified access to static/dynamic
        percpu memory

Convert to the new dynamic percpu allocator.

* implement populate_extra_pte() for both 32 and 64
* update setup_per_cpu_areas() to use pcpu_setup_static()
* define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr()
* define config HAVE_DYNAMIC_PER_CPU_AREA
Signed-off-by: NTejun Heo <tj@kernel.org>

11124411

09 2月, 2009 1 次提交

x86: use linker to offset symbols by __per_cpu_load · 2add8e23

由 Brian Gerst 提交于 2月 08, 2009

Impact: cleanup and bug fix

Use the linker to create symbols for certain per-cpu variables
that are offset by __per_cpu_load.  This allows the removal of
the runtime fixup of the GDT pointer, which fixes a bug with
resume reported by Jiri Slaby.
Reported-by: NJiri Slaby <jirislaby@gmail.com>
Signed-off-by: NBrian Gerst <brgerst@gmail.com>
Acked-by: NJiri Slaby <jirislaby@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2add8e23

21 1月, 2009 1 次提交

x86: fix percpu_write with 64-bit constants · 299e2699

由 Brian Gerst 提交于 1月 21, 2009

Impact: slightly better code generation for percpu_to_op()

The processor will sign-extend 32-bit immediate values in 64-bit
operations.  Use the 'e' constraint ("32-bit signed integer constant,
or a symbolic reference known to fit that range") for 64-bit constants.
Signed-off-by: NBrian Gerst <brgerst@gmail.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

299e2699

20 1月, 2009 1 次提交

x86: move stack_canary into irq_stack · 947e76cd

由 Brian Gerst 提交于 1月 19, 2009

Impact: x86_64 percpu area layout change, irq_stack now at the beginning

Now that the PDA is empty except for the stack canary, it can be removed.
The irqstack is moved to the start of the per-cpu section.  If the stack
protector is enabled, the canary overlaps the bottom 48 bytes of the irqstack.

tj: * updated subject
    * dropped asm relocation of irq_stack_ptr
    * updated comments a bit
    * rebased on top of stack canary changes
Signed-off-by: NBrian Gerst <brgerst@gmail.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

947e76cd

18 1月, 2009 1 次提交

x86-64: Use absolute displacements for per-cpu accesses. · 87b26406

由 Brian Gerst 提交于 1月 19, 2009

Accessing memory through %gs should not use rip-relative addressing.
Adding a P prefix for the argument tells gcc to not add (%rip) to
the memory references.
Signed-off-by: NBrian Gerst <brgerst@gmail.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

87b26406

16 1月, 2009 5 次提交

percpu: add optimized generic percpu accessors · 6dbde353

由 Ingo Molnar 提交于 1月 15, 2009

It is an optimization and a cleanup, and adds the following new
generic percpu methods:

  percpu_read()
  percpu_write()
  percpu_add()
  percpu_sub()
  percpu_and()
  percpu_or()
  percpu_xor()

and implements support for them on x86. (other architectures will fall
back to a default implementation)

The advantage is that for example to read a local percpu variable,
instead of this sequence:

 return __get_cpu_var(var);

 ffffffff8102ca2b:	48 8b 14 fd 80 09 74 	mov    -0x7e8bf680(,%rdi,8),%rdx
 ffffffff8102ca32:	81
 ffffffff8102ca33:	48 c7 c0 d8 59 00 00 	mov    $0x59d8,%rax
 ffffffff8102ca3a:	48 8b 04 10          	mov    (%rax,%rdx,1),%rax

We can get a single instruction by using the optimized variants:

 return percpu_read(var);

 ffffffff8102ca3f:	65 48 8b 05 91 8f fd 	mov    %gs:0x7efd8f91(%rip),%rax

I also cleaned up the x86-specific APIs and made the x86 code use
these new generic percpu primitives.

tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
    * added percpu_and() for completeness's sake
    * made generic percpu ops atomic against preemption
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NTejun Heo <tj@kernel.org>

6dbde353

x86: convert pda ops to wrappers around x86 percpu accessors · 49357d19

由 Tejun Heo 提交于 1月 13, 2009

pda is now a percpu variable and there's no reason it can't use plain
x86 percpu accessors.  Add x86_test_and_clear_bit_percpu() and replace
pda op implementations with wrappers around x86 percpu accessors.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

49357d19

x86: merge 64 and 32 SMP percpu handling · 9939ddaf

由 Tejun Heo 提交于 1月 13, 2009

Now that pda is allocated as part of percpu, percpu doesn't need to be
accessed through pda.  Unify x86_64 SMP percpu access with x86_32 SMP
one.  Other than the segment register, operand size and the base of
percpu symbols, they behave identical now.

This patch replaces now unnecessary pda->data_offset with a dummy
field which is necessary to keep stack_canary at its place.  This
patch also moves per_cpu_offset initialization out of init_gdt() into
setup_per_cpu_areas().  Note that this change also necessitates
explicit per_cpu_offset initializations in voyager_smp.c.

With this change, x86_OP_percpu()'s are as efficient on x86_64 as on
x86_32 and also x86_64 can use assembly PER_CPU macros.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9939ddaf

x86: fold pda into percpu area on SMP · 1a51e3a0

由 Tejun Heo 提交于 1月 13, 2009

[ Based on original patch from Christoph Lameter and Mike Travis. ]

Currently pdas and percpu areas are allocated separately.  %gs points
to local pda and percpu area can be reached using pda->data_offset.
This patch folds pda into percpu area.

Due to strange gcc requirement, pda needs to be at the beginning of
the percpu area so that pda->stack_canary is at %gs:40.  To achieve
this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is
added and used to reserve pda sized chunk at the start of the percpu
area.

After this change, for boot cpu, %gs first points to pda in the
data.init area and later during setup_per_cpu_areas() gets updated to
point to the actual pda.  This means that setup_per_cpu_areas() need
to reload %gs for CPU0 while clearing pda area for other cpus as cpu0
already has modified it when control reaches setup_per_cpu_areas().

This patch also removes now unnecessary get_local_pda() and its call
sites.

A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
per cpu area" patch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1a51e3a0

x86: make early_per_cpu() a lvalue and use it · f10fcd47

由 Tejun Heo 提交于 1月 13, 2009

Make early_per_cpu() a lvalue as per_cpu() is and use it where
applicable.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f10fcd47

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多