- 14 6月, 2012 1 次提交
-
-
由 Ido Yariv 提交于
Some read-mostly per-cpu data may need to be declared or defined early, so it can be initialized and accessed before per_cpu areas are allocated. Only the data that resides in the per_cpu areas should be read-mostly, as there is little benefit in optimizing cache lines on initialization. Signed-off-by: NIdo Yariv <ido@wizery.com> [ Added the missing declarations in !SMP code. ] Signed-off-by: NVlad Zolotarov <vlad@scalemp.com> Acked-by: NShai Fultheim <shai@scalemp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/46188571.ddB8aVQYWo@vladSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 5月, 2012 2 次提交
-
-
由 Alex Shi 提交于
Remove percpu_xxx serial functions, all of them were replaced by this_cpu_xxx or __this_cpu_xxx serial functions Signed-off-by: NAlex Shi <alex.shi@intel.com> Acked-by: NChristoph Lameter <cl@gentwo.org> Acked-by: NTejun Heo <tj@kernel.org> Acked-by: N"H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Alex Shi 提交于
Since percpu_xxx() serial functions are duplicated with this_cpu_xxx(). Removing percpu_xxx() definition and replacing them by this_cpu_xxx() in code. There is no function change in this patch, just preparation for later percpu_xxx serial function removing. On x86 machine the this_cpu_xxx() serial functions are same as __this_cpu_xxx() without no unnecessary premmpt enable/disable. Thanks for Stephen Rothwell, he found and fixed a i386 build error in the patch. Also thanks for Andrew Morton, he kept updating the patchset in Linus' tree. Signed-off-by: NAlex Shi <alex.shi@intel.com> Acked-by: NChristoph Lameter <cl@gentwo.org> Acked-by: NTejun Heo <tj@kernel.org> Acked-by: N"H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 23 12月, 2011 1 次提交
-
-
由 Christoph Lameter 提交于
We simply say that regular this_cpu use must be safe regardless of preemption and interrupt state. That has no material change for x86 and s390 implementations of this_cpu operations. However, arches that do not provide their own implementation for this_cpu operations will now get code generated that disables interrupts instead of preemption. -tj: This is part of on-going percpu API cleanup. For detailed discussion of the subject, please refer to the following thread. http://thread.gmane.org/gmane.linux.kernel/1222078Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTejun Heo <tj@kernel.org> LKML-Reference: <alpine.DEB.2.00.1112221154380.11787@router.home>
-
- 15 12月, 2011 1 次提交
-
-
由 Jan Beulich 提交于
They had several problems/shortcomings: Only the first memory operand was mentioned in the 2x32bit asm() operands, and 2x64-bit version had a memory clobber. The first allowed the compiler to not recognize the need to re-load the data in case it had it cached in some register, and the second was overly destructive. The memory operand in the 2x32-bit asm() was declared to only be an output. The types of the local copies of the old and new values were incorrect (as in other per-CPU ops, the types of the per-CPU variables accessed should be used here, to make sure the respective types are compatible). The __dummy variable was pointless (and needlessly initialized in the 2x32-bit case), given that local copies of the inputs already exist. The 2x64-bit variant forced the address of the first object into %rsi, even though this is needed only for the call to the emulation function. The real cmpxchg16b can operate on an memory. At once also change the return value type to what it really is - 'bool'. Signed-off-by: NJan Beulich <jbeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Howells <dhowells@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4EE86D6502000078000679FE@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 12 7月, 2011 1 次提交
-
-
由 Christoph Lameter 提交于
Somehow we got into a situation where the __this_cpu_xchg() operations were not defined in the same way as this_cpu_xchg() and friends. I had some build failures under 32 bit that were addressed by these fixes. Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 19 4月, 2011 1 次提交
-
-
由 H. Peter Anvin 提交于
For use in assembly constants, use the ASM_NOP* defines. Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1303166160-10315-2-git-send-email-hpa@linux.intel.com
-
- 29 3月, 2011 2 次提交
-
-
由 Christoph Lameter 提交于
Add this_cpu_has() which determines if the current cpu has a certain ability using a segment prefix and a bit test operation. For that we need to add bit operations to x86s percpu.h. Many uses of cpu_has use a pointer passed to a function to determine the current flags. That is no longer necessary after this patch. However, this patch only converts the straightforward cases where cpu_has is used with this_cpu_ptr. The rest is work for later. -tj: Rolled up patch to add x86_ prefix and use percpu_read() instead of percpu_read_stable(). Signed-off-by: NChristoph Lameter <cl@linux.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NTejun Heo <tj@kernel.org> -
由 Eric Dumazet 提交于
percpu_cmpxchg16b_double() uses alternative_io() and looks like : e8 .. .. .. .. call this_cpu_cmpxchg16b_emu X bytes NOPX or, once patched (if cpu supports native instruction) on SMP build : 65 48 0f c7 0e cmpxchg16b %gs:(%rsi) 0f 94 c0 sete %al on !SMP build : 48 0f c7 0e cmpxchg16b (%rsi) 0f 94 c0 sete %al Therefore, NOPX should be : P6_NOP3 on SMP P6_NOP2 on !SMP Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 28 3月, 2011 1 次提交
-
-
由 Christoph Lameter 提交于
Omit the segment prefix in the UP case. GS is not used then and we will generate segfaults if cmpxchg16b is used otherwise. Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 2月, 2011 1 次提交
-
-
由 Christoph Lameter 提交于
Support this_cpu_cmpxchg_double() using the cmpxchg16b and cmpxchg8b instructions. -tj: s/percpu_cmpxchg16b/percpu_cmpxchg16b_double/ for consistency and other cosmetic changes. Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 26 1月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
These recent percpu commits: 2485b646: x86,percpu: Move out of place 64 bit ops into X86_64 section 8270137a: cpuops: Use cmpxchg for xchg to avoid lock semantics Caused this 'perf top' crash: Kernel panic - not syncing: Fatal exception in interrupt Pid: 0, comm: swapper Tainted: G D 2.6.38-rc2-00181-gef71723 #413 Call Trace: <IRQ> [<ffffffff810465b5>] ? panic ? kmsg_dump ? kmsg_dump ? oops_end ? no_context ? __bad_area_nosemaphore ? perf_output_begin ? bad_area_nosemaphore ? do_page_fault ? __task_pid_nr_ns ? perf_event_tid ? __perf_event_header__init_id ? validate_chain ? perf_output_sample ? trace_hardirqs_off ? page_fault ? irq_work_run ? update_process_times ? tick_sched_timer ? tick_sched_timer ? __run_hrtimer ? hrtimer_interrupt ? account_system_vtime ? smp_apic_timer_interrupt ? apic_timer_interrupt ... Looking at assembly code, I found: list = this_cpu_xchg(irq_work_list, NULL); gives this wrong code : (gcc-4.1.2 cross compiler) ffffffff810bc45e: mov %gs:0xead0,%rax cmpxchg %rax,%gs:0xead0 jne ffffffff810bc45e <irq_work_run+0x3e> test %rax,%rax je ffffffff810bc4aa <irq_work_run+0x8a> Tell gcc we dirty eax/rax register in percpu_xchg_op() Compiler must use another register to store pxo_new__ We also dont need to reload percpu value after a jump, since a 'failed' cmpxchg already updated eax/rax Wrong generated code was : xor %rax,%rax /* load 0 into %rax */ 1: mov %gs:0xead0,%rax cmpxchg %rax,%gs:0xead0 jne 1b test %rax,%rax After patch : xor %rdx,%rdx /* load 0 into %rdx */ mov %gs:0xead0,%rax 1: cmpxchg %rdx,%gs:0xead0 jne 1b: test %rax,%rax Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> LKML-Reference: <1295973114.3588.312.camel@edumazet-laptop> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 12 1月, 2011 1 次提交
-
-
由 Christoph Lameter 提交于
Some operations that operate on 64 bit operands are defined for 32 bit. Move them into the correct section. Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 18 12月, 2010 2 次提交
-
-
由 Christoph Lameter 提交于
Use cmpxchg instead of xchg to realize this_cpu_xchg. xchg will cause LOCK overhead since LOCK is always implied but cmpxchg will not. Baselines: xchg() = 18 cycles (no segment prefix, LOCK semantics) __this_cpu_xchg = 1 cycle (simulated using this_cpu_read/write, two prefixes. Looks like the cpu can use loop optimization to get rid of most of the overhead) Cycles before: this_cpu_xchg = 37 cycles (segment prefix and LOCK (implied by xchg)) After: this_cpu_xchg = 11 cycle (using cmpxchg without lock semantics) Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Christoph Lameter 提交于
Provide support as far as the hardware capabilities of the x86 cpus allow. Define CONFIG_CMPXCHG_LOCAL in Kconfig.cpu to allow core code to test for fast cpuops implementations. V1->V2: - Take out the definition for this_cpu_cmpxchg_8 and move it into a separate patch. tj: - Reordered ops to better follow this_cpu_* organization. - Renamed macro temp variables similar to their existing neighbours. Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 17 12月, 2010 2 次提交
-
-
由 Tejun Heo 提交于
- include/linux/percpu.h: this_cpu_add_return() and friends were located next to __this_cpu_add_return(). However, the overall organization is to first group by preemption safeness. Relocate this_cpu_add_return() and friends to preemption-safe area. - arch/x86/include/asm/percpu.h: Relocate percpu_add_return_op() after other more basic operations. Relocate [__]this_cpu_add_return_8() so that they're first grouped by preemption safeness. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> -
由 Christoph Lameter 提交于
Supply an implementation for x86 in order to generate more efficient code. V2->V3: - Cleanup - Remove strange type checking from percpu_add_return_op. tj: - Dropped unused typedef from percpu_add_return_op(). - Renamed ret__ to paro_ret__ in percpu_add_return_op(). - Minor indentation adjustments. Acked-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 10 9月, 2010 1 次提交
-
-
由 Brian Gerst 提交于
Allow arches to implement __this_cpu_ptr, and provide an x86 version. Before: movq $foo, %rax movq %gs:this_cpu_off, %rdx addq %rdx, %rax After: movq $foo, %rax addq %gs:this_cpu_off, %rax The benefit is doing it in one less instruction and not clobbering a temporary register. tj: * Beefed up the comment a bit and renamed in-macro temp variable to match neighboring macros. * Folded fix for const pointer case found in linux-next. * Fixed sparse notation. Signed-off-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 11 6月, 2010 1 次提交
-
-
由 Andi Kleen 提交于
Avoid hundreds of warnings with a gcc 4.6 -Wall build. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 29 4月, 2010 1 次提交
-
-
由 Jan Beulich 提交于
... generating slightly smaller code. Signed-off-by: NJan Beulich <jbeulich@novell.com> LKML-Reference: <4BCF261F020000780003B33C@vpn.id2.novell.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 20 4月, 2010 1 次提交
-
-
由 Justin P. Mattock 提交于
Fix a typo in arch/x86/include/asm/percpu.h Signed-off-by: NJustin P. Mattock <justinmattock@gmail.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 05 1月, 2010 1 次提交
-
-
由 Christoph Lameter 提交于
Optimize code generated for percpu access by checking for increment and decrements. tj: fix incorrect usage of __builtin_constant_p() and restructure percpu_add_op() macro. Signed-off-by: NChristoph Lameter <cl@linux-foundation.org> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 29 10月, 2009 2 次提交
-
-
由 Rusty Russell 提交于
Now that the return from alloc_percpu is compatible with the address of per-cpu vars, it makes sense to hand around the address of per-cpu variables. To make this sane, we remove the per_cpu__ prefix we used created to stop people accidentally using these vars directly. Now we have sparse, we can use that (next patch). tj: * Updated to convert stuff which were missed by or added after the original patch. * Kill per_cpu_var() macro. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NChristoph Lameter <cl@linux-foundation.org> -
由 Tejun Heo 提交于
Make the following changes to remove some sparse warnings. * Make DEFINE_PER_CPU_SECTION() declare __pcpu_unique_* before defining it. * Annotate pcpu_extend_area_map() that it is entered with pcpu_lock held, releases it and then reacquires it. * Make percpu related macros use unique nested variable names. * While at it, add pcpu prefix to __size_call[_return]() macros as to-be-implemented sparse annotations will add percpu specific stuff to these macros. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NChristoph Lameter <cl@linux-foundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au>
-
- 03 10月, 2009 1 次提交
-
-
由 Christoph Lameter 提交于
Basically the existing percpu ops can be used for this_cpu variants that allow operations also on dynamically allocated percpu data. However, we do not pass a reference to a percpu variable in. Instead a dynamically or statically allocated percpu variable is provided. Preempt, the non preempt and the irqsafe operations generate the same code. It will always be possible to have the requires per cpu atomicness in a single RMW instruction with segment override on x86. 64 bit this_cpu operations are not supported on 32 bit. Signed-off-by: NChristoph Lameter <cl@linux-foundation.org> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 04 8月, 2009 1 次提交
-
-
由 Linus Torvalds 提交于
This is very useful for some common things like 'get_current()' and 'get_thread_info()', which can be used multiple times in a function, and where the result is cacheable. tj: Added the magical undocumented "P" modifier to UP __percpu_arg() to force gcc to dereference the pointer value passed in via the "p" input constraint. Without this, percpu_read_stable() returns the address of the percpu variable. Also added comment explaining the difference between percpu_read() and percpu_read_stable(). Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 04 7月, 2009 1 次提交
-
-
由 Tejun Heo 提交于
Generalize and move x86 setup_pcpu_lpage() into pcpu_lpage_first_chunk(). setup_pcpu_lpage() now is a simple wrapper around the generalized version. Other than taking size parameters and using arch supplied callbacks to allocate/free/map memory, pcpu_lpage_first_chunk() is identical to the original implementation. This simplifies arch code and will help converting more archs to dynamic percpu allocator. While at it, factor out pcpu_calc_fc_sizes() which is common to pcpu_embed_first_chunk() and pcpu_lpage_first_chunk(). [ Impact: code reorganization and generalization ] Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>
-
- 22 6月, 2009 1 次提交
-
-
由 Tejun Heo 提交于
lpage allocator aliases a PMD page for each cpu and returns whatever is unused to the page allocator. When the pageattr of the recycled pages are changed, this makes the two aliases point to the overlapping regions with different attributes which isn't allowed and known to cause subtle data corruption in certain cases. This can be handled in simliar manner to the x86_64 highmap alias. pageattr code should detect if the target pages have PMD alias and split the PMD alias and synchronize the attributes. pcpur allocator is updated to keep the allocated PMD pages map sorted in ascending address order and provide pcpu_lpage_remapped() function which binary searches the array to determine whether the given address is aliased and if so to which address. pageattr is updated to use pcpu_lpage_remapped() to detect the PMD alias and split it up as necessary from cpa_process_alias(). Jan Beulich spotted the original problem and incorrect usage of vaddr instead of laddr for lookup. With this, lpage percpu allocator should work correctly. Re-enable it. [ Impact: fix subtle lpage pageattr bug and re-enable lpage ] Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NJan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu>
-
- 11 5月, 2009 1 次提交
-
-
由 Jan Beulich 提交于
- the byte operand constraints were wrong for 32-bit - the to-op's input operands weren't properly parenthesized [ Impact: fix possible miscompilation or build failure ] Signed-off-by: NJan Beulich <jbeulich@novell.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 10 3月, 2009 1 次提交
-
-
由 Tejun Heo 提交于
Impact: generic addr <-> pcpu ptr conversion macros There's nothing arch specific about x86 __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr(). With proper __per_cpu_load and __per_cpu_start defined, they'll do the right thing regardless of actual layout. Move these macros from arch/x86/include/asm/percpu.h to mm/percpu.c and allow archs to override it as necessary. Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 20 2月, 2009 1 次提交
-
-
由 Tejun Heo 提交于
Impact: use new dynamic allocator, unified access to static/dynamic percpu memory Convert to the new dynamic percpu allocator. * implement populate_extra_pte() for both 32 and 64 * update setup_per_cpu_areas() to use pcpu_setup_static() * define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr() * define config HAVE_DYNAMIC_PER_CPU_AREA Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 09 2月, 2009 1 次提交
-
-
由 Brian Gerst 提交于
Impact: cleanup and bug fix Use the linker to create symbols for certain per-cpu variables that are offset by __per_cpu_load. This allows the removal of the runtime fixup of the GDT pointer, which fixes a bug with resume reported by Jiri Slaby. Reported-by: NJiri Slaby <jirislaby@gmail.com> Signed-off-by: NBrian Gerst <brgerst@gmail.com> Acked-by: NJiri Slaby <jirislaby@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 21 1月, 2009 1 次提交
-
-
由 Brian Gerst 提交于
Impact: slightly better code generation for percpu_to_op() The processor will sign-extend 32-bit immediate values in 64-bit operations. Use the 'e' constraint ("32-bit signed integer constant, or a symbolic reference known to fit that range") for 64-bit constants. Signed-off-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 20 1月, 2009 1 次提交
-
-
由 Brian Gerst 提交于
Impact: x86_64 percpu area layout change, irq_stack now at the beginning Now that the PDA is empty except for the stack canary, it can be removed. The irqstack is moved to the start of the per-cpu section. If the stack protector is enabled, the canary overlaps the bottom 48 bytes of the irqstack. tj: * updated subject * dropped asm relocation of irq_stack_ptr * updated comments a bit * rebased on top of stack canary changes Signed-off-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 18 1月, 2009 1 次提交
-
-
由 Brian Gerst 提交于
Accessing memory through %gs should not use rip-relative addressing. Adding a P prefix for the argument tells gcc to not add (%rip) to the memory references. Signed-off-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 16 1月, 2009 5 次提交
-
-
由 Ingo Molnar 提交于
It is an optimization and a cleanup, and adds the following new generic percpu methods: percpu_read() percpu_write() percpu_add() percpu_sub() percpu_and() percpu_or() percpu_xor() and implements support for them on x86. (other architectures will fall back to a default implementation) The advantage is that for example to read a local percpu variable, instead of this sequence: return __get_cpu_var(var); ffffffff8102ca2b: 48 8b 14 fd 80 09 74 mov -0x7e8bf680(,%rdi,8),%rdx ffffffff8102ca32: 81 ffffffff8102ca33: 48 c7 c0 d8 59 00 00 mov $0x59d8,%rax ffffffff8102ca3a: 48 8b 04 10 mov (%rax,%rdx,1),%rax We can get a single instruction by using the optimized variants: return percpu_read(var); ffffffff8102ca3f: 65 48 8b 05 91 8f fd mov %gs:0x7efd8f91(%rip),%rax I also cleaned up the x86-specific APIs and made the x86 code use these new generic percpu primitives. tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out * added percpu_and() for completeness's sake * made generic percpu ops atomic against preemption Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NTejun Heo <tj@kernel.org> -
由 Tejun Heo 提交于
pda is now a percpu variable and there's no reason it can't use plain x86 percpu accessors. Add x86_test_and_clear_bit_percpu() and replace pda op implementations with wrappers around x86 percpu accessors. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Tejun Heo 提交于
Now that pda is allocated as part of percpu, percpu doesn't need to be accessed through pda. Unify x86_64 SMP percpu access with x86_32 SMP one. Other than the segment register, operand size and the base of percpu symbols, they behave identical now. This patch replaces now unnecessary pda->data_offset with a dummy field which is necessary to keep stack_canary at its place. This patch also moves per_cpu_offset initialization out of init_gdt() into setup_per_cpu_areas(). Note that this change also necessitates explicit per_cpu_offset initializations in voyager_smp.c. With this change, x86_OP_percpu()'s are as efficient on x86_64 as on x86_32 and also x86_64 can use assembly PER_CPU macros. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Tejun Heo 提交于
[ Based on original patch from Christoph Lameter and Mike Travis. ] Currently pdas and percpu areas are allocated separately. %gs points to local pda and percpu area can be reached using pda->data_offset. This patch folds pda into percpu area. Due to strange gcc requirement, pda needs to be at the beginning of the percpu area so that pda->stack_canary is at %gs:40. To achieve this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is added and used to reserve pda sized chunk at the start of the percpu area. After this change, for boot cpu, %gs first points to pda in the data.init area and later during setup_per_cpu_areas() gets updated to point to the actual pda. This means that setup_per_cpu_areas() need to reload %gs for CPU0 while clearing pda area for other cpus as cpu0 already has modified it when control reaches setup_per_cpu_areas(). This patch also removes now unnecessary get_local_pda() and its call sites. A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into per cpu area" patch. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Tejun Heo 提交于
Make early_per_cpu() a lvalue as per_cpu() is and use it where applicable. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-