1. 18 1月, 2009 6 次提交
  2. 16 1月, 2009 10 次提交
    • I
      percpu: add optimized generic percpu accessors · 6dbde353
      Ingo Molnar 提交于
      It is an optimization and a cleanup, and adds the following new
      generic percpu methods:
      
        percpu_read()
        percpu_write()
        percpu_add()
        percpu_sub()
        percpu_and()
        percpu_or()
        percpu_xor()
      
      and implements support for them on x86. (other architectures will fall
      back to a default implementation)
      
      The advantage is that for example to read a local percpu variable,
      instead of this sequence:
      
       return __get_cpu_var(var);
      
       ffffffff8102ca2b:	48 8b 14 fd 80 09 74 	mov    -0x7e8bf680(,%rdi,8),%rdx
       ffffffff8102ca32:	81
       ffffffff8102ca33:	48 c7 c0 d8 59 00 00 	mov    $0x59d8,%rax
       ffffffff8102ca3a:	48 8b 04 10          	mov    (%rax,%rdx,1),%rax
      
      We can get a single instruction by using the optimized variants:
      
       return percpu_read(var);
      
       ffffffff8102ca3f:	65 48 8b 05 91 8f fd 	mov    %gs:0x7efd8f91(%rip),%rax
      
      I also cleaned up the x86-specific APIs and made the x86 code use
      these new generic percpu primitives.
      
      tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
          * added percpu_and() for completeness's sake
          * made generic percpu ops atomic against preemption
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6dbde353
    • T
      x86: misc clean up after the percpu update · 004aa322
      Tejun Heo 提交于
      Do the following cleanups:
      
      * kill x86_64_init_pda() which now is equivalent to pda_init()
      
      * use per_cpu_offset() instead of cpu_pda() when initializing
        initial_gs
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      004aa322
    • T
      x86: convert pda ops to wrappers around x86 percpu accessors · 49357d19
      Tejun Heo 提交于
      pda is now a percpu variable and there's no reason it can't use plain
      x86 percpu accessors.  Add x86_test_and_clear_bit_percpu() and replace
      pda op implementations with wrappers around x86 percpu accessors.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      49357d19
    • T
      x86: make pda a percpu variable · b12d8db8
      Tejun Heo 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
      As pda is now allocated in percpu area, it can easily be made a proper
      percpu variable.  Make it so by defining per cpu symbol from linker
      script and declaring it in C code for SMP and simply defining it for
      UP.  This change cleans up code and brings SMP and UP closer a bit.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b12d8db8
    • T
      x86: merge 64 and 32 SMP percpu handling · 9939ddaf
      Tejun Heo 提交于
      Now that pda is allocated as part of percpu, percpu doesn't need to be
      accessed through pda.  Unify x86_64 SMP percpu access with x86_32 SMP
      one.  Other than the segment register, operand size and the base of
      percpu symbols, they behave identical now.
      
      This patch replaces now unnecessary pda->data_offset with a dummy
      field which is necessary to keep stack_canary at its place.  This
      patch also moves per_cpu_offset initialization out of init_gdt() into
      setup_per_cpu_areas().  Note that this change also necessitates
      explicit per_cpu_offset initializations in voyager_smp.c.
      
      With this change, x86_OP_percpu()'s are as efficient on x86_64 as on
      x86_32 and also x86_64 can use assembly PER_CPU macros.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9939ddaf
    • T
      x86: fold pda into percpu area on SMP · 1a51e3a0
      Tejun Heo 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
      Currently pdas and percpu areas are allocated separately.  %gs points
      to local pda and percpu area can be reached using pda->data_offset.
      This patch folds pda into percpu area.
      
      Due to strange gcc requirement, pda needs to be at the beginning of
      the percpu area so that pda->stack_canary is at %gs:40.  To achieve
      this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is
      added and used to reserve pda sized chunk at the start of the percpu
      area.
      
      After this change, for boot cpu, %gs first points to pda in the
      data.init area and later during setup_per_cpu_areas() gets updated to
      point to the actual pda.  This means that setup_per_cpu_areas() need
      to reload %gs for CPU0 while clearing pda area for other cpus as cpu0
      already has modified it when control reaches setup_per_cpu_areas().
      
      This patch also removes now unnecessary get_local_pda() and its call
      sites.
      
      A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
      per cpu area" patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1a51e3a0
    • T
      x86: use static _cpu_pda array · c8f3329a
      Tejun Heo 提交于
      _cpu_pda array first uses statically allocated storage in data.init
      and then switches to allocated bootmem to conserve space.  However,
      after folding pda area into percpu area, _cpu_pda array will be
      removed completely.  Drop the reallocation part to simplify the code
      for soon-to-follow changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c8f3329a
    • T
      x86: load pointer to pda into %gs while brining up a CPU · f32ff538
      Tejun Heo 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
      CPU startup code in head_64.S loaded address of a zero page into %gs
      for temporary use till pda is loaded but address to the actual pda is
      available at the point.  Load the real address directly instead.
      
      This will help unifying percpu and pda handling later on.
      
      This patch is mostly taken from Mike Travis' "x86_64: Fold pda into
      per cpu area" patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f32ff538
    • T
      x86: make early_per_cpu() a lvalue and use it · f10fcd47
      Tejun Heo 提交于
      Make early_per_cpu() a lvalue as per_cpu() is and use it where
      applicable.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f10fcd47
    • T
      x86: fix pda_to_op() · 7de6883f
      Tejun Heo 提交于
      There's no instruction to move a 64bit immediate into memory location.
      Drop "i".
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7de6883f
  3. 14 1月, 2009 5 次提交
  4. 13 1月, 2009 2 次提交
  5. 12 1月, 2009 3 次提交
    • M
      cpumask, irq: non-x86 build failures · 92296c6d
      Mike Travis 提交于
      Ingo Molnar wrote:
      
      > All non-x86 architectures fail to build:
      >
      > In file included from /home/mingo/tip/include/linux/random.h:11,
      >                  from /home/mingo/tip/include/linux/stackprotector.h:6,
      >                  from /home/mingo/tip/init/main.c:17:
      > /home/mingo/tip/include/linux/irqnr.h:26:63: error: asm/irq_vectors.h: No such file or directory
      
      Do not include asm/irq_vectors.h in generic code - it's not available
      on all architectures.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      92296c6d
    • M
      irq: initialize nr_irqs based on nr_cpu_ids · 9332fccd
      Mike Travis 提交于
      Impact: Reduce memory usage.
      
      This is the second half of the changes to make the irq_desc_ptrs be
      variable sized based on nr_cpu_ids.  This is done by adding a new
      "max_nr_irqs" macro to irq_vectors.h (and a dummy in irqnr.h) to
      return a max NR_IRQS value based on NR_CPUS or nr_cpu_ids.
      
      This necessitated moving the define of MAX_IO_APICS to a separate
      file (asm/apicnum.h) so it could be included without the baggage
      of the other asm/apicdef.h declarations.
      Signed-off-by: NMike Travis <travis@sgi.com>
      9332fccd
    • R
      x86: change flush_tlb_others to take a const struct cpumask · 4595f962
      Rusty Russell 提交于
      Impact: reduce stack usage, use new cpumask API.
      
      This is made a little more tricky by uv_flush_tlb_others which
      actually alters its argument, for an IPI to be sent to the remaining
      cpus in the mask.
      
      I solve this by allocating a cpumask_var_t for this case and falling back
      to IPI should this fail.
      
      To eliminate temporaries in the caller, all flush_tlb_others implementations
      now do the this-cpu-elimination step themselves.
      
      Note also the curious "cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask)"
      which has been there since pre-git and yet f->flush_cpumask is always zero
      at this point.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NMike Travis <travis@sgi.com>
      4595f962
  6. 11 1月, 2009 4 次提交
  7. 10 1月, 2009 1 次提交
  8. 08 1月, 2009 7 次提交
  9. 07 1月, 2009 2 次提交