1. 18 12月, 2010 6 次提交
    • C
      vmstat: User per cpu atomics to avoid interrupt disable / enable · 7c839120
      Christoph Lameter 提交于
      Currently the operations to increment vm counters must disable interrupts
      in order to not mess up their housekeeping of counters.
      
      So use this_cpu_cmpxchg() to avoid the overhead. Since we can no longer
      count on preremption being disabled we still have some minor issues.
      The fetching of the counter thresholds is racy.
      A threshold from another cpu may be applied if we happen to be
      rescheduled on another cpu.  However, the following vmstat operation
      will then bring the counter again under the threshold limit.
      
      The operations for __xxx_zone_state are not changed since the caller
      has taken care of the synchronization needs (and therefore the cycle
      count is even less than the optimized version for the irq disable case
      provided here).
      
      The optimization using this_cpu_cmpxchg will only be used if the arch
      supports efficient this_cpu_ops (must have CONFIG_CMPXCHG_LOCAL set!)
      
      The use of this_cpu_cmpxchg reduces the cycle count for the counter
      operations by %80 (inc_zone_page_state goes from 170 cycles to 32).
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      7c839120
    • C
      irq_work: Use per cpu atomics instead of regular atomics · 20b87691
      Christoph Lameter 提交于
      The irq work queue is a per cpu object and it is sufficient for
      synchronization if per cpu atomics are used. Doing so simplifies
      the code and reduces the overhead of the code.
      
      Before:
      
      christoph@linux-2.6$ size kernel/irq_work.o
         text	   data	    bss	    dec	    hex	filename
          451	      8	      1	    460	    1cc	kernel/irq_work.o
      
      After:
      
      christoph@linux-2.6$ size kernel/irq_work.o 
         text	   data	    bss	    dec	    hex	filename
          438	      8	      1	    447	    1bf	kernel/irq_work.o
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      20b87691
    • T
      Merge branch 'this_cpu_ops' into for-2.6.38 · 05c2d088
      Tejun Heo 提交于
      05c2d088
    • C
      cpuops: Use cmpxchg for xchg to avoid lock semantics · 8270137a
      Christoph Lameter 提交于
      Use cmpxchg instead of xchg to realize this_cpu_xchg.
      
      xchg will cause LOCK overhead since LOCK is always implied but cmpxchg
      will not.
      
      Baselines:
      
      xchg()		= 18 cycles (no segment prefix, LOCK semantics)
      __this_cpu_xchg = 1 cycle
      
      (simulated using this_cpu_read/write, two prefixes. Looks like the
      cpu can use loop optimization to get rid of most of the overhead)
      
      Cycles before:
      
      this_cpu_xchg	 = 37 cycles (segment prefix and LOCK (implied by xchg))
      
      After:
      
      this_cpu_xchg	= 11 cycle (using cmpxchg without lock semantics)
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      8270137a
    • C
      x86: this_cpu_cmpxchg and this_cpu_xchg operations · 7296e08a
      Christoph Lameter 提交于
      Provide support as far as the hardware capabilities of the x86 cpus
      allow.
      
      Define CONFIG_CMPXCHG_LOCAL in Kconfig.cpu to allow core code to test for
      fast cpuops implementations.
      
      V1->V2:
      	- Take out the definition for this_cpu_cmpxchg_8 and move it into
      	  a separate patch.
      
      tj: - Reordered ops to better follow this_cpu_* organization.
          - Renamed macro temp variables similar to their existing
            neighbours.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      7296e08a
    • C
      percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support · 2b712442
      Christoph Lameter 提交于
      Generic code to provide new per cpu atomic features
      
      	this_cpu_cmpxchg
      	this_cpu_xchg
      
      Fallback occurs to functions using interrupts disable/enable
      to ensure correct per cpu atomicity.
      
      Fallback to regular cmpxchg and xchg is not possible since per cpu atomic
      semantics include the guarantee that the current cpus per cpu data is
      accessed atomically. Use of regular cmpxchg and xchg requires the
      determination of the address of the per cpu data before regular cmpxchg
      or xchg which therefore cannot be atomically included in an xchg or
      cmpxchg without segment override.
      
      tj: - Relocated new ops to conform better to the general organization.
          - This patch contains a trivial comment fix.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      2b712442
  2. 17 12月, 2010 34 次提交