1. 03 10月, 2009 11 次提交
    • C
      this_cpu: Use this_cpu operations in RCU · e800879d
      Christoph Lameter 提交于
      RCU does not do dynamic allocations but it increments per cpu variables
      a lot. These instructions results in a move to a register and then back
      to memory. This patch will make it use the inc/dec instructions on x86
      that do not need a register.
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e800879d
    • C
      this_cpu: Use this_cpu ops for VM statistics · 4dac3e98
      Christoph Lameter 提交于
      Using per cpu atomics for the vm statistics reduces their overhead.
      And in the case of x86 we are guaranteed that they will never race even
      in the lax form used for vm statistics.
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4dac3e98
    • C
      this_cpu: Use this_cpu_ptr in crypto subsystem · 0b44f486
      Christoph Lameter 提交于
      Just a slight optimization that removes one array lookup.
      The processor number is needed for other things as well so the
      get/put_cpu cannot be removed.
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      0b44f486
    • C
      this_cpu: xfs_icsb_modify_counters does not need "cpu" variable · 7a9e02d6
      Christoph Lameter 提交于
      The xfs_icsb_modify_counters() function no longer needs the cpu variable
      if we use this_cpu_ptr() and we can get rid of get/put_cpu().
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NOlaf Weber <olaf@sgi.com>
      Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      7a9e02d6
    • C
      this_cpu: Eliminate get/put_cpu · e7dcaa47
      Christoph Lameter 提交于
      There are cases where we can use this_cpu_ptr and as the result
      of using this_cpu_ptr() we no longer need to determine the
      currently executing cpu.
      
      In those places no get/put_cpu combination is needed anymore.
      The local cpu variable can be eliminated.
      
      Preemption still needs to be disabled and enabled since the
      modifications of the per cpu variables is not atomic. There may
      be multiple per cpu variables modified and those must all
      be from the same processor.
      Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      cc: Eric Biederman <ebiederm@aristanetworks.com>
      cc: Stephen Hemminger <shemminger@vyatta.com>
      cc: David L Stevens <dlstevens@us.ibm.com>
      Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e7dcaa47
    • C
      this_cpu: Straight transformations · ca0c9584
      Christoph Lameter 提交于
      Use this_cpu_ptr and __this_cpu_ptr in locations where straight
      transformations are possible because per_cpu_ptr is used with
      either smp_processor_id() or raw_smp_processor_id().
      
      cc: David Howells <dhowells@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      cc: Ingo Molnar <mingo@elte.hu>
      cc: Rusty Russell <rusty@rustcorp.com.au>
      cc: Eric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      ca0c9584
    • C
    • C
      this_cpu: Use this_cpu operations for NFS statistics · fce22848
      Christoph Lameter 提交于
      Simplify NFS statistics and allow the use of optimized
      arch instructions.
      Acked-by: NTejun Heo <tj@kernel.org>
      CC: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      fce22848
    • C
      this_cpu: Use this_cpu operations for SNMP statistics · 4eb41d10
      Christoph Lameter 提交于
      SNMP statistic macros can be signficantly simplified.
      This will also reduce code size if the arch supports these operations
      in hardware.
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4eb41d10
    • C
      this_cpu: Implement X86 optimized this_cpu operations · 30ed1a79
      Christoph Lameter 提交于
      Basically the existing percpu ops can be used for this_cpu variants that allow
      operations also on dynamically allocated percpu data. However, we do not pass a
      reference to a percpu variable in. Instead a dynamically or statically
      allocated percpu variable is provided.
      
      Preempt, the non preempt and the irqsafe operations generate the same code.
      It will always be possible to have the requires per cpu atomicness in a single
      RMW instruction with segment override on x86.
      
      64 bit this_cpu operations are not supported on 32 bit.
      Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      30ed1a79
    • C
      this_cpu: Introduce this_cpu_ptr() and generic this_cpu_* operations · 7340a0b1
      Christoph Lameter 提交于
      This patch introduces two things: First this_cpu_ptr and then per cpu
      atomic operations.
      
      this_cpu_ptr
      ------------
      
      A common operation when dealing with cpu data is to get the instance of the
      cpu data associated with the currently executing processor. This can be
      optimized by
      
      this_cpu_ptr(xx) = per_cpu_ptr(xx, smp_processor_id).
      
      The problem with per_cpu_ptr(x, smp_processor_id) is that it requires
      an array lookup to find the offset for the cpu. Processors typically
      have the offset for the current cpu area in some kind of (arch dependent)
      efficiently accessible register or memory location.
      
      We can use that instead of doing the array lookup to speed up the
      determination of the address of the percpu variable. This is particularly
      significant because these lookups occur in performance critical paths
      of the core kernel. this_cpu_ptr() can avoid memory accesses and
      
      this_cpu_ptr comes in two flavors. The preemption context matters since we
      are referring the the currently executing processor. In many cases we must
      insure that the processor does not change while a code segment is executed.
      
      __this_cpu_ptr 	-> Do not check for preemption context
      this_cpu_ptr	-> Check preemption context
      
      The parameter to these operations is a per cpu pointer. This can be the
      address of a statically defined per cpu variable (&per_cpu_var(xxx)) or
      the address of a per cpu variable allocated with the per cpu allocator.
      
      per cpu atomic operations: this_cpu_*(var, val)
      -----------------------------------------------
      this_cpu_* operations (like this_cpu_add(struct->y, value) operate on
      abitrary scalars that are members of structures allocated with the new
      per cpu allocator. They can also operate on static per_cpu variables
      if they are passed to per_cpu_var() (See patch to use this_cpu_*
      operations for vm statistics).
      
      These operations are guaranteed to be atomic vs preemption when modifying
      the scalar. The calculation of the per cpu offset is also guaranteed to
      be atomic at the same time. This means that a this_cpu_* operation can be
      safely used to modify a per cpu variable in a context where interrupts are
      enabled and preemption is allowed. Many architectures can perform such
      a per cpu atomic operation with a single instruction.
      
      Note that the atomicity here is different from regular atomic operations.
      Atomicity is only guaranteed for data accessed from the currently executing
      processor. Modifications from other processors are still possible. There
      must be other guarantees that the per cpu data is not modified from another
      processor when using these instruction. The per cpu atomicity is created
      by the fact that the processor either executes and instruction or not.
      Embedded in the instruction is the relocation of the per cpu address to
      the are reserved for the current processor and the RMW action. Therefore
      interrupts or preemption cannot occur in the mids of this processing.
      
      Generic fallback functions are used if an arch does not define optimized
      this_cpu operations. The functions come also come in the two flavors used
      for this_cpu_ptr().
      
      The firstparameter is a scalar that is a member of a structure allocated
      through allocpercpu or a per cpu variable (use per_cpu_var(xxx)). The
      operations are similar to what percpu_add() and friends do.
      
      this_cpu_read(scalar)
      this_cpu_write(scalar, value)
      this_cpu_add(scale, value)
      this_cpu_sub(scalar, value)
      this_cpu_inc(scalar)
      this_cpu_dec(scalar)
      this_cpu_and(scalar, value)
      this_cpu_or(scalar, value)
      this_cpu_xor(scalar, value)
      
      Arch code can override the generic functions and provide optimized atomic
      per cpu operations. These atomic operations must provide both the relocation
      (x86 does it through a segment override) and the operation on the data in a
      single instruction. Otherwise preempt needs to be disabled and there is no
      gain from providing arch implementations.
      
      A third variant is provided prefixed by irqsafe_. These variants are safe
      against hardware interrupts on the *same* processor (all per cpu atomic
      primitives are *always* *only* providing safety for code running on the
      *same* processor!). The increment needs to be implemented by the hardware
      in such a way that it is a single RMW instruction that is either processed
      before or after an interrupt.
      
      cc: David Howells <dhowells@redhat.com>
      cc: Ingo Molnar <mingo@elte.hu>
      cc: Rusty Russell <rusty@rustcorp.com.au>
      cc: Eric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      7340a0b1
  2. 02 10月, 2009 29 次提交