1. 09 2月, 2009 1 次提交
  2. 31 1月, 2009 1 次提交
    • J
      x86/paravirt: add register-saving thunks to reduce caller register pressure · ecb93d1c
      Jeremy Fitzhardinge 提交于
      Impact: Optimization
      
      One of the problems with inserting a pile of C calls where previously
      there were none is that the register pressure is greatly increased.
      The C calling convention says that the caller must expect a certain
      set of registers may be trashed by the callee, and that the callee can
      use those registers without restriction.  This includes the function
      argument registers, and several others.
      
      This patch seeks to alleviate this pressure by introducing wrapper
      thunks that will do the register saving/restoring, so that the
      callsite doesn't need to worry about it, but the callee function can
      be conventional compiler-generated code.  In many cases (particularly
      performance-sensitive cases) the callee will be in assembler anyway,
      and need not use the compiler's calling convention.
      
      Standard calling convention is:
      	 arguments	    return	scratch
      x86-32	 eax edx ecx	    eax		?
      x86-64	 rdi rsi rdx rcx    rax		r8 r9 r10 r11
      
      The thunk preserves all argument and scratch registers.  The return
      register is not preserved, and is available as a scratch register for
      unwrapped callee code (and of course the return value).
      
      Wrapped function pointers are themselves wrapped in a struct
      paravirt_callee_save structure, in order to get some warning from the
      compiler when functions with mismatched calling conventions are used.
      
      The most common paravirt ops, both statically and dynamically, are
      interrupt enable/disable/save/restore, so handle them first.  This is
      particularly easy since their calls are handled specially anyway.
      
      XXX Deal with VMI.  What's their calling convention?
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ecb93d1c
  3. 16 1月, 2009 1 次提交
    • I
      percpu: add optimized generic percpu accessors · 6dbde353
      Ingo Molnar 提交于
      It is an optimization and a cleanup, and adds the following new
      generic percpu methods:
      
        percpu_read()
        percpu_write()
        percpu_add()
        percpu_sub()
        percpu_and()
        percpu_or()
        percpu_xor()
      
      and implements support for them on x86. (other architectures will fall
      back to a default implementation)
      
      The advantage is that for example to read a local percpu variable,
      instead of this sequence:
      
       return __get_cpu_var(var);
      
       ffffffff8102ca2b:	48 8b 14 fd 80 09 74 	mov    -0x7e8bf680(,%rdi,8),%rdx
       ffffffff8102ca32:	81
       ffffffff8102ca33:	48 c7 c0 d8 59 00 00 	mov    $0x59d8,%rax
       ffffffff8102ca3a:	48 8b 04 10          	mov    (%rax,%rdx,1),%rax
      
      We can get a single instruction by using the optimized variants:
      
       return percpu_read(var);
      
       ffffffff8102ca3f:	65 48 8b 05 91 8f fd 	mov    %gs:0x7efd8f91(%rip),%rax
      
      I also cleaned up the x86-specific APIs and made the x86 code use
      these new generic percpu primitives.
      
      tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
          * added percpu_and() for completeness's sake
          * made generic percpu ops atomic against preemption
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6dbde353
  4. 16 10月, 2008 1 次提交
  5. 31 7月, 2008 1 次提交