1. 28 6月, 2012 1 次提交
    • A
      x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range · e7b52ffd
      Alex Shi 提交于
      x86 has no flush_tlb_range support in instruction level. Currently the
      flush_tlb_range just implemented by flushing all page table. That is not
      the best solution for all scenarios. In fact, if we just use 'invlpg' to
      flush few lines from TLB, we can get the performance gain from later
      remain TLB lines accessing.
      
      But the 'invlpg' instruction costs much of time. Its execution time can
      compete with cr3 rewriting, and even a bit more on SNB CPU.
      
      So, on a 512 4KB TLB entries CPU, the balance points is at:
      	(512 - X) * 100ns(assumed TLB refill cost) =
      		X(TLB flush entries) * 100ns(assumed invlpg cost)
      
      Here, X is 256, that is 1/2 of 512 entries.
      
      But with the mysterious CPU pre-fetcher and page miss handler Unit, the
      assumed TLB refill cost is far lower then 100ns in sequential access. And
      2 HT siblings in one core makes the memory access more faster if they are
      accessing the same memory. So, in the patch, I just do the change when
      the target entries is less than 1/16 of whole active tlb entries.
      Actually, I have no data support for the percentage '1/16', so any
      suggestions are welcomed.
      
      As to hugetlb, guess due to smaller page table, and smaller active TLB
      entries, I didn't see benefit via my benchmark, so no optimizing now.
      
      My micro benchmark show in ideal scenarios, the performance improves 70
      percent in reading. And in worst scenario, the reading/writing
      performance is similar with unpatched 3.4-rc4 kernel.
      
      Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
      'always':
      
      multi thread testing, '-t' paramter is thread number:
      	       	        with patch   unpatched 3.4-rc4
      ./mprotect -t 1           14ns		24ns
      ./mprotect -t 2           13ns		22ns
      ./mprotect -t 4           12ns		19ns
      ./mprotect -t 8           14ns		16ns
      ./mprotect -t 16          28ns		26ns
      ./mprotect -t 32          54ns		51ns
      ./mprotect -t 128         200ns		199ns
      
      Single process with sequencial flushing and memory accessing:
      
      		       	with patch   unpatched 3.4-rc4
      ./mprotect		    7ns			11ns
      ./mprotect -p 4096  -l 8 -n 10240
      			    21ns		21ns
      
      [ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
        has additional performance numbers. ]
      Signed-off-by: NAlex Shi <alex.shi@intel.com>
      Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      e7b52ffd
  2. 20 4月, 2012 1 次提交
  3. 05 3月, 2012 1 次提交
    • P
      BUG: headers with BUG/BUG_ON etc. need linux/bug.h · 187f1882
      Paul Gortmaker 提交于
      If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
      other BUG variant in a static inline (i.e. not in a #define) then
      that header really should be including <linux/bug.h> and not just
      expecting it to be implicitly present.
      
      We can make this change risk-free, since if the files using these
      headers didn't have exposure to linux/bug.h already, they would have
      been causing compile failures/warnings.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      187f1882
  4. 24 2月, 2012 1 次提交
    • I
      static keys: Introduce 'struct static_key', static_key_true()/false() and... · c5905afb
      Ingo Molnar 提交于
      static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]()
      
      So here's a boot tested patch on top of Jason's series that does
      all the cleanups I talked about and turns jump labels into a
      more intuitive to use facility. It should also address the
      various misconceptions and confusions that surround jump labels.
      
      Typical usage scenarios:
      
              #include <linux/static_key.h>
      
              struct static_key key = STATIC_KEY_INIT_TRUE;
      
              if (static_key_false(&key))
                      do unlikely code
              else
                      do likely code
      
      Or:
      
              if (static_key_true(&key))
                      do likely code
              else
                      do unlikely code
      
      The static key is modified via:
      
              static_key_slow_inc(&key);
              ...
              static_key_slow_dec(&key);
      
      The 'slow' prefix makes it abundantly clear that this is an
      expensive operation.
      
      I've updated all in-kernel code to use this everywhere. Note
      that I (intentionally) have not pushed through the rename
      blindly through to the lowest levels: the actual jump-label
      patching arch facility should be named like that, so we want to
      decouple jump labels from the static-key facility a bit.
      
      On non-jump-label enabled architectures static keys default to
      likely()/unlikely() branches.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NJason Baron <jbaron@redhat.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: a.p.zijlstra@chello.nl
      Cc: mathieu.desnoyers@efficios.com
      Cc: davem@davemloft.net
      Cc: ddaney.cavm@gmail.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.huSigned-off-by: NIngo Molnar <mingo@elte.hu>
      c5905afb
  5. 14 7月, 2011 1 次提交
  6. 26 1月, 2011 1 次提交
  7. 14 1月, 2011 1 次提交
  8. 28 12月, 2010 1 次提交
    • C
      x86, paravirt: Use native_halt on a halt, not native_safe_halt · c8217b83
      Cliff Wickman 提交于
      halt() should use native_halt()
      safe_halt() uses native_safe_halt()
      
      If CONFIG_PARAVIRT=y, halt() is defined in arch/x86/include/asm/paravirt.h as
      
      static inline void halt(void)
      {
              PVOP_VCALL0(pv_irq_ops.safe_halt);
      }
      
      Otherwise (no CONFIG_PARAVIRT) halt() in arch/x86/include/asm/irqflags.h is
      
      static inline void halt(void)
      {
              native_halt();
      }
      
      So it looks to me like the CONFIG_PARAVIRT case of using native_safe_halt()
      for a halt() is an oversight.
      Am I missing something?
      
      It probably hasn't shown up as a problem because the local apic is disabled
      on a shutdown or restart.  But if we disable interrupts and call halt()
      we shouldn't expect that the halt() will re-enable interrupts.
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      LKML-Reference: <E1PSBcz-0001g1-FM@eag09.americas.sgi.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      c8217b83
  9. 11 11月, 2010 1 次提交
    • S
      tracing: Force arch_local_irq_* notrace for paravirt · b5908548
      Steven Rostedt 提交于
      When running ktest.pl randconfig tests, I would sometimes trigger
      a lockdep annotation bug (possible reason: unannotated irqs-on).
      
      This triggering happened right after function tracer self test was
      executed. After doing a config bisect I found that this was caused with
      having function tracer, paravirt guest, prove locking, and rcu torture
      all enabled.
      
      The rcu torture just enhanced the likelyhood of triggering the bug.
      Prove locking was needed, since it was the thing that was bugging.
      Function tracer would trace and disable interrupts in all sorts
      of funny places.
      paravirt guest would turn arch_local_irq_* into functions that would
      be traced.
      
      Besides the fact that tracing arch_local_irq_* is just a bad idea,
      this is what is happening.
      
      The bug happened simply in the local_irq_restore() code:
      
      		if (raw_irqs_disabled_flags(flags)) {	\
      			raw_local_irq_restore(flags);	\
      			trace_hardirqs_off();		\
      		} else {				\
      			trace_hardirqs_on();		\
      			raw_local_irq_restore(flags);	\
      		}					\
      
      The raw_local_irq_restore() was defined as arch_local_irq_restore().
      
      Now imagine, we are about to enable interrupts. We go into the else
      case and call trace_hardirqs_on() which tells lockdep that we are enabling
      interrupts, so it sets the current->hardirqs_enabled = 1.
      
      Then we call raw_local_irq_restore() which calls arch_local_irq_restore()
      which gets traced!
      
      Now in the function tracer we disable interrupts with local_irq_save().
      This is fine, but flags is stored that we have interrupts disabled.
      
      When the function tracer calls local_irq_restore() it does it, but this
      time with flags set as disabled, so we go into the if () path.
      This keeps interrupts disabled and calls trace_hardirqs_off() which
      sets current->hardirqs_enabled = 0.
      
      When the tracer is finished and proceeds with the original code,
      we enable interrupts but leave current->hardirqs_enabled as 0. Which
      now breaks lockdeps internal processing.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b5908548
  10. 07 10月, 2010 1 次提交
    • D
      Fix IRQ flag handling naming · df9ee292
      David Howells 提交于
      Fix the IRQ flag handling naming.  In linux/irqflags.h under one configuration,
      it maps:
      
      	local_irq_enable() -> raw_local_irq_enable()
      	local_irq_disable() -> raw_local_irq_disable()
      	local_irq_save() -> raw_local_irq_save()
      	...
      
      and under the other configuration, it maps:
      
      	raw_local_irq_enable() -> local_irq_enable()
      	raw_local_irq_disable() -> local_irq_disable()
      	raw_local_irq_save() -> local_irq_save()
      	...
      
      This is quite confusing.  There should be one set of names expected of the
      arch, and this should be wrapped to give another set of names that are expected
      by users of this facility.
      
      Change this to have the arch provide:
      
      	flags = arch_local_save_flags()
      	flags = arch_local_irq_save()
      	arch_local_irq_restore(flags)
      	arch_local_irq_disable()
      	arch_local_irq_enable()
      	arch_irqs_disabled_flags(flags)
      	arch_irqs_disabled()
      	arch_safe_halt()
      
      Then linux/irqflags.h wraps these to provide:
      
      	raw_local_save_flags(flags)
      	raw_local_irq_save(flags)
      	raw_local_irq_restore(flags)
      	raw_local_irq_disable()
      	raw_local_irq_enable()
      	raw_irqs_disabled_flags(flags)
      	raw_irqs_disabled()
      	raw_safe_halt()
      
      with type checking on the flags 'arguments', and then wraps those to provide:
      
      	local_save_flags(flags)
      	local_irq_save(flags)
      	local_irq_restore(flags)
      	local_irq_disable()
      	local_irq_enable()
      	irqs_disabled_flags(flags)
      	irqs_disabled()
      	safe_halt()
      
      with tracing included if enabled.
      
      The arch functions can now all be inline functions rather than some of them
      having to be macros.
      
      Signed-off-by: David Howells <dhowells@redhat.com> [X86, FRV, MN10300]
      Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> [Tile]
      Signed-off-by: Michal Simek <monstr@monstr.eu> [Microblaze]
      Tested-by: Catalin Marinas <catalin.marinas@arm.com> [ARM]
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> [AVR]
      Acked-by: Tony Luck <tony.luck@intel.com> [IA-64]
      Acked-by: Hirokazu Takata <takata@linux-m32r.org> [M32R]
      Acked-by: Greg Ungerer <gerg@uclinux.org> [M68K/M68KNOMMU]
      Acked-by: Ralf Baechle <ralf@linux-mips.org> [MIPS]
      Acked-by: Kyle McMartin <kyle@mcmartin.ca> [PA-RISC]
      Acked-by: Paul Mackerras <paulus@samba.org> [PowerPC]
      Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [S390]
      Acked-by: Chen Liqin <liqin.chen@sunplusct.com> [Score]
      Acked-by: Matt Fleming <matt@console-pimps.org> [SH]
      Acked-by: David S. Miller <davem@davemloft.net> [Sparc]
      Acked-by: Chris Zankel <chris@zankel.net> [Xtensa]
      Reviewed-by: Richard Henderson <rth@twiddle.net> [Alpha]
      Reviewed-by: Yoshinori Sato <ysato@users.sourceforge.jp> [H8300]
      Cc: starvik@axis.com [CRIS]
      Cc: jesper.nilsson@axis.com [CRIS]
      Cc: linux-cris-kernel@axis.com
      df9ee292
  11. 24 8月, 2010 1 次提交
  12. 28 2月, 2010 1 次提交
  13. 15 12月, 2009 2 次提交
  14. 13 10月, 2009 1 次提交
    • J
      x86/paravirt: Use normal calling sequences for irq enable/disable · 71999d98
      Jeremy Fitzhardinge 提交于
      Bastian Blank reported a boot crash with stackprotector enabled,
      and debugged it back to edx register corruption.
      
      For historical reasons irq enable/disable/save/restore had special
      calling sequences to make them more efficient.  With the more
      recent introduction of higher-level and more general optimisations
      this is no longer necessary so we can just use the normal PVOP_
      macros.
      
      This fixes some residual bugs in the old implementations which left
      edx liable to inadvertent clobbering. Also, fix some bugs in
      __PVOP_VCALLEESAVE which were revealed by actual use.
      Reported-by: NBastian Blank <bastian@waldi.eu.org>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stable Kernel <stable@kernel.org>
      Cc: Xen-devel <xen-devel@lists.xensource.com>
      LKML-Reference: <4AD3BC9B.7040501@goop.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      71999d98
  15. 16 9月, 2009 1 次提交
  16. 01 9月, 2009 2 次提交
  17. 31 8月, 2009 7 次提交
  18. 18 6月, 2009 1 次提交
  19. 16 5月, 2009 1 次提交
    • J
      x86: Fix performance regression caused by paravirt_ops on native kernels · b4ecc126
      Jeremy Fitzhardinge 提交于
      Xiaohui Xin and some other folks at Intel have been looking into what's
      behind the performance hit of paravirt_ops when running native.
      
      It appears that the hit is entirely due to the paravirtualized
      spinlocks introduced by:
      
       | commit 8efcbab6
       | Date:   Mon Jul 7 12:07:51 2008 -0700
       |
       |     paravirt: introduce a "lock-byte" spinlock implementation
      
      The extra call/return in the spinlock path is somehow
      causing an increase in the cycles/instruction of somewhere around 2-7%
      (seems to vary quite a lot from test to test).  The working theory is
      that the CPU's pipeline is getting upset about the
      call->call->locked-op->return->return, and seems to be failing to
      speculate (though I haven't seen anything definitive about the precise
      reasons).  This doesn't entirely make sense, because the performance
      hit is also visible on unlock and other operations which don't involve
      locked instructions.  But spinlock operations clearly swamp all the
      other pvops operations, even though I can't imagine that they're
      nearly as common (there's only a .05% increase in instructions
      executed).
      
      If I disable just the pv-spinlock calls, my tests show that pvops is
      identical to non-pvops performance on native (my measurements show that
      it is actually about .1% faster, but Xiaohui shows a .05% slowdown).
      
      Summary of results, averaging 10 runs of the "mmperf" test, using a
      no-pvops build as baseline:
      
      		nopv		Pv-nospin	Pv-spin
      CPU cycles	100.00%		99.89%		102.18%
      instructions	100.00%		100.10%		100.15%
      CPI		100.00%		99.79%		102.03%
      cache ref	100.00%		100.84%		100.28%
      cache miss	100.00%		90.47%		88.56%
      cache miss rate	100.00%		89.72%		88.31%
      branches	100.00%		99.93%		100.04%
      branch miss	100.00%		103.66%		107.72%
      branch miss rt	100.00%		103.73%		107.67%
      wallclock	100.00%		99.90%		102.20%
      
      The clear effect here is that the 2% increase in CPI is
      directly reflected in the final wallclock time.
      
      (The other interesting effect is that the more ops are
      out of line calls via pvops, the lower the cache access
      and miss rates.  Not too surprising, but it suggests that
      the non-pvops kernel is over-inlined.  On the flipside,
      the branch misses go up correspondingly...)
      
      So, what's the fix?
      
      Paravirt patching turns all the pvops calls into direct calls, so
      _spin_lock etc do end up having direct calls.  For example, the compiler
      generated code for paravirtualized _spin_lock is:
      
      <_spin_lock+0>:		mov    %gs:0xb4c8,%rax
      <_spin_lock+9>:		incl   0xffffffffffffe044(%rax)
      <_spin_lock+15>:	callq  *0xffffffff805a5b30
      <_spin_lock+22>:	retq
      
      The indirect call will get patched to:
      <_spin_lock+0>:		mov    %gs:0xb4c8,%rax
      <_spin_lock+9>:		incl   0xffffffffffffe044(%rax)
      <_spin_lock+15>:	callq <__ticket_spin_lock>
      <_spin_lock+20>:	nop; nop		/* or whatever 2-byte nop */
      <_spin_lock+22>:	retq
      
      One possibility is to inline _spin_lock, etc, when building an
      optimised kernel (ie, when there's no spinlock/preempt
      instrumentation/debugging enabled).  That will remove the outer
      call/return pair, returning the instruction stream to a single
      call/return, which will presumably execute the same as the non-pvops
      case.  The downsides arel 1) it will replicate the
      preempt_disable/enable code at eack lock/unlock callsite; this code is
      fairly small, but not nothing; and 2) the spinlock definitions are
      already a very heavily tangled mass of #ifdefs and other preprocessor
      magic, and making any changes will be non-trivial.
      
      The other obvious answer is to disable pv-spinlocks.  Making them a
      separate config option is fairly easy, and it would be trivial to
      enable them only when Xen is enabled (as the only non-default user).
      But it doesn't really address the common case of a distro build which
      is going to have Xen support enabled, and leaves the open question of
      whether the native performance cost of pv-spinlocks is worth the
      performance improvement on a loaded Xen system (10% saving of overall
      system CPU when guests block rather than spin).  Still it is a
      reasonable short-term workaround.
      
      [ Impact: fix pvops performance regression when running native ]
      Analysed-by: N"Xin Xiaohui" <xiaohui.xin@intel.com>
      Analysed-by: N"Li Xin" <xin.li@intel.com>
      Analysed-by: N"Nakajima Jun" <jun.nakajima@intel.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Xen-devel <xen-devel@lists.xensource.com>
      LKML-Reference: <4A0B62F7.5030802@goop.org>
      [ fixed the help text ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b4ecc126
  20. 11 4月, 2009 1 次提交
  21. 10 4月, 2009 1 次提交
  22. 30 3月, 2009 3 次提交
  23. 19 3月, 2009 1 次提交
  24. 17 3月, 2009 1 次提交
    • J
      x86, paravirt: prevent gcc from generating the wrong addressing mode · 42854dc0
      Jeremy Fitzhardinge 提交于
      Impact: fix crash on VMI (VMware)
      
      When we generate a call sequence for calling a paravirtualized
      function, we presume that the generated code is "call *0xXXXXX",
      which is a 6 byte opcode; this is larger than a normal
      direct call, and so we can patch a direct call over it.
      
      At the moment, however we give gcc enough rope to hang us by
      putting the address in a register and generating a two byte
      indirect-via-register call.  Prevent this by explicitly
      dereferencing the function pointer and passing it into the
      asm as a constant.
      
      This prevents crashes in VMI, as it cannot handle unpatchable
      callsites.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Alok Kataria <akataria@vmware.com>
      LKML-Reference: <49BEEDC2.2070809@goop.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      42854dc0
  25. 13 2月, 2009 1 次提交
  26. 12 2月, 2009 2 次提交
  27. 10 2月, 2009 1 次提交
  28. 04 2月, 2009 1 次提交
  29. 03 2月, 2009 1 次提交