1. 15 7月, 2011 1 次提交
  2. 14 7月, 2011 1 次提交
  3. 18 5月, 2011 1 次提交
  4. 19 4月, 2011 2 次提交
  5. 05 4月, 2011 1 次提交
    • J
      jump label: Introduce static_branch() interface · d430d3d7
      Jason Baron 提交于
      Introduce:
      
      static __always_inline bool static_branch(struct jump_label_key *key);
      
      instead of the old JUMP_LABEL(key, label) macro.
      
      In this way, jump labels become really easy to use:
      
      Define:
      
              struct jump_label_key jump_key;
      
      Can be used as:
      
              if (static_branch(&jump_key))
                      do unlikely code
      
      enable/disale via:
      
              jump_label_inc(&jump_key);
              jump_label_dec(&jump_key);
      
      that's it!
      
      For the jump labels disabled case, the static_branch() becomes an
      atomic_read(), and jump_label_inc()/dec() are simply atomic_inc(),
      atomic_dec() operations. We show testing results for this change below.
      
      Thanks to H. Peter Anvin for suggesting the 'static_branch()' construct.
      
      Since we now require a 'struct jump_label_key *key', we can store a pointer into
      the jump table addresses. In this way, we can enable/disable jump labels, in
      basically constant time. This change allows us to completely remove the previous
      hashtable scheme. Thanks to Peter Zijlstra for this re-write.
      
      Testing:
      
      I ran a series of 'tbench 20' runs 5 times (with reboots) for 3
      configurations, where tracepoints were disabled.
      
      jump label configured in
      avg: 815.6
      
      jump label *not* configured in (using atomic reads)
      avg: 800.1
      
      jump label *not* configured in (regular reads)
      avg: 803.4
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110316212947.GA8792@redhat.com>
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Suggested-by: NH. Peter Anvin <hpa@linux.intel.com>
      Tested-by: NDavid Daney <ddaney@caviumnetworks.com>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d430d3d7
  6. 18 3月, 2011 1 次提交
  7. 15 3月, 2011 1 次提交
    • M
      x86: stop_machine_text_poke() should issue sync_core() · 0e00f7ae
      Mathieu Desnoyers 提交于
      Intel Archiecture Software Developer's Manual section 7.1.3 specifies that a
      core serializing instruction such as "cpuid" should be executed on _each_ core
      before the new instruction is made visible.
      
      Failure to do so can lead to unspecified behavior (Intel XMC erratas include
      General Protection Fault in the list), so we should avoid this at all cost.
      
      This problem can affect modified code executed by interrupt handlers after
      interrupt are re-enabled at the end of stop_machine, because no core serializing
      instruction is executed between the code modification and the moment interrupts
      are reenabled.
      
      Because stop_machine_text_poke performs the text modification from the first CPU
      decrementing stop_machine_first, modified code executed in thread context is
      also affected by this problem. To explain why, we have to split the CPUs in two
      categories: the CPU that initiates the text modification (calls text_poke_smp)
      and all the others. The scheduler, executed on all other CPUs after
      stop_machine, issues an "iret" core serializing instruction, and therefore
      handles core serialization for all these CPUs. However, the text modification
      initiator can continue its execution on the same thread and access the modified
      text without any scheduler call. Given that the CPU that initiates the code
      modification is not guaranteed to be the one actually performing the code
      modification, it falls into the XMC errata.
      
      Q: Isn't this executed from an IPI handler, which will return with IRET (a
         serializing instruction) anyway?
      A: No, now stop_machine uses per-cpu workqueue, so that handler will be
         executed from worker threads. There is no iret anymore.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      LKML-Reference: <20110303160137.GB1590@Krystal>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: <stable@kernel.org>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0e00f7ae
  8. 12 2月, 2011 1 次提交
    • P
      x86: Fix text_poke_smp_batch() deadlock · d91309f6
      Peter Zijlstra 提交于
      Fix this deadlock - we are already holding the mutex:
      
      =======================================================
      [ INFO: possible circular locking dependency detected ] 2.6.38-rc4-test+ #1
      -------------------------------------------------------
      bash/1850 is trying to acquire lock:
       (text_mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
      
      but task is already holding lock:
       (smp_alt){+.+...}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (smp_alt){+.+...}:
             [<ffffffff81082d02>] lock_acquire+0xcd/0xf8
             [<ffffffff8192e119>] __mutex_lock_common+0x4c/0x339
             [<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43
             [<ffffffff8101050f>] alternatives_smp_switch+0x77/0x1d8
             [<ffffffff81926a6f>] do_boot_cpu+0xd7/0x762
             [<ffffffff819277dd>] native_cpu_up+0xe6/0x16a
             [<ffffffff81928e28>] _cpu_up+0x9d/0xee
             [<ffffffff81928f4c>] cpu_up+0xd3/0xe7
             [<ffffffff82268d4b>] kernel_init+0xe8/0x20a
             [<ffffffff8100ba24>] kernel_thread_helper+0x4/0x10
      
      -> #1 (cpu_hotplug.lock){+.+.+.}:
             [<ffffffff81082d02>] lock_acquire+0xcd/0xf8
             [<ffffffff8192e119>] __mutex_lock_common+0x4c/0x339
             [<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43
             [<ffffffff810568cc>] get_online_cpus+0x41/0x55
             [<ffffffff810a1348>] stop_machine+0x1e/0x3e
             [<ffffffff819314c1>] text_poke_smp_batch+0x3a/0x3c
             [<ffffffff81932b6c>] arch_optimize_kprobes+0x10d/0x11c
             [<ffffffff81933a51>] kprobe_optimizer+0x152/0x222
             [<ffffffff8106bb71>] process_one_work+0x1d3/0x335
             [<ffffffff8106cfae>] worker_thread+0x104/0x1a4
             [<ffffffff810707c4>] kthread+0x9d/0xa5
             [<ffffffff8100ba24>] kernel_thread_helper+0x4/0x10
      
      -> #0 (text_mutex){+.+.+.}:
      
      other info that might help us debug this:
      
      6 locks held by bash/1850:
       #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
       #1:  (s_active#75){.+.+.+}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
       #2:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
       #3:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
       #4:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
       #5:  (smp_alt){+.+...}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
      
      stack backtrace:
      Pid: 1850, comm: bash Not tainted 2.6.38-rc4-test+ #1
      Call Trace:
      
       [<ffffffff81080eb2>] print_circular_bug+0xa8/0xb7
       [<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43
       [<ffffffff81010302>] alternatives_smp_unlock+0x3d/0x93
       [<ffffffff81010630>] alternatives_smp_switch+0x198/0x1d8
       [<ffffffff8102568a>] native_cpu_die+0x65/0x95
       [<ffffffff818cc4ec>] _cpu_down+0x13e/0x202
       [<ffffffff8117a619>] sysfs_write_file+0x108/0x144
       [<ffffffff8111f5a2>] vfs_write+0xac/0xff
       [<ffffffff8111f7a9>] sys_write+0x4a/0x6e
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: mathieu.desnoyers@efficios.com
      Cc: rusty@rustcorp.com.au
      Cc: ananth@in.ibm.com
      Cc: masami.hiramatsu.pt@hitachi.com
      Cc: fweisbec@gmail.com
      Cc: jbeulich@novell.com
      Cc: jbaron@redhat.com
      Cc: mhiramat@redhat.com
      LKML-Reference: <1297458466.5226.93.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d91309f6
  9. 14 12月, 2010 1 次提交
  10. 07 12月, 2010 1 次提交
    • M
      x86: Introduce text_poke_smp_batch() for batch-code modifying · 7deb18dc
      Masami Hiramatsu 提交于
      Introduce text_poke_smp_batch(). This function modifies several
      text areas with one stop_machine() on SMP. Because calling
      stop_machine() is heavy task, it is better to aggregate
      text_poke requests.
      
      ( Note: I've talked with Rusty about this interface, and
        he would not like to expand stop_machine() interface, since
        it is not for generic use. )
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: 2nddept-manager@sdl.hitachi.co.jp
      LKML-Reference: <20101203095422.2961.51217.stgit@ltc236.sdl.hitachi.co.jp>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7deb18dc
  11. 30 10月, 2010 2 次提交
  12. 14 10月, 2010 1 次提交
  13. 23 9月, 2010 1 次提交
    • J
      jump label: Base patch for jump label · bf5438fc
      Jason Baron 提交于
      base patch to implement 'jump labeling'. Based on a new 'asm goto' inline
      assembly gcc mechanism, we can now branch to labels from an 'asm goto'
      statment. This allows us to create a 'no-op' fastpath, which can subsequently
      be patched with a jump to the slowpath code. This is useful for code which
      might be rarely used, but which we'd like to be able to call, if needed.
      Tracepoints are the current usecase that these are being implemented for.
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      LKML-Reference: <ee8b3595967989fdaf84e698dc7447d315ce972a.1284733808.git.jbaron@redhat.com>
      
      [ cleaned up some formating ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bf5438fc
  14. 21 9月, 2010 2 次提交
  15. 14 7月, 2010 1 次提交
  16. 29 4月, 2010 1 次提交
  17. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  18. 04 3月, 2010 1 次提交
  19. 26 2月, 2010 2 次提交
    • L
      x86: Add support for lock prefix in alternatives · b3ac891b
      Luca Barbieri 提交于
      The current lock prefix UP/SMP alternative code doesn't allow
      LOCK_PREFIX to be used in alternatives code.
      
      This patch solves the problem by adding a new LOCK_PREFIX_ALTERNATIVE_PATCH
      macro that only records the lock prefix location but does not emit
      the prefix.
      
      The user of this macro can then start any alternative sequence with
      "lock" and have it UP/SMP patched.
      
      To make this work, the UP/SMP alternative code is changed to do the
      lock/DS prefix switching only if the byte actually contains a lock or
      DS prefix.
      
      Thus, if an alternative without the "lock" is selected, it will now do
      nothing instead of clobbering the code.
      
      Changes in v2:
      - Naming change
      - Change label to not conflict with alternatives
      Signed-off-by: NLuca Barbieri <luca@luca-barbieri.com>
      LKML-Reference: <1267005265-27958-2-git-send-email-luca@luca-barbieri.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      b3ac891b
    • M
      x86: Add text_poke_smp for SMP cross modifying code · 3d55cc8a
      Masami Hiramatsu 提交于
      Add generic text_poke_smp for SMP which uses stop_machine()
      to synchronize modifying code.
      This stop_machine() method is officially described at "7.1.3
      Handling Self- and Cross-Modifying Code" on the intel's
      software developer's manual 3A.
      
      Since stop_machine() can't protect code against NMI/MCE, this
      function can not modify those handlers. And also, this function
      is basically for modifying multibyte-single-instruction. For
      modifying multibyte-multi-instructions, we need another special
      trap & detour code.
      
      This code originaly comes from immediate values with
      stop_machine() version. Thanks Jason and Mathieu!
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: systemtap <systemtap@sources.redhat.com>
      Cc: DLE <dle-develop@lists.sourceforge.net>
      Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Anders Kaseorg <andersk@ksplice.com>
      Cc: Tim Abbott <tabbott@ksplice.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      LKML-Reference: <20100225133438.6725.80273.stgit@localhost6.localdomain6>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3d55cc8a
  20. 08 2月, 2010 1 次提交
    • M
      x86/alternatives: Fix build warning · 076dc4a6
      Masami Hiramatsu 提交于
      Fixes these warnings:
      
       arch/x86/kernel/alternative.c: In function 'alternatives_text_reserved':
       arch/x86/kernel/alternative.c:402: warning: comparison of distinct pointer types lacks a cast
       arch/x86/kernel/alternative.c:402: warning: comparison of distinct pointer types lacks a cast
       arch/x86/kernel/alternative.c:405: warning: comparison of distinct pointer types lacks a cast
       arch/x86/kernel/alternative.c:405: warning: comparison of distinct pointer types lacks a cast
      
      Caused by:
      
        2cfa1978: ftrace/alternatives: Introducing *_text_reserved functions
      
      Changes in v2:
        - Use local variables to compare, instead of type casts.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: systemtap <systemtap@sources.redhat.com>
      Cc: DLE <dle-develop@lists.sourceforge.net>
      LKML-Reference: <20100205171647.15750.37221.stgit@dhcp-100-2-132.bos.redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      076dc4a6
  21. 04 2月, 2010 1 次提交
    • M
      ftrace/alternatives: Introducing *_text_reserved functions · 2cfa1978
      Masami Hiramatsu 提交于
      Introducing *_text_reserved functions for checking the text
      address range is partially reserved or not. This patch provides
      checking routines for x86 smp alternatives and dynamic ftrace.
      Since both functions modify fixed pieces of kernel text, they
      should reserve and protect those from other dynamic text
      modifier, like kprobes.
      
      This will also be extended when introducing other subsystems
      which modify fixed pieces of kernel text. Dynamic text modifiers
      should avoid those.
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: systemtap <systemtap@sources.redhat.com>
      Cc: DLE <dle-develop@lists.sourceforge.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: przemyslaw@pawelczyk.it
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
      Cc: Jason Baron <jbaron@redhat.com>
      LKML-Reference: <20100202214911.4694.16587.stgit@dhcp-100-2-132.bos.redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2cfa1978
  22. 30 12月, 2009 1 次提交
    • J
      x86-64: Modify copy_user_generic() alternatives mechanism · 1b1d9258
      Jan Beulich 提交于
      In order to avoid unnecessary chains of branches, rather than
      implementing copy_user_generic() as a function consisting of
      just a single (possibly patched) branch, instead properly deal
      with patching call instructions in the alternative instructions
      framework, and move the patching into the callers.
      
      As a follow-on, one could also introduce something like
      __EXPORT_SYMBOL_ALT() to avoid patching call sites in modules.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B2BB8180200007800026AE7@vpn.id2.novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1b1d9258
  23. 11 9月, 2009 1 次提交
  24. 22 8月, 2009 1 次提交
  25. 10 3月, 2009 1 次提交
  26. 06 3月, 2009 2 次提交
  27. 25 2月, 2009 1 次提交
  28. 18 2月, 2009 1 次提交
    • A
      x86, mce: don't disable machine checks during code patching · 123aa76e
      Andi Kleen 提交于
      Impact: low priority bug fix
      
      This removes part of a a patch I added myself some time ago. After some
      consideration the patch was a bad idea. In particular it stopped machine check
      exceptions during code patching.
      
      To quote the comment:
      
              * MCEs only happen when something got corrupted and in this
              * case we must do something about the corruption.
              * Ignoring it is worse than a unlikely patching race.
              * Also machine checks tend to be broadcast and if one CPU
              * goes into machine check the others follow quickly, so we don't
              * expect a machine check to cause undue problems during to code
              * patching.
      
      So undo the machine check related parts of
      8f4e956b NMIs are still disabled.
      
      This only removes code, the only additions are a new comment.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      123aa76e
  29. 13 10月, 2008 1 次提交
  30. 06 9月, 2008 1 次提交
  31. 19 8月, 2008 1 次提交
  32. 16 8月, 2008 1 次提交
    • M
      x86: alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug · f88f07e0
      Mathieu Desnoyers 提交于
      If a kernel thread is preempted in single-cpu mode right after the NOP (nop
      about to be turned into a lock prefix), then we CPU hotplug a CPU, and then the
      thread is scheduled back again, a SMP-unsafe atomic operation will be used on
      shared SMP variables, leading to corruption. No corruption would happen in the
      reverse case : going from SMP to UP is ok because we split a bit instruction
      into tiny pieces, which does not present this condition.
      
      Changing the 0x90 (single-byte nop) currently used into a 0x3E DS segment
      override prefix should fix this issue. Since the default of the atomic
      instructions is to use the DS segment anyway, it should not affect the
      behavior.
      
      The exception to this are references that use ESP/RSP and EBP/RBP as
      the base register (they will use the SS segment), however, in Linux
      (a) DS == SS at all times, and (b) we do not distinguish between
      segment violations reported as #SS as opposed to #GP, so there is no
      need to disassemble the instruction to figure out the suitable segment.
      
      This patch assumes that the 0x3E prefix will leave atomic operations as-is (thus
      assuming they normally touch data in the DS segment). Since there seem to be no
      obvious ill-use of other segment override prefixes for atomic operations, it
      should be safe. It can be verified with a quick
      
      grep -r LOCK_PREFIX include/asm-x86/
      grep -A 1 -r LOCK_PREFIX arch/x86/
      
      Taken from
      
      This source :
      AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System
      Instructions
      States
      "Instructions that Reference a Non-Stack Segment—If an instruction encoding
      references any base register other than rBP or rSP, or if an instruction
      contains an immediate offset, the default segment is the data segment (DS).
      These instructions can use the segment-override prefix to select one of the
      non-default segments, as shown in Table 1-5."
      
      Therefore, forcing the DS segment on the atomic operations, which already use
      the DS segment, should not change.
      
      This source :
      http://wiki.osdev.org/X86_Instruction_Encoding
      States
      "In 64-bit the CS, SS, DS and ES segment overrides are ignored."
      
      Confirmed by "AMD 64-Bit Technology" A.7
      http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/x86-64_overview.pdf
      
      "In 64-bit mode, the DS, ES, SS and CS segment-override prefixes have no effect.
      These four prefixes are no longer treated as segment-override prefixes in the
      context of multipleprefix rules. Instead, they are treated as null prefixes."
      
      This patch applies to 2.6.27-rc2, but would also have to be applied to earlier
      kernels (2.6.26, 2.6.25, ...).
      
      Performance impact of the fix : tests done on "xaddq" and "xaddl" shows it
      actually improves performances on Intel Xeon, AMD64, Pentium M. It does not
      change the performance on Pentium II, Pentium 3 and Pentium 4.
      
      Xeon E5405 2.0GHz :
      NR_TESTS                                    10000000
      test empty cycles :                        162207948
      test test 1-byte nop xadd cycles :         170755422
      test test DS override prefix xadd cycles : 170000118 *
      test test LOCK xadd cycles :               472012134
      
      AMD64 2.0GHz :
      NR_TESTS                                    10000000
      test empty cycles :                        146674549
      test test 1-byte nop xadd cycles :         150273860
      test test DS override prefix xadd cycles : 149982382 *
      test test LOCK xadd cycles :               270000690
      
      Pentium 4 3.0GHz
      NR_TESTS                                    10000000
      test empty cycles :                        290001195
      test test 1-byte nop xadd cycles :         310000560
      test test DS override prefix xadd cycles : 310000575 *
      test test LOCK xadd cycles :              1050103740
      
      Pentium M 2.0GHz
      NR_TESTS 10000000
      test empty cycles :                        180000523
      test test 1-byte nop xadd cycles :         320000345
      test test DS override prefix xadd cycles : 310000374 *
      test test LOCK xadd cycles :               480000357
      
      Pentium 3 550MHz
      NR_TESTS                                    10000000
      test empty cycles :                        510000231
      test test 1-byte nop xadd cycles :         620000128
      test test DS override prefix xadd cycles : 620000110 *
      test test LOCK xadd cycles :               800000088
      
      Pentium II 350MHz
      NR_TESTS                                    10000000
      test empty cycles :                        200833494
      test test 1-byte nop xadd cycles :         340000130
      test test DS override prefix xadd cycles : 340000126 *
      test test LOCK xadd cycles :               530000078
      
      Speed test modules can be found at
      http://ltt.polymtl.ca/svn/trunk/tests/kernel/test-prefix-speed-32.c
      http://ltt.polymtl.ca/svn/trunk/tests/kernel/test-prefix-speed.c
      
      Macro-benchmarks
      
      2.0GHz E5405 Core 2 dual Quad-Core Xeon
      
      Summary
      
      * replace smp lock prefixes with DS segment selector prefixes
                        no lock prefix (s)   with lock prefix (s)    Speedup
      make -j1 kernel/      33.94 +/- 0.07         34.91 +/- 0.27      2.8 %
      hackbench 50           2.99 +/- 0.01          3.74 +/- 0.01     25.1 %
      
      * replace smp lock prefixes with 0x90 nops
                        no lock prefix (s)   with lock prefix (s)    Speedup
      make -j1 kernel/      34.16 +/- 0.32         34.91 +/- 0.27      2.2 %
      hackbench 50           3.00 +/- 0.01          3.74 +/- 0.01     24.7 %
      
      Detail :
      
      1 CPU, replace smp lock prefixes with DS segment selector prefixes
      
      make -j1 kernel/
      
      real	0m34.067s
      user	0m30.630s
      sys	0m2.980s
      
      real	0m33.867s
      user	0m30.582s
      sys	0m3.024s
      
      real	0m33.939s
      user	0m30.738s
      sys	0m2.876s
      
      real	0m33.913s
      user	0m30.806s
      sys	0m2.808s
      
      avg : 33.94s
      std. dev. : 0.07s
      
      hackbench 50
      
      Time: 2.978
      Time: 2.982
      Time: 3.010
      Time: 2.984
      Time: 2.982
      
      avg : 2.99
      std. dev. : 0.01
      
      1 CPU, noreplace-smp
      
      make -j1 kernel/
      
      real	0m35.326s
      user	0m30.630s
      sys	0m3.260s
      
      real	0m34.325s
      user	0m30.802s
      sys	0m3.084s
      
      real	0m35.568s
      user	0m30.722s
      sys	0m3.168s
      
      real	0m34.435s
      user	0m30.886s
      sys	0m2.996s
      
      avg.: 34.91s
      std. dev. : 0.27s
      
      hackbench 50
      
      Time: 3.733
      Time: 3.750
      Time: 3.761
      Time: 3.737
      Time: 3.741
      
      avg : 3.74
      std. dev. : 0.01
      
      1 CPU, replace smp lock prefixes with 0x90 nops
      
      make -j1 kernel/
      
      real	0m34.139s
      user	0m30.782s
      sys	0m2.820s
      
      real	0m34.010s
      user	0m30.630s
      sys	0m2.976s
      
      real	0m34.777s
      user	0m30.658s
      sys	0m2.916s
      
      real	0m33.924s
      user	0m30.634s
      sys	0m2.924s
      
      real	0m33.962s
      user	0m30.774s
      sys	0m2.800s
      
      real	0m34.141s
      user	0m30.770s
      sys	0m2.828s
      
      avg : 34.16
      std. dev. : 0.32
      
      hackbench 50
      
      Time: 2.999
      Time: 2.994
      Time: 3.004
      Time: 2.991
      Time: 2.988
      
      avg : 3.00
      std. dev. : 0.01
      
      I did more runs (20 runs of each) to compare the nop case to the DS
      prefix case. Results in seconds. They actually does not seems to show a
      significant difference.
      
      NOP
      
      34.155
      33.955
      34.012
      35.299
      35.679
      34.141
      33.995
      35.016
      34.254
      33.957
      33.957
      34.008
      35.013
      34.494
      33.893
      34.295
      34.314
      34.854
      33.991
      34.132
      
      DS
      
      34.080
      34.304
      34.374
      35.095
      34.291
      34.135
      33.940
      34.208
      35.276
      34.288
      33.861
      33.898
      34.610
      34.709
      33.851
      34.256
      35.161
      34.283
      33.865
      35.078
      
      Used http://www.graphpad.com/quickcalcs/ttest1.cfm?Format=C to do the
      T-test (yeah, I'm lazy) :
      
       Group      Group One (DS prefix)       Group Two (nops)
       Mean                    34.37815               34.37070
       SD                       0.46108                0.51905
       SEM                      0.10310                0.11606
       N                             20                     20
      
      P value and statistical significance:
        The two-tailed P value equals 0.9620
        By conventional criteria, this difference is considered to be not statistically significant.
      
      Confidence interval:
        The mean of Group One minus Group Two equals 0.00745
        95% confidence interval of this difference: From -0.30682 to 0.32172
      
      Intermediate values used in calculations:
        t = 0.0480
        df = 38
        standard error of difference = 0.155
      
      So, unless these calculus are completely bogus, the difference between the nop
      and the DS case seems not to be statistically significant.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      CC: Linus Torvalds <torvalds@linux-foundation.org>
      CC: Jeremy Fitzhardinge <jeremy@goop.org>
      CC: Roland McGrath <roland@redhat.com>
      CC: Ingo Molnar <mingo@elte.hu>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      CC: Steven Rostedt <srostedt@redhat.com>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: David Miller <davem@davemloft.net>
      CC: Ulrich Drepper <drepper@redhat.com>
      CC: Rusty Russell <rusty@rustcorp.com.au>
      CC: Gregory Haskins <ghaskins@novell.com>
      CC: Arnaldo Carvalho de Melo <acme@redhat.com>
      CC: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
      CC: Clark Williams <williams@redhat.com>
      CC: Christoph Lameter <cl@linux-foundation.org>
      CC: Andi Kleen <andi@firstfloor.org>
      CC: Harvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      f88f07e0
  33. 24 5月, 2008 2 次提交
  34. 26 4月, 2008 1 次提交