1. 11 1月, 2018 1 次提交
  2. 10 8月, 2017 1 次提交
  3. 24 5月, 2017 1 次提交
  4. 20 12月, 2016 1 次提交
  5. 29 4月, 2016 1 次提交
  6. 04 9月, 2015 1 次提交
    • T
      x86/alternatives: Make optimize_nops() interrupt safe and synced · 66c117d7
      Thomas Gleixner 提交于
      Richard reported the following crash:
      
      [    0.036000] BUG: unable to handle kernel paging request at 55501e06
      [    0.036000] IP: [<c0aae48b>] common_interrupt+0xb/0x38
      [    0.036000] Call Trace:
      [    0.036000]  [<c0409c80>] ? add_nops+0x90/0xa0
      [    0.036000]  [<c040a054>] apply_alternatives+0x274/0x630
      
      Chuck decoded:
      
       "  0:   8d 90 90 83 04 24       lea    0x24048390(%eax),%edx
          6:   80 fc 0f                cmp    $0xf,%ah
          9:   a8 0f                   test   $0xf,%al
       >> b:   a0 06 1e 50 55          mov    0x55501e06,%al
         10:   57                      push   %edi
         11:   56                      push   %esi
      
       Interrupt 0x30 occurred while the alternatives code was replacing the
       initial 0x90,0x90,0x90 NOPs (from the ASM_CLAC macro) with the
       optimized version, 0x8d,0x76,0x00. Only the first byte has been
       replaced so far, and it makes a mess out of the insn decoding."
      
      optimize_nops() is buggy in two aspects:
      
      - It's not disabling interrupts across the modification
      - It's lacking a sync_core() call
      
      Add both.
      
      Fixes: 4fd4b6e5 'x86/alternatives: Use optimized NOPs for padding'
      Reported-and-tested-by: N"Richard W.M. Jones" <rjones@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Richard W.M. Jones <rjones@redhat.com>
      Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1509031232340.15006@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      66c117d7
  7. 19 5月, 2015 1 次提交
    • I
      x86/alternatives, x86/fpu: Add 'alternatives_patched' debug flag and use it in xsave_state() · 5e907bb0
      Ingo Molnar 提交于
      We'd like to use xsave_state() earlier, but its SYSTEM_BOOTING check
      is too imprecise.
      
      The real condition that xsave_state() would like to check is whether
      alternative XSAVE instructions were patched into the kernel image
      already.
      
      Add such a (read-mostly) debug flag and use it in xsave_state().
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5e907bb0
  8. 11 5月, 2015 1 次提交
    • B
      x86/alternatives: Switch AMD F15h and later to the P6 NOPs · f21262b8
      Borislav Petkov 提交于
      Software optimization guides for both F15h and F16h cite those
      NOPs as the optimal ones. A microbenchmark confirms that
      actually even older families are better with the single-insn
      NOPs so switch to them for the alternatives.
      
      Cycles count below includes the loop overhead of the measurement
      but that overhead is the same with all runs.
      
      	F10h, revE:
      	-----------
      	Running NOP tests, 1000 NOPs x 1000000 repetitions
      
      	K8:
      			      90     288.212282 cycles
      			   66 90     288.220840 cycles
      			66 66 90     288.219447 cycles
      		     66 66 66 90     288.223204 cycles
      		  66 66 90 66 90     571.393424 cycles
      	       66 66 90 66 66 90     571.374919 cycles
      	    66 66 66 90 66 66 90     572.249281 cycles
      	 66 66 66 90 66 66 66 90     571.388651 cycles
      
      	P6:
      			      90     288.214193 cycles
      			   66 90     288.225550 cycles
      			0f 1f 00     288.224441 cycles
      		     0f 1f 40 00     288.225030 cycles
      		  0f 1f 44 00 00     288.233558 cycles
      	       66 0f 1f 44 00 00     324.792342 cycles
      	    0f 1f 80 00 00 00 00     325.657462 cycles
      	 0f 1f 84 00 00 00 00 00     430.246643 cycles
      
      	F14h:
      	----
      	Running NOP tests, 1000 NOPs x 1000000 repetitions
      
      	K8:
      			      90     510.404890 cycles
      			   66 90     510.432117 cycles
      			66 66 90     510.561858 cycles
      		     66 66 66 90     510.541865 cycles
      		  66 66 90 66 90    1014.192782 cycles
      	       66 66 90 66 66 90    1014.226546 cycles
      	    66 66 66 90 66 66 90    1014.334299 cycles
      	 66 66 66 90 66 66 66 90    1014.381205 cycles
      
      	P6:
      			      90     510.436710 cycles
      			   66 90     510.448229 cycles
      			0f 1f 00     510.545100 cycles
      		     0f 1f 40 00     510.502792 cycles
      		  0f 1f 44 00 00     510.589517 cycles
      	       66 0f 1f 44 00 00     510.611462 cycles
      	    0f 1f 80 00 00 00 00     511.166794 cycles
      	 0f 1f 84 00 00 00 00 00     511.651641 cycles
      
      	F15h:
      	-----
      	Running NOP tests, 1000 NOPs x 1000000 repetitions
      
      	K8:
      			      90     243.128396 cycles
      			   66 90     243.129883 cycles
      			66 66 90     243.131631 cycles
      		     66 66 66 90     242.499324 cycles
      		  66 66 90 66 90     481.829083 cycles
      	       66 66 90 66 66 90     481.884413 cycles
      	    66 66 66 90 66 66 90     481.851446 cycles
      	 66 66 66 90 66 66 66 90     481.409220 cycles
      
      	P6:
      			      90     243.127026 cycles
      			   66 90     243.130711 cycles
      			0f 1f 00     243.122747 cycles
      		     0f 1f 40 00     242.497617 cycles
      		  0f 1f 44 00 00     245.354461 cycles
      	       66 0f 1f 44 00 00     361.930417 cycles
      	    0f 1f 80 00 00 00 00     362.844944 cycles
      	 0f 1f 84 00 00 00 00 00     480.514948 cycles
      
      	F16h:
      	-----
      	Running NOP tests, 1000 NOPs x 1000000 repetitions
      
      	K8:
      			      90     507.793298 cycles
      			   66 90     507.789636 cycles
      			66 66 90     507.826490 cycles
      		     66 66 66 90     507.859075 cycles
      		  66 66 90 66 90    1008.663129 cycles
      	       66 66 90 66 66 90    1008.696259 cycles
      	    66 66 66 90 66 66 90    1008.692517 cycles
      	 66 66 66 90 66 66 66 90    1008.755399 cycles
      
      	P6:
      			      90     507.795232 cycles
      			   66 90     507.794761 cycles
      			0f 1f 00     507.834901 cycles
      		     0f 1f 40 00     507.822629 cycles
      		  0f 1f 44 00 00     507.838493 cycles
      	       66 0f 1f 44 00 00     507.908597 cycles
      	    0f 1f 80 00 00 00 00     507.946417 cycles
      	 0f 1f 84 00 00 00 00 00     507.954960 cycles
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1431332153-18566-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f21262b8
  9. 06 4月, 2015 1 次提交
    • B
      x86/alternatives: Guard NOPs optimization · 69df353f
      Borislav Petkov 提交于
      Take a look at the first instruction byte before optimizing the NOP -
      there might be something else there already, like the ALTERNATIVE_2()
      in rdtsc_barrier() which NOPs out on AMD even though we just
      patched in an MFENCE.
      
      This happens because the alternatives sees X86_FEATURE_MFENCE_RDTSC,
      AMD CPUs set it, we patch in the MFENCE and right afterwards it sees
      X86_FEATURE_LFENCE_RDTSC which AMD CPUs don't set and we blindly
      optimize the NOP.
      
      Checking whether at least the first byte is 0x90 prevents that.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1428181662-18020-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69df353f
  10. 04 4月, 2015 1 次提交
    • B
      x86/alternatives: Fix ALTERNATIVE_2 padding generation properly · dbe4058a
      Borislav Petkov 提交于
      Quentin caught a corner case with the generation of instruction
      padding in the ALTERNATIVE_2 macro: if len(orig_insn) <
      len(alt1) < len(alt2), then not enough padding gets added and
      that is not good(tm) as we could overwrite the beginning of the
      next instruction.
      
      Luckily, at the time of this writing, we don't have
      ALTERNATIVE_2() invocations which have that problem and even if
      we did, a simple fix would be to prepend the instructions with
      enough prefixes so that that corner case doesn't happen.
      
      However, best it would be if we fixed it properly. See below for
      a simple, abstracted example of what we're doing.
      
      So what we ended up doing is, we compute the
      
      	max(len(alt1), len(alt2)) - len(orig_insn)
      
      and feed that value to the .skip gas directive. The max() cannot
      have conditionals due to gas limitations, thus the fancy integer
      math.
      
      With this patch, all ALTERNATIVE_2 sites get padded correctly;
      generating obscure test cases pass too:
      
        #define alt_max_short(a, b)    ((a) ^ (((a) ^ (b)) & -(-((a) < (b)))))
      
        #define gen_skip(orig, alt1, alt2, marker)	\
        	.skip -((alt_max_short(alt1, alt2) - (orig)) > 0) * \
        		(alt_max_short(alt1, alt2) - (orig)),marker
      
        	.pushsection .text, "ax"
        .globl main
        main:
        	gen_skip(1, 2, 4, 0x09)
        	gen_skip(4, 1, 2, 0x10)
        	...
        	.popsection
      
      Thanks to Quentin for catching it and double-checking the fix!
      Reported-by: NQuentin Casasnovas <quentin.casasnovas@oracle.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150404133443.GE21152@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dbe4058a
  11. 23 3月, 2015 1 次提交
  12. 23 2月, 2015 4 次提交
    • B
      x86/alternatives: Use optimized NOPs for padding · 4fd4b6e5
      Borislav Petkov 提交于
      Alternatives allow now for an empty old instruction. In this case we go
      and pad the space with NOPs at assembly time. However, there are the
      optimal, longer NOPs which should be used. Do that at patching time by
      adding alt_instr.padlen-sized NOPs at the old instruction address.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      4fd4b6e5
    • B
      x86/alternatives: Make JMPs more robust · 48c7a250
      Borislav Petkov 提交于
      Up until now we had to pay attention to relative JMPs in alternatives
      about how their relative offset gets computed so that the jump target
      is still correct. Or, as it is the case for near CALLs (opcode e8), we
      still have to go and readjust the offset at patching time.
      
      What is more, the static_cpu_has_safe() facility had to forcefully
      generate 5-byte JMPs since we couldn't rely on the compiler to generate
      properly sized ones so we had to force the longest ones. Worse than
      that, sometimes it would generate a replacement JMP which is longer than
      the original one, thus overwriting the beginning of the next instruction
      at patching time.
      
      So, in order to alleviate all that and make using JMPs more
      straight-forward we go and pad the original instruction in an
      alternative block with NOPs at build time, should the replacement(s) be
      longer. This way, alternatives users shouldn't pay special attention
      so that original and replacement instruction sizes are fine but the
      assembler would simply add padding where needed and not do anything
      otherwise.
      
      As a second aspect, we go and recompute JMPs at patching time so that we
      can try to make 5-byte JMPs into two-byte ones if possible. If not, we
      still have to recompute the offsets as the replacement JMP gets put far
      away in the .altinstr_replacement section leading to a wrong offset if
      copied verbatim.
      
      For example, on a locally generated kernel image
      
        old insn VA: 0xffffffff810014bd, CPU feat: X86_FEATURE_ALWAYS, size: 2
        __switch_to:
         ffffffff810014bd:      eb 21                   jmp ffffffff810014e0
        repl insn: size: 5
        ffffffff81d0b23c:       e9 b1 62 2f ff          jmpq ffffffff810014f2
      
      gets corrected to a 2-byte JMP:
      
        apply_alternatives: feat: 3*32+21, old: (ffffffff810014bd, len: 2), repl: (ffffffff81d0b23c, len: 5)
        alt_insn: e9 b1 62 2f ff
        recompute_jumps: next_rip: ffffffff81d0b241, tgt_rip: ffffffff810014f2, new_displ: 0x00000033, ret len: 2
        converted to: eb 33 90 90 90
      
      and a 5-byte JMP:
      
        old insn VA: 0xffffffff81001516, CPU feat: X86_FEATURE_ALWAYS, size: 2
        __switch_to:
         ffffffff81001516:      eb 30                   jmp ffffffff81001548
        repl insn: size: 5
         ffffffff81d0b241:      e9 10 63 2f ff          jmpq ffffffff81001556
      
      gets shortened into a two-byte one:
      
        apply_alternatives: feat: 3*32+21, old: (ffffffff81001516, len: 2), repl: (ffffffff81d0b241, len: 5)
        alt_insn: e9 10 63 2f ff
        recompute_jumps: next_rip: ffffffff81d0b246, tgt_rip: ffffffff81001556, new_displ: 0x0000003e, ret len: 2
        converted to: eb 3e 90 90 90
      
      ... and so on.
      
      This leads to a net win of around
      
      40ish replacements * 3 bytes savings =~ 120 bytes of I$
      
      on an AMD guest which means some savings of precious instruction cache
      bandwidth. The padding to the shorter 2-byte JMPs are single-byte NOPs
      which on smart microarchitectures means discarding NOPs at decode time
      and thus freeing up execution bandwidth.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      48c7a250
    • B
      x86/alternatives: Add instruction padding · 4332195c
      Borislav Petkov 提交于
      Up until now we have always paid attention to make sure the length of
      the new instruction replacing the old one is at least less or equal to
      the length of the old instruction. If the new instruction is longer, at
      the time it replaces the old instruction it will overwrite the beginning
      of the next instruction in the kernel image and cause your pants to
      catch fire.
      
      So instead of having to pay attention, teach the alternatives framework
      to pad shorter old instructions with NOPs at buildtime - but only in the
      case when
      
        len(old instruction(s)) < len(new instruction(s))
      
      and add nothing in the >= case. (In that case we do add_nops() when
      patching).
      
      This way the alternatives user shouldn't have to care about instruction
      sizes and simply use the macros.
      
      Add asm ALTERNATIVE* flavor macros too, while at it.
      
      Also, we need to save the pad length in a separate struct alt_instr
      member for NOP optimization and the way to do that reliably is to carry
      the pad length instead of trying to detect whether we're looking at
      single-byte NOPs or at pathological instruction offsets like e9 90 90 90
      90, for example, which is a valid instruction.
      
      Thanks to Michael Matz for the great help with toolchain questions.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      4332195c
    • B
      x86/alternatives: Cleanup DPRINTK macro · db477a33
      Borislav Petkov 提交于
      Make it pass __func__ implicitly. Also, dump info about each replacing
      we're doing. Fixup comments and style while at it.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      db477a33
  13. 24 4月, 2014 1 次提交
  14. 28 9月, 2013 1 次提交
  15. 23 7月, 2013 1 次提交
    • J
      kprobes/x86: Call out into INT3 handler directly instead of using notifier · 17f41571
      Jiri Kosina 提交于
      In fd4363ff ("x86: Introduce int3 (breakpoint)-based
      instruction patching"), the mechanism that was introduced for
      notifying alternatives code from int3 exception handler that and
      exception occured was die_notifier.
      
      This is however problematic, as early code might be using jump
      labels even before the notifier registration has been performed,
      which will then lead to an oops due to unhandled exception. One
      of such occurences has been encountered by Fengguang:
      
       int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
       Modules linked in:
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8
       task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000
       RIP: 0010:[<ffffffff811098cc>]  [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
       RSP: 0000:ffff88000dd03f10  EFLAGS: 00000006
       RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40
       RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001
       RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002
       R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0
       R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8
       FS:  0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0
       Stack:
        ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48
        ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68
        ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98
       Call Trace:
        <IRQ>
        [<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79
        [<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84
        [<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0
        [<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41
        [<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80
        <EOI>
        [<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1
        [<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16
        [<ffffffff81015f10>] default_idle+0x147/0x282
        [<ffffffff81017026>] arch_cpu_idle+0x3d/0x5d
        [<ffffffff81127d6a>] cpu_idle_loop+0x46d/0x5db
        [<ffffffff81127f5c>] cpu_startup_entry+0x84/0x84
        [<ffffffff8104f4f8>] start_secondary+0x3c8/0x3d5
        [...]
      
      Fix this by directly calling poke_int3_handler() from the int3
      exception handler (analogically to what ftrace has been doing
      already), instead of relying on notifier, registration of which
      might not have yet been finalized by the time of the first trap.
      Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307231007490.14024@pobox.suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      17f41571
  16. 19 7月, 2013 1 次提交
  17. 17 7月, 2013 1 次提交
  18. 03 4月, 2013 1 次提交
  19. 19 9月, 2012 1 次提交
  20. 23 8月, 2012 1 次提交
    • R
      x86/smp: Don't ever patch back to UP if we unplug cpus · 816afe4f
      Rusty Russell 提交于
      We still patch SMP instructions to UP variants if we boot with a
      single CPU, but not at any other time.  In particular, not if we
      unplug CPUs to return to a single cpu.
      
      Paul McKenney points out:
      
       mean offline overhead is 6251/48=130.2 milliseconds.
      
       If I remove the alternatives_smp_switch() from the offline
       path [...] the mean offline overhead is 550/42=13.1 milliseconds
      
      Basically, we're never going to get those 120ms back, and the
      code is pretty messy.
      
      We get rid of:
      
       1) The "smp-alt-once" boot option. It's actually "smp-alt-boot", the
          documentation is wrong. It's now the default.
      
       2) The skip_smp_alternatives flag used by suspend.
      
       3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end()
          which were only used to set this one flag.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Paul McKenney <paul.mckenney@us.ibm.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/87vcgwwive.fsf@rustcorp.com.auSigned-off-by: NIngo Molnar <mingo@kernel.org>
      816afe4f
  21. 22 8月, 2012 1 次提交
  22. 25 7月, 2012 1 次提交
  23. 13 6月, 2012 1 次提交
  24. 06 6月, 2012 1 次提交
  25. 14 11月, 2011 1 次提交
  26. 15 7月, 2011 1 次提交
  27. 14 7月, 2011 1 次提交
  28. 18 5月, 2011 1 次提交
  29. 19 4月, 2011 2 次提交
  30. 05 4月, 2011 1 次提交
    • J
      jump label: Introduce static_branch() interface · d430d3d7
      Jason Baron 提交于
      Introduce:
      
      static __always_inline bool static_branch(struct jump_label_key *key);
      
      instead of the old JUMP_LABEL(key, label) macro.
      
      In this way, jump labels become really easy to use:
      
      Define:
      
              struct jump_label_key jump_key;
      
      Can be used as:
      
              if (static_branch(&jump_key))
                      do unlikely code
      
      enable/disale via:
      
              jump_label_inc(&jump_key);
              jump_label_dec(&jump_key);
      
      that's it!
      
      For the jump labels disabled case, the static_branch() becomes an
      atomic_read(), and jump_label_inc()/dec() are simply atomic_inc(),
      atomic_dec() operations. We show testing results for this change below.
      
      Thanks to H. Peter Anvin for suggesting the 'static_branch()' construct.
      
      Since we now require a 'struct jump_label_key *key', we can store a pointer into
      the jump table addresses. In this way, we can enable/disable jump labels, in
      basically constant time. This change allows us to completely remove the previous
      hashtable scheme. Thanks to Peter Zijlstra for this re-write.
      
      Testing:
      
      I ran a series of 'tbench 20' runs 5 times (with reboots) for 3
      configurations, where tracepoints were disabled.
      
      jump label configured in
      avg: 815.6
      
      jump label *not* configured in (using atomic reads)
      avg: 800.1
      
      jump label *not* configured in (regular reads)
      avg: 803.4
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110316212947.GA8792@redhat.com>
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Suggested-by: NH. Peter Anvin <hpa@linux.intel.com>
      Tested-by: NDavid Daney <ddaney@caviumnetworks.com>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d430d3d7
  31. 18 3月, 2011 1 次提交
  32. 15 3月, 2011 1 次提交
    • M
      x86: stop_machine_text_poke() should issue sync_core() · 0e00f7ae
      Mathieu Desnoyers 提交于
      Intel Archiecture Software Developer's Manual section 7.1.3 specifies that a
      core serializing instruction such as "cpuid" should be executed on _each_ core
      before the new instruction is made visible.
      
      Failure to do so can lead to unspecified behavior (Intel XMC erratas include
      General Protection Fault in the list), so we should avoid this at all cost.
      
      This problem can affect modified code executed by interrupt handlers after
      interrupt are re-enabled at the end of stop_machine, because no core serializing
      instruction is executed between the code modification and the moment interrupts
      are reenabled.
      
      Because stop_machine_text_poke performs the text modification from the first CPU
      decrementing stop_machine_first, modified code executed in thread context is
      also affected by this problem. To explain why, we have to split the CPUs in two
      categories: the CPU that initiates the text modification (calls text_poke_smp)
      and all the others. The scheduler, executed on all other CPUs after
      stop_machine, issues an "iret" core serializing instruction, and therefore
      handles core serialization for all these CPUs. However, the text modification
      initiator can continue its execution on the same thread and access the modified
      text without any scheduler call. Given that the CPU that initiates the code
      modification is not guaranteed to be the one actually performing the code
      modification, it falls into the XMC errata.
      
      Q: Isn't this executed from an IPI handler, which will return with IRET (a
         serializing instruction) anyway?
      A: No, now stop_machine uses per-cpu workqueue, so that handler will be
         executed from worker threads. There is no iret anymore.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      LKML-Reference: <20110303160137.GB1590@Krystal>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: <stable@kernel.org>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0e00f7ae
  33. 12 2月, 2011 1 次提交
    • P
      x86: Fix text_poke_smp_batch() deadlock · d91309f6
      Peter Zijlstra 提交于
      Fix this deadlock - we are already holding the mutex:
      
      =======================================================
      [ INFO: possible circular locking dependency detected ] 2.6.38-rc4-test+ #1
      -------------------------------------------------------
      bash/1850 is trying to acquire lock:
       (text_mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
      
      but task is already holding lock:
       (smp_alt){+.+...}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (smp_alt){+.+...}:
             [<ffffffff81082d02>] lock_acquire+0xcd/0xf8
             [<ffffffff8192e119>] __mutex_lock_common+0x4c/0x339
             [<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43
             [<ffffffff8101050f>] alternatives_smp_switch+0x77/0x1d8
             [<ffffffff81926a6f>] do_boot_cpu+0xd7/0x762
             [<ffffffff819277dd>] native_cpu_up+0xe6/0x16a
             [<ffffffff81928e28>] _cpu_up+0x9d/0xee
             [<ffffffff81928f4c>] cpu_up+0xd3/0xe7
             [<ffffffff82268d4b>] kernel_init+0xe8/0x20a
             [<ffffffff8100ba24>] kernel_thread_helper+0x4/0x10
      
      -> #1 (cpu_hotplug.lock){+.+.+.}:
             [<ffffffff81082d02>] lock_acquire+0xcd/0xf8
             [<ffffffff8192e119>] __mutex_lock_common+0x4c/0x339
             [<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43
             [<ffffffff810568cc>] get_online_cpus+0x41/0x55
             [<ffffffff810a1348>] stop_machine+0x1e/0x3e
             [<ffffffff819314c1>] text_poke_smp_batch+0x3a/0x3c
             [<ffffffff81932b6c>] arch_optimize_kprobes+0x10d/0x11c
             [<ffffffff81933a51>] kprobe_optimizer+0x152/0x222
             [<ffffffff8106bb71>] process_one_work+0x1d3/0x335
             [<ffffffff8106cfae>] worker_thread+0x104/0x1a4
             [<ffffffff810707c4>] kthread+0x9d/0xa5
             [<ffffffff8100ba24>] kernel_thread_helper+0x4/0x10
      
      -> #0 (text_mutex){+.+.+.}:
      
      other info that might help us debug this:
      
      6 locks held by bash/1850:
       #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
       #1:  (s_active#75){.+.+.+}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
       #2:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
       #3:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
       #4:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
       #5:  (smp_alt){+.+...}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
      
      stack backtrace:
      Pid: 1850, comm: bash Not tainted 2.6.38-rc4-test+ #1
      Call Trace:
      
       [<ffffffff81080eb2>] print_circular_bug+0xa8/0xb7
       [<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43
       [<ffffffff81010302>] alternatives_smp_unlock+0x3d/0x93
       [<ffffffff81010630>] alternatives_smp_switch+0x198/0x1d8
       [<ffffffff8102568a>] native_cpu_die+0x65/0x95
       [<ffffffff818cc4ec>] _cpu_down+0x13e/0x202
       [<ffffffff8117a619>] sysfs_write_file+0x108/0x144
       [<ffffffff8111f5a2>] vfs_write+0xac/0xff
       [<ffffffff8111f7a9>] sys_write+0x4a/0x6e
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: mathieu.desnoyers@efficios.com
      Cc: rusty@rustcorp.com.au
      Cc: ananth@in.ibm.com
      Cc: masami.hiramatsu.pt@hitachi.com
      Cc: fweisbec@gmail.com
      Cc: jbeulich@novell.com
      Cc: jbaron@redhat.com
      Cc: mhiramat@redhat.com
      LKML-Reference: <1297458466.5226.93.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d91309f6
  34. 14 12月, 2010 1 次提交
  35. 07 12月, 2010 1 次提交
    • M
      x86: Introduce text_poke_smp_batch() for batch-code modifying · 7deb18dc
      Masami Hiramatsu 提交于
      Introduce text_poke_smp_batch(). This function modifies several
      text areas with one stop_machine() on SMP. Because calling
      stop_machine() is heavy task, it is better to aggregate
      text_poke requests.
      
      ( Note: I've talked with Rusty about this interface, and
        he would not like to expand stop_machine() interface, since
        it is not for generic use. )
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: 2nddept-manager@sdl.hitachi.co.jp
      LKML-Reference: <20101203095422.2961.51217.stgit@ltc236.sdl.hitachi.co.jp>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7deb18dc
  36. 30 10月, 2010 1 次提交