1. 06 12月, 2011 1 次提交
  2. 29 9月, 2011 1 次提交
    • J
      x86-64: Fix CFI data for interrupt frames · eab9e613
      Jan Beulich 提交于
      The patch titled "x86: Don't use frame pointer to save old stack
      on irq entry" did not properly adjust CFI directives, so this
      patch is a follow-up to that one.
      
      With the old stack pointer no longer stored in a callee-saved
      register (plus some offset), we now have to use a CFA expression
      to describe the memory location where it is being found. This
      requires the use of .cfi_escape (allowing arbitrary byte streams
      to be emitted into .eh_frame), as there is no
      .cfi_def_cfa_expression (which also cannot reasonably be
      expected, as it would require a full expression parser).
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/4E8360200200007800058467@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      eab9e613
  3. 11 8月, 2011 1 次提交
  4. 03 7月, 2011 4 次提交
    • F
      x86: Don't use frame pointer to save old stack on irq entry · a2bbe750
      Frederic Weisbecker 提交于
      rbp is used in SAVE_ARGS_IRQ to save the old stack pointer
      in order to restore it later in ret_from_intr.
      
      It is convenient because we save its value in the irq regs
      and it's easily restored using the leave instruction.
      
      However this is a kind of abuse of the frame pointer which
      role is to help unwinding the kernel by chaining frames
      together, each node following the return address to the
      previous frame.
      
      But although we are breaking the frame by changing the stack
      pointer, there is no preceding return address before the new
      frame. Hence using the frame pointer to link the two stacks
      breaks the stack unwinders that find a random value instead of
      a return address here.
      
      There is no workaround that can work in every case. We are using
      the fixup_bp_irq_link() function to dereference that abused frame
      pointer in the case of non nesting interrupt (which means stack
      changed).
      But that doesn't fix the case of interrupts that don't change the
      stack (but we still have the unconditional frame link), which is
      the case of hardirq interrupting softirq. We have no way to detect
      this transition so the frame irq link is considered as a real frame
      pointer and the return address is dereferenced but it is still a
      spurious one.
      
      There are two possible results of this: either the spurious return
      address, a random stack value, luckily belongs to the kernel text
      and then the unwinding can continue and we just have a weird entry
      in the stack trace. Or it doesn't belong to the kernel text and
      unwinding stops there.
      
      This is the reason why stacktraces (including perf callchains) on
      irqs that interrupted softirqs don't work very well.
      
      To solve this, we don't save the old stack pointer on rbp anymore
      but we save it to a scratch register that we push on the new
      stack and that we pop back later on irq return.
      
      This preserves the whole frame chain without spurious return addresses
      in the middle and drops the need for the horrid fixup_bp_irq_link()
      workaround.
      
      And finally irqs that interrupt softirq are sanely unwinded.
      
      Before:
      
          99.81%         perf  [kernel.kallsyms]  [k] perf_pending_event
                         |
                         --- perf_pending_event
                             irq_work_run
                             smp_irq_work_interrupt
                             irq_work_interrupt
                            |
                            |--41.60%-- __read
                            |          |
                            |          |--99.90%-- create_worker
                            |          |          bench_sched_messaging
                            |          |          cmd_bench
                            |          |          run_builtin
                            |          |          main
                            |          |          __libc_start_main
                            |           --0.10%-- [...]
      
      After:
      
           1.64%  swapper  [kernel.kallsyms]  [k] perf_pending_event
                  |
                  --- perf_pending_event
                      irq_work_run
                      smp_irq_work_interrupt
                      irq_work_interrupt
                     |
                     |--95.00%-- arch_irq_work_raise
                     |          irq_work_queue
                     |          __perf_event_overflow
                     |          perf_swevent_overflow
                     |          perf_swevent_event
                     |          perf_tp_event
                     |          perf_trace_softirq
                     |          __do_softirq
                     |          call_softirq
                     |          do_softirq
                     |          irq_exit
                     |          |
                     |          |--73.68%-- smp_apic_timer_interrupt
                     |          |          apic_timer_interrupt
                     |          |          |
                     |          |          |--96.43%-- amd_e400_idle
                     |          |          |          cpu_idle
                     |          |          |          start_secondary
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      a2bbe750
    • F
      x86: Remove useless unwinder backlink from irq regs saving · 48ffee7d
      Frederic Weisbecker 提交于
      The unwinder backlink in interrupt entry is very useless.
      It's actually not part of the stack frame chain and thus is
      never used.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      48ffee7d
    • F
      x86,64: Separate arg1 from rbp handling in SAVE_REGS_IRQ · 3b99a3ef
      Frederic Weisbecker 提交于
      Just for clarity in the code. Have a first block that handles
      the frame pointer and a separate one that handles pt_regs
      pointer and its use.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      3b99a3ef
    • F
      x86,64: Simplify save_regs() · 1871853f
      Frederic Weisbecker 提交于
      The save_regs function that saves the regs on low level
      irq entry is complicated because of the fact it changes
      its stack in the middle and also because it manipulates
      data allocated in the caller frame and accesses there
      are directly calculated from callee rsp value with the
      return address in the middle of the way.
      
      This complicates the static stack offsets calculation and
      require more dynamic ones. It also needs a save/restore
      of the function's return address.
      
      To simplify and optimize this, turn save_regs() into a
      macro.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      1871853f
  5. 16 6月, 2011 1 次提交
  6. 07 6月, 2011 1 次提交
    • A
      x86-64: Emulate legacy vsyscalls · 5cec93c2
      Andy Lutomirski 提交于
      There's a fair amount of code in the vsyscall page.  It contains
      a syscall instruction (in the gettimeofday fallback) and who
      knows what will happen if an exploit jumps into the middle of
      some other code.
      
      Reduce the risk by replacing the vsyscalls with short magic
      incantations that cause the kernel to emulate the real
      vsyscalls. These incantations are useless if entered in the
      middle.
      
      This causes vsyscalls to be a little more expensive than real
      syscalls.  Fortunately sensible programs don't use them.
      The only exception is time() which is still called by glibc
      through the vsyscall - but calling time() millions of times
      per second is not sensible. glibc has this fixed in the
      development tree.
      
      This patch is not perfect: the vread_tsc and vread_hpet
      functions are still at a fixed address.  Fixing that might
      involve making alternative patching work in the vDSO.
      Signed-off-by: NAndy Lutomirski <luto@mit.edu>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Jesper Juhl <jj@chaosbits.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
      Cc: Mikael Pettersson <mikpe@it.uu.se>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
      Cc: Valdis.Kletnieks@vt.edu
      Cc: pageexec@freemail.hu
      Link: http://lkml.kernel.org/r/e64e1b3c64858820d12c48fa739efbd1485e79d5.1307292171.git.luto@mit.edu
      [ Removed the CONFIG option - it's simpler to just do it unconditionally. Tidied up the code as well. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5cec93c2
  7. 06 6月, 2011 1 次提交
  8. 04 6月, 2011 2 次提交
  9. 18 3月, 2011 1 次提交
  10. 12 3月, 2011 1 次提交
    • A
      x86, binutils, xen: Fix another wrong size directive · 371c394a
      Alexander van Heukelum 提交于
      The latest binutils (2.21.0.20110302/Ubuntu) breaks the build
      yet another time, under CONFIG_XEN=y due to a .size directive that
      refers to a slightly differently named (hence, to the now very
      strict and unforgiving assembler, non-existent) symbol.
      
      [ mingo:
      
         This unnecessary build breakage caused by new binutils
         version 2.21 gets escallated back several kernel releases spanning
         several years of Linux history, affecting over 130,000 upstream
         kernel commits (!), on CONFIG_XEN=y 64-bit kernels (i.e. essentially
         affecting all major Linux distro kernel configs).
      
         Git annotate tells us that this slight debug symbol code mismatch
         bug has been introduced in 2008 in commit 3d75e1b8:
      
           3d75e1b8        (Jeremy Fitzhardinge    2008-07-08 15:06:49 -0700 1231) ENTRY(xen_do_hypervisor_callback)   # do_hypervisor_callback(struct *pt_regs)
      
         The 'bug' is just a slight assymetry in ENTRY()/END()
         debug-symbols sequences, with lots of assembly code between the
         ENTRY() and the END():
      
           ENTRY(xen_do_hypervisor_callback)   # do_hypervisor_callback(struct *pt_regs)
             ...
           END(do_hypervisor_callback)
      
         Human reviewers almost never catch such small mismatches, and binutils
         never even warned about it either.
      
         This new binutils version thus breaks the Xen build on all upstream kernels
         since v2.6.27, out of the blue.
      
         This makes a straightforward Git bisection of all 64-bit Xen-enabled kernels
         impossible on such binutils, for a bisection window of over hundred
         thousand historic commits. (!)
      
         This is a major fail on the side of binutils and binutils needs to turn
         this show-stopper build failure into a warning ASAP. ]
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Kees Cook <kees.cook@canonical.com>
      LKML-Reference: <1299877178-26063-1-git-send-email-heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      371c394a
  11. 09 3月, 2011 1 次提交
    • J
      x86: Separate out entry text section · ea714547
      Jiri Olsa 提交于
      Put x86 entry code into a separate link section: .entry.text.
      
      Separating the entry text section seems to have performance
      benefits - caused by more efficient instruction cache usage.
      
      Running hackbench with perf stat --repeat showed that the change
      compresses the icache footprint. The icache load miss rate went
      down by about 15%:
      
       before patch:
               19417627  L1-icache-load-misses      ( +-   0.147% )
      
       after patch:
               16490788  L1-icache-load-misses      ( +-   0.180% )
      
      The motivation of the patch was to fix a particular kprobes
      bug that relates to the entry text section, the performance
      advantage was discovered accidentally.
      
      Whole perf output follows:
      
       - results for current tip tree:
      
        Performance counter stats for './hackbench/hackbench 10' (500 runs):
      
               19417627  L1-icache-load-misses      ( +-   0.147% )
             2676914223  instructions             #      0.497 IPC     ( +- 0.079% )
             5389516026  cycles                     ( +-   0.144% )
      
            0.206267711  seconds time elapsed   ( +-   0.138% )
      
       - results for current tip tree with the patch applied:
      
        Performance counter stats for './hackbench/hackbench 10' (500 runs):
      
               16490788  L1-icache-load-misses      ( +-   0.180% )
             2717734941  instructions             #      0.502 IPC     ( +- 0.079% )
             5414756975  cycles                     ( +-   0.148% )
      
            0.206747566  seconds time elapsed   ( +-   0.137% )
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: masami.hiramatsu.pt@hitachi.com
      Cc: ananth@in.ibm.com
      Cc: davem@davemloft.net
      Cc: 2nddept-manager@sdl.hitachi.co.jp
      LKML-Reference: <20110307181039.GB15197@jolsa.redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ea714547
  12. 14 2月, 2011 1 次提交
  13. 12 1月, 2011 1 次提交
  14. 08 1月, 2011 1 次提交
    • F
      x86: Save rbp in pt_regs on irq entry · 625dbc3b
      Frederic Weisbecker 提交于
      From the x86_64 low level interrupt handlers, the frame pointer is
      saved right after the partial pt_regs frame.
      
      rbp is not supposed to be part of the irq partial saved registers,
      but it only requires to extend the pt_regs frame by 8 bytes to
      do so, plus a tiny stack offset fixup on irq exit.
      
      This changes a bit the semantics or get_irq_entry() that is supposed
      to provide only the value of caller saved registers and the cpu
      saved frame. However it's a win for unwinders that can walk through
      stack frames on top of get_irq_regs() snapshots.
      
      A noticeable impact is that it makes perf events cpu-clock and
      task-clock events based callchains working on x86_64.
      
      Let's then save rbp into the irq pt_regs.
      
      As a result with:
      
      	perf record -e cpu-clock perf bench sched messaging
      	perf report --stdio
      
      Before:
          20.94%             perf  [kernel.kallsyms]        [k] lock_acquire
                             |
                             --- lock_acquire
                                |
                                |--44.01%-- __write_nocancel
                                |
                                |--43.18%-- __read
                                |
                                |--6.08%-- fork
                                |          create_worker
                                |
                                |--0.88%-- _dl_fixup
                                |
                                |--0.65%-- do_lookup_x
                                |
                                |--0.53%-- __GI___libc_read
                                 --4.67%-- [...]
      
      After:
          19.23%         perf  [kernel.kallsyms]    [k] __lock_acquire
                         |
                         --- __lock_acquire
                            |
                            |--97.74%-- lock_acquire
                            |          |
                            |          |--21.82%-- _raw_spin_lock
                            |          |          |
                            |          |          |--37.26%-- unix_stream_recvmsg
                            |          |          |          sock_aio_read
                            |          |          |          do_sync_read
                            |          |          |          vfs_read
                            |          |          |          sys_read
                            |          |          |          system_call
                            |          |          |          __read
                            |          |          |
                            |          |          |--24.09%-- unix_stream_sendmsg
                            |          |          |          sock_aio_write
                            |          |          |          do_sync_write
                            |          |          |          vfs_write
                            |          |          |          sys_write
                            |          |          |          system_call
                            |          |          |          __write_nocancel
      
      v2: Fix cfi annotations.
      Reported-by: NSoeren Sandmann Pedersen <sandmann@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      625dbc3b
  15. 18 11月, 2010 1 次提交
  16. 20 10月, 2010 1 次提交
    • J
      x86, asm: Fix CFI macro invocations to deal with shortcomings in gas · 3234282f
      Jan Beulich 提交于
      gas prior to (perhaps) 2.16.90 has problems with passing non-
      parenthesized expressions containing spaces to macros. Spaces, however,
      get inserted by cpp between any macro expanding to a number and a
      subsequent + or -. For the +, current x86 gas then removes the space
      again (future gas may not do so), but for the - the space gets retained
      and is then considered a separator between macro arguments.
      
      Fix the respective definitions for both the - and + cases, so that they
      neither contain spaces nor make cpp insert any (the latter by adding
      seemingly redundant parentheses).
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      LKML-Reference: <4CBDBEBA020000780001E05A@vpn.id2.novell.com>
      Cc: Alexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      3234282f
  17. 19 10月, 2010 1 次提交
    • P
      irq_work: Add generic hardirq context callbacks · e360adbe
      Peter Zijlstra 提交于
      Provide a mechanism that allows running code in IRQ context. It is
      most useful for NMI code that needs to interact with the rest of the
      system -- like wakeup a task to drain buffers.
      
      Perf currently has such a mechanism, so extract that and provide it as
      a generic feature, independent of perf so that others may also
      benefit.
      
      The IRQ context callback is generated through self-IPIs where
      possible, or on architectures like powerpc the decrementer (the
      built-in timer facility) is set to generate an interrupt immediately.
      
      Architectures that don't have anything like this get to do with a
      callback from the timer tick. These architectures can call
      irq_work_run() at the tail of any IRQ handlers that might enqueue such
      work (like the perf IRQ handler) to avoid undue latencies in
      processing the work.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      [ various fixes ]
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e360adbe
  18. 03 9月, 2010 4 次提交
  19. 14 8月, 2010 1 次提交
  20. 02 8月, 2010 1 次提交
  21. 23 7月, 2010 1 次提交
  22. 22 7月, 2010 1 次提交
    • R
      x86: auditsyscall: fix fastpath return value after reschedule · 03275591
      Roland McGrath 提交于
      In the CONFIG_AUDITSYSCALL fast-path for x86 64-bit system calls,
      we can pass a bad return value and/or error indication for the
      system call to audit_syscall_exit().  This happens when
      TIF_NEED_RESCHED was set as the system call returned, so we went
      out to schedule() and came back to the exit-audit fast-path.  The
      fix is to reload the user return value register from the pt_regs
      before using it for audit_syscall_exit().
      
      Both the 32-bit kernel's fast path and the 64-bit kernel's 32-bit
      system call fast paths work slightly differently, so that they
      always leave the fast path entirely to reschedule and don't return
      there, so they don't have the analogous bugs.
      Reported-by: NAlexander Viro <aviro@redhat.com>
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      03275591
  23. 11 12月, 2009 1 次提交
  24. 06 12月, 2009 1 次提交
    • F
      x86: Fixup wrong debug exception frame link in stacktraces · b625b3b3
      Frederic Weisbecker 提交于
      While dumping a stacktrace, the end of the exception stack won't link
      the frame pointer to the previous stack.
      
      The interrupted stack will then be considered as unreliable and ignored
      by perf, as the frame pointer is unreliable itself.
      
      This happens because we overwrite the frame pointer that links to the
      interrupted frame with the address of the exception stack. This is
      done in order to reserve space inside.
      But rbp has been chosen here only because it is not a scratch register,
      so that the address of the exception stack remains in rbp after calling
      do_debug(), we can then release the exception stack space without the
      need to retrieve its address again.
      
      But we can pick another non-scratch register to do that, so that we
      preserve the link to the interrupted stack frame in the stacktraces.
      
      Just randomly choose r12. Every registers are saved just before and
      restored just after calling do_debug(). And r12 is not used in the
      middle, which makes it a perfect candidate.
      
      Example: perf record -g -a -c 1 -f -e mem:$(tasklist_lock_addr):rw
      
      Before:
          44.18%  [k] _raw_read_lock
                  |
                  |
                  ---  |--6.31%-- waitid
                       |
                       |--4.26%-- writev
                       |
                       |--3.63%-- __select
                       |
                       |--3.15%-- __waitpid
                       |          |
                       |          |--28.57%-- 0x8b52e00000139f
                       |          |
                       |          |--28.57%-- 0x8b52e0000013c6
                       |          |
                       |          |--14.29%-- 0x7fde786dc000
                       |          |
                       |          |--14.29%-- 0x62696c2f7273752f
                       |          |
                       |           --14.29%-- 0x1ea9df800000000
                       |
                       |--3.00%-- __poll
      
      After:
      
          43.94%  [k] _raw_read_lock
                  |
                  --- _read_lock
                     |
                     |--60.53%-- send_sigio
                     |          __kill_fasync
                     |          kill_fasync
                     |          evdev_pass_event
                     |          evdev_event
                     |          input_pass_event
                     |          input_handle_event
                     |          input_event
                     |          synaptics_process_byte
                     |          psmouse_handle_byte
                     |          psmouse_interrupt
                     |          serio_interrupt
                     |          i8042_interrupt
                     |          handle_IRQ_event
                     |          handle_edge_irq
                     |          handle_irq
                     |          __irqentry_text_start
                     |          ret_from_intr
                     |          |
                     |          |--30.43%-- __select
                     |          |
                     |          |--17.39%-- 0x454f15
                     |          |
                     |          |--13.04%-- __read
                     |          |
                     |          |--13.04%-- vread_hpet
                     |          |
                     |          |--13.04%-- _xcb_lock_io
                     |          |
                     |           --13.04%-- 0x7f630878ce87
      
      Note: it does not only affect perf events but also other stacktraces in
      x86-64. They were considered as unreliable once we quit the debug
      stack frame.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      b625b3b3
  25. 04 11月, 2009 1 次提交
  26. 15 10月, 2009 1 次提交
  27. 14 10月, 2009 1 次提交
    • S
      function-graph/x86: Replace unbalanced ret with jmp · 194ec341
      Steven Rostedt 提交于
      The function graph tracer replaces the return address with a hook
      to trace the exit of the function call. This hook will finish by
      returning to the real location the function should return to.
      
      But the current implementation uses a ret to jump to the real
      return location. This causes a imbalance between calls and ret.
      That is the original function does a call, the ret goes to the
      handler and then the handler does a ret without a matching call.
      
      Although the function graph tracer itself still breaks the branch
      predictor by replacing the original ret, by using a second ret and
      causing an imbalance, it breaks the predictor even more.
      
      This patch replaces the ret with a jmp to keep the calls and ret
      balanced. I tested this on one box and it showed a 1.7% increase in
      performance. Another box only showed a small 0.3% increase. But no
      box that I tested this on showed a decrease in performance by
      making this change.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20091013203425.042034383@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      194ec341
  28. 13 10月, 2009 1 次提交
  29. 23 9月, 2009 1 次提交
  30. 21 9月, 2009 1 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  31. 13 9月, 2009 1 次提交
  32. 30 8月, 2009 1 次提交
  33. 19 6月, 2009 1 次提交
    • S
      function-graph: add stack frame test · 71e308a2
      Steven Rostedt 提交于
      In case gcc does something funny with the stack frames, or the return
      from function code, we would like to detect that.
      
      An arch may implement passing of a variable that is unique to the
      function and can be saved on entering a function and can be tested
      when exiting the function. Usually the frame pointer can be used for
      this purpose.
      
      This patch also implements this for x86. Where it passes in the stack
      frame of the parent function, and will test that frame on exit.
      
      There was a case in x86_32 with optimize for size (-Os) where, for a
      few functions, gcc would align the stack frame and place a copy of the
      return address into it. The function graph tracer modified the copy and
      not the actual return address. On return from the funtion, it did not go
      to the tracer hook, but returned to the parent. This broke the function
      graph tracer, because the return of the parent (where gcc did not do
      this funky manipulation) returned to the location that the child function
      was suppose to. This caused strange kernel crashes.
      
      This test detected the problem and pointed out where the issue was.
      
      This modifies the parameters of one of the functions that the arch
      specific code calls, so it includes changes to arch code to accommodate
      the new prototype.
      
      Note, I notice that the parsic arch implements its own push_return_trace.
      This is now a generic function and the ftrace_push_return_trace should be
      used instead. This patch does not touch that code.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      71e308a2