1. 09 1月, 2011 1 次提交
    • C
      perf, x86: P4 PMU - Fix unflagged overflows handling · 047a3772
      Cyrill Gorcunov 提交于
      Don found that P4 PMU reads CCCR register instead of counter
      itself (in attempt to catch unflagged event) this makes P4
      NMI handler to consume all NMIs it observes. So the other
      NMI users such as kgdb simply have no chance to get NMI
      on their hands.
      
      Side note: at moment there is no way to run nmi-watchdog
      together with perf tool. This is because both 'perf top' and
      nmi-watchdog use same event. So while nmi-watchdog reserves
      one event/counter for own needs there is no room for perf tool
      left (there is a way to disable nmi-watchdog on boot of course).
      
      Ming has tested this patch with the following results
      
       | 1. watchdog disabled
       |
       | kgdb tests on boot OK
       | perf works OK
       |
       | 2. watchdog enabled, without patch perf-x86-p4-nmi-4
       |
       | kgdb tests on boot hang
       |
       | 3. watchdog enabled, without patch perf-x86-p4-nmi-4 and do not run kgdb
       | tests on boot
       |
       | "perf top" partialy works
       |   cpu-cycles            no
       |   instructions          yes
       |   cache-references      no
       |   cache-misses          no
       |   branch-instructions   no
       |   branch-misses         yes
       |   bus-cycles            no
       |
       | 4. watchdog enabled, with patch perf-x86-p4-nmi-4 applied
       |
       | kgdb tests on boot OK
       | perf does not work, NMI "Dazed and confused" messages show up
       |
      
      Which means we still have problems with p4 box due to 'unknown'
      nmi happens but at least it should fix kgdb test cases.
      Reported-by: NJason Wessel <jason.wessel@windriver.com>
      Reported-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Acked-by: NLin Ming <ming.m.lin@intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <4D275E7E.3040903@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      047a3772
  2. 08 1月, 2011 1 次提交
    • F
      x86: Save rbp in pt_regs on irq entry · 625dbc3b
      Frederic Weisbecker 提交于
      From the x86_64 low level interrupt handlers, the frame pointer is
      saved right after the partial pt_regs frame.
      
      rbp is not supposed to be part of the irq partial saved registers,
      but it only requires to extend the pt_regs frame by 8 bytes to
      do so, plus a tiny stack offset fixup on irq exit.
      
      This changes a bit the semantics or get_irq_entry() that is supposed
      to provide only the value of caller saved registers and the cpu
      saved frame. However it's a win for unwinders that can walk through
      stack frames on top of get_irq_regs() snapshots.
      
      A noticeable impact is that it makes perf events cpu-clock and
      task-clock events based callchains working on x86_64.
      
      Let's then save rbp into the irq pt_regs.
      
      As a result with:
      
      	perf record -e cpu-clock perf bench sched messaging
      	perf report --stdio
      
      Before:
          20.94%             perf  [kernel.kallsyms]        [k] lock_acquire
                             |
                             --- lock_acquire
                                |
                                |--44.01%-- __write_nocancel
                                |
                                |--43.18%-- __read
                                |
                                |--6.08%-- fork
                                |          create_worker
                                |
                                |--0.88%-- _dl_fixup
                                |
                                |--0.65%-- do_lookup_x
                                |
                                |--0.53%-- __GI___libc_read
                                 --4.67%-- [...]
      
      After:
          19.23%         perf  [kernel.kallsyms]    [k] __lock_acquire
                         |
                         --- __lock_acquire
                            |
                            |--97.74%-- lock_acquire
                            |          |
                            |          |--21.82%-- _raw_spin_lock
                            |          |          |
                            |          |          |--37.26%-- unix_stream_recvmsg
                            |          |          |          sock_aio_read
                            |          |          |          do_sync_read
                            |          |          |          vfs_read
                            |          |          |          sys_read
                            |          |          |          system_call
                            |          |          |          __read
                            |          |          |
                            |          |          |--24.09%-- unix_stream_sendmsg
                            |          |          |          sock_aio_write
                            |          |          |          do_sync_write
                            |          |          |          vfs_write
                            |          |          |          sys_write
                            |          |          |          system_call
                            |          |          |          __write_nocancel
      
      v2: Fix cfi annotations.
      Reported-by: NSoeren Sandmann Pedersen <sandmann@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      625dbc3b
  3. 07 1月, 2011 7 次提交
  4. 05 1月, 2011 3 次提交
  5. 04 1月, 2011 1 次提交
    • T
      perf: Clean up power events by introducing new, more generic ones · 25e41933
      Thomas Renninger 提交于
      Add these new power trace events:
      
       power:cpu_idle
       power:cpu_frequency
       power:machine_suspend
      
      The old C-state/idle accounting events:
        power:power_start
        power:power_end
      
      Have now a replacement (but we are still keeping the old
      tracepoints for compatibility):
      
        power:cpu_idle
      
      and
        power:power_frequency
      
      is replaced with:
        power:cpu_frequency
      
      power:machine_suspend is newly introduced.
      
      Jean Pihet has a patch integrated into the generic layer
      (kernel/power/suspend.c) which will make use of it.
      
      the type= field got removed from both, it was never
      used and the type is differed by the event type itself.
      
      perf timechart userspace tool gets adjusted in a separate patch.
      Signed-off-by: NThomas Renninger <trenn@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Acked-by: NJean Pihet <jean.pihet@newoldbits.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: rjw@sisk.pl
      LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
      25e41933
  6. 27 12月, 2010 1 次提交
    • J
      x86/microcode: Fix double vfree() and remove redundant pointer checks before vfree() · 5cdd2de0
      Jesper Juhl 提交于
      In arch/x86/kernel/microcode_intel.c::generic_load_microcode()
      we have  this:
      
      	while (leftover) {
      		...
      		if (get_ucode_data(mc, ucode_ptr, mc_size) ||
      		    microcode_sanity_check(mc) < 0) {
      			vfree(mc);
      			break;
      		}
      		...
      	}
      
      	if (mc)
      		vfree(mc);
      
      This will cause a double free of 'mc'. This patch fixes that by
      just  removing the vfree() call in the loop since 'mc' will be
      freed nicely just  after we break out of the loop.
      
      There's also a second change in the patch. I noticed a lot of
      checks for  pointers being NULL before passing them to vfree().
      That's completely  redundant since vfree() deals gracefully with
      being passed a NULL pointer.  Removing the redundant checks
      yields a nice size decrease for the object  file.
      
      Size before the patch:
         text    data     bss     dec     hex filename
         4578     240    1032    5850    16da arch/x86/kernel/microcode_intel.o
      Size after the patch:
         text    data     bss     dec     hex filename
         4489     240     984    5713    1651 arch/x86/kernel/microcode_intel.o
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Acked-by: NTigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Shaohua Li <shaohua.li@intel.com>
      LKML-Reference: <alpine.LNX.2.00.1012251946100.10759@swampdragon.chaosbits.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5cdd2de0
  7. 23 12月, 2010 1 次提交
    • D
      x86, nmi_watchdog: Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR · 4a7863cc
      Don Zickus 提交于
      The x86 arch has shifted its use of the nmi_watchdog from a
      local implementation to the global one provide by
      kernel/watchdog.c.  This shift has caused a whole bunch of
      compile problems under different config options.  I attempt to
      simplify things with the patch below.
      
      In order to simplify things, I had to come to terms with the
      meaning of two terms ARCH_HAS_NMI_WATCHDOG and
      CONFIG_HARDLOCKUP_DETECTOR.  Basically they mean the same thing,
      the former on a local level and the latter on a global level.
      
      With the old x86 nmi watchdog gone, there is no need to rely on
      defining the ARCH_HAS_NMI_WATCHDOG variable because it doesn't
      make sense any more.  x86 will now use the global
      implementation.
      
      The changes below do a few things.  First it changes the few
      places that relied on ARCH_HAS_NMI_WATCHDOG to use
      CONFIG_X86_LOCAL_APIC (the former was an alias for the latter
      anyway, so nothing unusual here).  Those pieces of code were
      relying more on local apic functionality the nmi watchdog
      functionality, so the change should make sense.
      
      Second, I removed the x86 implementation of
      touch_nmi_watchdog().  It isn't need now, instead x86 will rely
      on kernel/watchdog.c's implementation.
      
      Third, I removed the #define ARCH_HAS_NMI_WATCHDOG itself from
      x86.  And tweaked the include/linux/nmi.h file to tell users to
      look for an externally defined touch_nmi_watchdog in the case of
      ARCH_HAS_NMI_WATCHDOG _or_ CONFIG_HARDLOCKUP_DETECTOR. This
      changes removes some of the ugliness in that file.
      
      Finally, I added a Kconfig dependency for
      CONFIG_HARDLOCKUP_DETECTOR that said you can't have
      ARCH_HAS_NMI_WATCHDOG _and_ CONFIG_HARDLOCKUP_DETECTOR.  You can
      only have one nmi_watchdog.
      
      Tested with
      ARCH=i386: allnoconfig, defconfig, allyesconfig, (various broken
      configs) ARCH=x86_64: allnoconfig, defconfig, allyesconfig,
      (various broken configs)
      
      Hopefully, after this patch I won't get any more compile broken
      emails. :-)
      
      v3:
        changed a couple of 'linux/nmi.h' -> 'asm/nmi.h' to pick-up correct function
        prototypes when CONFIG_HARDLOCKUP_DETECTOR is not set.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: fweisbec@gmail.com
      LKML-Reference: <1293044403-14117-1-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4a7863cc
  8. 18 12月, 2010 5 次提交
  9. 17 12月, 2010 1 次提交
  10. 16 12月, 2010 4 次提交
    • P
      perf, x86: Provide a PEBS capable cycle event · 7639dae0
      Peter Zijlstra 提交于
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7639dae0
    • P
      perf: Dynamic pmu types · 2e80a82a
      Peter Zijlstra 提交于
      Extend the perf_pmu_register() interface to allow for named and
      dynamic pmu types.
      
      Because we need to support the existing static types we cannot use
      dynamic types for everything, hence provide a type argument.
      
      If we want to enumerate the PMUs they need a name, provide one.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101117222056.259707703@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2e80a82a
    • P
      perf, x86: Detect broken BIOSes that corrupt the PMU · 4407204c
      Peter Zijlstra 提交于
      Some BIOSes use PMU resources, which can cause various bugs:
      
       - Non-working or erratic PMU based statistics - the PMU can end up
         counting the wrong thing, resulting in misleading statistics
      
       - Profiling can stop working or it can profile the wrong thing
      
       - A non-working or erratic NMI watchdog that cannot be relied on
      
       - The kernel may disturb whatever thing the BIOS tries to use the
         PMU for - possibly causing hardware malfunction in extreme cases.
      
       - ... and other forms of potential misbehavior
      
      Various forms of such misbehavior has been observed in practice - there are
      BIOSes that just corrupt the PMU state, consequences be damned.
      
      The PMU is a CPU resource that is handled by the kernel and the BIOS
      stealing+corrupting it is not acceptable nor robust, so we detect it,
      warn about it and further refuse to touch the PMU ourselves.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4407204c
    • R
      lguest: populate initial_page_table · da32dac1
      Rusty Russell 提交于
      Two x86 patches broke lguest:
      1) v2.6.35-492-g72d7c3b3, which changed x86 to use the memblock allocator.
      
      In lguest, the host places linear page tables at the top of mem, which
      used to be enough to get us up to the swapper_pg_dir page tables.  With
      the first patch, the direct mapping tables used that memory:
      
      Before: kernel direct mapping tables up to 4000000 @ 7000-1a000
      After: kernel direct mapping tables up to 4000000 @ 3fed000-4000000
      
      I initially fixed this by lying about the amount of memory we had, so
      the kernel wouldn't blatt the lguest boot pagetables (yuk!), but then...
      
      2) v2.6.36-rc8-54-gb40827fa, which made x86 boot use initial_page_table.
      
      This was initialized in a part of head_32.S which isn't executed by
      lguest; it is then copied into swapper_pg_dir.  So we have to initialize
      it; and anyway we switch to it before we blatt the old tables, so that
      fixes the previous damage as well.
      
      For the moment, I cut & pasted the code into lguest's boot code, but
      next merge window I will merge them.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      To: x86@kernel.org
      da32dac1
  11. 14 12月, 2010 4 次提交
  12. 13 12月, 2010 1 次提交
    • T
      x86: HPET: Chose a paranoid safe value for the ETIME check · f1c18071
      Thomas Gleixner 提交于
      commit 995bd3bb (x86: Hpet: Avoid the comparator readback penalty)
      chose 8 HPET cycles as a safe value for the ETIME check, as we had the
      confirmation that the posted write to the comparator register is
      delayed by two HPET clock cycles on Intel chipsets which showed
      readback problems.
      
      After that patch hit mainline we got reports from machines with newer
      AMD chipsets which seem to have an even longer delay. See
      http://thread.gmane.org/gmane.linux.kernel/1054283 and
      http://thread.gmane.org/gmane.linux.kernel/1069458 for further
      information.
      
      Boris tried to come up with an ACPI based selection of the minimum
      HPET cycles, but this failed on a couple of test machines. And of
      course we did not get any useful information from the hardware folks.
      
      For now our only option is to chose a paranoid high and safe value for
      the minimum HPET cycles used by the ETIME check. Adjust the minimum ns
      value for the HPET clockevent accordingly.
      Reported-Bistected-and-Tested-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <alpine.LFD.2.00.1012131222420.2653@localhost6.localdomain6>
      Cc: Simon Kirby <sim@hostway.ca>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      f1c18071
  13. 10 12月, 2010 3 次提交
  14. 09 12月, 2010 2 次提交
  15. 07 12月, 2010 4 次提交
    • M
      kprobes: Use text_poke_smp_batch for unoptimizing · f984ba4e
      Masami Hiramatsu 提交于
      Use text_poke_smp_batch() on unoptimization path for reducing
      the number of stop_machine() issues. If the number of
      unoptimizing probes is more than MAX_OPTIMIZE_PROBES(=256),
      kprobes unoptimizes first MAX_OPTIMIZE_PROBES probes and kicks
      optimizer for remaining probes.
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: 2nddept-manager@sdl.hitachi.co.jp
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <20101203095434.2961.22657.stgit@ltc236.sdl.hitachi.co.jp>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f984ba4e
    • M
      kprobes: Use text_poke_smp_batch for optimizing · cd7ebe22
      Masami Hiramatsu 提交于
      Use text_poke_smp_batch() in optimization path for reducing
      the number of stop_machine() issues. If the number of optimizing
      probes is more than MAX_OPTIMIZE_PROBES(=256), kprobes optimizes
      first MAX_OPTIMIZE_PROBES probes and kicks optimizer for
      remaining probes.
      
      Changes in v5:
      - Use kick_kprobe_optimizer() instead of directly calling
        schedule_delayed_work().
      - Rescheduling optimizer outside of kprobe mutex lock.
      
      Changes in v2:
      - Allocate code buffer and parameters in arch_init_kprobes()
        instead of using static arraies.
      - Merge previous max optimization limit patch into this patch.
        So, this patch introduces upper limit of optimization at
        once.
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: 2nddept-manager@sdl.hitachi.co.jp
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <20101203095428.2961.8994.stgit@ltc236.sdl.hitachi.co.jp>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cd7ebe22
    • M
      x86: Introduce text_poke_smp_batch() for batch-code modifying · 7deb18dc
      Masami Hiramatsu 提交于
      Introduce text_poke_smp_batch(). This function modifies several
      text areas with one stop_machine() on SMP. Because calling
      stop_machine() is heavy task, it is better to aggregate
      text_poke requests.
      
      ( Note: I've talked with Rusty about this interface, and
        he would not like to expand stop_machine() interface, since
        it is not for generic use. )
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: 2nddept-manager@sdl.hitachi.co.jp
      LKML-Reference: <20101203095422.2961.51217.stgit@ltc236.sdl.hitachi.co.jp>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7deb18dc
    • M
      kprobes: Support delayed unoptimizing · 6274de49
      Masami Hiramatsu 提交于
      Unoptimization occurs when a probe is unregistered or disabled,
      and is heavy because it recovers instructions by using
      stop_machine(). This patch delays unoptimization operations and
      unoptimize several probes at once by using
      text_poke_smp_batch(). This can avoid unexpected system slowdown
      coming from stop_machine().
      
      Changes in v5:
      - Split this patch into several cleanup patches and this patch.
      - Fix some text_mutex lock miss.
      - Use bool instead of int for behavior flags.
      - Add additional comment for (un)optimizing path.
      
      Changes in v2:
      - Use dynamic allocated buffers and params.
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: 2nddept-manager@sdl.hitachi.co.jp
      LKML-Reference: <20101203095409.2961.82733.stgit@ltc236.sdl.hitachi.co.jp>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6274de49
  16. 28 11月, 2010 1 次提交