1. 01 7月, 2011 2 次提交
    • P
      perf: Remove the nmi parameter from the swevent and overflow interface · a8b0ca17
      Peter Zijlstra 提交于
      The nmi parameter indicated if we could do wakeups from the current
      context, if not, we would set some state and self-IPI and let the
      resulting interrupt do the wakeup.
      
      For the various event classes:
      
        - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
          the PMI-tail (ARM etc.)
        - tracepoint: nmi=0; since tracepoint could be from NMI context.
        - software: nmi=[0,1]; some, like the schedule thing cannot
          perform wakeups, and hence need 0.
      
      As one can see, there is very little nmi=1 usage, and the down-side of
      not using it is that on some platforms some software events can have a
      jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).
      
      The up-side however is that we can remove the nmi parameter and save a
      bunch of conditionals in fast paths.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Michael Cree <mcree@orcon.net.nz>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      a8b0ca17
    • C
      perf, x86: Add hw_watchdog_set_attr() in a sake of nmi-watchdog on P4 · 1880c4ae
      Cyrill Gorcunov 提交于
      Due to restriction and specifics of Netburst PMU we need a separated
      event for NMI watchdog. In particular every Netburst event
      consumes not just a counter and a config register, but also an
      additional ESCR register.
      
      Since ESCR registers are grouped upon counters (i.e. if ESCR is occupied
      for some event there is no room for another event to enter until its
      released) we need to pick up the "least" used ESCR (or the most available
      one) for nmi-watchdog purposes -- so MSR_P4_CRU_ESCR2/3 was chosen.
      
      With this patch nmi-watchdog and perf top should be able to run simultaneously.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      CC: Lin Ming <ming.m.lin@intel.com>
      CC: Arnaldo Carvalho de Melo <acme@redhat.com>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      Tested-and-reviewed-by: NDon Zickus <dzickus@redhat.com>
      Tested-and-reviewed-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110623124918.GC13050@sunSigned-off-by: NIngo Molnar <mingo@elte.hu>
      1880c4ae
  2. 12 5月, 2011 1 次提交
  3. 27 4月, 2011 1 次提交
    • D
      perf, x86, nmi: Move LVT un-masking into irq handlers · 2bce5dac
      Don Zickus 提交于
      It was noticed that P4 machines were generating double NMIs for
      each perf event.  These extra NMIs lead to 'Dazed and confused'
      messages on the screen.
      
      I tracked this down to a P4 quirk that said the overflow bit had
      to be cleared before re-enabling the apic LVT mask.  My first
      attempt was to move the un-masking inside the perf nmi handler
      from before the chipset NMI handler to after.
      
      This broke Nehalem boxes that seem to like the unmasking before
      the counters themselves are re-enabled.
      
      In order to keep this change simple for 2.6.39, I decided to
      just simply move the apic LVT un-masking to the beginning of all
      the chipset NMI handlers, with the exception of Pentium4's to
      fix the double NMI issue.
      
      Later on we can move the un-masking to later in the handlers to
      save a number of 'extra' NMIs on those particular chipsets.
      
      I tested this change on a P4 machine, an AMD machine, a Nehalem
      box, and a core2quad box.  'perf top' worked correctly along
      with various other small 'perf record' runs.  Anything high
      stress breaks all the machines but that is a different problem.
      
      Thanks to various people for testing different versions of this
      patch.
      Reported-and-tested-by: NShaun Ruffell <sruffell@digium.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Link: http://lkml.kernel.org/r/1303900353-10242-1-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      CC: Cyrill Gorcunov <gorcunov@gmail.com>
      2bce5dac
  4. 26 4月, 2011 1 次提交
  5. 22 4月, 2011 1 次提交
    • I
      x86, perf event: Turn off unstructured raw event access to offcore registers · b52c55c6
      Ingo Molnar 提交于
      Andi Kleen pointed out that the Intel offcore support patches were merged
      without user-space tool support to the functionality:
      
       |
       | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
       | user space bits were not. This made it impossible to set the extra mask
       | and actually do the OFFCORE profiling
       |
      
      Andi submitted a preliminary patch for user-space support, as an
      extension to perf's raw event syntax:
      
       |
       | Some raw events -- like the Intel OFFCORE events -- support additional
       | parameters. These can be appended after a ':'.
       |
       | For example on a multi socket Intel Nehalem:
       |
       |    perf stat -e r1b7:20ff -a sleep 1
       |
       | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
       | that measures any access to DRAM on another socket.
       |
      
      But this kind of usability is absolutely unacceptable - users should not
      be expected to type in magic, CPU and model specific incantations to get
      access to useful hardware functionality.
      
      The proper solution is to expose useful offcore functionality via
      generalized events - that way users do not have to care which specific
      CPU model they are using, they can use the conceptual event and not some
      model specific quirky hexa number.
      
      We already have such generalization in place for CPU cache events,
      and it's all very extensible.
      
      "Offcore" events measure general DRAM access patters along various
      parameters. They are particularly useful in NUMA systems.
      
      We want to support them via generalized DRAM events: either as the
      fourth level of cache (after the last-level cache), or as a separate
      generalization category.
      
      That way user-space support would be very obvious, memory access
      profiling could be done via self-explanatory commands like:
      
        perf record -e dram ./myapp
        perf record -e dram-remote ./myapp
      
      ... to measure DRAM accesses or more expensive cross-node NUMA DRAM
      accesses.
      
      These generalized events would work on all CPUs and architectures that
      have comparable PMU features.
      
      ( Note, these are just examples: actual implementation could have more
        sophistication and more parameter - as long as they center around
        similarly simple usecases. )
      
      Now we do not want to revert *all* of the current offcore bits, as they
      are still somewhat useful for generic last-level-cache events, implemented
      in this commit:
      
        e994d7d2: perf: Fix LLC-* events on Intel Nehalem/Westmere
      
      But we definitely do not yet want to expose the unstructured raw events
      to user-space, until better generalization and usability is implemented
      for these hardware event features.
      
      ( Note: after generalization has been implemented raw offcore events can be
        supported as well: there can always be an odd event that is marginally
        useful but not useful enough to generalize. DRAM profiling is definitely
        *not* such a category so generalization must be done first. )
      
      Furthermore, PERF_TYPE_RAW access to these registers was not intended
      to go upstream without proper support - it was a side-effect of the above
      e994d7d2 commit, not mentioned in the changelog.
      
      As v2.6.39 is nearing release we go for the simplest approach: disable
      the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
      kernel and becomes an ABI.
      
      Once proper structure is implemented for these hardware events and users
      are offered usable solutions we can revisit this issue.
      Reported-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      b52c55c6
  6. 19 4月, 2011 1 次提交
  7. 25 3月, 2011 1 次提交
    • I
      perf, x86: Complain louder about BIOSen corrupting CPU/PMU state and continue · 45daae57
      Ingo Molnar 提交于
      Eric Dumazet reported that hardware PMU events do not work on his
      system, due to the BIOS corrupting PMU state:
      
          Performance Events: PEBS fmt0+, Core2 events, Broken BIOS detected, using software events only.
          [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 43003c)
      
      Linus suggested that we continue in the face of such BIOS-induced CPU
      state corruption:
      
         http://lkml.org/lkml/2011/3/24/608
      
      Such BIOSes will have to be fixed - Linux developers rely on a working and
      fully capable PMU and the BIOS interfering with the CPU's PMU state is simply
      not acceptable.
      
      So this patch changes perf to continue when it detects such BIOS
      interaction, some hardware events may be unreliable due to the BIOS
      writing and re-writing them - there's not much the kernel can do
      about that but to detect the corruption and report it.
      Reported-and-tested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      45daae57
  8. 20 3月, 2011 1 次提交
    • S
      perf, x86: Fix Intel fixed counters base initialization · fc66c521
      Stephane Eranian 提交于
      The following patch solves the problems introduced by Robert's
      commit 41bf4989 and reported by Arun Sharma. This commit gets rid
      of the base + index notation for reading and writing PMU msrs.
      
      The problem is that for fixed counters, the new calculation for
      the base did not take into account the fixed counter indexes,
      thus all fixed counters were read/written from fixed counter 0.
      Although all fixed counters share the same config MSR, they each
      have their own counter register.
      
      Without:
      
       $ task -e unhalted_core_cycles -e instructions_retired -e baclears noploop 1 noploop for 1 seconds
      
        242202299 unhalted_core_cycles (0.00% scaling, ena=1000790892, run=1000790892)
       2389685946 instructions_retired (0.00% scaling, ena=1000790892, run=1000790892)
            49473 baclears             (0.00% scaling, ena=1000790892, run=1000790892)
      
      With:
      
       $ task -e unhalted_core_cycles -e instructions_retired -e baclears noploop 1 noploop for 1 seconds
      
       2392703238 unhalted_core_cycles (0.00% scaling, ena=1000840809, run=1000840809)
       2389793744 instructions_retired (0.00% scaling, ena=1000840809, run=1000840809)
            47863 baclears             (0.00% scaling, ena=1000840809, run=1000840809)
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: peterz@infradead.org
      Cc: ming.m.lin@intel.com
      Cc: robert.richter@amd.com
      Cc: asharma@fb.com
      Cc: perfmon2-devel@lists.sf.net
      LKML-Reference: <20110319172005.GB4978@quad>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fc66c521
  9. 18 3月, 2011 2 次提交
    • N
      x86, dumpstack: Correct stack dump info when frame pointer is available · e8e999cf
      Namhyung Kim 提交于
      Current stack dump code scans entire stack and check each entry
      contains a pointer to kernel code. If CONFIG_FRAME_POINTER=y it
      could mark whether the pointer is valid or not based on value of
      the frame pointer. Invalid entries could be preceded by '?' sign.
      
      However this was not going to happen because scan start point
      was always higher than the frame pointer so that they could not
      meet.
      
      Commit 9c0729dc ("x86: Eliminate bp argument from the stack
      tracing routines") delayed bp acquisition point, so the bp was
      read in lower frame, thus all of the entries were marked
      invalid.
      
      This patch fixes this by reverting above commit while retaining
      stack_frame() helper as suggested by Frederic Weisbecker.
      
      End result looks like below:
      
      before:
      
       [    3.508329] Call Trace:
       [    3.508551]  [<ffffffff814f35c9>] ? panic+0x91/0x199
       [    3.508662]  [<ffffffff814f3739>] ? printk+0x68/0x6a
       [    3.508770]  [<ffffffff81a981b2>] ? mount_block_root+0x257/0x26e
       [    3.508876]  [<ffffffff81a9821f>] ? mount_root+0x56/0x5a
       [    3.508975]  [<ffffffff81a98393>] ? prepare_namespace+0x170/0x1a9
       [    3.509216]  [<ffffffff81a9772b>] ? kernel_init+0x1d2/0x1e2
       [    3.509335]  [<ffffffff81003894>] ? kernel_thread_helper+0x4/0x10
       [    3.509442]  [<ffffffff814f6880>] ? restore_args+0x0/0x30
       [    3.509542]  [<ffffffff81a97559>] ? kernel_init+0x0/0x1e2
       [    3.509641]  [<ffffffff81003890>] ? kernel_thread_helper+0x0/0x10
      
      after:
      
       [    3.522991] Call Trace:
       [    3.523351]  [<ffffffff814f35b9>] panic+0x91/0x199
       [    3.523468]  [<ffffffff814f3729>] ? printk+0x68/0x6a
       [    3.523576]  [<ffffffff81a981b2>] mount_block_root+0x257/0x26e
       [    3.523681]  [<ffffffff81a9821f>] mount_root+0x56/0x5a
       [    3.523780]  [<ffffffff81a98393>] prepare_namespace+0x170/0x1a9
       [    3.523885]  [<ffffffff81a9772b>] kernel_init+0x1d2/0x1e2
       [    3.523987]  [<ffffffff81003894>] kernel_thread_helper+0x4/0x10
       [    3.524228]  [<ffffffff814f6880>] ? restore_args+0x0/0x30
       [    3.524345]  [<ffffffff81a97559>] ? kernel_init+0x0/0x1e2
       [    3.524445]  [<ffffffff81003890>] ? kernel_thread_helper+0x0/0x10
      
       -v5:
         * fix build breakage with oprofile
      
       -v4:
         * use 0 instead of regs->bp
         * separate out printk changes
      
       -v3:
         * apply comment from Frederic
         * add a couple of printk fixes
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Soren Sandmann <ssp@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Robert Richter <robert.richter@amd.com>
      LKML-Reference: <1300416006-3163-1-git-send-email-namhyung@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e8e999cf
    • L
      x86: Fix common misspellings · 0d2eb44f
      Lucas De Marchi 提交于
      They were generated by 'codespell' and then manually reviewed.
      Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
      Cc: trivial@kernel.org
      LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0d2eb44f
  10. 16 3月, 2011 1 次提交
  11. 05 3月, 2011 1 次提交
  12. 04 3月, 2011 2 次提交
    • A
      perf: Fix LLC-* events on Intel Nehalem/Westmere · e994d7d2
      Andi Kleen 提交于
      On Intel Nehalem and Westmere CPUs the generic perf LLC-* events count the
      L2 caches, not the real L3 LLC - this was inconsistent with behavior on
      other CPUs.
      
      Fixing this requires the use of the special OFFCORE_RESPONSE
      events which need a separate mask register.
      
      This has been implemented by the previous patch, now use this infrastructure
      to set correct events for the LLC-* on Nehalem and Westmere.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1299119690-13991-3-git-send-email-ming.m.lin@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e994d7d2
    • A
      perf: Add support for supplementary event registers · a7e3ed1e
      Andi Kleen 提交于
      Change logs against Andi's original version:
      
      - Extends perf_event_attr:config to config{,1,2} (Peter Zijlstra)
      - Fixed a major event scheduling issue. There cannot be a ref++ on an
        event that has already done ref++ once and without calling
        put_constraint() in between. (Stephane Eranian)
      - Use thread_cpumask for percore allocation. (Lin Ming)
      - Use MSR names in the extra reg lists. (Lin Ming)
      - Remove redundant "c = NULL" in intel_percore_constraints
      - Fix comment of perf_event_attr::config1
      
      Intel Nehalem/Westmere have a special OFFCORE_RESPONSE event
      that can be used to monitor any offcore accesses from a core.
      This is a very useful event for various tunings, and it's
      also needed to implement the generic LLC-* events correctly.
      
      Unfortunately this event requires programming a mask in a separate
      register. And worse this separate register is per core, not per
      CPU thread.
      
      This patch:
      
      - Teaches perf_events that OFFCORE_RESPONSE needs extra parameters.
        The extra parameters are passed by user space in the
        perf_event_attr::config1 field.
      
      - Adds support to the Intel perf_event core to schedule per
        core resources. This adds fairly generic infrastructure that
        can be also used for other per core resources.
        The basic code has is patterned after the similar AMD northbridge
        constraints code.
      
      Thanks to Stephane Eranian who pointed out some problems
      in the original version and suggested improvements.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1299119690-13991-2-git-send-email-ming.m.lin@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a7e3ed1e
  13. 02 3月, 2011 1 次提交
    • L
      perf, x86: Add Intel SandyBridge CPU support · b06b3d49
      Lin Ming 提交于
      This patch adds basic SandyBridge support, including hardware
      cache events and PEBS events support.
      
      It has been tested on SandyBridge CPUs with perf stat and also
      with PEBS based profiling - both work fine.
      
      The patch does not affect other models.
      
      v2 -> v3:
       - fix PEBS event 0xd0 with right umask combinations
       - move snb pebs constraint assignment to intel_pmu_init
      
      v1 -> v2:
       - add more raw and PEBS events constraints
       - use offcore events for LLC-* cache events
       - remove the call to Nehalem workaround enable_all function
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      LKML-Reference: <1299072424.2175.24.camel@localhost>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b06b3d49
  14. 16 2月, 2011 4 次提交
  15. 28 1月, 2011 1 次提交
  16. 07 1月, 2011 2 次提交
    • D
      x86, NMI: Remove DIE_NMI_IPI · c410b830
      Don Zickus 提交于
      With priorities in place and no one really understanding the difference between
      DIE_NMI and DIE_NMI_IPI, just remove DIE_NMI_IPI and convert everyone to DIE_NMI.
      
      This also simplifies default_do_nmi() a little bit.  Instead of calling the
      die_notifier in both the if and else part, just pull it out and call it before
      the if-statement.  This has the side benefit of avoiding a call to the ioport
      to see if there is an external NMI sitting around until after the (more frequent)
      internal NMIs are dealt with.
      Patch-Inspired-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1294348732-15030-5-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c410b830
    • D
      x86, NMI: Add priorities to handlers · 166d7514
      Don Zickus 提交于
      In order to consolidate the NMI die_chain events, we need to setup the priorities
      for the die notifiers.
      
      I started by defining a bunch of common priorities that can be used by the
      notifier blocks.  Then I modified the notifier blocks to use the newly created
      priorities.
      
      Now that the priorities are straightened out, it should be easier to remove the
      event DIE_NMI_IPI.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1294348732-15030-4-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      166d7514
  17. 30 12月, 2010 1 次提交
  18. 16 12月, 2010 2 次提交
    • P
      perf: Dynamic pmu types · 2e80a82a
      Peter Zijlstra 提交于
      Extend the perf_pmu_register() interface to allow for named and
      dynamic pmu types.
      
      Because we need to support the existing static types we cannot use
      dynamic types for everything, hence provide a type argument.
      
      If we want to enumerate the PMUs they need a name, provide one.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101117222056.259707703@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2e80a82a
    • P
      perf, x86: Detect broken BIOSes that corrupt the PMU · 4407204c
      Peter Zijlstra 提交于
      Some BIOSes use PMU resources, which can cause various bugs:
      
       - Non-working or erratic PMU based statistics - the PMU can end up
         counting the wrong thing, resulting in misleading statistics
      
       - Profiling can stop working or it can profile the wrong thing
      
       - A non-working or erratic NMI watchdog that cannot be relied on
      
       - The kernel may disturb whatever thing the BIOS tries to use the
         PMU for - possibly causing hardware malfunction in extreme cases.
      
       - ... and other forms of potential misbehavior
      
      Various forms of such misbehavior has been observed in practice - there are
      BIOSes that just corrupt the PMU state, consequences be damned.
      
      The PMU is a CPU resource that is handled by the kernel and the BIOS
      stealing+corrupting it is not acceptable nor robust, so we detect it,
      warn about it and further refuse to touch the PMU ourselves.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4407204c
  19. 26 11月, 2010 3 次提交
  20. 18 11月, 2010 2 次提交
    • S
      x86: Eliminate bp argument from the stack tracing routines · 9c0729dc
      Soeren Sandmann Pedersen 提交于
      The various stack tracing routines take a 'bp' argument in which the
      caller is supposed to provide the base pointer to use, or 0 if doesn't
      have one. Since bp is garbage whenever CONFIG_FRAME_POINTER is not
      defined, this means all callers in principle should either always pass
      0, or be conditional on CONFIG_FRAME_POINTER.
      
      However, there are only really three use cases for stack tracing:
      
      (a) Trace the current task, including IRQ stack if any
      (b) Trace the current task, but skip IRQ stack
      (c) Trace some other task
      
      In all cases, if CONFIG_FRAME_POINTER is not defined, bp should just
      be 0.  If it _is_ defined, then
      
      - in case (a) bp should be gotten directly from the CPU's register, so
        the caller should pass NULL for regs,
      
      - in case (b) the caller should should pass the IRQ registers to
        dump_trace(),
      
      - in case (c) bp should be gotten from the top of the task's stack, so
        the caller should pass NULL for regs.
      
      Hence, the bp argument is not necessary because the combination of
      task and regs is sufficient to determine an appropriate value for bp.
      
      This patch introduces a new inline function stack_frame(task, regs)
      that computes the desired bp. This function is then called from the
      two versions of dump_stack().
      Signed-off-by: NSoren Sandmann <ssp@redhat.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arjan van de Ven <arjan@infradead.org>,
      Cc: Frederic Weisbecker <fweisbec@gmail.com>,
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>,
      LKML-Reference: <m3oc9rop28.fsf@dhcp-100-3-82.bos.redhat.com>>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      9c0729dc
    • D
      x86, nmi_watchdog: Remove all stub function calls from old nmi_watchdog · 072b198a
      Don Zickus 提交于
      Now that the bulk of the old nmi_watchdog is gone, remove all
      the stub variables and hooks associated with it.
      
      This touches lots of files mainly because of how the io_apic
      nmi_watchdog was implemented.  Now that the io_apic nmi_watchdog
      is forever gone, remove all its fingers.
      
      Most of this code was not being exercised by virtue of
      nmi_watchdog != NMI_IO_APIC, so there shouldn't be anything to
      risky here.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: fweisbec@gmail.com
      Cc: gorcunov@openvz.org
      LKML-Reference: <1289578944-28564-3-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      072b198a
  21. 27 10月, 2010 1 次提交
  22. 22 10月, 2010 3 次提交
  23. 19 10月, 2010 1 次提交
    • P
      irq_work: Add generic hardirq context callbacks · e360adbe
      Peter Zijlstra 提交于
      Provide a mechanism that allows running code in IRQ context. It is
      most useful for NMI code that needs to interact with the rest of the
      system -- like wakeup a task to drain buffers.
      
      Perf currently has such a mechanism, so extract that and provide it as
      a generic feature, independent of perf so that others may also
      benefit.
      
      The IRQ context callback is generated through self-IPIs where
      possible, or on architectures like powerpc the decrementer (the
      built-in timer facility) is set to generate an interrupt immediately.
      
      Architectures that don't have anything like this get to do with a
      callback from the timer tick. These architectures can call
      irq_work_run() at the tail of any IRQ handlers that might enqueue such
      work (like the perf IRQ handler) to avoid undue latencies in
      processing the work.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      [ various fixes ]
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e360adbe
  24. 05 10月, 2010 1 次提交
  25. 24 9月, 2010 1 次提交
    • R
      perf, x86: Catch spurious interrupts after disabling counters · 63e6be6d
      Robert Richter 提交于
      Some cpus still deliver spurious interrupts after disabling a
      counter. This caused 'undelivered NMI' messages. This patch
      fixes this. Introduced by:
      
        4177c42a: perf, x86: Try to handle unknown nmis with an enabled PMU
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: gorcunov@gmail.com <gorcunov@gmail.com>
      Cc: fweisbec@gmail.com <fweisbec@gmail.com>
      Cc: ying.huang@intel.com <ying.huang@intel.com>
      Cc: ming.m.lin@intel.com <ming.m.lin@intel.com>
      Cc: yinghai@kernel.org <yinghai@kernel.org>
      Cc: andi@firstfloor.org <andi@firstfloor.org>
      Cc: eranian@google.com <eranian@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <20100915162034.GO13563@erda.amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      63e6be6d
  26. 10 9月, 2010 2 次提交
    • P
      perf: Remove the sysfs bits · 15ac9a39
      Peter Zijlstra 提交于
      Neither the overcommit nor the reservation sysfs parameter were
      actually working, remove them as they'll only get in the way.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus <paulus@samba.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      15ac9a39
    • P
      perf: Rework the PMU methods · a4eaf7f1
      Peter Zijlstra 提交于
      Replace pmu::{enable,disable,start,stop,unthrottle} with
      pmu::{add,del,start,stop}, all of which take a flags argument.
      
      The new interface extends the capability to stop a counter while
      keeping it scheduled on the PMU. We replace the throttled state with
      the generic stopped state.
      
      This also allows us to efficiently stop/start counters over certain
      code paths (like IRQ handlers).
      
      It also allows scheduling a counter without it starting, allowing for
      a generic frozen state (useful for rotating stopped counters).
      
      The stopped state is implemented in two different ways, depending on
      how the architecture implemented the throttled state:
      
       1) We disable the counter:
          a) the pmu has per-counter enable bits, we flip that
          b) we program a NOP event, preserving the counter state
      
       2) We store the counter state and ignore all read/overflow events
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus <paulus@samba.org>
      Cc: stephane eranian <eranian@googlemail.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      Cc: Yanmin <yanmin_zhang@linux.intel.com>
      Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Michael Cree <mcree@orcon.net.nz>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a4eaf7f1