1. 11 6月, 2009 6 次提交
  2. 06 6月, 2009 1 次提交
    • I
      perf_counter: Separate out attr->type from attr->config · a21ca2ca
      Ingo Molnar 提交于
      Counter type is a frequently used value and we do a lot of
      bit juggling by encoding and decoding it from attr->config.
      
      Clean this up by creating a separate attr->type field.
      
      Also clean up the various similarly complex user-space bits
      all around counter attribute management.
      
      The net improvement is significant, and it will be easier
      to add a new major type (which is what triggered this cleanup).
      
      (This changes the ABI, all tools are adapted.)
      (PowerPC build-tested.)
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a21ca2ca
  3. 04 6月, 2009 1 次提交
  4. 03 6月, 2009 4 次提交
    • P
      perf_counter: powerpc: Fix race causing "oops trying to read PMC0" errors · dcd945e0
      Paul Mackerras 提交于
      When using interrupting counters and limited (non-interrupting)
      counters at the same time, it's possible that we get an
      interrupt in write_mmcr0() after writing MMCR0 but before we
      have set up the counters using limited PMCs.  What happens then
      is that we get into perf_counter_interrupt() with
      counter->hw.idx = 0 for the limited counters, leading to the
      "oops trying to read PMC0" error message being printed.
      
      This fixes the problem by making perf_counter_interrupt()
      robust against counter->hw.idx being zero (the counter is just
      ignored in that case) and also by changing write_mmcr0() to
      write MMCR0 initially with the counter overflow interrupt
      enable bits masked (set to 0).  If the MMCR0 value requested by
      the caller has either of those bits set, we write MMCR0 again
      with the requested value of those bits after setting up the
      limited counters properly.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Stephane Eranian <eranian@googlemail.com>
      LKML-Reference: <18982.17684.138182.954599@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dcd945e0
    • P
      perf_counter: powerpc: Fix event alternative code generation on POWER5/5+ · 6984efb6
      Paul Mackerras 提交于
      Commit ef923214 ("perf_counter: powerpc: use u64 for event
      codes internally") introduced a bug where the return value from
      function find_alternative_bdecode gets put into a u64 variable
      and later tested to see if it is < 0.  The effect is that we
      get extra, bogus event code alternatives on POWER5 and POWER5+,
      leading to error messages such as "oops compute_mmcr failed"
      being printed and counters not counting properly.
      
      This fixes it by using s64 for the return type of
      find_alternative_bdecode and for the local variable that the
      caller puts the value in.  It also makes the event argument a
      u64 on POWER5+ for consistency.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Stephane Eranian <eranian@googlemail.com>
      LKML-Reference: <18982.17586.666132.90983@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6984efb6
    • P
      perf_counter: Rename perf_counter_hw_event => perf_counter_attr · 0d48696f
      Peter Zijlstra 提交于
      The structure isn't hw only and when I read event, I think about those
      things that fall out the other end. Rename the thing.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Stephane Eranian <eranian@googlemail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0d48696f
    • P
      perf_counter: Rename various fields · b23f3325
      Peter Zijlstra 提交于
      A few renames:
      
        s/irq_period/sample_period/
        s/irq_freq/sample_freq/
        s/PERF_RECORD_/PERF_SAMPLE_/
        s/record_type/sample_type/
      
      And change both the new sample_type and read_format to u64.
      Reported-by: NStephane Eranian <eranian@googlemail.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b23f3325
  5. 27 5月, 2009 1 次提交
    • B
      powerpc: Fix up dma_alloc_coherent() on platforms without cache coherency. · 8b31e49d
      Benjamin Herrenschmidt 提交于
      The implementation we just revived has issues, such as using a
      Kconfig-defined virtual address area in kernel space that nothing
      actually carves out (and thus will overlap whatever is there),
      or having some dependencies on being self contained in a single
      PTE page which adds unnecessary constraints on the kernel virtual
      address space.
      
      This fixes it by using more classic PTE accessors and automatically
      locating the area for consistent memory, carving an appropriate hole
      in the kernel virtual address space, leaving only the size of that
      area as a Kconfig option. It also brings some dma-mask related fixes
      from the ARM implementation which was almost identical initially but
      grew its own fixes.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8b31e49d
  6. 26 5月, 2009 1 次提交
    • P
      perf_counter: powerpc: Implement interrupt throttling · 8a7b8cb9
      Paul Mackerras 提交于
      This implements interrupt throttling on powerpc.  Since we don't have
      individual count enable/disable or interrupt enable/disable controls
      per counter, this simply sets the hardware counter to 0, meaning that
      it will not interrupt again until it has counted 2^31 counts, which
      will take at least 2^30 cycles assuming a maximum of 2 counts per
      cycle.  Also, we set counter->hw.period_left to the maximum possible
      value (2^63 - 1), so we won't report overflows for this counter for
      the forseeable future.
      
      The unthrottle operation restores counter->hw.period_left and the
      hardware counter so that we will once again report a counter overflow
      after counter->hw.irq_period counts.
      
      [ Impact: new perfcounters robustness feature on PowerPC ]
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      LKML-Reference: <18971.35823.643362.446774@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8a7b8cb9
  7. 18 5月, 2009 4 次提交
  8. 15 5月, 2009 7 次提交
    • P
      perf_counter: powerpc: supply more precise information on counter overflow events · 0bbd0d4b
      Paul Mackerras 提交于
      This uses values from the MMCRA, SIAR and SDAR registers on
      powerpc to supply more precise information for overflow events,
      including a data address when PERF_RECORD_ADDR is specified.
      
      Since POWER6 uses different bit positions in MMCRA from earlier
      processors, this converts the struct power_pmu limited_pmc5_6
      field, which only had 0/1 values, into a flags field and
      defines bit values for its previous use (PPMU_LIMITED_PMC5_6)
      and a new flag (PPMU_ALT_SIPR) to indicate that the processor
      uses the POWER6 bit positions rather than the earlier
      positions.  It also adds definitions in reg.h for the new and
      old positions of the bit that indicates that the SIAR and SDAR
      values come from the same instruction.
      
      For the data address, the SDAR value is supplied if we are not
      doing instruction sampling.  In that case there is no guarantee
      that the address given in the PERF_RECORD_ADDR subrecord will
      correspond to the instruction whose address is given in the
      PERF_RECORD_IP subrecord.
      
      If instruction sampling is enabled (e.g. because this counter
      is counting a marked instruction event), then we only supply
      the SDAR value for the PERF_RECORD_ADDR subrecord if it
      corresponds to the instruction whose address is in the
      PERF_RECORD_IP subrecord.  Otherwise we supply 0.
      
      [ Impact: support more PMU hardware features on PowerPC ]
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <18955.37028.48861.555309@drongo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0bbd0d4b
    • P
      perf_counter: powerpc: use u64 for event codes internally · ef923214
      Paul Mackerras 提交于
      Although the perf_counter API allows 63-bit raw event codes,
      internally in the powerpc back-end we had been using 32-bit
      event codes.  This expands them to 64 bits so that we can add
      bits for specifying threshold start/stop events and instruction
      sampling modes later.
      
      This also corrects the return value of can_go_on_limited_pmc;
      we were returning an event code rather than just a 0/1 value in
      some circumstances. That didn't particularly matter while event
      codes were 32-bit, but now that event codes are 64-bit it
      might, so this fixes it.
      
      [ Impact: extend PowerPC perfcounter interfaces from u32 to u64 ]
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <18955.36874.472452.353104@drongo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ef923214
    • P
      perf_counter: frequency based adaptive irq_period · 60db5e09
      Peter Zijlstra 提交于
      Instead of specifying the irq_period for a counter, provide a target interrupt
      frequency and dynamically adapt the irq_period to match this frequency.
      
      [ Impact: new perf-counter attribute/feature ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <20090515132018.646195868@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60db5e09
    • P
      perf_counter: Rework the perf counter disable/enable · 9e35ad38
      Peter Zijlstra 提交于
      The current disable/enable mechanism is:
      
      	token = hw_perf_save_disable();
      	...
      	/* do bits */
      	...
      	hw_perf_restore(token);
      
      This works well, provided that the use nests properly. Except we don't.
      
      x86 NMI/INT throttling has non-nested use of this, breaking things. Therefore
      provide a reference counter disable/enable interface, where the first disable
      disables the hardware, and the last enable enables the hardware again.
      
      [ Impact: refactor, simplify the PMU disable/enable logic ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9e35ad38
    • B
      powerpc: Fix PCI ROM access · ad892a63
      Benjamin Herrenschmidt 提交于
      A couple of issues crept in since about 2.6.27 related to accessing PCI
      device ROMs on various powerpc machines.
      
      First, historically, we don't allocate the ROM resource in the resource
      tree. I'm not entirely certain of why, I susepct they often contained
      garbage on x86 but it's hard to tell. This causes the current generic
      code to always call pci_assign_resource() when trying to access the said
      ROM from sysfs, which will try to re-assign some new address regardless
      of what the ROM BAR was already set to at boot time. This can be a
      problem on hypervisor platforms like pSeries where we aren't supposed
      to move PCI devices around (and in fact probably can't).
      
      Second, our code that generates the PCI tree from the OF device-tree
      (instead of doing config space probing) which we mostly use on pseries
      at the moment, didn't set the (new) flag IORESOURCE_SIZEALIGN on any
      resource. That means that any attempt at re-assigning such a resource
      with pci_assign_resource() would fail due to resource_alignment()
      returning 0.
      
      This fixes this by doing these two things:
      
       - The code that calculates resource flags based on the OF device-node
      is improved to set IORESOURCE_SIZEALIGN on any valid BAR, and while at
      it also set IORESOURCE_READONLY for ROMs since we were lacking that too
      
       - We now allocate ROM resources as part of the resource tree. However
      to limit the chances of nasty conflicts due to busted firmwares, we
      only do it on the second pass of our two-passes allocation scheme,
      so that all valid and enabled BARs get precedence.
      
      This brings pSeries back the ability to access PCI ROMs via sysfs (and
      thus initialize various video cards from X etc...).
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ad892a63
    • B
      powerpc/pseries: Really fix the oprofile CPU type on pseries · b173f03d
      Benjamin Herrenschmidt 提交于
      My previous pach for fixing the oprofile CPU type got somewhat mismerged
      (by my fault) when it collided with another related patch. This should
      finally (fingers crossed) fix the whole thing.
      
      We make sure we keep the -old- oprofile type and CPU type whenever
      one of them was specified in the first pass through the function.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b173f03d
    • B
      powerpc: Allow mem=x cmdline to work with 4G+ · 49a84965
      Becky Bruce 提交于
      We're currently choking on mem=4g (and above) due to memory_limit
      being specified as an unsigned long. Make memory_limit
      phys_addr_t to fix this.
      Signed-off-by: NBecky Bruce <beckyb@kernel.crashing.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      49a84965
  9. 01 5月, 2009 2 次提交
  10. 29 4月, 2009 3 次提交
    • P
      perf_counter: powerpc: allow use of limited-function counters · ab7ef2e5
      Paul Mackerras 提交于
      POWER5+ and POWER6 have two hardware counters with limited functionality:
      PMC5 counts instructions completed in run state and PMC6 counts cycles
      in run state.  (Run state is the state when a hardware RUN bit is 1;
      the idle task clears RUN while waiting for work to do and sets it when
      there is work to do.)
      
      These counters can't be written to by the kernel, can't generate
      interrupts, and don't obey the freeze conditions.  That means we can
      only use them for per-task counters (where we know we'll always be in
      run state; we can't put a per-task counter on an idle task), and only
      if we don't want interrupts and we do want to count in all processor
      modes.
      
      Obviously some counters can't go on a limited hardware counter, but there
      are also situations where we can only put a counter on a limited hardware
      counter - if there are already counters on that exclude some processor
      modes and we want to put on a per-task cycle or instruction counter that
      doesn't exclude any processor mode, it could go on if it can use a
      limited hardware counter.
      
      To keep track of these constraints, this adds a flags argument to the
      processor-specific get_alternatives() functions, with three bits defined:
      one to say that we can accept alternative event codes that go on limited
      counters, one to say we only want alternatives on limited counters, and
      one to say that this is a per-task counter and therefore events that are
      gated by run state are equivalent to those that aren't (e.g. a "cycles"
      event is equivalent to a "cycles in run state" event).  These flags
      are computed for each counter and stored in the counter->hw.counter_base
      field (slightly wonky name for what it does, but it was an existing
      unused field).
      
      Since the limited counters don't freeze when we freeze the other counters,
      we need some special handling to avoid getting skew between things counted
      on the limited counters and those counted on normal counters.  To minimize
      this skew, if we are using any limited counters, we read PMC5 and PMC6
      immediately after setting and clearing the freeze bit.  This is done in
      a single asm in the new write_mmcr0() function.
      
      The code here is specific to PMC5 and PMC6 being the limited hardware
      counters.  Being more general (e.g. having a bitmap of limited hardware
      counter numbers) would have meant more complex code to read the limited
      counters when freezing and unfreezing the normal counters, with
      conditional branches, which would have increased the skew.  Since it
      isn't necessary for the code to be more general at this stage, it isn't.
      
      This also extends the back-ends for POWER5+ and POWER6 to be able to
      handle up to 6 counters rather than the 4 they previously handled.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      LKML-Reference: <18936.19035.163066.892208@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ab7ef2e5
    • R
      perfcounters: rename struct hw_perf_counter_ops into struct pmu · 4aeb0b42
      Robert Richter 提交于
      This patch renames struct hw_perf_counter_ops into struct pmu. It
      introduces a structure to describe a cpu specific pmu (performance
      monitoring unit). It may contain ops and data. The new name of the
      structure fits better, is shorter, and thus better to handle. Where it
      was appropriate, names of function and variable have been changed too.
      
      [ Impact: cleanup ]
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1241002046-8832-7-git-send-email-robert.richter@amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4aeb0b42
    • T
      powerpc: Revert switch to TEXT_TEXT in linker script · 13beadd9
      Tim Abbott 提交于
      Commit edada399 broke the build on 64-bit powerpc because it moved the
      __ftr_alt_* sections of a file away from the .text section, causing
      link failures due to relative conditional branch targets being too far
      away from the branch instructions.  This happens on pretty much all
      64-bit powerpc configs.
      
      This change reverts commit edada399 while preserving the update from
      the *.refok sections to .ref.text that has happened since.
      Signed-off-by: NTim Abbott <tabbott@mit.edu>
      Requested-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13beadd9
  11. 28 4月, 2009 1 次提交
  12. 27 4月, 2009 1 次提交
  13. 23 4月, 2009 1 次提交
  14. 22 4月, 2009 1 次提交
  15. 21 4月, 2009 1 次提交
  16. 09 4月, 2009 2 次提交
    • P
      perf_counter: powerpc: add nmi_enter/nmi_exit calls · ca8f2d7f
      Paul Mackerras 提交于
      Impact: fix potential deadlocks on powerpc
      
      Now that the core is using in_nmi() (added in e30e08f6, "perf_counter:
      fix NMI race in task clock"), we need the powerpc perf_counter_interrupt
      to call nmi_enter() and nmi_exit() in those cases where the interrupt
      happens when interrupts are soft-disabled.
      
      If interrupts were soft-enabled, we can treat it as a regular interrupt
      and do irq_enter/irq_exit around the whole routine. This lets us get rid
      of the test_perf_counter_pending() call at the end of
      perf_counter_interrupt, thus simplifying things a little.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <18909.31952.873098.336615@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ca8f2d7f
    • P
      perf_counter: allow for data addresses to be recorded · 78f13e95
      Peter Zijlstra 提交于
      Paul suggested we allow for data addresses to be recorded along with
      the traditional IPs as power can provide these.
      
      For now, only the software pagefault events provide data addresses,
      but in the future power might as well for some events.
      
      x86 doesn't seem capable of providing this atm.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      LKML-Reference: <20090408130409.394816925@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      78f13e95
  17. 08 4月, 2009 2 次提交
    • P
      perf_counter: powerpc: set sample enable bit for marked instruction events · f708223d
      Paul Mackerras 提交于
      Impact: enable access to hardware feature
      
      POWER processors have the ability to "mark" a subset of the instructions
      and provide more detailed information on what happens to the marked
      instructions as they flow through the pipeline.  This marking is
      enabled by the "sample enable" bit in MMCRA, and there are
      synchronization requirements around setting and clearing the bit.
      
      This adds logic to the processor-specific back-ends so that they know
      which events relate to marked instructions and set the sampling enable
      bit if any event that we want to put on the PMU is a marked instruction
      event.  It also adds logic to the generic powerpc code to do the
      necessary synchronization if that bit is set.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <18908.31930.1024.228867@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f708223d
    • P
      perf_counter: fix powerpc build · dc66270b
      Paul Mackerras 提交于
      Commit 4af4998b ("perf_counter: rework context time") changed struct
      perf_counter_context to have a 'time' field instead of a 'time_now'
      field, but neglected to fix the place in the powerpc perf_counter.c
      where the time_now field was accessed.  This fixes it.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <18908.31922.411398.147810@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dc66270b
  18. 07 4月, 2009 1 次提交