1. 30 10月, 2009 3 次提交
  2. 28 10月, 2009 1 次提交
    • P
      sh: perf events: Add preliminary support for SH-4A counters. · ac44e669
      Paul Mundt 提交于
      This adds in preliminary support for the SH-4A performance counters.
      Presently only the first 2 counters are supported, as these are the ones
      of the most interest to the perf tool and end users. Counter chaining is
      not presently handled, so these are simply implemented as 32-bit
      counters.
      
      This also establishes a perf event support framework for other hardware
      counters, which the existing SH-4 oprofile code will migrate over to as
      the SH-4A support evolves.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      ac44e669
  3. 27 10月, 2009 3 次提交
  4. 26 10月, 2009 1 次提交
  5. 20 10月, 2009 1 次提交
  6. 18 10月, 2009 1 次提交
    • P
      sh: Fix up smp_mb__xxx() memory barriers for SH-4A SMP. · 1c8db713
      Paul Mundt 提交于
      In the past these were simply wrapping to barrier() which was sufficient
      on SH SMP platforms predating SH-4A. Unfortunately due to ll/sc semantics
      an explicit synco is needed in these cases, which is sorted for us by
      just switching these over to smp_mb(). smp_mb() also has the benefit of
      being wrapped to barrier() in the UP and non-SH4A cases, so old behaviour
      is maintained for those parts.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      1c8db713
  7. 17 10月, 2009 1 次提交
  8. 16 10月, 2009 3 次提交
    • P
      sh: Kill off legacy UBC wakeup cruft. · cae19b59
      Paul Mundt 提交于
      This code was added for some ancient SH-4 solution engines with peculiar
      boot ROMs that did silly things to the UBC MSTP bits. None of these have
      been in the wild for years, and these days the clock framework wraps up
      the MSTP bits, meaning that the UBC code is one of the few interfaces
      that is stomping MSTP bits underneath the clock framework. At this point
      the risks far outweigh any benefit this code provided, so just kill it
      off.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      cae19b59
    • P
      sh: Support SCHED_MC for SH-X3 multi-cores. · 896f0c0e
      Paul Mundt 提交于
      This enables SCHED_MC support for SH-X3 multi-cores. Presently this is
      just a simple wrapper around the possible map, but this allows for
      tying in support for some of the more exotic NUMA clusters where we can
      actually do something with the topology.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      896f0c0e
    • P
      sh: Idle loop chainsawing for SMP-based light sleep. · f533c3d3
      Paul Mundt 提交于
      This does a bit of chainsawing of the idle loop code to get light sleep
      working on SMP. Previously this was forcing secondary CPUs in to sleep
      mode with them not coming back if they didn't have their own local
      timers. Given that we use clockevents broadcasting by default, the CPU
      managing the clockevents can't have IRQs disabled before entering its
      sleep state.
      
      This unfortunately leaves us with the age-old need_resched() race in
      between local_irq_enable() and cpu_sleep(), but at present this is
      unavoidable. After some more experimentation it may be possible to layer
      on SR.BL bit manipulation over top of this scheme to inhibit the race
      condition, but given the current potential for missing wakeups, this is
      left as a future exercise.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      f533c3d3
  9. 14 10月, 2009 3 次提交
  10. 13 10月, 2009 2 次提交
    • P
      sh: Tidy up the dwarf module helpers. · 5a3abba7
      Paul Mundt 提交于
      This enables us to build the dwarf unwinder both with modules enabled and
      disabled in addition to reducing code size in the latter case. The
      helpers are also consolidated, and modified to resemble the BUG module
      helpers.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      5a3abba7
    • P
      sh: Generalize CALLER_ADDRx support. · ac4fac8c
      Paul Mundt 提交于
      This splits out the unwinder implementation and adds a new
      return_address() abstraction modelled after the ARM code. The DWARF
      unwinder is tied in to this, returning NULL otherwise in the case of
      being unable to support arbitrary depths.
      
      This enables us to get correct behaviour with the unwinder enabled,
      as well as disabling the arbitrary depth support when frame pointers are
      enabled, as arbitrary depths with __builtin_return_address() are not
      supported regardless.
      
      With this abstraction it's also possible to layer on a simplified
      implementation with frame pointers in the event that the unwinder isn't
      enabled, although this is left as a future exercise.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      ac4fac8c
  11. 12 10月, 2009 2 次提交
  12. 11 10月, 2009 1 次提交
  13. 10 10月, 2009 6 次提交
  14. 24 9月, 2009 2 次提交
  15. 21 9月, 2009 1 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  16. 16 9月, 2009 2 次提交
    • P
      sched: Disable wakeup balancing · 182a85f8
      Peter Zijlstra 提交于
      Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't
      really mind too much, SD_BALANCE_NEWIDLE picks up most of the
      slack.
      
      On a dual socket, quad core, dual thread nehalem system:
      
      sysbench (--num_threads=16):
      
       SD_BALANCE_WAKE-: 13982 tx/s
       SD_BALANCE_WAKE+: 15688 tx/s
      
      kbuild (-j16):
      
       SD_BALANCE_WAKE-: 47.648295846  seconds time elapsed   ( +-   0.312% )
       SD_BALANCE_WAKE+: 47.608607360  seconds time elapsed   ( +-   0.026% )
      
      (same within noise)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      182a85f8
    • P
      sh: Wire up HAVE_SYSCALL_TRACEPOINTS. · a74f7e04
      Paul Mundt 提交于
      This is necessary to get ftrace syscall tracing working again.. a fairly
      trivial and mechanical change. The one benefit is that this can also be
      enabled on sh64, despite not having its own ftrace port.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      a74f7e04
  17. 15 9月, 2009 5 次提交
    • P
      sched: Reduce forkexec_idx · b8a543ea
      Peter Zijlstra 提交于
      If we're looking to place a new task, we might as well find the
      idlest position _now_, not 1 tick ago.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b8a543ea
    • M
      sched: Improve latencies and throughput · 0ec9fab3
      Mike Galbraith 提交于
      Make the idle balancer more agressive, to improve a
      x264 encoding workload provided by Jason Garrett-Glaser:
      
       NEXT_BUDDY NO_LB_BIAS
       encoded 600 frames, 252.82 fps, 22096.60 kb/s
       encoded 600 frames, 250.69 fps, 22096.60 kb/s
       encoded 600 frames, 245.76 fps, 22096.60 kb/s
      
       NO_NEXT_BUDDY LB_BIAS
       encoded 600 frames, 344.44 fps, 22096.60 kb/s
       encoded 600 frames, 346.66 fps, 22096.60 kb/s
       encoded 600 frames, 352.59 fps, 22096.60 kb/s
      
       NO_NEXT_BUDDY NO_LB_BIAS
       encoded 600 frames, 425.75 fps, 22096.60 kb/s
       encoded 600 frames, 425.45 fps, 22096.60 kb/s
       encoded 600 frames, 422.49 fps, 22096.60 kb/s
      
      Peter pointed out that this is better done via newidle_idx,
      not via LB_BIAS, newidle balancing should look for where
      there is load _now_, not where there was load 2 ticks ago.
      
      Worst-case latencies are improved as well as no buddies
      means less vruntime spread. (as per prior lkml discussions)
      
      This change improves kbuild-peak parallelism as well.
      Reported-by: NJason Garrett-Glaser <darkshikari@gmail.com>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1253011667.9128.16.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0ec9fab3
    • P
      sched: Tweak wake_idx · 78e7ed53
      Peter Zijlstra 提交于
      When merging select_task_rq_fair() and sched_balance_self() we lost
      the use of wake_idx, restore that and set them to 0 to make wake
      balancing more aggressive.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      78e7ed53
    • P
      sched: Merge select_task_rq_fair() and sched_balance_self() · c88d5910
      Peter Zijlstra 提交于
      The problem with wake_idle() is that is doesn't respect things like
      cpu_power, which means it doesn't deal well with SMT nor the recent
      RT interaction.
      
      To cure this, it needs to do what sched_balance_self() does, which
      leads to the possibility of merging select_task_rq_fair() and
      sched_balance_self().
      
      Modify sched_balance_self() to:
      
        - update_shares() when walking up the domain tree,
          (it only called it for the top domain, but it should
           have done this anyway), which allows us to remove
          this ugly bit from try_to_wake_up().
      
        - do wake_affine() on the smallest domain that contains
          both this (the waking) and the prev (the wakee) cpu for
          WAKE invocations.
      
      Then use the top-down balance steps it had to replace wake_idle().
      
      This leads to the dissapearance of SD_WAKE_BALANCE and
      SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE.
      
      SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective.
      
      Touch all topology bits to replace the old with new SD flags --
      platforms might need re-tuning, enabling SD_BALANCE_WAKE
      conditionally on a NUMA distance seems like a good additional
      feature, magny-core and small nehalem systems would want this
      enabled, systems with slow interconnects would not.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c88d5910
    • K
      sh: add kycr2_delay for sh_keysc · 1f85d381
      Kuninori Morimoto 提交于
      After KYCR2 is set, udelay might become necessary if there are only a
      small number of keys attached. This patch introduces an optional delay
      through the platform data to address this problem.
      Signed-off-by: NKuninori Morimoto <morimoto.kuninori@renesas.com>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      1f85d381
  18. 11 9月, 2009 1 次提交
  19. 10 9月, 2009 1 次提交