1. 15 9月, 2009 2 次提交
    • P
      sched: Tweak wake_idx · 78e7ed53
      Peter Zijlstra 提交于
      When merging select_task_rq_fair() and sched_balance_self() we lost
      the use of wake_idx, restore that and set them to 0 to make wake
      balancing more aggressive.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      78e7ed53
    • P
      sched: Merge select_task_rq_fair() and sched_balance_self() · c88d5910
      Peter Zijlstra 提交于
      The problem with wake_idle() is that is doesn't respect things like
      cpu_power, which means it doesn't deal well with SMT nor the recent
      RT interaction.
      
      To cure this, it needs to do what sched_balance_self() does, which
      leads to the possibility of merging select_task_rq_fair() and
      sched_balance_self().
      
      Modify sched_balance_self() to:
      
        - update_shares() when walking up the domain tree,
          (it only called it for the top domain, but it should
           have done this anyway), which allows us to remove
          this ugly bit from try_to_wake_up().
      
        - do wake_affine() on the smallest domain that contains
          both this (the waking) and the prev (the wakee) cpu for
          WAKE invocations.
      
      Then use the top-down balance steps it had to replace wake_idle().
      
      This leads to the dissapearance of SD_WAKE_BALANCE and
      SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE.
      
      SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective.
      
      Touch all topology bits to replace the old with new SD flags --
      platforms might need re-tuning, enabling SD_BALANCE_WAKE
      conditionally on a NUMA distance seems like a good additional
      feature, magny-core and small nehalem systems would want this
      enabled, systems with slow interconnects would not.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c88d5910
  2. 06 9月, 2009 1 次提交
  3. 03 9月, 2009 1 次提交
  4. 01 9月, 2009 1 次提交
    • H
      locking, powerpc: Rename __spin_try_lock() and friends · 8307a980
      Heiko Carstens 提交于
      Needed to avoid namespace conflicts when the common code
      function bodies of _spin_try_lock() etc. are moved to a header
      file where the function name would be __spin_try_lock().
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Horst Hartmann <horsth@linux.vnet.ibm.com>
      Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <20090831124415.918799705@de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8307a980
  5. 27 8月, 2009 2 次提交
  6. 18 8月, 2009 3 次提交
    • P
      perf_counter: powerpc: Add callchain support · 20002ded
      Paul Mackerras 提交于
      This adds support for tracing callchains for powerpc, both 32-bit
      and 64-bit, and both in the kernel and userspace, from PMU interrupt
      context.
      
      The first three entries stored for each callchain are the NIP (next
      instruction pointer), LR (link register), and the contents of the LR
      save area in the second stack frame (the first is ignored because the
      ABI convention on powerpc is that functions save their return address
      in their caller's stack frame).  Because leaf functions don't have to
      save their return address (LR value) and don't have to establish a
      stack frame, it's possible for either or both of LR and the second
      stack frame's LR save area to have valid return addresses in them.
      This is basically impossible to disambiguate without either reading
      the code or looking at auxiliary information such as CFI tables.
      Since we don't want to do either of those things at interrupt time,
      we store both LR and the second stack frame's LR save area.
      
      Once we get past the second stack frame, there is no ambiguity; all
      return addresses we get are reliable.
      
      For kernel traces, we check whether they are valid kernel instruction
      addresses and store zero instead if they are not (rather than
      omitting them, which would make it impossible for userspace to know
      which was which).  We also store zero instead of the second stack
      frame's LR save area value if it is the same as LR.
      
      For kernel traces, we check for interrupt frames, and for user traces,
      we check for signal frames.  In each case, since we're starting a new
      trace, we store a PERF_CONTEXT_KERNEL/USER marker so that userspace
      knows that the next three entries are NIP, LR and the second stack frame
      for the interrupted context.
      
      We read user memory with __get_user_inatomic.  On 64-bit, if this
      PMU interrupt occurred while interrupts are soft-disabled, and
      there is no MMU hash table entry for the page, we will get an
      -EFAULT return from __get_user_inatomic even if there is a valid
      Linux PTE for the page, since hash_page isn't reentrant.  Thus we
      have code here to read the Linux PTE and access the page via the
      kernel linear mapping.  Since 64-bit doesn't use (or need) highmem
      there is no need to do kmap_atomic.  On 32-bit, we don't do soft
      interrupt disabling, so this complication doesn't occur and there
      is no need to fall back to reading the Linux PTE, since hash_page
      (or the TLB miss handler) will get called automatically if necessary.
      
      Note that we cannot get PMU interrupts in the interval during
      context switch between switch_mm (which switches the user address
      space) and switch_to (which actually changes current to the new
      process).  On 64-bit this is because interrupts are hard-disabled
      in switch_mm and stay hard-disabled until they are soft-enabled
      later, after switch_to has returned.  So there is no possibility
      of trying to do a user stack trace when the user address space is
      not current's address space.
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      20002ded
    • P
      powerpc: Allow perf_counters to access user memory at interrupt time · 9c1e1052
      Paul Mackerras 提交于
      This provides a mechanism to allow the perf_counters code to access
      user memory in a PMU interrupt routine.  Such an access can cause
      various kinds of interrupt: SLB miss, MMU hash table miss, segment
      table miss, or TLB miss, depending on the processor.  This commit
      only deals with 64-bit classic/server processors, which use an MMU
      hash table.  32-bit processors are already able to access user memory
      at interrupt time.  Since we don't soft-disable on 32-bit, we avoid
      the possibility of reentering hash_page or the TLB miss handlers,
      since they run with interrupts disabled.
      
      On 64-bit processors, an SLB miss interrupt on a user address will
      update the slb_cache and slb_cache_ptr fields in the paca.  This is
      OK except in the case where a PMU interrupt occurs in switch_slb,
      which also accesses those fields.  To prevent this, we hard-disable
      interrupts in switch_slb.  Interrupts are already soft-disabled at
      this point, and will get hard-enabled when they get soft-enabled
      later.
      
      This also reworks slb_flush_and_rebolt: to avoid hard-disabling twice,
      and to make sure that it clears the slb_cache_ptr when called from
      other callers than switch_slb, the existing routine is renamed to
      __slb_flush_and_rebolt, which is called by switch_slb and the new
      version of slb_flush_and_rebolt.
      
      Similarly, switch_stab (used on POWER3 and RS64 processors) gets a
      hard_irq_disable() to protect the per-cpu variables used there and
      in ste_allocate.
      
      If a MMU hashtable miss interrupt occurs, normally we would call
      hash_page to look up the Linux PTE for the address and create a HPTE.
      However, hash_page is fairly complex and takes some locks, so to
      avoid the possibility of deadlock, we check the preemption count
      to see if we are in a (pseudo-)NMI handler, and if so, we don't call
      hash_page but instead treat it like a bad access that will get
      reported up through the exception table mechanism.  An interrupt
      whose handler runs even though the interrupt occurred when
      soft-disabled (such as the PMU interrupt) is considered a pseudo-NMI
      handler, which should use nmi_enter()/nmi_exit() rather than
      irq_enter()/irq_exit().
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      9c1e1052
    • P
      powerpc/32: Always order writes to halves of 64-bit PTEs · 1660e9d3
      Paul Mackerras 提交于
      On 32-bit systems with 64-bit PTEs, the PTEs have to be written in two
      32-bit halves.  On SMP we write the higher-order half and then the
      lower-order half, with a write barrier between the two halves, but on
      UP there was no particular ordering of the writes to the two halves.
      
      This extends the ordering that we already do on SMP to the UP case as
      well.  The reason is that with the perf_counter subsystem potentially
      accessing user memory at interrupt time to get stack traces, we have
      to be careful not to create an incorrect but apparently valid PTE even
      on UP.
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      1660e9d3
  7. 10 8月, 2009 1 次提交
  8. 09 8月, 2009 1 次提交
    • P
      perf_counter/powerpc: Fix oops on cpus without perf_counter hardware support · f36a1a13
      Paul Mackerras 提交于
      If we have the powerpc perf_counter backend compiled in, but
      the cpu we are running on is one where we don't support the
      PMU, we currently oops in hw_perf_group_sched_in if we try to
      use any counters, because ppmu is NULL in that case, and we
      unconditionally dereference ppmu.
      
      This fixes the problem by adding a check if ppmu is NULL at the
      beginning of hw_perf_group_sched_in, and also at the beginning
      of the other functions that get called from the perf_counter
      core, i.e. hw_perf_disable, hw_perf_enable, and
      hw_perf_counter_setup.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: benh@kernel.crashing.org
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f36a1a13
  9. 06 8月, 2009 1 次提交
  10. 05 8月, 2009 1 次提交
  11. 30 7月, 2009 7 次提交
  12. 28 7月, 2009 6 次提交
  13. 15 7月, 2009 2 次提交
  14. 13 7月, 2009 1 次提交
  15. 11 7月, 2009 1 次提交
  16. 08 7月, 2009 9 次提交