1. 05 3月, 2012 6 次提交
  2. 02 3月, 2012 1 次提交
  3. 22 2月, 2012 1 次提交
  4. 21 2月, 2012 2 次提交
    • L
      i387: export 'fpu_owner_task' per-cpu variable · 27e74da9
      Linus Torvalds 提交于
      (And define it properly for x86-32, which had its 'current_task'
      declaration in separate from x86-64)
      
      Bitten by my dislike for modules on the machines I use, and the fact
      that apparently nobody else actually wanted to test the patches I sent
      out.
      
      Snif. Nobody else cares.
      
      Anyway, we probably should uninline the 'kernel_fpu_begin()' function
      that is what modules actually use and that references this, but this is
      the minimal fix for now.
      Reported-by: NJosh Boyer <jwboyer@gmail.com>
      Reported-and-tested-by: NJongman Heo <jongman.heo@samsung.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27e74da9
    • L
      i387: support lazy restore of FPU state · 7e16838d
      Linus Torvalds 提交于
      This makes us recognize when we try to restore FPU state that matches
      what we already have in the FPU on this CPU, and avoids the restore
      entirely if so.
      
      To do this, we add two new data fields:
      
       - a percpu 'fpu_owner_task' variable that gets written any time we
         update the "has_fpu" field, and thus acts as a kind of back-pointer
         to the task that owns the CPU.  The exception is when we save the FPU
         state as part of a context switch - if the save can keep the FPU
         state around, we leave the 'fpu_owner_task' variable pointing at the
         task whose FP state still remains on the CPU.
      
       - a per-thread 'last_cpu' field, that indicates which CPU that thread
         used its FPU on last.  We update this on every context switch
         (writing an invalid CPU number if the last context switch didn't
         leave the FPU in a lazily usable state), so we know that *that*
         thread has done nothing else with the FPU since.
      
      These two fields together can be used when next switching back to the
      task to see if the CPU still matches: if 'fpu_owner_task' matches the
      task we are switching to, we know that no other task (or kernel FPU
      usage) touched the FPU on this CPU in the meantime, and if the current
      CPU number matches the 'last_cpu' field, we know that this thread did no
      other FP work on any other CPU, so the FPU state on the CPU must match
      what was saved on last context switch.
      
      In that case, we can avoid the 'f[x]rstor' entirely, and just clear the
      CR0.TS bit.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e16838d
  5. 09 2月, 2012 1 次提交
  6. 07 2月, 2012 2 次提交
    • S
      perf: Fix double start/stop in x86_pmu_start() · f39d47ff
      Stephane Eranian 提交于
      The following patch fixes a bug introduced by the following
      commit:
      
              e050e3f0 ("perf: Fix broken interrupt rate throttling")
      
      The patch caused the following warning to pop up depending on
      the sampling frequency adjustments:
      
        ------------[ cut here ]------------
        WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4()
      
      It was caused by the following call sequence:
      
      perf_adjust_freq_unthr_context.part() {
           stop()
           if (delta > 0) {
                perf_adjust_period() {
                    if (period > 8*...) {
                        stop()
                        ...
                        start()
                    }
                }
            }
            start()
      }
      
      Which caused a double start and a double stop, thus triggering
      the assert in x86_pmu_start().
      
      The patch fixes the problem by avoiding the double calls. We
      pass a new argument to perf_adjust_period() to indicate whether
      or not the event is already stopped. We can't just remove the
      start/stop from that function because it's called from
      __perf_event_overflow where the event needs to be reloaded via a
      stop/start back-toback call.
      
      The patch reintroduces the assertion in x86_pmu_start() which
      was removed by commit:
      
      	84f2b9b2 ("perf: Remove deprecated WARN_ON_ONCE()")
      
      In this second version, we've added calls to disable/enable PMU
      during unthrottling or frequency adjustment based on bug report
      of spurious NMI interrupts from Eric Dumazet.
      Reported-and-tested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: markus@trippelsdorf.de
      Cc: paulus@samba.org
      Link: http://lkml.kernel.org/r/20120207133956.GA4932@quad
      [ Minor edits to the changelog and to the code ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f39d47ff
    • B
      x86/sched/perf/AMD: Set sched_clock_stable · c98fdeaa
      Borislav Petkov 提交于
      Stephane Eranian reported that doing a scheduler latency
      measurements with perf on AMD doesn't work out as expected due
      to the fact that the sched_clock() granularity is too coarse,
      i.e. done in jiffies due to the sched_clock_stable not set,
      which, if set, would mean that we get to use the TSC as sample
      source which would give us much higher precision.
      
      However, there's no reason not to set sched_clock_stable on AMD
      because all families from F10h and upwards do have an invariant
      TSC and have the CPUID flag to prove (CPUID_8000_0007_EDX[8]).
      
      Make it so, #1.
      Signed-off-by: NBorislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@amd64.org>
      Cc: Venki Pallipadi <venki@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Link: http://lkml.kernel.org/r/20120206132546.GA30854@quad
      [ Should any non-standard system break the TSC, we should
        mark them so explicitly, in their platform init handler, or
        in a DMI quirk. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c98fdeaa
  7. 03 2月, 2012 1 次提交
    • S
      perf: Remove deprecated WARN_ON_ONCE() · 84f2b9b2
      Stephane Eranian 提交于
      With the new throttling/unthrottling code introduced with
      commit:
      
        e050e3f0 ("perf: Fix broken interrupt rate throttling")
      
      we occasionally hit two WARN_ON_ONCE() checks in:
      
        - intel_pmu_pebs_enable()
        - intel_pmu_lbr_enable()
        - x86_pmu_start()
      
      The assertions are no longer problematic. There is a valid
      path where they can trigger but it is harmless.
      
      The assertion can be triggered with:
      
        $ perf record -e instructions:pp ....
      
      Leading to paths:
      
        intel_pmu_pebs_enable
        intel_pmu_enable_event
        x86_perf_event_set_period
        x86_pmu_start
        perf_adjust_freq_unthr_context
        perf_event_task_tick
        scheduler_tick
      
      And:
      
        intel_pmu_lbr_enable
        intel_pmu_enable_event
        x86_perf_event_set_period
        x86_pmu_start
        perf_adjust_freq_unthr_context.
        perf_event_task_tick
        scheduler_tick
      
      cpuc->enabled is always on because when we get to
      perf_adjust_freq_unthr_context() the PMU is not totally
      disabled. Furthermore when we need to adjust a period,
      we only stop the event we need to change and not the
      entire PMU. Thus, when we re-enable, cpuc->enabled is
      already set. Note that when we stop the event, both
      pebs and lbr are stopped if necessary (and possible).
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: peterz@infradead.org
      Link: http://lkml.kernel.org/r/20120202110401.GA30911@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
      84f2b9b2
  8. 17 1月, 2012 1 次提交
  9. 14 1月, 2012 1 次提交
  10. 24 12月, 2011 1 次提交
  11. 22 12月, 2011 3 次提交
    • K
      cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem · 8a25a2fd
      Kay Sievers 提交于
      This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
      and converts the devices to regular devices. The sysdev drivers are
      implemented as subsystem interfaces now.
      
      After all sysdev classes are ported to regular driver core entities, the
      sysdev implementation will be entirely removed from the kernel.
      
      Userspace relies on events and generic sysfs subsystem infrastructure
      from sysdev devices, which are made available with this conversion.
      
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@amd64.org>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      8a25a2fd
    • S
      x86: Add counter when debug stack is used with interrupts enabled · 42181186
      Steven Rostedt 提交于
      Mathieu Desnoyers pointed out a case that can cause issues with
      NMIs running on the debug stack:
      
        int3 -> interrupt -> NMI -> int3
      
      Because the interrupt changes the stack, the NMI will not see that
      it preempted the debug stack. Looking deeper at this case,
      interrupts only happen when the int3 is from userspace or in
      an a location in the exception table (fixup).
      
        userspace -> int3 -> interurpt -> NMI -> int3
      
      All other int3s that happen in the kernel should be processed
      without ever enabling interrupts, as the do_trap() call will
      panic the kernel if it is called to process any other location
      within the kernel.
      
      Adding a counter around the sections that enable interrupts while
      using the debug stack allows the NMI to also check that case.
      If the NMI sees that it either interrupted a task using the debug
      stack or the debug counter is non-zero, then it will have to
      change the IDT table to make the int3 not change stacks (which will
      corrupt the stack if it does).
      
      Note, I had to move the debug_usage functions out of processor.h
      and into debugreg.h because of the static inlined functions to
      inc and dec the debug_usage counter. __get_cpu_var() requires
      smp.h which includes processor.h, and would fail to build.
      
      Link: http://lkml.kernel.org/r/1323976535.23971.112.camel@gandalf.stny.rr.comReported-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul Turner <pjt@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      42181186
    • S
      x86: Keep current stack in NMI breakpoints · 228bdaa9
      Steven Rostedt 提交于
      We want to allow NMI handlers to have breakpoints to be able to
      remove stop_machine from ftrace, kprobes and jump_labels. But if
      an NMI interrupts a current breakpoint, and then it triggers a
      breakpoint itself, it will switch to the breakpoint stack and
      corrupt the data on it for the breakpoint processing that it
      interrupted.
      
      Instead, have the NMI check if it interrupted breakpoint processing
      by checking if the stack that is currently used is a breakpoint
      stack. If it is, then load a special IDT that changes the IST
      for the debug exception to keep the same stack in kernel context.
      When the NMI is done, it puts it back.
      
      This way, if the NMI does trigger a breakpoint, it will keep
      using the same stack and not stomp on the breakpoint data for
      the breakpoint it interrupted.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      228bdaa9
  12. 21 12月, 2011 6 次提交
  13. 17 12月, 2011 1 次提交
  14. 16 12月, 2011 1 次提交
  15. 15 12月, 2011 2 次提交
  16. 14 12月, 2011 2 次提交
  17. 12 12月, 2011 1 次提交
    • F
      x86: Call idle notifier after irq_enter() · 98ad1cc1
      Frederic Weisbecker 提交于
      Interrupts notify the idle exit state before calling irq_enter().
      But the notifier code calls rcu_read_lock() and this is not
      allowed while rcu is in an extended quiescent state. We need
      to wait for irq_enter() -> rcu_idle_exit() to be called before
      doing so otherwise this results in a grumpy RCU:
      
      [    0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
      [    0.099991] Hardware name: AMD690VM-FMH
      [    0.099991] Modules linked in:
      [    0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255
      [    0.099991] Call Trace:
      [    0.099991]  <IRQ>  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
      [    0.099991]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
      [    0.099991]  [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110
      [    0.099991]  [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20
      [    0.099991]  [<ffffffff81001873>] exit_idle+0x43/0x50
      [    0.099991]  [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0
      [    0.099991]  [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20
      [    0.099991]  <EOI>  [<ffffffff8100ae67>] ? default_idle+0xa7/0x350
      [    0.099991]  [<ffffffff8100ae65>] ? default_idle+0xa5/0x350
      [    0.099991]  [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110
      [    0.099991]  [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160
      [    0.099991]  [<ffffffff810019a0>] cpu_idle+0xb0/0x110
      [    0.099991]  [<ffffffff817a7505>] rest_init+0xe5/0x140
      [    0.099991]  [<ffffffff817a7468>] ? rest_init+0x48/0x140
      [    0.099991]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
      [    0.099991]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
      [    0.099991]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Andy Henroid <andrew.d.henroid@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      98ad1cc1
  18. 09 12月, 2011 1 次提交
  19. 07 12月, 2011 3 次提交
  20. 06 12月, 2011 3 次提交
    • P
      perf, x86: Prefer fixed-purpose counters when scheduling · 4defea85
      Peter Zijlstra 提交于
      This avoids a scheduling failure for cases like:
      
        cycles, cycles, instructions, instructions (on Core2)
      
      Which would end up being programmed like:
      
        PMC0, PMC1, FP-instructions, fail
      
      Because all events will have the same weight.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/n/tip-8tnwb92asqj7xajqqoty4gel@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      4defea85
    • R
      perf, x86: Fix event scheduler for constraints with overlapping counters · bc1738f6
      Robert Richter 提交于
      The current x86 event scheduler fails to resolve scheduling problems
      of certain combinations of events and constraints. This happens if the
      counter mask of such an event is not a subset of any other counter
      mask of a constraint with an equal or higher weight, e.g. constraints
      of the AMD family 15h pmu:
      
                              counter mask    weight
      
       amd_f15_PMC30          0x09            2  <--- overlapping counters
       amd_f15_PMC20          0x07            3
       amd_f15_PMC53          0x38            3
      
      The scheduler does not find then an existing solution. Here is an
      example:
      
       event code     counter         failure         possible solution
      
       0x02E          PMC[3,0]        0               3
       0x043          PMC[2:0]        1               0
       0x045          PMC[2:0]        2               1
       0x046          PMC[2:0]        FAIL            2
      
      The event scheduler may not select the correct counter in the first
      cycle because it needs to know which subsequent events will be
      scheduled. It may fail to schedule the events then.
      
      To solve this, we now save the scheduler state of events with
      overlapping counter counstraints.  If we fail to schedule the events
      we rollback to those states and try to use another free counter.
      
      Constraints with overlapping counters are marked with a new introduced
      overlap flag. We set the overlap flag for such constraints to give the
      scheduler a hint which events to select for counter rescheduling. The
      EVENT_CONSTRAINT_OVERLAP() macro can be used for this.
      
      Care must be taken as the rescheduling algorithm is O(n!) which will
      increase scheduling cycles for an over-commited system dramatically.
      The number of such EVENT_CONSTRAINT_OVERLAP() macros and its counter
      masks must be kept at a minimum. Thus, the current stack is limited to
      2 states to limit the number of loops the algorithm takes in the worst
      case.
      
      On systems with no overlapping-counter constraints, this
      implementation does not increase the loop count compared to the
      previous algorithm.
      
      V2:
      * Renamed redo -> overlap.
      * Reimplementation using perf scheduling helper functions.
      
      V3:
      * Added WARN_ON_ONCE() if out of save states.
      * Changed function interface of perf_sched_restore_state() to use bool
        as return value.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1321616122-1533-3-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      bc1738f6
    • R
      perf, x86: Implement event scheduler helper functions · 1e2ad28f
      Robert Richter 提交于
      This patch introduces x86 perf scheduler code helper functions. We
      need this to later add more complex functionality to support
      overlapping counter constraints (next patch).
      
      The algorithm is modified so that the range of weight values is now
      generated from the constraints. There shouldn't be other functional
      changes.
      
      With the helper functions the scheduler is controlled. There are
      functions to initialize, traverse the event list, find unused counters
      etc. The scheduler keeps its own state.
      
      V3:
      * Added macro for_each_set_bit_cont().
      * Changed functions interfaces of perf_sched_find_counter() and
        perf_sched_next_event() to use bool as return value.
      * Added some comments to make code better understandable.
      
      V4:
      * Fix broken event assignment if weight of the first event is not
        wmin (perf_sched_init()).
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1321616122-1533-2-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      1e2ad28f