1. 31 7月, 2008 1 次提交
    • I
      sched clock: revert various sched_clock() changes · e4e4e534
      Ingo Molnar 提交于
      Found an interactivity problem on a quad core test-system - simple
      CPU loops would occasionally delay the system un an unacceptable way.
      
      After much debugging with Peter Zijlstra it turned out that the problem
      is caused by the string of sched_clock() changes - they caused the CPU
      clock to jump backwards a bit - which confuses the scheduler arithmetics.
      
      (which is unsigned for performance reasons)
      
      So revert:
      
       # c300ba25: sched_clock: and multiplier for TSC to gtod drift
       # c0c87734: sched_clock: only update deltas with local reads.
       # af52a90a: sched_clock: stop maximum check on NO HZ
       # f7cce27f: sched_clock: widen the max and min time
      
      This solves the interactivity problems.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NMike Galbraith <efault@gmx.de>
      e4e4e534
  2. 26 7月, 2008 1 次提交
  3. 22 7月, 2008 1 次提交
    • A
      sysdev: Pass the attribute to the low level sysdev show/store function · 4a0b2b4d
      Andi Kleen 提交于
      This allow to dynamically generate attributes and share show/store
      functions between attributes. Right now most attributes are generated
      by special macros and lots of duplicated code. With the attribute
      passed it's instead possible to attach some data to the attribute
      and then use that in shared low level functions to do different things.
      
      I need this for the dynamically generated bank attributes in the x86
      machine check code, but it'll allow some further cleanups.
      
      I converted all users in tree to the new show/store prototype. It's a single
      huge patch to avoid unbisectable sections.
      
      Runtime tested: x86-32, x86-64
      Compiled only: ia64, powerpc
      Not compile tested/only grep converted: sh, arm, avr32
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      4a0b2b4d
  4. 19 7月, 2008 2 次提交
    • M
      cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c · c18a41fb
      Mike Travis 提交于
        * Optimize various places where a pointer to the cpumask_of_cpu value
          will result in reducing stack pressure.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c18a41fb
    • T
      nohz: prevent tick stop outside of the idle loop · b8f8c3cf
      Thomas Gleixner 提交于
      Jack Ren and Eric Miao tracked down the following long standing
      problem in the NOHZ code:
      
      	scheduler switch to idle task
      	enable interrupts
      
      Window starts here
      
      	----> interrupt happens (does not set NEED_RESCHED)
      	      	irq_exit() stops the tick
      
      	----> interrupt happens (does set NEED_RESCHED)
      
      	return from schedule()
      	
      	cpu_idle(): preempt_disable();
      
      Window ends here
      
      The interrupts can happen at any point inside the race window. The
      first interrupt stops the tick, the second one causes the scheduler to
      rerun and switch away from idle again and we end up with the tick
      disabled.
      
      The fact that it needs two interrupts where the first one does not set
      NEED_RESCHED and the second one does made the bug obscure and extremly
      hard to reproduce and analyse. Kudos to Jack and Eric.
      
      Solution: Limit the NOHZ functionality to the idle loop to make sure
      that we can not run into such a situation ever again.
      
      cpu_idle()
      {
      	preempt_disable();
      
      	while(1) {
      		 tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
      		 			          are in the idle loop
      
      		 while (!need_resched())
      		       halt();
      
      		 tick_nohz_restart_sched_tick(); <- disables NOHZ mode
      		 preempt_enable_no_resched();
      		 schedule();
      		 preempt_disable();
      	}
      }
      
      In hindsight we should have done this forever, but ... 
      
      /me grabs a large brown paperbag.
      
      Debugged-by: Jack Ren <jack.ren@marvell.com>, 
      Debugged-by: Neric miao <eric.y.miao@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b8f8c3cf
  5. 11 7月, 2008 2 次提交
    • S
      sched_clock: stop maximum check on NO HZ · af52a90a
      Steven Rostedt 提交于
      Working with ftrace I would get large jumps of 11 millisecs or more with
      the clock tracer. This killed the latencing timings of ftrace and also
      caused the irqoff self tests to fail.
      
      What was happening is with NO_HZ the idle would stop the jiffy counter and
      before the jiffy counter was updated the sched_clock would have a bad
      delta jiffies to compare with the gtod with the maximum.
      
      The jiffies would stop and the last sched_tick would record the last gtod.
      On wakeup, the sched clock update would compare the gtod + delta jiffies
      (which would be zero) and compare it to the TSC. The TSC would have
      correctly (with a stable TSC) moved forward several jiffies. But because the
      jiffies has not been updated yet the clock would be prevented from moving
      forward because it would appear that the TSC jumped too far ahead.
      
      The clock would then virtually stop, until the jiffies are updated. Then
      the next sched clock update would see that the clock was very much behind
      since the delta jiffies is now correct. This would then jump the clock
      forward by several jiffies.
      
      This caused ftrace to report several milliseconds of interrupts off
      latency at every resume from NO_HZ idle.
      
      This patch adds hooks into the nohz code to disable the checking of the
      maximum clock update when nohz is in effect. It resumes the max check
      when nohz has updated the jiffies again.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      af52a90a
    • H
      nohz: don't stop idle tick if softirqs are pending. · 857f3fd7
      Heiko Carstens 提交于
      In case a cpu goes idle but softirqs are pending only an error message is
      printed to the console. It may take a very long time until the pending
      softirqs will finally be executed. Worst case would be a hanging system.
      
      With this patch the timer tick just continues and the softirqs will be
      executed after the next interrupt. Still a delay but better than a
      hanging system.
      
      Currently we have at least two device drivers on s390 which under certain
      circumstances schedule a tasklet from process context. This is a reason
      why we can end up with pending softirqs when going idle. Fixing these
      drivers seems to be non-trivial.
      However there is no question that the drivers should be fixed.
      This patch shouldn't be considered as a bug fix. It just is intended to
      keep a system running even if device drivers are buggy.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jan Glauber <jan.glauber@de.ibm.com>
      Cc: Stefan Weinhuber <wein@de.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      857f3fd7
  6. 08 7月, 2008 1 次提交
    • T
      x86, clockevents: add C1E aware idle function · aa276e1c
      Thomas Gleixner 提交于
      C1E on AMD machines is like C3 but without control from the OS. Up to
      now we disabled the local apic timer for those machines as it stops
      when the CPU goes into C1E. This excludes those machines from high
      resolution timers / dynamic ticks, which hurts especially X2 based
      laptops.
      
      The current boot time C1E detection has another, more serious flaw
      as well: some BIOSes do not enable C1E until the ACPI processor module
      is loaded. This causes systems to stop working after that point.
      
      To work nicely with C1E enabled machines we use a separate idle
      function, which checks on idle entry whether C1E was enabled in the
      Interrupt Pending Message MSR. This allows us to do timer broadcasting
      for C1E and covers the late enablement of C1E as well.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      aa276e1c
  7. 26 6月, 2008 1 次提交
  8. 30 5月, 2008 2 次提交
  9. 24 5月, 2008 1 次提交
  10. 04 5月, 2008 2 次提交
  11. 01 5月, 2008 10 次提交
  12. 29 4月, 2008 1 次提交
  13. 25 4月, 2008 1 次提交
    • I
      softlockup: fix NOHZ wakeup · 126e01bf
      Ingo Molnar 提交于
      David Miller reported:
      
      |--------------->
      the following commit:
      
      | commit 27ec4407
      | Author: Ingo Molnar <mingo@elte.hu>
      | Date:   Thu Feb 28 21:00:21 2008 +0100
      |
      |     sched: make cpu_clock() globally synchronous
      |
      |     Alexey Zaytsev reported (and bisected) that the introduction of
      |     cpu_clock() in printk made the timestamps jump back and forth.
      |
      |     Make cpu_clock() more reliable while still keeping it fast when it's
      |     called frequently.
      |
      |     Signed-off-by: Ingo Molnar <mingo@elte.hu>
      
      causes watchdog triggers when a cpu exits NOHZ state when it has been
      there for >= the soft lockup threshold, for example here are some
      messages from a 128 cpu Niagara2 box:
      
      [  168.106406] BUG: soft lockup - CPU#11 stuck for 128s! [dd:3239]
      [  168.989592] BUG: soft lockup - CPU#21 stuck for 86s! [swapper:0]
      [  168.999587] BUG: soft lockup - CPU#29 stuck for 91s! [make:4511]
      [  168.999615] BUG: soft lockup - CPU#2 stuck for 85s! [swapper:0]
      [  169.020514] BUG: soft lockup - CPU#37 stuck for 91s! [swapper:0]
      [  169.020514] BUG: soft lockup - CPU#45 stuck for 91s! [sh:4515]
      [  169.020515] BUG: soft lockup - CPU#69 stuck for 92s! [swapper:0]
      [  169.020515] BUG: soft lockup - CPU#77 stuck for 92s! [swapper:0]
      [  169.020515] BUG: soft lockup - CPU#61 stuck for 92s! [swapper:0]
      [  169.112554] BUG: soft lockup - CPU#85 stuck for 92s! [swapper:0]
      [  169.112554] BUG: soft lockup - CPU#101 stuck for 92s! [swapper:0]
      [  169.112554] BUG: soft lockup - CPU#109 stuck for 92s! [swapper:0]
      [  169.112554] BUG: soft lockup - CPU#117 stuck for 92s! [swapper:0]
      [  169.171483] BUG: soft lockup - CPU#40 stuck for 80s! [dd:3239]
      [  169.331483] BUG: soft lockup - CPU#13 stuck for 86s! [swapper:0]
      [  169.351500] BUG: soft lockup - CPU#43 stuck for 101s! [dd:3239]
      [  169.531482] BUG: soft lockup - CPU#9 stuck for 129s! [mkdir:4565]
      [  169.595754] BUG: soft lockup - CPU#20 stuck for 93s! [swapper:0]
      [  169.626787] BUG: soft lockup - CPU#52 stuck for 93s! [swapper:0]
      [  169.626787] BUG: soft lockup - CPU#84 stuck for 92s! [swapper:0]
      [  169.636812] BUG: soft lockup - CPU#116 stuck for 94s! [swapper:0]
      
      It's simple enough to trigger this by doing a 10 minute sleep after a
      fresh bootup then starting a parallel kernel build.
      
      I suspect this might be reintroducing a problem we've had and fixed
      before, see the thread:
      
      http://marc.info/?l=linux-kernel&m=119546414004065&w=2
      <---------------|
      
      touch the softlockup watchdog when exiting NOHZ state - we are
      obviously not locked up.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      126e01bf
  14. 21 4月, 2008 1 次提交
  15. 20 4月, 2008 2 次提交
    • P
      sched: rt-group: synchonised bandwidth period · d0b27fa7
      Peter Zijlstra 提交于
      Various SMP balancing algorithms require that the bandwidth period
      run in sync.
      
      Possible improvements are moving the rt_bandwidth thing into root_domain
      and keeping a span per rt_bandwidth which marks throttled cpus.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d0b27fa7
    • T
      x86: tsc prevent time going backwards · d8bb6f4c
      Thomas Gleixner 提交于
      We already catch most of the TSC problems by sanity checks, but there
      is a subtle bug which has been in the code forever. This can cause
      time jumps in the range of hours.
      
      This was reported in:
           http://lkml.org/lkml/2007/8/23/96
      and
           http://lkml.org/lkml/2008/3/31/23
      
      I was able to reproduce the problem with a gettimeofday loop test on a
      dual core and a quad core machine which both have sychronized
      TSCs. The TSCs seems not to be perfectly in sync though, but the
      kernel is not able to detect the slight delta in the sync check. Still
      there exists an extremly small window where this delta can be observed
      with a real big time jump. So far I was only able to reproduce this
      with the vsyscall gettimeofday implementation, but in theory this
      might be observable with the syscall based version as well.
      
      CPU 0 updates the clock source variables under xtime/vyscall lock and
      CPU1, where the TSC is slighty behind CPU0, is reading the time right
      after the seqlock was unlocked.
      
      The clocksource reference data was updated with the TSC from CPU0 and
      the value which is read from TSC on CPU1 is less than the reference
      data. This results in a huge delta value due to the unsigned
      subtraction of the TSC value and the reference value. This algorithm
      can not be changed due to the support of wrapping clock sources like
      pm timer.
      
      The huge delta is converted to nanoseconds and added to xtime, which
      is then observable by the caller. The next gettimeofday call on CPU1
      will show the correct time again as now the TSC has advanced above the
      reference value.
      
      To prevent this TSC specific wreckage we need to compare the TSC value
      against the reference value and return the latter when it is larger
      than the actual TSC value.
      
      I pondered to mark the TSC unstable when the readout is smaller than
      the reference value, but this would render an otherwise good and fast
      clocksource unusable without a real good reason.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d8bb6f4c
  16. 18 4月, 2008 1 次提交
  17. 17 4月, 2008 3 次提交
    • A
      clocksource: make clocksource watchdog cycle through online CPUs · 6993fc5b
      Andi Kleen 提交于
      This way it checks if the clocks are synchronized between CPUs too.
      This might be able to detect slowly drifting TSCs which only
      go wrong over longer time.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6993fc5b
    • K
      clockevents: optimise tick_nohz_stop_sched_tick() a bit · 903b8a8d
      Karsten Wiese 提交于
      Call
      	ts = &per_cpu(tick_cpu_sched, cpu);
      and
      	cpu = smp_processor_id();
      once instead of twice.
      
      No functional change done, as changed code runs with local irq off.
      Reduces source lines and text size (20bytes on x86_64).
      
      [ akpm@linux-foundation.org: Build fix ]
      Signed-off-by: NKarsten Wiese <fzu@wemgehoertderstaat.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      903b8a8d
    • R
      [S390] genirq/clockevents: move irq affinity prototypes/inlines to interrupt.h · d7b90689
      Russell King 提交于
      > Generic code is not supposed to include irq.h. Replace this include
      > by linux/hardirq.h instead and add/replace an include of linux/irq.h
      > in asm header files where necessary.
      > This change should only matter for architectures that make use of
      > GENERIC_CLOCKEVENTS.
      > Architectures in question are mips, x86, arm, sh, powerpc, uml and sparc64.
      >
      > I did some cross compile tests for mips, x86_64, arm, powerpc and sparc64.
      > This patch fixes also build breakages caused by the include replacement in
      > tick-common.h.
      
      I generally dislike adding optional linux/* includes in asm/* includes -
      I'm nervous about this causing include loops.
      
      However, there's a separate point to be discussed here.
      
      That is, what interfaces are expected of every architecture in the kernel.
      If generic code wants to be able to set the affinity of interrupts, then
      that needs to become part of the interfaces listed in linux/interrupt.h
      rather than linux/irq.h.
      
      So what I suggest is this approach instead (against Linus' tree of a
      couple of days ago) - we move irq_set_affinity() and irq_can_set_affinity()
      to linux/interrupt.h, change the linux/irq.h includes to linux/interrupt.h
      and include asm/irq_regs.h where needed (asm/irq_regs.h is supposed to be
      rarely used include since not much touches the stacked parent context
      registers.)
      
      Build tested on ARM PXA family kernels and ARM's Realview platform
      kernels which both use genirq.
      
      [ tglx@linutronix.de: add GENERIC_HARDIRQ dependencies ]
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      d7b90689
  18. 26 3月, 2008 1 次提交
  19. 25 3月, 2008 1 次提交
  20. 20 3月, 2008 1 次提交
  21. 09 3月, 2008 3 次提交
  22. 01 3月, 2008 1 次提交