1. 27 7月, 2010 3 次提交
    • J
      timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset · 7615856e
      John Stultz 提交于
      update_vsyscall() did not provide the wall_to_monotoinc offset,
      so arch specific implementations tend to reference wall_to_monotonic
      directly. This limits future cleanups in the timekeeping core, so
      this patch fixes the update_vsyscall interface to provide
      wall_to_monotonic, allowing wall_to_monotonic to be made static
      as planned in Documentation/feature-removal-schedule.txt
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Tony Luck <tony.luck@intel.com>
      LKML-Reference: <1279068988-21864-7-git-send-email-johnstul@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7615856e
    • J
      time: Kill off CONFIG_GENERIC_TIME · 592913ec
      John Stultz 提交于
      Now that all arches have been converted over to use generic time via
      clocksources or arch_gettimeoffset(), we can remove the GENERIC_TIME
      config option and simplify the generic code.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      LKML-Reference: <1279068988-21864-4-git-send-email-johnstul@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      592913ec
    • J
      time: Implement timespec_add · ce3bf7ab
      John Stultz 提交于
      After accidentally misusing timespec_add_safe, I wanted to make sure
      we don't accidently trip over that issue again, so I created a simple
      timespec_add() function which we can use to replace the instances
      of timespec_add_safe() that don't want the overflow detection.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      LKML-Reference: <1279068988-21864-3-git-send-email-johnstul@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      ce3bf7ab
  2. 01 7月, 2010 1 次提交
    • P
      sched: Cure nr_iowait_cpu() users · 8c215bd3
      Peter Zijlstra 提交于
      Commit 0224cf4c (sched: Intoduce get_cpu_iowait_time_us())
      broke things by not making sure preemption was indeed disabled
      by the callers of nr_iowait_cpu() which took the iowait value of
      the current cpu.
      
      This resulted in a heap of preempt warnings. Cure this by making
      nr_iowait_cpu() take a cpu number and fix up the callers to pass
      in the right number.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Maxim Levitsky <maximlevitsky@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: linux-pm@lists.linux-foundation.org
      LKML-Reference: <1277968037.1868.120.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8c215bd3
  3. 18 6月, 2010 1 次提交
  4. 10 5月, 2010 7 次提交
  5. 13 4月, 2010 1 次提交
    • J
      time: Remove xtime_cache · 6a867a39
      John Stultz 提交于
      With the earlier logarithmic time accumulation patch, xtime will now
      always be within one "tick" of the current time, instead of possibly
      half a second off.
      
      This removes the need for the xtime_cache value, which always stored the
      time at the last interrupt, so this patch cleans that up removing the
      xtime_cache related code.
      
      This patch also addresses an issue with an earlier version of this change,
      where xtime_cache was normalizing xtime, which could in some cases be
      not valid (ie: tv_nsec == NSEC_PER_SEC). This is fixed by handling
      the edge case in update_wall_time().
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Petr Titěra <P.Titera@century.cz>
      LKML-Reference: <1270589451-30773-1-git-send-email-johnstul@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6a867a39
  6. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  7. 24 3月, 2010 1 次提交
  8. 23 3月, 2010 1 次提交
    • J
      time: Fix accumulation bug triggered by long delay. · 830ec045
      John Stultz 提交于
      The logarithmic accumulation done in the timekeeping has some overflow
      protection that limits the max shift value. That means it will take
      more then shift loops to accumulate all of the cycles. This causes
      the shift decrement to underflow, which causes the loop to never exit.
      
      The simplest fix would be simply to do a:
      	if (shift)
      		shift--;
      
      However that is not optimal, as we know the cycle offset is larger
      then the interval << shift, the above would make shift drop to zero,
      then we would be spinning for quite awhile accumulating at interval
      chunks at a time.
      
      Instead, this patch only decreases shift if the offset is smaller
      then cycle_interval << shift.  This makes sure we accumulate using
      the largest chunks possible without overflowing tick_length, and limits
      the number of iterations through the loop.
      
      This issue was found and reported by Sonic Zhang, who also tested the fix.
      Many thanks your explanation and testing!
      Reported-by: NSonic Zhang <sonic.adi@gmail.com>
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Tested-by: NSonic Zhang <sonic.adi@gmail.com>
      LKML-Reference: <1268948850-5225-1-git-send-email-johnstul@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      830ec045
  9. 13 3月, 2010 1 次提交
  10. 12 3月, 2010 1 次提交
    • M
      sched: Rate-limit nohz · 39c0cbe2
      Mike Galbraith 提交于
      Entering nohz code on every micro-idle is costing ~10% throughput for netperf
      TCP_RR when scheduling cross-cpu.  Rate limiting entry fixes this, but raises
      ticks a bit.  On my Q6600, an idle box goes from ~85 interrupts/sec to 128.
      
      The higher the context switch rate, the more nohz entry costs.  With this patch
      and some cycle recovery patches in my tree, max cross cpu context switch rate is
      improved by ~16%, a large portion of which of which is this ratelimiting.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1268301003.6785.28.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      39c0cbe2
  11. 02 3月, 2010 1 次提交
    • J
      timekeeping: Prevent oops when GENERIC_TIME=n · ad6759fb
      john stultz 提交于
      Aaro Koskinen reported an issue in kernel.org bugzilla #15366, where
      on non-GENERIC_TIME systems, accessing
      /sys/devices/system/clocksource/clocksource0/current_clocksource
      results in an oops.
      
      It seems the timekeeper/clocksource rework missed initializing the
      curr_clocksource value in the !GENERIC_TIME case.
      
      Thanks to Aaro for reporting and diagnosing the issue as well as
      testing the fix!
      Reported-by: NAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: stable@kernel.org
      LKML-Reference: <1267475683.4216.61.camel@localhost.localdomain>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      ad6759fb
  12. 10 2月, 2010 1 次提交
  13. 05 2月, 2010 2 次提交
  14. 29 1月, 2010 2 次提交
  15. 26 1月, 2010 1 次提交
    • T
      clocksource: Prevent potential kgdb dead lock · 7b7422a5
      Thomas Gleixner 提交于
      commit 0f8e8ef7 (clocksource: Simplify clocksource watchdog resume
      logic) introduced a potential kgdb dead lock. When the kernel is
      stopped by kgdb inside code which holds watchdog_lock then kgdb dead
      locks in clocksource_resume_watchdog().
      
      clocksource_resume_watchdog() is called from kbdg via
      clocksource_touch_watchdog() to avoid that the clock source watchdog
      marks TSC unstable after the kernel has been stopped.
      
      Solve this by replacing spin_lock with a spin_trylock and just return
      in case the lock is held. Not resetting the watchdog might result in
      TSC becoming marked unstable, but that's an acceptable penalty for
      using kgdb.
      
      The timekeeping is anyway easily screwed up by kgdb when the system
      uses either jiffies or a clock source which wraps in short intervals
      (e.g. pm_timer wraps about every 4.6s), so we really do not have to
      worry about that occasional TSC marked unstable side effect.
      
      The second caller of clocksource_resume_watchdog() is
      clocksource_resume(). The trylock is safe here as well because the
      system is UP at this point, interrupts are disabled and nothing else
      can hold watchdog_lock().
      Reported-by: NJason Wessel <jason.wessel@windriver.com>
      LKML-Reference: <1264480000-6997-4-git-send-email-jason.wessel@windriver.com>
      Cc: kgdb-bugreport@lists.sourceforge.net
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7b7422a5
  16. 18 1月, 2010 1 次提交
  17. 23 12月, 2009 1 次提交
    • L
      Revert "time: Remove xtime_cache" · 83f57a11
      Linus Torvalds 提交于
      This reverts commit 7bc7d637, as
      requested by John Stultz. Quoting John:
      
       "Petr Titěra reported an issue where he saw odd atime regressions with
        2.6.33 where there were a full second worth of nanoseconds in the
        nanoseconds field.
      
        He also reviewed the time code and narrowed down the problem: unhandled
        overflow of the nanosecond field caused by rounding up the
        sub-nanosecond accumulated time.
      
        Details:
      
         * At the end of update_wall_time(), we currently round up the
        sub-nanosecond portion of accumulated time when storing it into xtime.
        This was added to avoid time inconsistencies caused when the
        sub-nanosecond portion was truncated when storing into xtime.
        Unfortunately we don't handle the possible second overflow caused by
        that rounding.
      
         * Previously the xtime_cache code hid this overflow by normalizing the
        xtime value when storing into the xtime_cache.
      
         * We could try to handle the second overflow after the rounding up, but
        since this affects the timekeeping's internal state, this would further
        complicate the next accumulation cycle, causing small errors in ntp
        steering. As much as I'd like to get rid of it, the xtime_cache code is
        known to work.
      
         * The correct fix is really to include the sub-nanosecond portion in the
        timekeeping accessor function, so we don't need to round up at during
        accumulation. This would greatly simplify the accumulation code.
        Unfortunately, we can't do this safely until the last three
        non-GENERIC_TIME arches (sparc32, arm, cris) are converted  (those
        patches are in -mm) and we kill off the spots where arches set xtime
        directly. This is all 2.6.34 material, so I think reverting the
        xtime_cache change is the best approach for now.
      
        Many thanks to Petr for both reporting and finding the issue!"
      Reported-by: NPetr Titěra <P.Titera@century.cz>
      Requested-by: Njohn stultz <johnstul@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83f57a11
  18. 17 12月, 2009 1 次提交
  19. 16 12月, 2009 1 次提交
  20. 15 12月, 2009 3 次提交
  21. 11 12月, 2009 1 次提交
  22. 10 12月, 2009 1 次提交
    • T
      hrtimer: Tune hrtimer_interrupt hang logic · 41d2e494
      Thomas Gleixner 提交于
      The hrtimer_interrupt hang logic adjusts min_delta_ns based on the
      execution time of the hrtimer callbacks.
      
      This is error-prone for virtual machines, where a guest vcpu can be
      scheduled out during the execution of the callbacks (and the callbacks
      themselves can do operations that translate to blocking operations in
      the hypervisor), which in can lead to large min_delta_ns rendering the
      system unusable.
      
      Replace the current heuristics with something more reliable. Allow the
      interrupt code to try 3 times to catch up with the lost time. If that
      fails use the total time spent in the interrupt handler to defer the
      next timer interrupt so the system can catch up with other things
      which got delayed. Limit that deferment to 100ms.
      
      The retry events and the maximum time spent in the interrupt handler
      are recorded and exposed via /proc/timer_list
      
      Inspired by a patch from Marcelo.
      Reported-by: NMichael Tokarev <mjt@tls.msk.ru>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Cc: kvm@vger.kernel.org
      41d2e494
  23. 18 11月, 2009 1 次提交
  24. 17 11月, 2009 1 次提交
    • L
      timekeeping: Fix clock_gettime vsyscall time warp · 0696b711
      Lin Ming 提交于
      Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier
      to struct timekeeper" the clock multiplier of vsyscall is updated with
      the unmodified clock multiplier of the clock source and not with the
      NTP adjusted multiplier of the timekeeper.
      
      This causes user space observerable time warps:
      new CLOCK-warp maximum: 120 nsecs,  00000025c337c537 -> 00000025c337c4bf
      
      Add a new argument "mult" to update_vsyscall() and hand in the
      timekeeping internal NTP adjusted multiplier.
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Tony Luck <tony.luck@intel.com>
      LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      0696b711
  25. 14 11月, 2009 4 次提交
    • T
      clocksource/events: Fix fallout of generic code changes · a362c638
      Thomas Gleixner 提交于
      powerpc grew a new warning due to the type change of clockevent->mult.
      
      The architectures which use parts of the generic time keeping
      infrastructure tripped over my wrong assumption that
      clocksource_register is only used when GENERIC_TIME=y.
      
      I should have looked and also I should have known better. These
      renitent Gaul villages are racking my nerves. Some serious deprecating
      is due.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      a362c638
    • J
      nohz: Allow 32-bit machines to sleep for more than 2.15 seconds · 97813f2f
      Jon Hunter 提交于
      In the dynamic tick code, "max_delta_ns" (member of the
      "clock_event_device" structure) represents the maximum sleep time
      that can occur between timer events in nanoseconds.
      
      The variable, "max_delta_ns", is defined as an unsigned long
      which is a 32-bit integer for 32-bit machines and a 64-bit
      integer for 64-bit machines (if -m64 option is used for gcc).
      The value of max_delta_ns is set by calling the function
      "clockevent_delta2ns()" which returns a maximum value of LONG_MAX.
      For a 32-bit machine LONG_MAX is equal to 0x7fffffff and in
      nanoseconds this equates to ~2.15 seconds. Hence, the maximum
      sleep time for a 32-bit machine is ~2.15 seconds, where as for
      a 64-bit machine it will be many years.
      
      This patch changes the type of max_delta_ns to be "u64" instead of
      "unsigned long" so that this variable is a 64-bit type for both 32-bit
      and 64-bit machines. It also changes the maximum value returned by
      clockevent_delta2ns() to KTIME_MAX.  Hence this allows a 32-bit
      machine to sleep for longer than ~2.15 seconds. Please note that this
      patch also changes "min_delta_ns" to be "u64" too and although this is
      unnecessary, it makes the patch simpler as it avoids to fixup all
      callers of clockevent_delta2ns().
      
      [ tglx: changed "unsigned long long" to u64 as we use this data type
        	through out the time code ]
      Signed-off-by: NJon Hunter <jon-hunter@ti.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <1250617512-23567-3-git-send-email-jon-hunter@ti.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      97813f2f
    • T
      nohz: Track last do_timer() cpu · 27185016
      Thomas Gleixner 提交于
      The previous patch which limits the sleep time to the maximum
      deferment time of the time keeping clocksource has some limitations on
      SMP machines: if all CPUs are idle then for all CPUs the maximum sleep
      time is limited.
      
      Solve this by keeping track of which cpu had the do_timer() duty
      assigned last and limit the sleep time only for this cpu.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <new-submission>
      Cc: Jon Hunter <jon-hunter@ti.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      27185016
    • J
      nohz: Prevent clocksource wrapping during idle · 98962465
      Jon Hunter 提交于
      The dynamic tick allows the kernel to sleep for periods longer than a
      single tick, but it does not limit the sleep time currently. In the
      worst case the kernel could sleep longer than the wrap around time of
      the time keeping clock source which would result in losing track of
      time.
      
      Prevent this by limiting it to the safe maximum sleep time of the
      current time keeping clock source. The value is calculated when the
      clock source is registered.
      
      [ tglx: simplified the code a bit and massaged the commit msg ]
      Signed-off-by: NJon Hunter <jon-hunter@ti.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <1250617512-23567-2-git-send-email-jon-hunter@ti.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      98962465