1. 05 2月, 2010 1 次提交
  2. 29 1月, 2010 2 次提交
  3. 18 1月, 2010 1 次提交
  4. 23 12月, 2009 1 次提交
    • L
      Revert "time: Remove xtime_cache" · 83f57a11
      Linus Torvalds 提交于
      This reverts commit 7bc7d637, as
      requested by John Stultz. Quoting John:
      
       "Petr Titěra reported an issue where he saw odd atime regressions with
        2.6.33 where there were a full second worth of nanoseconds in the
        nanoseconds field.
      
        He also reviewed the time code and narrowed down the problem: unhandled
        overflow of the nanosecond field caused by rounding up the
        sub-nanosecond accumulated time.
      
        Details:
      
         * At the end of update_wall_time(), we currently round up the
        sub-nanosecond portion of accumulated time when storing it into xtime.
        This was added to avoid time inconsistencies caused when the
        sub-nanosecond portion was truncated when storing into xtime.
        Unfortunately we don't handle the possible second overflow caused by
        that rounding.
      
         * Previously the xtime_cache code hid this overflow by normalizing the
        xtime value when storing into the xtime_cache.
      
         * We could try to handle the second overflow after the rounding up, but
        since this affects the timekeeping's internal state, this would further
        complicate the next accumulation cycle, causing small errors in ntp
        steering. As much as I'd like to get rid of it, the xtime_cache code is
        known to work.
      
         * The correct fix is really to include the sub-nanosecond portion in the
        timekeeping accessor function, so we don't need to round up at during
        accumulation. This would greatly simplify the accumulation code.
        Unfortunately, we can't do this safely until the last three
        non-GENERIC_TIME arches (sparc32, arm, cris) are converted  (those
        patches are in -mm) and we kill off the spots where arches set xtime
        directly. This is all 2.6.34 material, so I think reverting the
        xtime_cache change is the best approach for now.
      
        Many thanks to Petr for both reporting and finding the issue!"
      Reported-by: NPetr Titěra <P.Titera@century.cz>
      Requested-by: Njohn stultz <johnstul@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83f57a11
  5. 17 12月, 2009 1 次提交
  6. 16 12月, 2009 1 次提交
  7. 15 12月, 2009 3 次提交
  8. 11 12月, 2009 1 次提交
  9. 10 12月, 2009 1 次提交
    • T
      hrtimer: Tune hrtimer_interrupt hang logic · 41d2e494
      Thomas Gleixner 提交于
      The hrtimer_interrupt hang logic adjusts min_delta_ns based on the
      execution time of the hrtimer callbacks.
      
      This is error-prone for virtual machines, where a guest vcpu can be
      scheduled out during the execution of the callbacks (and the callbacks
      themselves can do operations that translate to blocking operations in
      the hypervisor), which in can lead to large min_delta_ns rendering the
      system unusable.
      
      Replace the current heuristics with something more reliable. Allow the
      interrupt code to try 3 times to catch up with the lost time. If that
      fails use the total time spent in the interrupt handler to defer the
      next timer interrupt so the system can catch up with other things
      which got delayed. Limit that deferment to 100ms.
      
      The retry events and the maximum time spent in the interrupt handler
      are recorded and exposed via /proc/timer_list
      
      Inspired by a patch from Marcelo.
      Reported-by: NMichael Tokarev <mjt@tls.msk.ru>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Cc: kvm@vger.kernel.org
      41d2e494
  10. 18 11月, 2009 1 次提交
  11. 17 11月, 2009 1 次提交
    • L
      timekeeping: Fix clock_gettime vsyscall time warp · 0696b711
      Lin Ming 提交于
      Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier
      to struct timekeeper" the clock multiplier of vsyscall is updated with
      the unmodified clock multiplier of the clock source and not with the
      NTP adjusted multiplier of the timekeeper.
      
      This causes user space observerable time warps:
      new CLOCK-warp maximum: 120 nsecs,  00000025c337c537 -> 00000025c337c4bf
      
      Add a new argument "mult" to update_vsyscall() and hand in the
      timekeeping internal NTP adjusted multiplier.
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Tony Luck <tony.luck@intel.com>
      LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      0696b711
  12. 14 11月, 2009 7 次提交
    • T
      clocksource/events: Fix fallout of generic code changes · a362c638
      Thomas Gleixner 提交于
      powerpc grew a new warning due to the type change of clockevent->mult.
      
      The architectures which use parts of the generic time keeping
      infrastructure tripped over my wrong assumption that
      clocksource_register is only used when GENERIC_TIME=y.
      
      I should have looked and also I should have known better. These
      renitent Gaul villages are racking my nerves. Some serious deprecating
      is due.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      a362c638
    • J
      nohz: Allow 32-bit machines to sleep for more than 2.15 seconds · 97813f2f
      Jon Hunter 提交于
      In the dynamic tick code, "max_delta_ns" (member of the
      "clock_event_device" structure) represents the maximum sleep time
      that can occur between timer events in nanoseconds.
      
      The variable, "max_delta_ns", is defined as an unsigned long
      which is a 32-bit integer for 32-bit machines and a 64-bit
      integer for 64-bit machines (if -m64 option is used for gcc).
      The value of max_delta_ns is set by calling the function
      "clockevent_delta2ns()" which returns a maximum value of LONG_MAX.
      For a 32-bit machine LONG_MAX is equal to 0x7fffffff and in
      nanoseconds this equates to ~2.15 seconds. Hence, the maximum
      sleep time for a 32-bit machine is ~2.15 seconds, where as for
      a 64-bit machine it will be many years.
      
      This patch changes the type of max_delta_ns to be "u64" instead of
      "unsigned long" so that this variable is a 64-bit type for both 32-bit
      and 64-bit machines. It also changes the maximum value returned by
      clockevent_delta2ns() to KTIME_MAX.  Hence this allows a 32-bit
      machine to sleep for longer than ~2.15 seconds. Please note that this
      patch also changes "min_delta_ns" to be "u64" too and although this is
      unnecessary, it makes the patch simpler as it avoids to fixup all
      callers of clockevent_delta2ns().
      
      [ tglx: changed "unsigned long long" to u64 as we use this data type
        	through out the time code ]
      Signed-off-by: NJon Hunter <jon-hunter@ti.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <1250617512-23567-3-git-send-email-jon-hunter@ti.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      97813f2f
    • T
      nohz: Track last do_timer() cpu · 27185016
      Thomas Gleixner 提交于
      The previous patch which limits the sleep time to the maximum
      deferment time of the time keeping clocksource has some limitations on
      SMP machines: if all CPUs are idle then for all CPUs the maximum sleep
      time is limited.
      
      Solve this by keeping track of which cpu had the do_timer() duty
      assigned last and limit the sleep time only for this cpu.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <new-submission>
      Cc: Jon Hunter <jon-hunter@ti.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      27185016
    • J
      nohz: Prevent clocksource wrapping during idle · 98962465
      Jon Hunter 提交于
      The dynamic tick allows the kernel to sleep for periods longer than a
      single tick, but it does not limit the sleep time currently. In the
      worst case the kernel could sleep longer than the wrap around time of
      the time keeping clock source which would result in losing track of
      time.
      
      Prevent this by limiting it to the safe maximum sleep time of the
      current time keeping clock source. The value is calculated when the
      clock source is registered.
      
      [ tglx: simplified the code a bit and massaged the commit msg ]
      Signed-off-by: NJon Hunter <jon-hunter@ti.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <1250617512-23567-2-git-send-email-jon-hunter@ti.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      98962465
    • T
      nohz: Type cast printk argument · 529eaccd
      Thomas Gleixner 提交于
      On some archs local_softirq_pending() has a data type of unsigned long
      on others its unsigned int. Type cast it to (unsigned int) in the
      printk to avoid the compiler warning.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <new-submission>
      529eaccd
    • T
      clocksource: Provide a generic mult/shift factor calculation · 7d2f944a
      Thomas Gleixner 提交于
      MIPS has two functions to calculcate the mult/shift factors for clock
      sources and clock events at run time. ARM needs such functions as
      well.
      
      Implement a function which calculates the mult/shift factors based on
      the frequencies to which and from which is converted. The function
      also has a parameter to specify the minimum conversion range in
      seconds. This range is guaranteed not to produce a 64bit overflow when
      a value is multiplied with the calculated mult factor. The larger the
      conversion range the less becomes the conversion accuracy.
      
      Provide two inline wrappers which handle clock events and clock
      sources. For clock events the "from" frequency is nano seconds per
      second which corresponds to 1GHz and "to" is the device frequency. For
      clock sources "from" is the device frequency and "to" is nano seconds
      per second.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMikael Pettersson <mikpe@it.uu.se>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NLinus Walleij <linus.walleij@stericsson.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <20091111134229.766673305@linutronix.de>
      7d2f944a
    • T
      clockevents: Use u32 for mult and shift factors · 23af368e
      Thomas Gleixner 提交于
      The mult and shift factors of clock events differ in their data type
      from those of clock sources for no reason. u32 is sufficient for
      both. shift is always <= 32 and mult is limited to 2^32-1 to avoid
      64bit multiplication overflows in the conversion.
      
      Preparatory patch for a generic mult/shift factor calculation
      function.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMikael Pettersson <mikpe@it.uu.se>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NLinus Walleij <linus.walleij@stericsson.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <20091111134229.725664788@linutronix.de>
      23af368e
  13. 12 11月, 2009 1 次提交
  14. 09 11月, 2009 1 次提交
  15. 05 11月, 2009 2 次提交
    • M
      nohz: Introduce arch_needs_cpu · 3c5d92a0
      Martin Schwidefsky 提交于
      Allow the architecture to request a normal jiffy tick when the system
      goes idle and tick_nohz_stop_sched_tick is called . On s390 the hook is
      used to prevent the system going fully idle if there has been an
      interrupt other than a clock comparator interrupt since the last wakeup.
      
      On s390 the HiperSockets response time for 1 connection ping-pong goes
      down from 42 to 34 microseconds. The CPU cost decreases by 27%.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      LKML-Reference: <20090929122533.402715150@de.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      3c5d92a0
    • M
      nohz: Reuse ktime in sub-functions of tick_check_idle. · eed3b9cf
      Martin Schwidefsky 提交于
      On a system with NOHZ=y tick_check_idle calls tick_nohz_stop_idle and
      tick_nohz_update_jiffies. Given the right conditions (ts->idle_active
      and/or ts->tick_stopped) both function get a time stamp with ktime_get.
      The same time stamp can be reused if both function require one.
      
      On s390 this change has the additional benefit that gcc inlines the
      tick_nohz_stop_idle function into tick_check_idle. The number of
      instructions to execute tick_check_idle drops from 225 to 144
      (without the ktime_get optimization it is 367 vs 215 instructions).
      
      before:
      
       0)               |  tick_check_idle() {
       0)               |    tick_nohz_stop_idle() {
       0)               |      ktime_get() {
       0)               |        read_tod_clock() {
       0)   0.601 us    |        }
       0)   1.765 us    |      }
       0)   3.047 us    |    }
       0)               |    ktime_get() {
       0)               |      read_tod_clock() {
       0)   0.570 us    |      }
       0)   1.727 us    |    }
       0)               |    tick_do_update_jiffies64() {
       0)   0.609 us    |    }
       0)   8.055 us    |  }
      
      after:
      
       0)               |  tick_check_idle() {
       0)               |    ktime_get() {
       0)               |      read_tod_clock() {
       0)   0.617 us    |      }
       0)   1.773 us    |    }
       0)               |    tick_do_update_jiffies64() {
       0)   0.593 us    |    }
       0)   4.477 us    |  }
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: john stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090929122533.206589318@de.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      eed3b9cf
  16. 29 10月, 2009 1 次提交
    • T
      percpu: make percpu symbols under kernel/ and mm/ unique · 1871e52c
      Tejun Heo 提交于
      This patch updates percpu related symbols under kernel/ and mm/ such
      that percpu symbols are unique and don't clash with local symbols.
      This serves two purposes of decreasing the possibility of global
      percpu symbol collision and allowing dropping per_cpu__ prefix from
      percpu symbols.
      
      * kernel/lockdep.c: s/lock_stats/cpu_lock_stats/
      
      * kernel/sched.c: s/init_rq_rt/init_rt_rq_var/	(any better idea?)
        		  s/sched_group_cpus/sched_groups/
      
      * kernel/softirq.c: s/ksoftirqd/run_ksoftirqd/a
      
      * kernel/softlockup.c: s/(*)_timestamp/softlockup_\1_ts/
        		       s/watchdog_task/softlockup_watchdog/
      		       s/timestamp/ts/ for local variables
      
      * kernel/time/timer_stats: s/lookup_lock/tstats_lookup_lock/
      
      * mm/slab.c: s/reap_work/slab_reap_work/
        	     s/reap_node/slab_reap_node/
      
      * mm/vmstat.c: local variable changed to avoid collision with vmstat_work
      
      Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
      which cause name clashes" patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: N(slab/vmstat) Christoph Lameter <cl@linux-foundation.org>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      1871e52c
  17. 12 10月, 2009 1 次提交
  18. 07 10月, 2009 1 次提交
    • E
      NOHZ: update idle state also when NOHZ is inactive · fdc6f192
      Eero Nurkkala 提交于
      Commit f2e21c96 had unfortunate side
      effects with cpufreq governors on some systems.
      
      If the system did not switch into NOHZ mode ts->inidle is not set when
      tick_nohz_stop_sched_tick() is called from the idle routine. Therefor
      all subsequent calls from irq_exit() to tick_nohz_stop_sched_tick()
      fail to call tick_nohz_start_idle(). This results in bogus idle
      accounting information which is passed to cpufreq governors.
      
      Set the inidle flag unconditionally of the NOHZ active state to keep
      the idle time accounting correct in any case.
      
      [ tglx: Added comment and tweaked the changelog ]
      Reported-by: NSteven Noonan <steven@uplinklabs.net>
      Signed-off-by: NEero Nurkkala <ext-eero.nurkkala@nokia.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Cc: stable@kernel.org
      LKML-Reference: <1254907901.30157.93.camel@eenurkka-desktop>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      fdc6f192
  19. 05 10月, 2009 2 次提交
    • J
      time: Remove xtime_cache · 7bc7d637
      john stultz 提交于
      With the prior logarithmic time accumulation patch, xtime will now
      always be within one "tick" of the current time, instead of
      possibly half a second off.
      
      This removes the need for the xtime_cache value, which always
      stored the time at the last interrupt, so this patch cleans that up
      removing the xtime_cache related code.
      
      This is a bit simpler, but still could use some wider testing.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJohn Kacur <jkacur@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <1254525855.7741.95.camel@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7bc7d637
    • J
      time: Implement logarithmic time accumulation · a092ff0f
      john stultz 提交于
      Accumulating one tick at a time works well unless we're using NOHZ.
      Then it can be an issue, since we may have to run through the loop
      a few thousand times, which can increase timer interrupt caused
      latency.
      
      The current solution was to accumulate in half-second intervals
      with NOHZ. This kept the number of loops down, however it did
      slightly change how we make NTP adjustments. While not an issue
      with NTPd users, as NTPd makes adjustments over a longer period of
      time, other adjtimex() users have noticed the half-second
      granularity with which we can apply frequency changes to the clock.
      
      For instance, if a application tries to apply a 100ppm frequency
      correction for 20ms to correct a 2us offset, with NOHZ they either
      get no correction, or a 50us correction.
      
      Now, there will always be some granularity error for applying
      frequency corrections. However with users sensitive to this error
      have seen a 50-500x increase with NOHZ compared to running without
      NOHZ.
      
      So I figured I'd try another approach then just simply increasing
      the interval. My approach is to consume the time interval
      logarithmically. This reduces the number of times through the loop
      needed keeping latency down, while still preserving the original
      granularity error for adjtimex() changes.
      
      Further, this change allows us to remove the xtime_cache code
      (patch to follow), as xtime is always within one tick of the
      current time, instead of the half-second updates it saw before.
      
      An earlier version of this patch has been shipping to x86 users in
      the RedHat MRG releases for awhile without issue, but I've reworked
      this version to be even more careful about avoiding possible
      overflows if the shift value gets too large.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJohn Kacur <jkacur@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <1254525473.7741.88.camel@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a092ff0f
  20. 02 10月, 2009 1 次提交
  21. 25 9月, 2009 1 次提交
    • M
      clocksource: Resume clocksource without taking the clocksource mutex · 89133f93
      Martin Schwidefsky 提交于
      git commit 75c5158f converted the clocksource spinlock to a
      mutex. This causes the following BUG:
      
      BUG: sleeping function called from invalid context at
      kernel/mutex.c:280 in_atomic(): 0, irqs_disabled(): 1, pid: 2473,
      name: pm-suspend 2 locks held by pm-suspend/2473:
       #0:  (&buffer->mutex){......}, at: [<ffffffff8115ab13>]
      sysfs_write_file+0x3c/0x137
       #1:  (pm_mutex){......}, at: [<ffffffff810865b5>]
      enter_state+0x39/0x130 Pid: 2473, comm: pm-suspend Not tainted 2.6.31
      #1 Call Trace:
       [<ffffffff810792f0>] ? __debug_show_held_locks+0x22/0x24
       [<ffffffff8104a2ef>] __might_sleep+0x107/0x10b
       [<ffffffff8141fca9>] mutex_lock_nested+0x25/0x43
       [<ffffffff81073537>] clocksource_resume+0x1c/0x60
       [<ffffffff81072902>] timekeeping_resume+0x1e/0x1c8
       [<ffffffff812aee62>] __sysdev_resume+0x25/0xcf
       [<ffffffff812aef79>] sysdev_resume+0x6d/0xae
       [<ffffffff810864f8>] suspend_devices_and_enter+0x12b/0x1af
       [<ffffffff8108665b>] enter_state+0xdf/0x130
       [<ffffffff81085dc3>] state_store+0xb6/0xd3
       [<ffffffff81204c73>] kobj_attr_store+0x17/0x19
       [<ffffffff8115abd2>] sysfs_write_file+0xfb/0x137
       [<ffffffff811057d2>] vfs_write+0xae/0x10b
       [<ffffffff81208392>] ? __up_read+0x1a/0x7f
       [<ffffffff811058ef>] sys_write+0x4a/0x6e
       [<ffffffff81011b82>] system_call_fastpath+0x16/0x1b
      
      clocksource_resume is called early in the resume process, there is
      only one cpu, no processes are running and the interrupts are
      disabled. It is therefore possible to resume the clocksources
      without taking the clocksource mutex.
      Reported-by: NXiaotian Feng <xtfeng@gmail.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Tested-by: NMichal Schmidt <mschmidt@redhat.com>
      Cc: Xiaotian Feng <xtfeng@gmail.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090924172952.49697825@mschwide.boeblingen.de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      89133f93
  22. 24 9月, 2009 1 次提交
    • Z
      time: add function to convert between calendar time and broken-down time for universal use · 57f1f087
      Zhaolei 提交于
      There are many similar code in kernel for one object: convert time between
      calendar time and broken-down time.
      
      Here is some source I found:
        fs/ncpfs/dir.c
        fs/smbfs/proc.c
        fs/fat/misc.c
        fs/udf/udftime.c
        fs/cifs/netmisc.c
        net/netfilter/xt_time.c
        drivers/scsi/ips.c
        drivers/input/misc/hp_sdc_rtc.c
        drivers/rtc/rtc-lib.c
        arch/ia64/hp/sim/boot/fw-emu.c
        arch/m68k/mac/misc.c
        arch/powerpc/kernel/time.c
        arch/parisc/include/asm/rtc.h
        ...
      
      We can make a common function for this type of conversion, At least we
      can get following benefit:
      
      1: Make kernel simple and unify
      2: Easy to fix bug in converting code
      3: Reduce clone of code in future
         For example, I'm trying to make ftrace display walltime,
         this patch will make me easy.
      
      This code is based on code from glibc-2.6
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57f1f087
  23. 15 9月, 2009 2 次提交
  24. 12 9月, 2009 1 次提交
    • M
      clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash · f79e0258
      Martin Schwidefsky 提交于
      The watchdog timer is started after the watchdog clocksource
      and at least one watched clocksource have been registered. The
      clocksource work element watchdog_work is initialized just
      before the clocksource timer is started. This is too late for
      the clocksource_mark_unstable call from native_cpu_up. To fix
      this use a static initializer for watchdog_work.
      
      This resolves a boot crash reported by multiple people.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090911153305.3fe9a361@skybase>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f79e0258
  25. 29 8月, 2009 1 次提交
    • T
      clocksource: Resolve cpu hotplug dead lock with TSC unstable · 7285dd7f
      Thomas Gleixner 提交于
      Martin Schwidefsky analyzed it:
      To register a clocksource the clocksource_mutex is acquired and if
      necessary timekeeping_notify is called to install the clocksource as
      the timekeeper clock. timekeeping_notify uses stop_machine which needs
      to take cpu_add_remove_lock mutex.
      Starting a new cpu is done with the cpu_add_remove_lock mutex held.
      native_cpu_up checks the tsc of the new cpu and if the tsc is no good
      clocksource_change_rating is called. Which needs the clocksource_mutex
      and the deadlock is complete.
      
      The solution is to replace the TSC via the clocksource watchdog
      mechanism. Mark the TSC as unstable and schedule the watchdog work so
      it gets removed in the watchdog thread context.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <new-submission>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      7285dd7f
  26. 25 8月, 2009 1 次提交
  27. 22 8月, 2009 1 次提交
    • J
      time: Introduce CLOCK_REALTIME_COARSE · da15cfda
      john stultz 提交于
      After talking with some application writers who want very fast, but not
      fine-grained timestamps, I decided to try to implement new clock_ids
      to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE
      which returns the time at the last tick. This is very fast as we don't
      have to access any hardware (which can be very painful if you're using
      something like the acpi_pm clocksource), and we can even use the vdso
      clock_gettime() method to avoid the syscall. The only trade off is you
      only get low-res tick grained time resolution.
      
      This isn't a new idea, I know Ingo has a patch in the -rt tree that made
      the vsyscall gettimeofday() return coarse grained time when the
      vsyscall64 sysctrl was set to 2. However this affects all applications
      on a system.
      
      With this method, applications can choose the proper speed/granularity
      trade-off for themselves.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: nikolag@ca.ibm.com
      Cc: Darren Hart <dvhltc@us.ibm.com>
      Cc: arjan@infradead.org
      Cc: jonathan@jonmasters.org
      LKML-Reference: <1250734414.6897.5.camel@localhost.localdomain>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      da15cfda
  28. 20 8月, 2009 1 次提交
    • S
      clockevent: Prevent dead lock on clockevents_lock · f833bab8
      Suresh Siddha 提交于
      Currently clockevents_notify() is called with interrupts enabled at
      some places and interrupts disabled at some other places.
      
      This results in a deadlock in this scenario.
      
      cpu A holds clockevents_lock in clockevents_notify() with irqs enabled
      cpu B waits for clockevents_lock in clockevents_notify() with irqs disabled
      cpu C doing set_mtrr() which will try to rendezvous of all the cpus.
      
      This will result in C and A come to the rendezvous point and waiting
      for B. B is stuck forever waiting for the spinlock and thus not
      reaching the rendezvous point.
      
      Fix the clockevents code so that clockevents_lock is taken with
      interrupts disabled and thus avoid the above deadlock.
      
      Also call lapic_timer_propagate_broadcast() on the destination cpu so
      that we avoid calling smp_call_function() in the clockevents notifier
      chain.
      
      This issue left us wondering if we need to change the MTRR rendezvous
      logic to use stop machine logic (instead of smp_call_function) or add
      a check in spinlock debug code to see if there are other spinlocks
      which gets taken under both interrupts enabled/disabled conditions.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
      Cc: "Brown Len" <len.brown@intel.com>
      LKML-Reference: <1250544899.2709.210.camel@sbs-t61.sc.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f833bab8