1. 25 8月, 2009 1 次提交
  2. 22 8月, 2009 1 次提交
    • J
      time: Introduce CLOCK_REALTIME_COARSE · da15cfda
      john stultz 提交于
      After talking with some application writers who want very fast, but not
      fine-grained timestamps, I decided to try to implement new clock_ids
      to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE
      which returns the time at the last tick. This is very fast as we don't
      have to access any hardware (which can be very painful if you're using
      something like the acpi_pm clocksource), and we can even use the vdso
      clock_gettime() method to avoid the syscall. The only trade off is you
      only get low-res tick grained time resolution.
      
      This isn't a new idea, I know Ingo has a patch in the -rt tree that made
      the vsyscall gettimeofday() return coarse grained time when the
      vsyscall64 sysctrl was set to 2. However this affects all applications
      on a system.
      
      With this method, applications can choose the proper speed/granularity
      trade-off for themselves.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: nikolag@ca.ibm.com
      Cc: Darren Hart <dvhltc@us.ibm.com>
      Cc: arjan@infradead.org
      Cc: jonathan@jonmasters.org
      LKML-Reference: <1250734414.6897.5.camel@localhost.localdomain>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      da15cfda
  3. 19 8月, 2009 2 次提交
    • M
      clocksource: Avoid clocksource watchdog circular locking dependency · 01548f4d
      Martin Schwidefsky 提交于
      stop_machine from a multithreaded workqueue is not allowed because
      of a circular locking dependency between cpu_down and the workqueue
      execution. Use a kernel thread to do the clocksource downgrade.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: john stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090818170942.3ab80c91@skybase>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      01548f4d
    • T
      clocksource: Protect the watchdog rating changes with clocksource_mutex · d0981a1b
      Thomas Gleixner 提交于
      Martin pointed out that commit 6ea41d2529 (clocksource: Call
      clocksource_change_rating() outside of watchdog_lock) has a
      theoretical reference count problem. The calls to
      clocksource_change_rating() are now done outside of the clocksource
      mutex and outside of the watchdog lock. A concurrent
      clocksource_unregister() could remove the clock.
      
      Split out the code which changes the rating from
      clocksource_change_rating() into __clocksource_change_rating().
      
      Protect the clocksource_watchdog_work() code sequence with the
      clocksource_mutex() and call __clocksource_change_rating().
      
      LKML-Reference: <alpine.LFD.2.00.0908171038420.2782@localhost.localdomain>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      d0981a1b
  4. 15 8月, 2009 16 次提交
  5. 19 7月, 2009 1 次提交
  6. 10 7月, 2009 1 次提交
    • T
      hrtimer: Fix migration expiry check · 6ff7041d
      Thomas Gleixner 提交于
      The timer migration expiry check should prevent the migration of a
      timer to another CPU when the timer expires before the next event is
      scheduled on the other CPU. Migrating the timer might delay it because
      we can not reprogram the clock event device on the other CPU. But the
      code implementing that check has two flaws:
      
      - for !HIGHRES the check compares the expiry value with the clock
        events device expiry value which is wrong for CLOCK_REALTIME based
        timers.
      
      - the check is racy. It holds the hrtimer base lock of the target CPU,
        but the clock event device expiry value can be modified
        nevertheless, e.g. by an timer interrupt firing.
      
      The !HIGHRES case is easy to fix as we can enqueue the timer on the
      cpu which was selected by the load balancer. It runs the idle
      balancing code once per jiffy anyway. So the maximum delay for the
      timer is the same as when we keep the tick on the current cpu going.
      
      In the HIGHRES case we can get the next expiry value from the hrtimer
      cpu_base of the target CPU and serialize the update with the cpu_base
      lock. This moves the lock section in hrtimer_interrupt() so we can set
      next_event to KTIME_MAX while we are handling the expired timers and
      set it to the next expiry value after we handled the timers under the
      base lock. While the expired timers are processed timer migration is
      blocked because the expiry time of the timer is always <= KTIME_MAX.
      
      Also remove the now useless clockevents_get_next_event() function.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6ff7041d
  7. 07 7月, 2009 2 次提交
    • T
      timekeeping: Move ktime_get() functions to timekeeping.c · a40f262c
      Thomas Gleixner 提交于
      The ktime_get() functions for GENERIC_TIME=n are still located in
      hrtimer.c. Move them to time/timekeeping.c where they belong.
      
      LKML-Reference: <new-submission>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      a40f262c
    • M
      timekeeping: optimized ktime_get[_ts] for GENERIC_TIME=y · 951ed4d3
      Martin Schwidefsky 提交于
      The generic ktime_get function defined in kernel/hrtimer.c is suboptimial
      for GENERIC_TIME=y:
      
       0)               |  ktime_get() {
       0)               |    ktime_get_ts() {
       0)               |      getnstimeofday() {
       0)               |        read_tod_clock() {
       0)   0.601 us    |        }
       0)   1.938 us    |      }
       0)               |      set_normalized_timespec() {
       0)   0.602 us    |      }
       0)   4.375 us    |    }
       0)   5.523 us    |  }
      
      Overall there are two read_seqbegin/read_seqretry loops and a lot of
      unnecessary struct timespec calculations. ktime_get returns a nano second
      value which is the sum of xtime, wall_to_monotonic and the nano second
      delta from the clock source.
      
      ktime_get can be optimized for GENERIC_TIME=y. The new version only calls
      clocksource_read:
      
       0)               |  ktime_get() {
       0)               |    read_tod_clock() {
       0)   0.610 us    |    }
       0)   1.977 us    |  }
      
      It uses a single read_seqbegin/readseqretry loop and just adds everthing
      to a nano second value.
      
      ktime_get_ts is optimized in a similar fashion.
      
      [ tglx: added WARN_ON(timekeeping_suspended) as in getnstimeofday() ]
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Acked-by: Njohn stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090707112728.3005244d@skybase>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      951ed4d3
  8. 24 6月, 2009 1 次提交
    • H
      timer stats: Optimize by adding quick check to avoid function calls · 507e1231
      Heiko Carstens 提交于
      When the kernel is configured with CONFIG_TIMER_STATS but timer
      stats are runtime disabled we still get calls to
      __timer_stats_timer_set_start_info which initializes some
      fields in the corresponding struct timer_list.
      
      So add some quick checks in the the timer stats setup functions
      to avoid function calls to __timer_stats_timer_set_start_info
      when timer stats are disabled.
      
      In an artificial workload that does nothing but playing ping
      pong with a single tcp packet via loopback this decreases cpu
      consumption by 1 - 1.5%.
      
      This is part of a modified function trace output on SLES11:
      
       perl-2497  [00] 28630647177732388 [+  125]: sk_reset_timer <-tcp_v4_rcv
       perl-2497  [00] 28630647177732513 [+  125]: mod_timer <-sk_reset_timer
       perl-2497  [00] 28630647177732638 [+  125]: __timer_stats_timer_set_start_info <-mod_timer
       perl-2497  [00] 28630647177732763 [+  125]: __mod_timer <-mod_timer
       perl-2497  [00] 28630647177732888 [+  125]: __timer_stats_timer_set_start_info <-__mod_timer
       perl-2497  [00] 28630647177733013 [+   93]: lock_timer_base <-__mod_timer
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mustafa Mesanovic <mustafa.mesanovic@de.ibm.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      LKML-Reference: <20090623153811.GA4641@osiris.boeblingen.de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      507e1231
  9. 13 6月, 2009 1 次提交
  10. 11 6月, 2009 1 次提交
  11. 27 5月, 2009 1 次提交
  12. 15 5月, 2009 1 次提交
    • T
      sched, timers: move calc_load() to scheduler · dce48a84
      Thomas Gleixner 提交于
      Dimitri Sivanich noticed that xtime_lock is held write locked across
      calc_load() which iterates over all online CPUs. That can cause long
      latencies for xtime_lock readers on large SMP systems. 
      
      The load average calculation is an rough estimate anyway so there is
      no real need to protect the readers vs. the update. It's not a problem
      when the avenrun array is updated while a reader copies the values.
      
      Instead of iterating over all online CPUs let the scheduler_tick code
      update the number of active tasks shortly before the avenrun update
      happens. The avenrun update itself is handled by the CPU which calls
      do_timer().
      
      [ Impact: reduce xtime_lock write locked section ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      dce48a84
  13. 13 5月, 2009 2 次提交
    • A
      timers: Logic to move non pinned timers · eea08f32
      Arun R Bharadwaj 提交于
      * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]:
      
      This patch migrates all non pinned timers and hrtimers to the current
      idle load balancer, from all the idle CPUs. Timers firing on busy CPUs
      are not migrated.
      
      While migrating hrtimers, care should be taken to check if migrating
      a hrtimer would result in a latency or not. So we compare the expiry of the
      hrtimer with the next timer interrupt on the target cpu and migrate the
      hrtimer only if it expires *after* the next interrupt on the target cpu.
      So, added a clockevents_get_next_event() helper function to return the
      next_event on the target cpu's clock_event_device.
      
      [ tglx: cleanups and simplifications ]
      Signed-off-by: NArun R Bharadwaj <arun@linux.vnet.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      eea08f32
    • A
      timers: Identifying the existing pinned timers · 5c333864
      Arun R Bharadwaj 提交于
      * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]:
      
      The following pinned hrtimers have been identified and marked:
      1)sched_rt_period_timer
      2)tick_sched_timer
      3)stack_trace_timer_fn
      
      [ tglx: fixup the hrtimer pinned mode ]
      Signed-off-by: NArun R Bharadwaj <arun@linux.vnet.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5c333864
  14. 02 5月, 2009 5 次提交
    • M
      clockevent: export register_device and delta2ns · c81fc2c3
      Magnus Damm 提交于
      Export the following symbols using EXPORT_SYMBOL_GPL:
       - clockevent_delta2ns
       - clockevents_register_device
      
      This allows us to build SuperH clockevent and clocksource
      drivers as modules, see drivers/clocksource/sh_*.c
      
      [ Impact: allow modular build of clockevent drivers ]
      Signed-off-by: NMagnus Damm <damm@igel.co.jp>
      LKML-Reference: <20090501055247.8286.64067.sendpatchset@rx1.opensource.se>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c81fc2c3
    • J
      timekeeping: create arch_gettimeoffset infrastructure · 7d27558c
      john stultz 提交于
      Some arches don't supply their own clocksource. This is mainly the
      case in architectures that get their inter-tick times by reading the
      counter on their interval timer.  Since these timers wrap every tick,
      they're not really useful as clocksources.  Wrapping them to act like
      one is possible but not very efficient. So we provide a callout these
      arches can implement for use with the jiffies clocksource to provide
      finer then tick granular time.
      
      [ Impact: ease the migration to generic time keeping ]
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7d27558c
    • M
      clocksource: setup mult_orig in clocksource_enable() · a25cbd04
      Magnus Damm 提交于
      Setup clocksource mult_orig in clocksource_enable().
      
      Clocksource drivers can save power by using keeping the
      device clock disabled while the clocksource is unused.
      
      In practice this means that the enable() and disable()
      callbacks perform clk_enable() and clk_disable().
      
      The enable() callback may also use clk_get_rate() to get
      the clock rate from the clock framework. This information
      can then be used to calculate the shift and mult variables.
      
      Currently the mult_orig variable is setup from mult at
      registration time only. This is conflicting with the above
      case since the clock is disabled and the mult variable is
      not yet calculated at the time of registration.
      
      Moving the mult_orig setup code to clocksource_enable()
      allows us to both handle the common case with no enable()
      callback and the mult-changed-after-enable() case.
      
      [ Impact: allow dynamic clock source usage ]
      Signed-off-by: NMagnus Damm <damm@igel.co.jp>
      LKML-Reference: <20090501054546.8193.10688.sendpatchset@rx1.opensource.se>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      a25cbd04
    • D
      clockevents: tick_broadcast_device can become static · a52f5c56
      Dmitri Vorobiev 提交于
      The variable tick_broadcast_device is not used outside of the
      file where it is defined, so let's make it static.
      Signed-off-by: NDmitri Vorobiev <dmitri.vorobiev@movial.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      a52f5c56
    • J
      clockevents: prevent endless loop in tick_handle_periodic() · 74a03b69
      john stultz 提交于
      tick_handle_periodic() can lock up hard when a one shot clock event
      device is used in combination with jiffies clocksource.
      
      Avoid an endless loop issue by requiring that a highres valid
      clocksource be installed before we call tick_periodic() in a loop when
      using ONESHOT mode. The result is we will only increment jiffies once
      per interrupt until a continuous hardware clocksource is available.
      
      Without this, we can run into a endless loop, where each cycle through
      the loop, jiffies is updated which increments time by tick_period or
      more (due to clock steering), which can cause the event programming to
      think the next event was before the newly incremented time and fail
      causing tick_periodic() to be called again and the whole process loops
      forever.
      
      [ Impact: prevent hard lock up ]
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@kernel.org
      74a03b69
  15. 22 4月, 2009 2 次提交
  16. 27 2月, 2009 1 次提交
  17. 26 2月, 2009 1 次提交
    • I
      time: ntp: clean up second_overflow() · 39854fe8
      Ingo Molnar 提交于
      Impact: cleanup, no functionality changed
      
      The 'time_adj' local variable is named in a very confusing
      way because it almost shadows the 'time_adjust' global
      variable - which is used in this same function.
      
      Rename it to 'delta' - to make them stand apart more clearly.
      
      kernel/time/ntp.o:
      
         text	   data	    bss	    dec	    hex	filename
         2545	    114	    144	   2803	    af3	ntp.o.before
         2545	    114	    144	   2803	    af3	ntp.o.after
      
      md5:
         1bf0b3be564512279ba7cee299d1d2be  ntp.o.before.asm
         1bf0b3be564512279ba7cee299d1d2be  ntp.o.after.asm
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      39854fe8