1. 16 3月, 2013 1 次提交
    • F
      timekeeping: utilize the suspend-nonstop clocksource to count suspended time · e445cf1c
      Feng Tang 提交于
      There are some new processors whose TSC clocksource won't stop during
      suspend. Currently, after system resumes, kernel will use persistent
      clock or RTC to compensate the sleep time, but with these nonstop
      clocksources, we could skip the special compensation from external
      sources, and just use current clocksource for time recounting.
      
      This can solve some time drift bugs caused by some not-so-accurate or
      error-prone RTC devices.
      
      The current way to count suspended time is first try to use the persistent
      clock, and then try the RTC if persistent clock can't be used. This
      patch will change the trying order to:
      	suspend-nonstop clocksource -> persistent clock -> RTC
      
      When counting the sleep time with nonstop clocksource, use an accurate way
      suggested by Jason Gunthorpe to cover very large delta cycles.
      Signed-off-by: NFeng Tang <feng.tang@intel.com>
      [jstultz: Small optimization, avoiding re-reading the clocksource]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      e445cf1c
  2. 22 2月, 2013 2 次提交
    • T
      Revert "nohz: Make tick_nohz_irq_exit() irq safe" · af7bdbaf
      Thomas Gleixner 提交于
      This reverts commit 351429b2e62b6545bb10c756686393f29ba268a1. The
      extra local_irq_save() is not longer needed as the call site now
      always calls with interrupts disabled.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linuxfoundation.org>
      af7bdbaf
    • F
      nohz: Make tick_nohz_irq_exit() irq safe · e5ab012c
      Frederic Weisbecker 提交于
      As it stands, irq_exit() may or may not be called with
      irqs disabled, depending on __ARCH_IRQ_EXIT_IRQS_DISABLED
      that the arch can define.
      
      It makes tick_nohz_irq_exit() unsafe. For example two
      interrupts can race in tick_nohz_stop_sched_tick(): the inner
      most one computes the expiring time on top of the timer list,
      then it's interrupted right before reprogramming the
      clock. The new interrupt enqueues a new timer list timer,
      it reprogram the clock to take it into account and it exits.
      The CPUs resumes the inner most interrupt and performs the clock
      reprogramming without considering the new timer list timer.
      
      This regression has been introduced by:
           280f0677
           ("nohz: Separate out irq exit and idle loop dyntick logic")
      
      Let's fix it right now with the appropriate protections.
      
      A saner long term solution will be to remove
      __ARCH_IRQ_EXIT_IRQS_DISABLED and mandate that irq_exit() is called
      with interrupts disabled.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linuxfoundation.org>
      Cc: <stable@vger.kernel.org> #v3.2+
      Link: http://lkml.kernel.org/r/1361373336-11337-1-git-send-email-fweisbec@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e5ab012c
  3. 19 2月, 2013 1 次提交
  4. 13 2月, 2013 1 次提交
    • M
      clockevents: Fix generic broadcast for FEAT_C3STOP · 5d1d9a29
      Mark Rutland 提交于
      Commit 12ad1000: "clockevents: Add generic timer broadcast function"
      made tick_device_uses_broadcast set up the generic broadcast function
      for dummy devices (where !tick_device_is_functional(dev)), but neglected
      to set up the broadcast function for devices that stop in low power
      states (with the CLOCK_EVT_FEAT_C3STOP flag).
      
      When these devices enter low power states they will not have the generic
      broadcast function assigned, and will bring down the system when an
      attempt is made to broadcast to them.
      
      This patch ensures that the broadcast function is also assigned for
      devices which require broadcast in low power states.
      Reported-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: nico@linaro.org
      Cc: Marc.Zyngier@arm.com
      Cc: Will.Deacon@arm.com
      Cc: santosh.shilimkar@ti.com
      Cc: john.stultz@linaro.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5d1d9a29
  5. 09 2月, 2013 1 次提交
    • P
      time, Fix setting of hardware clock in NTP code · 84e345e4
      Prarit Bhargava 提交于
      At init time, if the system time is "warped" forward in warp_clock()
      it will differ from the hardware clock by sys_tz.tz_minuteswest.  This time
      difference is not taken into account when ntp updates the hardware clock,
      and this causes the system time to jump forward by this offset every reboot.
      
      The kernel must take this offset into account when writing the system time
      to the hardware clock in the ntp code.  This patch adds
      persistent_clock_is_local which indicates that an offset has been applied
      in warp_clock() and accounts for the "warp" before writing the hardware
      clock.
      
      x86 does not have this problem as rtc writes are software limited to a
      +/-15 minute window relative to the current rtc time.  Other arches, such
      as powerpc, however do a full synchronization of the system time to the
      rtc and will see this problem.
      
      [v2]: generated against tip/timers/core
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      84e345e4
  6. 01 2月, 2013 2 次提交
  7. 30 1月, 2013 1 次提交
  8. 28 1月, 2013 1 次提交
    • F
      cputime: Allow dynamic switch between tick/virtual based cputime accounting · 3f4724ea
      Frederic Weisbecker 提交于
      Allow to dynamically switch between tick and virtual based
      cputime accounting. This way we can provide a kind of "on-demand"
      virtual based cputime accounting. In this mode, the kernel relies
      on the context tracking subsystem to dynamically probe on kernel
      boundaries.
      
      This is in preparation for being able to stop the timer tick in
      more places than just the idle state. Doing so will depend on
      CONFIG_VIRT_CPU_ACCOUNTING_GEN which makes it possible to account
      the cputime without the tick by hooking on kernel/user boundaries.
      
      Depending whether the tick is stopped or not, we can switch between
      tick and vtime based accounting anytime in order to minimize the
      overhead associated to user hooks.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      3f4724ea
  9. 17 1月, 2013 1 次提交
  10. 16 1月, 2013 4 次提交
    • F
      timekeeping: Add CONFIG_HAS_PERSISTENT_CLOCK option · 05ad717c
      Feng Tang 提交于
      Make the persistent clock check a kernel config option, so that some
      platform can explicitely select it, also make CONFIG_RTC_HCTOSYS and
      RTC_SYSTOHC depend on its non-existence, which could prevent the
      persistent clock and RTC code from doing similar thing twice during
      system's init/suspend/resume phases.
      
      If the CONFIG_HAS_PERSISTENT_CLOCK=n, then no change happens for kernel
      which still does the persistent clock check in timekeeping_init().
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Suggested-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NFeng Tang <feng.tang@intel.com>
      [jstultz: Added dependency for RTC_SYSTOHC as well]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      05ad717c
    • F
      timekeeping: Add persistent_clock_exist flag · 31ade306
      Feng Tang 提交于
      In current kernel, there are several places which need to check
      whether there is a persistent clock for the platform. Current check
      is done by calling the read_persistent_clock() and validating its
      return value.
      
      So one optimization is to do the check only once in timekeeping_init(),
      and use a flag persistent_clock_exist to record it.
      
      v2: Add a has_persistent_clock() helper function, as suggested by John.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Signed-off-by: NFeng Tang <feng.tang@intel.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      31ade306
    • J
      NTP: Add a CONFIG_RTC_SYSTOHC configuration · 023f333a
      Jason Gunthorpe 提交于
      The purpose of this option is to allow ARM/etc systems that rely on the
      class RTC subsystem to have the same kind of automatic NTP based
      synchronization that we have on PC platforms. Today ARM does not
      implement update_persistent_clock and makes extensive use of the class
      RTC system.
      
      When enabled CONFIG_RTC_SYSTOHC will provide a generic
      rtc_update_persistent_clock that stores the current time in the RTC and
      is intended complement the existing CONFIG_RTC_HCTOSYS option that loads
      the RTC at boot.
      
      Like with RTC_HCTOSYS the platform's update_persistent_clock is used
      first, if it works. Platforms with mixed class RTC and non-RTC drivers
      need to return ENODEV when class RTC should be used. Such an update for
      PPC is included in this patch.
      
      Long term, implementations of update_persistent_clock should migrate to
      proper class RTC drivers and use CONFIG_RTC_SYSTOHC instead.
      
      Tested on ARM kirkwood and PPC405
      Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      023f333a
    • K
      time: create __getnstimeofday for WARNless calls · 1e817fb6
      Kees Cook 提交于
      The pstore RAM backend can get called during resume, and must be defensive
      against a suspended time source. Expose getnstimeofday logic that returns
      an error instead of a WARN. This can be detected and the timestamp can
      be zeroed out.
      Reported-by: NDoug Anderson <dianders@chromium.org>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      1e817fb6
  11. 15 1月, 2013 1 次提交
  12. 25 12月, 2012 1 次提交
    • S
      time: convert arch_gettimeoffset to a pointer · 7b1f6207
      Stephen Warren 提交于
      Currently, whenever CONFIG_ARCH_USES_GETTIMEOFFSET is enabled, each
      arch core provides a single implementation of arch_gettimeoffset(). In
      many cases, different sub-architectures, different machines, or
      different timer providers exist, and so the arch ends up implementing
      arch_gettimeoffset() as a call-through-pointer anyway. Examples are
      ARM, Cris, M68K, and it's arguable that the remaining architectures,
      M32R and Blackfin, should be doing this anyway.
      
      Modify arch_gettimeoffset so that it itself is a function pointer, which
      the arch initializes. This will allow later changes to move the
      initialization of this function into individual machine support or timer
      drivers. This is particularly useful for code in drivers/clocksource
      which should rely on an arch-independant mechanism to register their
      implementation of arch_gettimeoffset().
      
      This patch also converts the Cris architecture to set arch_gettimeoffset
      directly to the final implementation in time_init(), because Cris already
      had separate time_init() functions per sub-architecture. M68K and ARM
      are converted to set arch_gettimeoffset to the final implementation in
      later patches, because they already have function pointers in place for
      this purpose.
      
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
      Acked-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NStephen Warren <swarren@nvidia.com>
      7b1f6207
  13. 28 11月, 2012 1 次提交
  14. 18 11月, 2012 3 次提交
    • F
      printk: Wake up klogd using irq_work · 74876a98
      Frederic Weisbecker 提交于
      klogd is woken up asynchronously from the tick in order
      to do it safely.
      
      However if printk is called when the tick is stopped, the reader
      won't be woken up until the next interrupt, which might not fire
      for a while. As a result, the user may miss some message.
      
      To fix this, lets implement the printk tick using a lazy irq work.
      This subsystem takes care of the timer tick state and can
      fix up accordingly.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      74876a98
    • F
      irq_work: Don't stop the tick with pending works · 00b42959
      Frederic Weisbecker 提交于
      Don't stop the tick if we have pending irq works on the
      queue, otherwise if the arch can't raise self-IPIs, we may not
      find an opportunity to execute the pending works for a while.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      00b42959
    • F
      nohz: Add API to check tick state · 33a5f626
      Frederic Weisbecker 提交于
      We need some quick way to check if the CPU has stopped
      its tick. This will be useful to implement the printk tick
      using the irq work subsystem.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      33a5f626
  15. 15 11月, 2012 1 次提交
    • Y
      cpuidle: Quickly notice prediction failure for repeat mode · 69a37bea
      Youquan Song 提交于
      The prediction for future is difficult and when the cpuidle governor prediction
      fails and govenor possibly choose the shallower C-state than it should. How to
      quickly notice and find the failure becomes important for power saving.
      
      cpuidle menu governor has a method to predict the repeat pattern if there are 8
      C-states residency which are continuous and the same or very close, so it will
      predict the next C-states residency will keep same residency time.
      
      There is a real case that turbostat utility (tools/power/x86/turbostat)
      at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
      Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
       governor will predict it is repeat mode and there is another IPI wake up idle
       CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally
      idle. However, in the turbostat, following 10 registers reading is sleep 5
      seconds by default, so the idle CPU will keep at C1 for a long time though it is
       idle until break event occurs.
      In a idle Sandybridge system, run "./turbostat -v", we will notice that deep
      C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
      deep C-state stays at >99.98%.
      
      In the patch, a timer is added when menu governor detects a repeat mode and
      choose a shallow C-state. The timer is set to a time out value that greater
      than predicted time, and we conclude repeat mode prediction failure if timer is
      triggered. When repeat mode happens as expected, the timer is not triggered
      and CPU waken up from C-states and it will cancel the timer initiatively.
      When repeat mode does not happen, the timer will be time out and menu governor
      will quickly notice that the repeat mode prediction fails and then re-evaluates
      deeper C-states possibility.
      
      Below is another case which will clearly show the patch much benefit:
      
      #include <stdlib.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <signal.h>
      #include <sys/time.h>
      #include <time.h>
      #include <pthread.h>
      
      volatile int * shutdown;
      volatile long * count;
      int delay = 20;
      int loop = 8;
      
      void usage(void)
      {
      	fprintf(stderr,
      		"Usage: idle_predict [options]\n"
      		"  --help	-h  Print this help\n"
      		"  --thread	-n  Thread number\n"
      		"  --loop     	-l  Loop times in shallow Cstate\n"
      		"  --delay	-t  Sleep time (uS)in shallow Cstate\n");
      }
      
      void *simple_loop() {
      	int idle_num = 1;
      	while (!(*shutdown)) {
      		*count = *count + 1;
      
      		if (idle_num % loop)
      			usleep(delay);
      		else {
      			/* sleep 1 second */
      			usleep(1000000);
      			idle_num = 0;
      		}
      		idle_num++;
      	}
      
      }
      
      static void sighand(int sig)
      {
      	*shutdown = 1;
      }
      
      int main(int argc, char *argv[])
      {
      	sigset_t sigset;
      	int signum = SIGALRM;
      	int i, c, er = 0, thread_num = 8;
      	pthread_t pt[1024];
      
      	static char optstr[] = "n:l:t:h:";
      
      	while ((c = getopt(argc, argv, optstr)) != EOF)
      		switch (c) {
      			case 'n':
      				thread_num = atoi(optarg);
      				break;
      			case 'l':
      				loop = atoi(optarg);
      				break;
      			case 't':
      				delay = atoi(optarg);
      				break;
      			case 'h':
      			default:
      				usage();
      				exit(1);
      		}
      
      	printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
      	count = malloc(sizeof(long));
      	shutdown = malloc(sizeof(int));
      	*count = 0;
      	*shutdown = 0;
      
      	sigemptyset(&sigset);
      	sigaddset(&sigset, signum);
      	sigprocmask (SIG_BLOCK, &sigset, NULL);
      	signal(SIGINT, sighand);
      	signal(SIGTERM, sighand);
      
      	for(i = 0; i < thread_num ; i++)
      		pthread_create(&pt[i], NULL, simple_loop, NULL);
      
      	for (i = 0; i < thread_num; i++)
      		pthread_join(pt[i], NULL);
      
      	exit(0);
      }
      
      Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
      After build the above test application, then run it.
      Test plaform can be Intel Sandybridge or other recent platforms.
      #./idle_predict -l 10 &
      #./powertop
      
      We will find that deep C-state will dangle between 40%~100% and much time spent
      on C1 state. It is because menu governor wrongly predict that repeat mode
      is kept, so it will choose the C1 shallow C-state even though it has chance to
      sleep 1 second in deep C-state.
      
      While after patched the kernel, we find that deep C-state will keep >99.6%.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      69a37bea
  16. 14 11月, 2012 2 次提交
  17. 01 11月, 2012 2 次提交
  18. 24 10月, 2012 1 次提交
  19. 16 10月, 2012 3 次提交
  20. 10 10月, 2012 1 次提交
  21. 05 10月, 2012 1 次提交
    • F
      nohz: Fix one jiffy count too far in idle cputime · 2b17c545
      Frederic Weisbecker 提交于
      When we stop the tick in idle, we save the current jiffies value
      in ts->idle_jiffies. This snapshot is substracted from the later
      value of jiffies when the tick is restarted and the resulting
      delta is accounted as idle cputime. This is how we handle the
      idle cputime accounting without the tick.
      
      But sometimes we need to schedule the next tick to some time in
      the future instead of completely stopping it. In this case, a
      tick may happen before we restart the periodic behaviour and
      from that tick we account one jiffy to idle cputime as usual but
      we also increment the ts->idle_jiffies snapshot by one so that
      when we compute the delta to account, we substract the one jiffy
      we just accounted.
      
      To prepare for stopping the tick outside idle, we introduced a
      check that prevents from fixing up that ts->idle_jiffies if we
      are not running the idle task. But we use idle_cpu() for that
      and this is a problem if we run the tick while another CPU
      remotely enqueues a ttwu to our runqueue:
      
      CPU 0:                            CPU 1:
      
      tick_sched_timer() {              ttwu_queue_remote()
             if (idle_cpu(CPU 0))
                 ts->idle_jiffies++;
      }
      
      Here, idle_cpu() notes that &rq->wake_list is not empty and
      hence won't consider the CPU as idle. As a result,
      ts->idle_jiffies won't be incremented. But this is wrong because
      we actually account the current jiffy to idle cputime. And that
      jiffy won't get substracted from the nohz time delta. So in the
      end, this jiffy is accounted twice.
      
      Fix this by changing idle_cpu(smp_processor_id()) with
      is_idle_task(current). This way the jiffy is substracted
      correctly even if a ttwu operation is enqueued on the CPU.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # 3.5+
      Link: http://lkml.kernel.org/r/1349308004-3482-1-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2b17c545
  22. 25 9月, 2012 8 次提交
    • J
      time: Only do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD systems · 92bb1fcf
      John Stultz 提交于
      We only do rounding to the next nanosecond so we don't see minor
      1ns inconsistencies in the vsyscall implementations. Since we're
      changing the vsyscall implementations to avoid this, conditionalize
      the rounding only to the GENERIC_TIME_VSYSCALL_OLD architectures.
      
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      92bb1fcf
    • J
      time: Introduce new GENERIC_TIME_VSYSCALL · 576094b7
      John Stultz 提交于
      Now that we moved everyone over to GENERIC_TIME_VSYSCALL_OLD,
      introduce the new declaration and config option for the new
      update_vsyscall method.
      
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      576094b7
    • J
      time: Convert CONFIG_GENERIC_TIME_VSYSCALL to CONFIG_GENERIC_TIME_VSYSCALL_OLD · 70639421
      John Stultz 提交于
      To help migrate archtectures over to the new update_vsyscall method,
      redfine CONFIG_GENERIC_TIME_VSYSCALL as CONFIG_GENERIC_TIME_VSYSCALL_OLD
      
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      70639421
    • J
      time: Move timekeeper structure to timekeeper_internal.h for vsyscall changes · d7b4202e
      John Stultz 提交于
      We're going to need to access the timekeeper in update_vsyscall,
      so make the structure available for those who need it.
      
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      d7b4202e
    • J
      jiffies: Remove compile time assumptions about CLOCK_TICK_RATE · b3c869d3
      John Stultz 提交于
      CLOCK_TICK_RATE is used to accurately caclulate exactly how
      a tick will be at a given HZ.
      
      This is useful, because while we'd expect NSEC_PER_SEC/HZ,
      the underlying hardware will have some granularity limit,
      so we won't be able to have exactly HZ ticks per second.
      
      This slight error can cause timekeeping quality problems
      when using the jiffies or other jiffies driven clocksources.
      Thus we currently use compile time CLOCK_TICK_RATE value to
      generate SHIFTED_HZ and NSEC_PER_JIFFIES, which we then use
      to adjust the jiffies clocksource to correct this error.
      
      Unfortunately though, since CLOCK_TICK_RATE is a compile
      time value, and the jiffies clocksource is registered very
      early during boot, there are a number of cases where there
      are different possible hardware timers that have different
      tick rates. This causes problems in cases like ARM where
      there are numerous different types of hardware, each having
      their own compile-time CLOCK_TICK_RATE, making it hard to
      accurately support different hardware with a single kernel.
      
      For the most part, this doesn't matter all that much, as not
      too many systems actually utilize the jiffies or jiffies driven
      clocksource. Usually there are other highres clocksources
      who's granularity error is negligable.
      
      Even so, we have some complicated calcualtions that we do
      everywhere to handle these edge cases.
      
      This patch removes the compile time SHIFTED_HZ value, and
      introduces a register_refined_jiffies() function. This results
      in the default jiffies clock as being assumed a perfect HZ
      freq, and allows archtectures that care about jiffies accuracy
      to call register_refined_jiffies() with the tick rate, specified
      dynamically at boot.
      
      This allows us, where necessary, to not have a compile time
      CLOCK_TICK_RATE constant, simplifies the jiffies code, and
      still provides a way to have an accurate jiffies clock.
      
      NOTE: Since this patch does not add register_refinied_jiffies()
      calls for every arch, it may cause time quality regressions
      in some cases. Its likely these will not be noticable, but
      if they are an issue, adding the following to the end of
      setup_arch() should resolve the regression:
      	register_refinied_jiffies(CLOCK_TICK_RATE)
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      b3c869d3
    • J
      alarmtimer: Rename alarmtimer_remove to alarmtimer_dequeue · a65bcc12
      John Stultz 提交于
      Now that alarmtimer_remove has been simplified, change
      its name to _dequeue to better match its paired _enqueue
      function.
      
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Colin Cross <ccross@android.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      a65bcc12
    • J
      alarmtimer: Use hrtimer per-alarm instead of per-base · dae373be
      John Stultz 提交于
      Arve Hjønnevåg reported numerous crashes from the
      "BUG_ON(timer->state != HRTIMER_STATE_CALLBACK)" check
      in __run_hrtimer after it called alarmtimer_fired.
      
      It ends up the alarmtimer code was not properly handling
      possible failures of hrtimer_try_to_cancel, and because
      these faulres occur when the underlying base hrtimer is
      being run, this limits the ability to properly handle
      modifications to any alarmtimers on that base.
      
      Because much of the logic duplicates the hrtimer logic,
      it seems that we might as well have a per-alarmtimer
      hrtimer, and avoid the extra complextity of trying to
      multiplex many alarmtimers off of one hrtimer.
      
      Thus this patch moves the hrtimer to the alarm structure
      and simplifies the management logic.
      
      Changelog:
      v2:
      * Includes a fix for double alarm_start calls found by
        Arve
      
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Colin Cross <ccross@android.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Reported-by: NArve Hjønnevåg <arve@android.com>
      Tested-by: NArve Hjønnevåg <arve@android.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      dae373be
    • T
      alarmtimer: Implement minimum alarm interval for allowing suspend · 59a93c27
      Todd Poynor 提交于
      alarmtimer suspend return -EBUSY if the next alarm will fire in less
      than 2 seconds.  This allows one RTC seconds tick to occur subsequent
      to this check before the alarm wakeup time is set, ensuring the wakeup
      time is still in the future (assuming the RTC does not tick one more
      second prior to setting the alarm).
      
      If suspend is rejected due to an imminent alarm, hold a wakeup source
      for 2 seconds to process the alarm prior to reattempting suspend.
      
      If setting the alarm incurs an -ETIME for an alarm set in the past,
      or any other problem setting the alarm, abort suspend and hold a
      wakelock for 1 second while the alarm is allowed to be serviced or
      other hopefully transient conditions preventing the alarm clear up.
      Signed-off-by: NTodd Poynor <toddpoynor@google.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      59a93c27