1. 23 1月, 2014 1 次提交
    • P
      sched/clock: Fixup early initialization · d375b4e0
      Peter Zijlstra 提交于
      The code would assume sched_clock_stable() and switch to !stable
      later, this switch brings a discontinuity in time.
      
      The discontinuity on switching from stable to unstable was always
      present, but previously we would set stable/unstable before
      initializing TSC and usually stick to the one we start out with.
      
      So the static_key bits brought an extra switch where there previously
      wasn't one.
      
      Things are further complicated by the fact that we cannot use
      static_key as early as we usually call set_sched_clock_stable().
      
      Fix things by tracking the stable state in a regular variable and only
      set the static_key to the right state on sched_clock_init(), which is
      ran right after late_time_init->tsc_init().
      
      Before this we would not be using the TSC anyway.
      Reported-and-Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Reported-by: dyoung@redhat.com
      Fixes: 35af99e6 ("sched/clock, x86: Use a static_key for sched_clock_stable")
      Cc: jacob.jun.pan@linux.intel.com
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: hpa@zytor.com
      Cc: paulmck@linux.vnet.ibm.com
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: lenb@kernel.org
      Cc: rjw@rjwysocki.net
      Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
      Cc: rui.zhang@intel.com
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140122115918.GG3694@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d375b4e0
  2. 13 1月, 2014 3 次提交
    • P
      sched/clock: Fix up clear_sched_clock_stable() · 6577e42a
      Peter Zijlstra 提交于
      The below tells us the static_key conversion has a problem; since the
      exact point of clearing that flag isn't too important, delay the flip
      and use a workqueue to process it.
      
      [ ] TSC synchronization [CPU#0 -> CPU#22]:
      [ ] Measured 8 cycles TSC warp between CPUs, turning off TSC clock.
      [ ]
      [ ] ======================================================
      [ ] [ INFO: possible circular locking dependency detected ]
      [ ] 3.13.0-rc3-01745-g848b0d0322cb-dirty #637 Not tainted
      [ ] -------------------------------------------------------
      [ ] swapper/0/1 is trying to acquire lock:
      [ ]  (jump_label_mutex){+.+...}, at: [<ffffffff8115a637>] jump_label_lock+0x17/0x20
      [ ]
      [ ] but task is already holding lock:
      [ ]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109408b>] cpu_hotplug_begin+0x2b/0x60
      [ ]
      [ ] which lock already depends on the new lock.
      [ ]
      [ ]
      [ ] the existing dependency chain (in reverse order) is:
      [ ]
      [ ] -> #1 (cpu_hotplug.lock){+.+.+.}:
      [ ]        [<ffffffff810def00>] lock_acquire+0x90/0x130
      [ ]        [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0
      [ ]        [<ffffffff81093fdc>] get_online_cpus+0x3c/0x60
      [ ]        [<ffffffff8104cc67>] arch_jump_label_transform+0x37/0x130
      [ ]        [<ffffffff8115a3cf>] __jump_label_update+0x5f/0x80
      [ ]        [<ffffffff8115a48d>] jump_label_update+0x9d/0xb0
      [ ]        [<ffffffff8115aa6d>] static_key_slow_inc+0x9d/0xb0
      [ ]        [<ffffffff810c0f65>] sched_feat_set+0xf5/0x100
      [ ]        [<ffffffff810c5bdc>] set_numabalancing_state+0x2c/0x30
      [ ]        [<ffffffff81d12f3d>] numa_policy_init+0x1af/0x1b7
      [ ]        [<ffffffff81cebdf4>] start_kernel+0x35d/0x41f
      [ ]        [<ffffffff81ceb5a5>] x86_64_start_reservations+0x2a/0x2c
      [ ]        [<ffffffff81ceb6a2>] x86_64_start_kernel+0xfb/0xfe
      [ ]
      [ ] -> #0 (jump_label_mutex){+.+...}:
      [ ]        [<ffffffff810de141>] __lock_acquire+0x1701/0x1eb0
      [ ]        [<ffffffff810def00>] lock_acquire+0x90/0x130
      [ ]        [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0
      [ ]        [<ffffffff8115a637>] jump_label_lock+0x17/0x20
      [ ]        [<ffffffff8115aa3b>] static_key_slow_inc+0x6b/0xb0
      [ ]        [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20
      [ ]        [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70
      [ ]        [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150
      [ ]        [<ffffffff81076612>] native_cpu_up+0x3a2/0x890
      [ ]        [<ffffffff810941cb>] _cpu_up+0xdb/0x160
      [ ]        [<ffffffff810942c9>] cpu_up+0x79/0x90
      [ ]        [<ffffffff81d0af6b>] smp_init+0x60/0x8c
      [ ]        [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197
      [ ]        [<ffffffff8164e32e>] kernel_init+0xe/0x130
      [ ]        [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0
      [ ]
      [ ] other info that might help us debug this:
      [ ]
      [ ]  Possible unsafe locking scenario:
      [ ]
      [ ]        CPU0                    CPU1
      [ ]        ----                    ----
      [ ]   lock(cpu_hotplug.lock);
      [ ]                                lock(jump_label_mutex);
      [ ]                                lock(cpu_hotplug.lock);
      [ ]   lock(jump_label_mutex);
      [ ]
      [ ]  *** DEADLOCK ***
      [ ]
      [ ] 2 locks held by swapper/0/1:
      [ ]  #0:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81094037>] cpu_maps_update_begin+0x17/0x20
      [ ]  #1:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109408b>] cpu_hotplug_begin+0x2b/0x60
      [ ]
      [ ] stack backtrace:
      [ ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc3-01745-g848b0d0322cb-dirty #637
      [ ] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
      [ ]  ffffffff82c9c270 ffff880236843bb8 ffffffff8165c5f5 ffffffff82c9c270
      [ ]  ffff880236843bf8 ffffffff81658c02 ffff880236843c80 ffff8802368586a0
      [ ]  ffff880236858678 0000000000000001 0000000000000002 ffff880236858000
      [ ] Call Trace:
      [ ]  [<ffffffff8165c5f5>] dump_stack+0x4e/0x7a
      [ ]  [<ffffffff81658c02>] print_circular_bug+0x1f9/0x207
      [ ]  [<ffffffff810de141>] __lock_acquire+0x1701/0x1eb0
      [ ]  [<ffffffff816680ff>] ? __atomic_notifier_call_chain+0x8f/0xb0
      [ ]  [<ffffffff810def00>] lock_acquire+0x90/0x130
      [ ]  [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20
      [ ]  [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20
      [ ]  [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0
      [ ]  [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20
      [ ]  [<ffffffff8115a637>] jump_label_lock+0x17/0x20
      [ ]  [<ffffffff8115aa3b>] static_key_slow_inc+0x6b/0xb0
      [ ]  [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20
      [ ]  [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70
      [ ]  [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150
      [ ]  [<ffffffff81076612>] native_cpu_up+0x3a2/0x890
      [ ]  [<ffffffff810941cb>] _cpu_up+0xdb/0x160
      [ ]  [<ffffffff810942c9>] cpu_up+0x79/0x90
      [ ]  [<ffffffff81d0af6b>] smp_init+0x60/0x8c
      [ ]  [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197
      [ ]  [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
      [ ]  [<ffffffff8164e32e>] kernel_init+0xe/0x130
      [ ]  [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0
      [ ]  [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
      [ ] ------------[ cut here ]------------
      [ ] WARNING: CPU: 0 PID: 1 at /usr/src/linux-2.6/kernel/smp.c:374 smp_call_function_many+0xad/0x300()
      [ ] Modules linked in:
      [ ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc3-01745-g848b0d0322cb-dirty #637
      [ ] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
      [ ]  0000000000000009 ffff880236843be0 ffffffff8165c5f5 0000000000000000
      [ ]  ffff880236843c18 ffffffff81093d8c 0000000000000000 0000000000000000
      [ ]  ffffffff81ccd1a0 ffffffff810ca951 0000000000000000 ffff880236843c28
      [ ] Call Trace:
      [ ]  [<ffffffff8165c5f5>] dump_stack+0x4e/0x7a
      [ ]  [<ffffffff81093d8c>] warn_slowpath_common+0x8c/0xc0
      [ ]  [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0
      [ ]  [<ffffffff81093dda>] warn_slowpath_null+0x1a/0x20
      [ ]  [<ffffffff8110b72d>] smp_call_function_many+0xad/0x300
      [ ]  [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30
      [ ]  [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30
      [ ]  [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0
      [ ]  [<ffffffff8110ba96>] smp_call_function+0x46/0x80
      [ ]  [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30
      [ ]  [<ffffffff8110bb3c>] on_each_cpu+0x3c/0xa0
      [ ]  [<ffffffff810ca950>] ? sched_clock_idle_sleep_event+0x20/0x20
      [ ]  [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0
      [ ]  [<ffffffff8104f964>] text_poke_bp+0x64/0xd0
      [ ]  [<ffffffff810ca950>] ? sched_clock_idle_sleep_event+0x20/0x20
      [ ]  [<ffffffff8104ccde>] arch_jump_label_transform+0xae/0x130
      [ ]  [<ffffffff8115a3cf>] __jump_label_update+0x5f/0x80
      [ ]  [<ffffffff8115a48d>] jump_label_update+0x9d/0xb0
      [ ]  [<ffffffff8115aa6d>] static_key_slow_inc+0x9d/0xb0
      [ ]  [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20
      [ ]  [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70
      [ ]  [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150
      [ ]  [<ffffffff81076612>] native_cpu_up+0x3a2/0x890
      [ ]  [<ffffffff810941cb>] _cpu_up+0xdb/0x160
      [ ]  [<ffffffff810942c9>] cpu_up+0x79/0x90
      [ ]  [<ffffffff81d0af6b>] smp_init+0x60/0x8c
      [ ]  [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197
      [ ]  [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
      [ ]  [<ffffffff8164e32e>] kernel_init+0xe/0x130
      [ ]  [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0
      [ ]  [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
      [ ] ---[ end trace 6ff1df5620c49d26 ]---
      [ ] tsc: Marking TSC unstable due to check_tsc_sync_source failed
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-v55fgqj3nnyqnngmvuu8ep6h@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6577e42a
    • P
      sched/clock, x86: Use a static_key for sched_clock_stable · 35af99e6
      Peter Zijlstra 提交于
      In order to avoid the runtime condition and variable load turn
      sched_clock_stable into a static_key.
      
      Also provide a shorter implementation of local_clock() and
      cpu_clock(int) when sched_clock_stable==1.
      
                              MAINLINE   PRE       POST
      
          sched_clock_stable: 1          1         1
          (cold) sched_clock: 329841     221876    215295
          (cold) local_clock: 301773     234692    220773
          (warm) sched_clock: 38375      25602     25659
          (warm) local_clock: 100371     33265     27242
          (warm) rdtsc:       27340      24214     24208
          sched_clock_stable: 0          0         0
          (cold) sched_clock: 382634     235941    237019
          (cold) local_clock: 396890     297017    294819
          (warm) sched_clock: 38194      25233     25609
          (warm) local_clock: 143452     71234     71232
          (warm) rdtsc:       27345      24245     24243
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35af99e6
    • P
      sched/clock: Remove local_irq_disable() from the clocks · ef08f0ff
      Peter Zijlstra 提交于
      Now that x86 no longer requires IRQs disabled for sched_clock() and
      ia64 never had this requirement (it doesn't seem to do cpufreq at
      all), we can remove the requirement of disabling IRQs.
      
                              MAINLINE   PRE        POST
      
          sched_clock_stable: 1          1          1
          (cold) sched_clock: 329841     257223     221876
          (cold) local_clock: 301773     309889     234692
          (warm) sched_clock: 38375      25280      25602
          (warm) local_clock: 100371     85268      33265
          (warm) rdtsc:       27340      24247      24214
          sched_clock_stable: 0          0          0
          (cold) sched_clock: 382634     301224     235941
          (cold) local_clock: 396890     399870     297017
          (warm) sched_clock: 38194      25630      25233
          (warm) local_clock: 143452     129629     71234
          (warm) rdtsc:       27345      24307      24245
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-36e5kohiasnr106d077mgubp@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ef08f0ff
  3. 08 4月, 2013 1 次提交
    • T
      sched_clock: Prevent 64bit inatomicity on 32bit systems · a1cbcaa9
      Thomas Gleixner 提交于
      The sched_clock_remote() implementation has the following inatomicity
      problem on 32bit systems when accessing the remote scd->clock, which
      is a 64bit value.
      
      CPU0			CPU1
      
      sched_clock_local()	sched_clock_remote(CPU0)
      ...
      			remote_clock = scd[CPU0]->clock
      			    read_low32bit(scd[CPU0]->clock)
      cmpxchg64(scd->clock,...)
      			    read_high32bit(scd[CPU0]->clock)
      
      While the update of scd->clock is using an atomic64 mechanism, the
      readout on the remote cpu is not, which can cause completely bogus
      readouts.
      
      It is a quite rare problem, because it requires the update to hit the
      narrow race window between the low/high readout and the update must go
      across the 32bit boundary.
      
      The resulting misbehaviour is, that CPU1 will see the sched_clock on
      CPU1 ~4 seconds ahead of it's own and update CPU1s sched_clock value
      to this bogus timestamp. This stays that way due to the clamping
      implementation for about 4 seconds until the synchronization with
      CLOCK_MONOTONIC undoes the problem.
      
      The issue is hard to observe, because it might only result in a less
      accurate SCHED_OTHER timeslicing behaviour. To create observable
      damage on realtime scheduling classes, it is necessary that the bogus
      update of CPU1 sched_clock happens in the context of an realtime
      thread, which then gets charged 4 seconds of RT runtime, which results
      in the RT throttler mechanism to trigger and prevent scheduling of RT
      tasks for a little less than 4 seconds. So this is quite unlikely as
      well.
      
      The issue was quite hard to decode as the reproduction time is between
      2 days and 3 weeks and intrusive tracing makes it less likely, but the
      following trace recorded with trace_clock=global, which uses
      sched_clock_local(), gave the final hint:
      
        <idle>-0   0d..30 400269.477150: hrtimer_cancel: hrtimer=0xf7061e80
        <idle>-0   0d..30 400269.477151: hrtimer_start:  hrtimer=0xf7061e80 ...
      irq/20-S-587 1d..32 400273.772118: sched_wakeup:   comm= ... target_cpu=0
        <idle>-0   0dN.30 400273.772118: hrtimer_cancel: hrtimer=0xf7061e80
      
      What happens is that CPU0 goes idle and invokes
      sched_clock_idle_sleep_event() which invokes sched_clock_local() and
      CPU1 runs a remote wakeup for CPU0 at the same time, which invokes
      sched_remote_clock(). The time jump gets propagated to CPU0 via
      sched_remote_clock() and stays stale on both cores for ~4 seconds.
      
      There are only two other possibilities, which could cause a stale
      sched clock:
      
      1) ktime_get() which reads out CLOCK_MONOTONIC returns a sporadic
         wrong value.
      
      2) sched_clock() which reads the TSC returns a sporadic wrong value.
      
      #1 can be excluded because sched_clock would continue to increase for
         one jiffy and then go stale.
      
      #2 can be excluded because it would not make the clock jump
         forward. It would just result in a stale sched_clock for one jiffy.
      
      After quite some brain twisting and finding the same pattern on other
      traces, sched_clock_remote() remained the only place which could cause
      such a problem and as explained above it's indeed racy on 32bit
      systems.
      
      So while on 64bit systems the readout is atomic, we need to verify the
      remote readout on 32bit machines. We need to protect the local->clock
      readout in sched_clock_remote() on 32bit as well because an NMI could
      hit between the low and the high readout, call sched_clock_local() and
      modify local->clock.
      
      Thanks to Siegfried Wulsch for bearing with my debug requests and
      going through the tedious tasks of running a bunch of reproducer
      systems to generate the debug information which let me decode the
      issue.
      Reported-by: NSiegfried Wulsch <Siegfried.Wulsch@rovema.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304051544160.21884@ionosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      a1cbcaa9
  4. 17 11月, 2011 1 次提交
  5. 31 10月, 2011 1 次提交
  6. 23 11月, 2010 1 次提交
  7. 09 6月, 2010 1 次提交
  8. 15 4月, 2010 1 次提交
  9. 15 12月, 2009 1 次提交
    • D
      sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK · b9f8fcd5
      David Miller 提交于
      Relax stable-sched-clock architectures to not save/disable/restore
      hardirqs in cpu_clock().
      
      The background is that I was trying to resolve a sparc64 perf
      issue when I discovered this problem.
      
      On sparc64 I implement pseudo NMIs by simply running the kernel
      at IRQ level 14 when local_irq_disable() is called, this allows
      performance counter events to still come in at IRQ level 15.
      
      This doesn't work if any code in an NMI handler does
      local_irq_save() or local_irq_disable() since the "disable" will
      kick us back to cpu IRQ level 14 thus letting NMIs back in and
      we recurse.
      
      The only path which that does that in the perf event IRQ
      handling path is the code supporting frequency based events.  It
      uses cpu_clock().
      
      cpu_clock() simply invokes sched_clock() with IRQs disabled.
      
      And that's a fundamental bug all on it's own, particularly for
      the HAVE_UNSTABLE_SCHED_CLOCK case.  NMIs can thus get into the
      sched_clock() code interrupting the local IRQ disable code
      sections of it.
      
      Furthermore, for the not-HAVE_UNSTABLE_SCHED_CLOCK case, the IRQ
      disabling done by cpu_clock() is just pure overhead and
      completely unnecessary.
      
      So the core problem is that sched_clock() is not NMI safe, but
      we are invoking it from NMI contexts in the perf events code
      (via cpu_clock()).
      
      A less important issue is the overhead of IRQ disabling when it
      isn't necessary in cpu_clock().
      
      CONFIG_HAVE_UNSTABLE_SCHED_CLOCK architectures are not
      affected by this patch.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <20091213.182502.215092085.davem@davemloft.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b9f8fcd5
  10. 01 10月, 2009 1 次提交
  11. 19 9月, 2009 1 次提交
    • P
      sched_clock: Make it NMI safe · def0a9b2
      Peter Zijlstra 提交于
      Arjan complained about the suckyness of TSC on modern machines, and
      asked if we could do something about that for PERF_SAMPLE_TIME.
      
      Make cpu_clock() NMI safe by removing the spinlock and using
      cmpxchg. This also makes it smaller and more robust.
      
      Affects architectures that use HAVE_UNSTABLE_SCHED_CLOCK, i.e. IA64
      and x86.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      def0a9b2
  12. 09 5月, 2009 1 次提交
  13. 27 2月, 2009 3 次提交
  14. 31 12月, 2008 1 次提交
    • T
      sched_clock: prevent scd->clock from moving backwards, take #2 · 1c5745aa
      Thomas Gleixner 提交于
      Redo:
      
        5b7dba4f: sched_clock: prevent scd->clock from moving backwards
      
      which had to be reverted due to s2ram hangs:
      
        ca7e716c: Revert "sched_clock: prevent scd->clock from moving backwards"
      
      ... this time with resume restoring GTOD later in the sequence
      taken into account as well.
      
      The "timekeeping_suspended" flag is not very nice but we cannot call into
      GTOD before it has been properly resumed and the scheduler will run very
      early in the resume sequence.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1c5745aa
  15. 15 12月, 2008 1 次提交
  16. 10 10月, 2008 1 次提交
  17. 25 8月, 2008 1 次提交
    • P
      sched_clock: fix cpu_clock() · 354879bb
      Peter Zijlstra 提交于
      This patch fixes 3 issues:
      
      a) it removes the dependency on jiffies, because jiffies are incremented
         by a single CPU, and the tick is not synchronized between CPUs. Therefore
         relying on it to calculate a window to clip whacky TSC values doesn't work
         as it can drift around.
      
         So instead use [GTOD, GTOD+TICK_NSEC) as the window.
      
      b) __update_sched_clock() did (roughly speaking):
      
         delta = sched_clock() - scd->tick_raw;
         clock += delta;
      
         Which gives exponential growth, instead of linear.
      
      c) allows the sched_clock_cpu() value to warp the u64 without breaking.
      
      the results are more reliable sched_clock() deltas:
      
                 before       after   sched_clock
      
      cpu_clock: 15750        51312   51488
      cpu_clock: 59719        51052   50947
      cpu_clock: 15879        51249   51061
      cpu_clock: 1            50933   51198
      cpu_clock: 1            50931   51039
      cpu_clock: 1            51093   50981
      cpu_clock: 1            51043   51040
      cpu_clock: 1            50959   50938
      cpu_clock: 1            50981   51011
      cpu_clock: 1            51364   51212
      cpu_clock: 1            51219   51273
      cpu_clock: 1            51389   51048
      cpu_clock: 1            51285   51611
      cpu_clock: 1            50964   51137
      cpu_clock: 1            50973   50968
      cpu_clock: 1            50967   50972
      cpu_clock: 1            58910   58485
      cpu_clock: 1            51082   51025
      cpu_clock: 1            50957   50958
      cpu_clock: 1            50958   50957
      cpu_clock: 1006128      51128   50971
      cpu_clock: 1            51107   51155
      cpu_clock: 1            51371   51081
      cpu_clock: 1            51104   51365
      cpu_clock: 1            51363   51309
      cpu_clock: 1            51107   51160
      cpu_clock: 1            51139   51100
      cpu_clock: 1            51216   51136
      cpu_clock: 1            51207   51215
      cpu_clock: 1            51087   51263
      cpu_clock: 1            51249   51177
      cpu_clock: 1            51519   51412
      cpu_clock: 1            51416   51255
      cpu_clock: 1            51591   51594
      cpu_clock: 1            50966   51374
      cpu_clock: 1            50966   50966
      cpu_clock: 1            51291   50948
      cpu_clock: 1            50973   50867
      cpu_clock: 1            50970   50970
      cpu_clock: 998306       50970   50971
      cpu_clock: 1            50971   50970
      cpu_clock: 1            50970   50970
      cpu_clock: 1            50971   50971
      cpu_clock: 1            50970   50970
      cpu_clock: 1            51351   50970
      cpu_clock: 1            50970   51352
      cpu_clock: 1            50971   50970
      cpu_clock: 1            50970   50970
      cpu_clock: 1            51321   50971
      cpu_clock: 1            50974   51324
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      354879bb
  18. 11 8月, 2008 1 次提交
  19. 31 7月, 2008 5 次提交
  20. 28 7月, 2008 1 次提交
  21. 11 7月, 2008 7 次提交
    • S
      sched_clock: and multiplier for TSC to gtod drift · c300ba25
      Steven Rostedt 提交于
      The sched_clock code currently tries to keep all CPU clocks of all CPUS
      somewhat in sync. At every clock tick it records the gtod clock and
      uses that and jiffies and the TSC to calculate a CPU clock that tries to
      stay in sync with all the other CPUs.
      
      ftrace depends heavily on this timer and it detects when this timer
      "jumps".  One problem is that the TSC and the gtod also drift.
      When the TSC is 0.1% faster or slower than the gtod it is very noticeable
      in ftrace. To help compensate for this, I've added a multiplier that
      tries to keep the CPU clock updating at the same rate as the gtod.
      
      I've tried various ways to get it to be in sync and this ended up being
      the most reliable. At every scheduler tick we calculate the new multiplier:
      
        multi = delta_gtod / delta_TSC
      
      This means we perform a 64 bit divide at the tick (once a HZ). A shift
      is used to handle the accuracy.
      
      Other methods that failed due to dynamic HZ are:
      
      (not used)  multi += (gtod - tsc) / delta_gtod
      (not used)  multi += (gtod - (last_tsc + delta_tsc)) / delta_gtod
      
      as well as other variants.
      
      This code still allows for a slight drift between TSC and gtod, but
      it keeps the damage down to a minimum.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c300ba25
    • S
      sched_clock: record TSC after gtod · a83bc47c
      Steven Rostedt 提交于
      To read the gtod we need to grab the xtime lock for read. Reading the gtod
      before the TSC can cause a bigger gab if the xtime lock is contended.
      
      This patch simply reverses the order to read the TSC after the gtod.
      The locking in the reading of the gtod handles any barriers one might
      think is needed.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a83bc47c
    • S
      sched_clock: only update deltas with local reads. · c0c87734
      Steven Rostedt 提交于
      Reading the CPU clock should try to stay accurate within the CPU.
      By reading the CPU clock from another CPU and updating the deltas can
      cause unneeded jumps when reading from the local CPU.
      
      This patch changes the code to update the last read TSC only when read
      from the local CPU.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c0c87734
    • S
      sched_clock: fix calculation of other CPU · 2b8a0cf4
      Steven Rostedt 提交于
      The algorithm to calculate the 'now' of another CPU is not correct.
      At each scheduler tick, each CPU records the last sched_clock and
      gtod (tick_raw and tick_gtod respectively). If the TSC is somewhat the
      same in speed between two clocks the algorithm would be:
      
        tick_gtod1 + (now1 - tick_raw1) = tick_gtod2 + (now2 - tick_raw2)
      
      To calculate now2 we would have:
      
        now2 = (tick_gtod1 - tick_gtod2) + (tick_raw2 - tick_raw1) + now1
      
      Currently the algorithm is:
      
        now2 = (tick_gtod1 - tick_gtod2) + (tick_raw1 - tick_raw2) + now1
      
      This solves most of the rest of the issues I've had with timestamps in
      ftace.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2b8a0cf4
    • S
      sched_clock: stop maximum check on NO HZ · af52a90a
      Steven Rostedt 提交于
      Working with ftrace I would get large jumps of 11 millisecs or more with
      the clock tracer. This killed the latencing timings of ftrace and also
      caused the irqoff self tests to fail.
      
      What was happening is with NO_HZ the idle would stop the jiffy counter and
      before the jiffy counter was updated the sched_clock would have a bad
      delta jiffies to compare with the gtod with the maximum.
      
      The jiffies would stop and the last sched_tick would record the last gtod.
      On wakeup, the sched clock update would compare the gtod + delta jiffies
      (which would be zero) and compare it to the TSC. The TSC would have
      correctly (with a stable TSC) moved forward several jiffies. But because the
      jiffies has not been updated yet the clock would be prevented from moving
      forward because it would appear that the TSC jumped too far ahead.
      
      The clock would then virtually stop, until the jiffies are updated. Then
      the next sched clock update would see that the clock was very much behind
      since the delta jiffies is now correct. This would then jump the clock
      forward by several jiffies.
      
      This caused ftrace to report several milliseconds of interrupts off
      latency at every resume from NO_HZ idle.
      
      This patch adds hooks into the nohz code to disable the checking of the
      maximum clock update when nohz is in effect. It resumes the max check
      when nohz has updated the jiffies again.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      af52a90a
    • S
      sched_clock: widen the max and min time · f7cce27f
      Steven Rostedt 提交于
      With keeping the max and min sched time within one jiffy of the gtod clock
      was too tight. Just before a schedule tick the max could easily be hit, as
      well as just after a schedule_tick the min could be hit. This caused the
      clock to jump around by a jiffy.
      
      This patch widens the minimum to
         last gtod + (delta_jiffies ? delta_jiffies - 1 : 0) * TICK_NSECS
      
      and the maximum to
          last gtod + (2 + delta_jiffies) * TICK_NSECS
      
      This keeps the minum to gtod or if one jiffy less than delta jiffies
      and the maxim 2 jiffies ahead of gtod. This may cause unstable TSCs to be
      a bit more sporadic, but it helps keep a clock with a stable TSC working well.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f7cce27f
    • S
      sched_clock: record from last tick · 62c43dd9
      Steven Rostedt 提交于
      The sched_clock code tries to keep within the gtod time by one tick (jiffy).
      The current code mistakenly keeps track of the delta jiffies between
      updates of the clock, where the the delta is used to compare with the
      number of jiffies that have past since an update of the gtod. The gtod is
      updated at each schedule tick not each sched_clock update. After one
      jiffy passes the clock is updated fine. But the delta is taken from the
      last update so if the next update happens before the next tick the delta
      jiffies used will be incorrect.
      
      This patch changes the code to check the delta of jiffies between ticks
      and not updates to match the comparison of the updates with the gtod.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      62c43dd9
  22. 29 6月, 2008 1 次提交
  23. 27 6月, 2008 2 次提交
  24. 29 5月, 2008 1 次提交
  25. 06 5月, 2008 1 次提交
    • P
      sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK · 3e51f33f
      Peter Zijlstra 提交于
      this replaces the rq->clock stuff (and possibly cpu_clock()).
      
       - architectures that have an 'imperfect' hardware clock can set
         CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
      
       - the 'jiffie' window might be superfulous when we update tick_gtod
         before the __update_sched_clock() call in sched_clock_tick()
      
       - cpu_clock() might be implemented as:
      
           sched_clock_cpu(smp_processor_id())
      
         if the accuracy proves good enough - how far can TSC drift in a
         single jiffie when considering the filtering and idle hooks?
      
      [ mingo@elte.hu: various fixes and cleanups ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e51f33f