1. 20 7月, 2018 2 次提交
  2. 04 3月, 2018 1 次提交
    • I
      sched/headers: Simplify and clean up header usage in the scheduler · 325ea10c
      Ingo Molnar 提交于
      Do the following cleanups and simplifications:
      
       - sched/sched.h already includes <asm/paravirt.h>, so no need to
         include it in sched/core.c again.
      
       - order the <linux/sched/*.h> headers alphabetically
      
       - add all <linux/sched/*.h> headers to kernel/sched/sched.h
      
       - remove all unnecessary includes from the .c files that
         are already included in kernel/sched/sched.h.
      
      Finally, make all scheduler .c files use a single common header:
      
        #include "sched.h"
      
      ... which now contains a union of the relied upon headers.
      
      This makes the various .c files easier to read and easier to handle.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      325ea10c
  3. 03 3月, 2018 1 次提交
    • I
      sched: Clean up and harmonize the coding style of the scheduler code base · 97fb7a0a
      Ingo Molnar 提交于
      A good number of small style inconsistencies have accumulated
      in the scheduler core, so do a pass over them to harmonize
      all these details:
      
       - fix speling in comments,
      
       - use curly braces for multi-line statements,
      
       - remove unnecessary parentheses from integer literals,
      
       - capitalize consistently,
      
       - remove stray newlines,
      
       - add comments where necessary,
      
       - remove invalid/unnecessary comments,
      
       - align structure definitions and other data types vertically,
      
       - add missing newlines for increased readability,
      
       - fix vertical tabulation where it's misaligned,
      
       - harmonize preprocessor conditional block labeling
         and vertical alignment,
      
       - remove line-breaks where they uglify the code,
      
       - add newline after local variable definitions,
      
      No change in functionality:
      
        md5:
           1191fa0a890cfa8132156d2959d7e9e2  built-in.o.before.asm
           1191fa0a890cfa8132156d2959d7e9e2  built-in.o.after.asm
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      97fb7a0a
  4. 08 11月, 2017 1 次提交
  5. 24 5月, 2017 1 次提交
  6. 15 5月, 2017 7 次提交
    • P
      sched/clock: Print a warning recommending 'tsc=unstable' · 7708d5f0
      Peter Zijlstra 提交于
      With our switch to stable delayed until late_initcall(), the most
      likely cause of hitting mark_tsc_unstable() is the watchdog. The
      watchdog typically only triggers when creative BIOS'es fiddle with the
      TSC to hide SMI latency.
      
      Since the watchdog can only detect TSC fiddling after the fact all TSC
      clocks (including userspace GTOD) can already have reported funny
      values.
      
      The only way to fully avoid this, is manually marking the TSC unstable
      at boot. Suggest people do this on their broken systems.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7708d5f0
    • P
      sched/clock: Use late_initcall() instead of sched_init_smp() · 2e44b7dd
      Peter Zijlstra 提交于
      Core2 marks its TSC unstable in ACPI Processor Idle, which is probed
      after sched_init_smp(). Luckily it appears both acpi_processor and
      intel_idle (which has a similar check) are mandatory built-in.
      
      This means we can delay switching to stable until after these drivers
      have ran (if they were modules, this would be impossible).
      
      Delay the stable switch to late_initcall() to allow these drivers to
      mark TSC unstable and avoid difficult stable->unstable transitions.
      Reported-by: NLofstedt, Marta <marta.lofstedt@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2e44b7dd
    • P
      cpuidle: Fix idle time tracking · f9fccdb9
      Peter Zijlstra 提交于
      Ville reported that on his Core2, which has TSC stop in idle, we would
      always report very short idle durations. He tracked this down to
      commit:
      
        e93e59ce ("cpuidle: Replace ktime_get() with local_clock()")
      
      which replaces ktime_get() with local_clock().
      
      Add a sched_clock_idle_wakeup_event() call, which will re-sync the
      clock with ktime_get_ns() when TSC is unstable and no-op otherwise.
      Reported-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Tested-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Fixes: e93e59ce ("cpuidle: Replace ktime_get() with local_clock()")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f9fccdb9
    • P
      sched/clock: Remove watchdog touching · 3067a33d
      Peter Zijlstra 提交于
      Commit:
      
        2bacec8c ("sched: touch softlockup watchdog after idling")
      
      introduced the touch_softlockup_watchdog_sched() call without
      justification and I feel sched_clock management is not the right
      place, it should only be concerned with producing semi coherent time.
      
      If this causes watchdog thingies, we can find a better place.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3067a33d
    • P
      sched/clock: Remove unused argument to sched_clock_idle_wakeup_event() · ac1e843f
      Peter Zijlstra 提交于
      The argument to sched_clock_idle_wakeup_event() has not been used in a
      long time. Remove it.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ac1e843f
    • P
      x86/tsc, sched/clock, clocksource: Use clocksource watchdog to provide stable sync points · b421b22b
      Peter Zijlstra 提交于
      Currently we keep sched_clock_tick() active for stable TSC in order to
      keep the per-CPU state semi up-to-date. The (obvious) problem is that
      by the time we detect TSC is borked, our per-CPU state is also borked.
      
      So hook into the clocksource watchdog and call a method after we've
      found it to still be stable.
      
      There's the obvious race where the TSC goes wonky between finding it
      stable and us running the callback, but closing that is too much work
      and not really worth it, since we're already detecting TSC wobbles
      after the fact, so we cannot, per definition, fully avoid funny clock
      values.
      
      And since the watchdog runs less often than the tick, this is also an
      optimization.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b421b22b
    • P
      sched/clock: Initialize all per-CPU state before switching (back) to unstable · cf15ca8d
      Peter Zijlstra 提交于
      In preparation for not keeping the sched_clock_tick() active for
      stable TSC, we need to explicitly initialize all per-CPU state
      before switching back to unstable.
      
      Note: this patch looses the __gtod_offset calculation; it will be
      restored in the next one.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      cf15ca8d
  7. 27 3月, 2017 1 次提交
  8. 23 3月, 2017 2 次提交
  9. 02 3月, 2017 2 次提交
  10. 20 1月, 2017 1 次提交
    • P
      sched/clock: Fix hotplug crash · acb04058
      Peter Zijlstra 提交于
      Mike reported that he could trigger the WARN_ON_ONCE() in
      set_sched_clock_stable() using hotplug.
      
      This exposed a fundamental problem with the interface, we should never
      mark the TSC stable if we ever find it to be unstable. Therefore
      set_sched_clock_stable() is a broken interface.
      
      The reason it existed is that not having it is a pain, it means all
      relevant architecture code needs to call clear_sched_clock_stable()
      where appropriate.
      
      Of the three architectures that select HAVE_UNSTABLE_SCHED_CLOCK ia64
      and parisc are trivial in that they never called
      set_sched_clock_stable(), so add an unconditional call to
      clear_sched_clock_stable() to them.
      
      For x86 the story is a lot more involved, and what this patch tries to
      do is ensure we preserve the status quo. So even is Cyrix or Transmeta
      have usable TSC they never called set_sched_clock_stable() so they now
      get an explicit mark unstable.
      Reported-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 9881b024 ("sched/clock: Delay switching sched_clock to stable")
      Link: http://lkml.kernel.org/r/20170119133633.GB6536@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      acb04058
  11. 14 1月, 2017 3 次提交
    • P
      sched/clock: Provide better clock continuity · 5680d809
      Peter Zijlstra 提交于
      When switching between the unstable and stable variants it is
      currently possible that clock discontinuities occur.
      
      And while these will mostly be 'small', attempt to do better.
      
      As observed on my IVB-EP, the sched_clock() is ~1.5s ahead of the
      ktime_get_ns() based timeline at the point of switchover
      (sched_clock_init_late()) after SMP bringup.
      
      Equally, when the TSC is later found to be unstable -- typically
      because SMM tries to hide its SMI latencies by mucking with the TSC --
      we want to avoid large jumps.
      
      Since the clocksource watchdog reports the issue after the fact we
      cannot exactly fix up time, but since SMI latencies are typically
      small (~10ns range), the discontinuity is mainly due to drift between
      sched_clock() and ktime_get_ns() (which on my desktop is ~79s over
      24days).
      
      I dislike this patch because it adds overhead to the good case in
      favour of dealing with badness. But given the widespread failure of
      TSC stability this is worth it.
      
      Note that in case the TSC makes drastic jumps after SMP bringup we're
      still hosed. There's just not much we can do in that case without
      stupid overhead.
      
      If we were to somehow expose tsc_clocksource_reliable (which is hard
      because this code is also used on ia64 and parisc) we could avoid some
      of the newly introduced overhead.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5680d809
    • P
      sched/clock: Delay switching sched_clock to stable · 9881b024
      Peter Zijlstra 提交于
      Currently we switch to the stable sched_clock if we guess the TSC is
      usable, and then switch back to the unstable path if it turns out TSC
      isn't stable during SMP bringup after all.
      
      Delay switching to the stable path until after SMP bringup is
      complete. This way we'll avoid switching during the time we detect the
      worst of the TSC offences.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9881b024
    • P
      sched/clock: Update static_key usage · 555570d7
      Peter Zijlstra 提交于
      sched_clock was still using the deprecated static_key interface.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      555570d7
  12. 13 4月, 2016 2 次提交
  13. 02 3月, 2016 1 次提交
  14. 09 12月, 2015 1 次提交
    • T
      watchdog: introduce touch_softlockup_watchdog_sched() · 03e0d461
      Tejun Heo 提交于
      touch_softlockup_watchdog() is used to tell watchdog that scheduler
      stall is expected.  One group of usage is from paths where the task
      may not be able to yield for a long time such as performing slow PIO
      to finicky device and coming out of suspend.  The other is to account
      for scheduler and timer going idle.
      
      For scheduler softlockup detection, there's no reason to distinguish
      the two cases; however, workqueue lockup detector is planned and it
      can use the same signals from the former group while the latter would
      spuriously prevent detection.  This patch introduces a new function
      touch_softlockup_watchdog_sched() and convert the latter group to call
      it instead.  For now, it just calls touch_softlockup_watchdog() and
      there's no functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      03e0d461
  15. 23 11月, 2015 1 次提交
    • P
      treewide: Remove old email address · 90eec103
      Peter Zijlstra 提交于
      There were still a number of references to my old Red Hat email
      address in the kernel source. Remove these while keeping the
      Red Hat copyright notices intact.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      90eec103
  16. 13 2月, 2015 1 次提交
    • C
      kernel/sched/clock.c: add another clock for use with the soft lockup watchdog · 545a2bf7
      Cyril Bur 提交于
      When the hypervisor pauses a virtualised kernel the kernel will observe a
      jump in timebase, this can cause spurious messages from the softlockup
      detector.
      
      Whilst these messages are harmless, they are accompanied with a stack
      trace which causes undue concern and more problematically the stack trace
      in the guest has nothing to do with the observed problem and can only be
      misleading.
      
      Futhermore, on POWER8 this is completely avoidable with the introduction
      of the Virtual Time Base (VTB) register.
      
      This patch (of 2):
      
      This permits the use of arch specific clocks for which virtualised kernels
      can use their notion of 'running' time, not the elpased wall time which
      will include host execution time.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Andrew Jones <drjones@redhat.com>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: chai wen <chaiw.fnst@cn.fujitsu.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Ben Zhang <benzh@chromium.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      545a2bf7
  17. 27 8月, 2014 1 次提交
  18. 08 4月, 2014 1 次提交
  19. 11 3月, 2014 1 次提交
  20. 23 1月, 2014 1 次提交
    • P
      sched/clock: Fixup early initialization · d375b4e0
      Peter Zijlstra 提交于
      The code would assume sched_clock_stable() and switch to !stable
      later, this switch brings a discontinuity in time.
      
      The discontinuity on switching from stable to unstable was always
      present, but previously we would set stable/unstable before
      initializing TSC and usually stick to the one we start out with.
      
      So the static_key bits brought an extra switch where there previously
      wasn't one.
      
      Things are further complicated by the fact that we cannot use
      static_key as early as we usually call set_sched_clock_stable().
      
      Fix things by tracking the stable state in a regular variable and only
      set the static_key to the right state on sched_clock_init(), which is
      ran right after late_time_init->tsc_init().
      
      Before this we would not be using the TSC anyway.
      Reported-and-Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Reported-by: dyoung@redhat.com
      Fixes: 35af99e6 ("sched/clock, x86: Use a static_key for sched_clock_stable")
      Cc: jacob.jun.pan@linux.intel.com
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: hpa@zytor.com
      Cc: paulmck@linux.vnet.ibm.com
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: lenb@kernel.org
      Cc: rjw@rjwysocki.net
      Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
      Cc: rui.zhang@intel.com
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140122115918.GG3694@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d375b4e0
  21. 13 1月, 2014 3 次提交
    • P
      sched/clock: Fix up clear_sched_clock_stable() · 6577e42a
      Peter Zijlstra 提交于
      The below tells us the static_key conversion has a problem; since the
      exact point of clearing that flag isn't too important, delay the flip
      and use a workqueue to process it.
      
      [ ] TSC synchronization [CPU#0 -> CPU#22]:
      [ ] Measured 8 cycles TSC warp between CPUs, turning off TSC clock.
      [ ]
      [ ] ======================================================
      [ ] [ INFO: possible circular locking dependency detected ]
      [ ] 3.13.0-rc3-01745-g848b0d0322cb-dirty #637 Not tainted
      [ ] -------------------------------------------------------
      [ ] swapper/0/1 is trying to acquire lock:
      [ ]  (jump_label_mutex){+.+...}, at: [<ffffffff8115a637>] jump_label_lock+0x17/0x20
      [ ]
      [ ] but task is already holding lock:
      [ ]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109408b>] cpu_hotplug_begin+0x2b/0x60
      [ ]
      [ ] which lock already depends on the new lock.
      [ ]
      [ ]
      [ ] the existing dependency chain (in reverse order) is:
      [ ]
      [ ] -> #1 (cpu_hotplug.lock){+.+.+.}:
      [ ]        [<ffffffff810def00>] lock_acquire+0x90/0x130
      [ ]        [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0
      [ ]        [<ffffffff81093fdc>] get_online_cpus+0x3c/0x60
      [ ]        [<ffffffff8104cc67>] arch_jump_label_transform+0x37/0x130
      [ ]        [<ffffffff8115a3cf>] __jump_label_update+0x5f/0x80
      [ ]        [<ffffffff8115a48d>] jump_label_update+0x9d/0xb0
      [ ]        [<ffffffff8115aa6d>] static_key_slow_inc+0x9d/0xb0
      [ ]        [<ffffffff810c0f65>] sched_feat_set+0xf5/0x100
      [ ]        [<ffffffff810c5bdc>] set_numabalancing_state+0x2c/0x30
      [ ]        [<ffffffff81d12f3d>] numa_policy_init+0x1af/0x1b7
      [ ]        [<ffffffff81cebdf4>] start_kernel+0x35d/0x41f
      [ ]        [<ffffffff81ceb5a5>] x86_64_start_reservations+0x2a/0x2c
      [ ]        [<ffffffff81ceb6a2>] x86_64_start_kernel+0xfb/0xfe
      [ ]
      [ ] -> #0 (jump_label_mutex){+.+...}:
      [ ]        [<ffffffff810de141>] __lock_acquire+0x1701/0x1eb0
      [ ]        [<ffffffff810def00>] lock_acquire+0x90/0x130
      [ ]        [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0
      [ ]        [<ffffffff8115a637>] jump_label_lock+0x17/0x20
      [ ]        [<ffffffff8115aa3b>] static_key_slow_inc+0x6b/0xb0
      [ ]        [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20
      [ ]        [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70
      [ ]        [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150
      [ ]        [<ffffffff81076612>] native_cpu_up+0x3a2/0x890
      [ ]        [<ffffffff810941cb>] _cpu_up+0xdb/0x160
      [ ]        [<ffffffff810942c9>] cpu_up+0x79/0x90
      [ ]        [<ffffffff81d0af6b>] smp_init+0x60/0x8c
      [ ]        [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197
      [ ]        [<ffffffff8164e32e>] kernel_init+0xe/0x130
      [ ]        [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0
      [ ]
      [ ] other info that might help us debug this:
      [ ]
      [ ]  Possible unsafe locking scenario:
      [ ]
      [ ]        CPU0                    CPU1
      [ ]        ----                    ----
      [ ]   lock(cpu_hotplug.lock);
      [ ]                                lock(jump_label_mutex);
      [ ]                                lock(cpu_hotplug.lock);
      [ ]   lock(jump_label_mutex);
      [ ]
      [ ]  *** DEADLOCK ***
      [ ]
      [ ] 2 locks held by swapper/0/1:
      [ ]  #0:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81094037>] cpu_maps_update_begin+0x17/0x20
      [ ]  #1:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109408b>] cpu_hotplug_begin+0x2b/0x60
      [ ]
      [ ] stack backtrace:
      [ ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc3-01745-g848b0d0322cb-dirty #637
      [ ] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
      [ ]  ffffffff82c9c270 ffff880236843bb8 ffffffff8165c5f5 ffffffff82c9c270
      [ ]  ffff880236843bf8 ffffffff81658c02 ffff880236843c80 ffff8802368586a0
      [ ]  ffff880236858678 0000000000000001 0000000000000002 ffff880236858000
      [ ] Call Trace:
      [ ]  [<ffffffff8165c5f5>] dump_stack+0x4e/0x7a
      [ ]  [<ffffffff81658c02>] print_circular_bug+0x1f9/0x207
      [ ]  [<ffffffff810de141>] __lock_acquire+0x1701/0x1eb0
      [ ]  [<ffffffff816680ff>] ? __atomic_notifier_call_chain+0x8f/0xb0
      [ ]  [<ffffffff810def00>] lock_acquire+0x90/0x130
      [ ]  [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20
      [ ]  [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20
      [ ]  [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0
      [ ]  [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20
      [ ]  [<ffffffff8115a637>] jump_label_lock+0x17/0x20
      [ ]  [<ffffffff8115aa3b>] static_key_slow_inc+0x6b/0xb0
      [ ]  [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20
      [ ]  [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70
      [ ]  [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150
      [ ]  [<ffffffff81076612>] native_cpu_up+0x3a2/0x890
      [ ]  [<ffffffff810941cb>] _cpu_up+0xdb/0x160
      [ ]  [<ffffffff810942c9>] cpu_up+0x79/0x90
      [ ]  [<ffffffff81d0af6b>] smp_init+0x60/0x8c
      [ ]  [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197
      [ ]  [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
      [ ]  [<ffffffff8164e32e>] kernel_init+0xe/0x130
      [ ]  [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0
      [ ]  [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
      [ ] ------------[ cut here ]------------
      [ ] WARNING: CPU: 0 PID: 1 at /usr/src/linux-2.6/kernel/smp.c:374 smp_call_function_many+0xad/0x300()
      [ ] Modules linked in:
      [ ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc3-01745-g848b0d0322cb-dirty #637
      [ ] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
      [ ]  0000000000000009 ffff880236843be0 ffffffff8165c5f5 0000000000000000
      [ ]  ffff880236843c18 ffffffff81093d8c 0000000000000000 0000000000000000
      [ ]  ffffffff81ccd1a0 ffffffff810ca951 0000000000000000 ffff880236843c28
      [ ] Call Trace:
      [ ]  [<ffffffff8165c5f5>] dump_stack+0x4e/0x7a
      [ ]  [<ffffffff81093d8c>] warn_slowpath_common+0x8c/0xc0
      [ ]  [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0
      [ ]  [<ffffffff81093dda>] warn_slowpath_null+0x1a/0x20
      [ ]  [<ffffffff8110b72d>] smp_call_function_many+0xad/0x300
      [ ]  [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30
      [ ]  [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30
      [ ]  [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0
      [ ]  [<ffffffff8110ba96>] smp_call_function+0x46/0x80
      [ ]  [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30
      [ ]  [<ffffffff8110bb3c>] on_each_cpu+0x3c/0xa0
      [ ]  [<ffffffff810ca950>] ? sched_clock_idle_sleep_event+0x20/0x20
      [ ]  [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0
      [ ]  [<ffffffff8104f964>] text_poke_bp+0x64/0xd0
      [ ]  [<ffffffff810ca950>] ? sched_clock_idle_sleep_event+0x20/0x20
      [ ]  [<ffffffff8104ccde>] arch_jump_label_transform+0xae/0x130
      [ ]  [<ffffffff8115a3cf>] __jump_label_update+0x5f/0x80
      [ ]  [<ffffffff8115a48d>] jump_label_update+0x9d/0xb0
      [ ]  [<ffffffff8115aa6d>] static_key_slow_inc+0x9d/0xb0
      [ ]  [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20
      [ ]  [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70
      [ ]  [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150
      [ ]  [<ffffffff81076612>] native_cpu_up+0x3a2/0x890
      [ ]  [<ffffffff810941cb>] _cpu_up+0xdb/0x160
      [ ]  [<ffffffff810942c9>] cpu_up+0x79/0x90
      [ ]  [<ffffffff81d0af6b>] smp_init+0x60/0x8c
      [ ]  [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197
      [ ]  [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
      [ ]  [<ffffffff8164e32e>] kernel_init+0xe/0x130
      [ ]  [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0
      [ ]  [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
      [ ] ---[ end trace 6ff1df5620c49d26 ]---
      [ ] tsc: Marking TSC unstable due to check_tsc_sync_source failed
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-v55fgqj3nnyqnngmvuu8ep6h@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6577e42a
    • P
      sched/clock, x86: Use a static_key for sched_clock_stable · 35af99e6
      Peter Zijlstra 提交于
      In order to avoid the runtime condition and variable load turn
      sched_clock_stable into a static_key.
      
      Also provide a shorter implementation of local_clock() and
      cpu_clock(int) when sched_clock_stable==1.
      
                              MAINLINE   PRE       POST
      
          sched_clock_stable: 1          1         1
          (cold) sched_clock: 329841     221876    215295
          (cold) local_clock: 301773     234692    220773
          (warm) sched_clock: 38375      25602     25659
          (warm) local_clock: 100371     33265     27242
          (warm) rdtsc:       27340      24214     24208
          sched_clock_stable: 0          0         0
          (cold) sched_clock: 382634     235941    237019
          (cold) local_clock: 396890     297017    294819
          (warm) sched_clock: 38194      25233     25609
          (warm) local_clock: 143452     71234     71232
          (warm) rdtsc:       27345      24245     24243
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35af99e6
    • P
      sched/clock: Remove local_irq_disable() from the clocks · ef08f0ff
      Peter Zijlstra 提交于
      Now that x86 no longer requires IRQs disabled for sched_clock() and
      ia64 never had this requirement (it doesn't seem to do cpufreq at
      all), we can remove the requirement of disabling IRQs.
      
                              MAINLINE   PRE        POST
      
          sched_clock_stable: 1          1          1
          (cold) sched_clock: 329841     257223     221876
          (cold) local_clock: 301773     309889     234692
          (warm) sched_clock: 38375      25280      25602
          (warm) local_clock: 100371     85268      33265
          (warm) rdtsc:       27340      24247      24214
          sched_clock_stable: 0          0          0
          (cold) sched_clock: 382634     301224     235941
          (cold) local_clock: 396890     399870     297017
          (warm) sched_clock: 38194      25630      25233
          (warm) local_clock: 143452     129629     71234
          (warm) rdtsc:       27345      24307      24245
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-36e5kohiasnr106d077mgubp@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ef08f0ff
  22. 08 4月, 2013 1 次提交
    • T
      sched_clock: Prevent 64bit inatomicity on 32bit systems · a1cbcaa9
      Thomas Gleixner 提交于
      The sched_clock_remote() implementation has the following inatomicity
      problem on 32bit systems when accessing the remote scd->clock, which
      is a 64bit value.
      
      CPU0			CPU1
      
      sched_clock_local()	sched_clock_remote(CPU0)
      ...
      			remote_clock = scd[CPU0]->clock
      			    read_low32bit(scd[CPU0]->clock)
      cmpxchg64(scd->clock,...)
      			    read_high32bit(scd[CPU0]->clock)
      
      While the update of scd->clock is using an atomic64 mechanism, the
      readout on the remote cpu is not, which can cause completely bogus
      readouts.
      
      It is a quite rare problem, because it requires the update to hit the
      narrow race window between the low/high readout and the update must go
      across the 32bit boundary.
      
      The resulting misbehaviour is, that CPU1 will see the sched_clock on
      CPU1 ~4 seconds ahead of it's own and update CPU1s sched_clock value
      to this bogus timestamp. This stays that way due to the clamping
      implementation for about 4 seconds until the synchronization with
      CLOCK_MONOTONIC undoes the problem.
      
      The issue is hard to observe, because it might only result in a less
      accurate SCHED_OTHER timeslicing behaviour. To create observable
      damage on realtime scheduling classes, it is necessary that the bogus
      update of CPU1 sched_clock happens in the context of an realtime
      thread, which then gets charged 4 seconds of RT runtime, which results
      in the RT throttler mechanism to trigger and prevent scheduling of RT
      tasks for a little less than 4 seconds. So this is quite unlikely as
      well.
      
      The issue was quite hard to decode as the reproduction time is between
      2 days and 3 weeks and intrusive tracing makes it less likely, but the
      following trace recorded with trace_clock=global, which uses
      sched_clock_local(), gave the final hint:
      
        <idle>-0   0d..30 400269.477150: hrtimer_cancel: hrtimer=0xf7061e80
        <idle>-0   0d..30 400269.477151: hrtimer_start:  hrtimer=0xf7061e80 ...
      irq/20-S-587 1d..32 400273.772118: sched_wakeup:   comm= ... target_cpu=0
        <idle>-0   0dN.30 400273.772118: hrtimer_cancel: hrtimer=0xf7061e80
      
      What happens is that CPU0 goes idle and invokes
      sched_clock_idle_sleep_event() which invokes sched_clock_local() and
      CPU1 runs a remote wakeup for CPU0 at the same time, which invokes
      sched_remote_clock(). The time jump gets propagated to CPU0 via
      sched_remote_clock() and stays stale on both cores for ~4 seconds.
      
      There are only two other possibilities, which could cause a stale
      sched clock:
      
      1) ktime_get() which reads out CLOCK_MONOTONIC returns a sporadic
         wrong value.
      
      2) sched_clock() which reads the TSC returns a sporadic wrong value.
      
      #1 can be excluded because sched_clock would continue to increase for
         one jiffy and then go stale.
      
      #2 can be excluded because it would not make the clock jump
         forward. It would just result in a stale sched_clock for one jiffy.
      
      After quite some brain twisting and finding the same pattern on other
      traces, sched_clock_remote() remained the only place which could cause
      such a problem and as explained above it's indeed racy on 32bit
      systems.
      
      So while on 64bit systems the readout is atomic, we need to verify the
      remote readout on 32bit machines. We need to protect the local->clock
      readout in sched_clock_remote() on 32bit as well because an NMI could
      hit between the low and the high readout, call sched_clock_local() and
      modify local->clock.
      
      Thanks to Siegfried Wulsch for bearing with my debug requests and
      going through the tedious tasks of running a bunch of reproducer
      systems to generate the debug information which let me decode the
      issue.
      Reported-by: NSiegfried Wulsch <Siegfried.Wulsch@rovema.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304051544160.21884@ionosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      a1cbcaa9
  23. 17 11月, 2011 1 次提交
  24. 31 10月, 2011 1 次提交
  25. 23 11月, 2010 1 次提交
  26. 09 6月, 2010 1 次提交