1. 15 5月, 2017 3 次提交
    • P
      x86/tsc: Fix sched_clock() sync · 615cd033
      Peter Zijlstra 提交于
      While looking through the code I noticed that we initialize the cyc2ns
      fields with a different cycle value for each CPU, resulting in a
      slightly different 0 point for each CPU.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      615cd033
    • P
      x86/tsc: Remodel cyc2ns to use seqcount_latch() · 59eaef78
      Peter Zijlstra 提交于
      Replace the custom multi-value scheme with the more regular
      seqcount_latch() scheme. Along with scrapping a lot of lines, the latch
      scheme is better documented and used in more places.
      
      The immediate benefit however is not being limited on the update side.
      The current code has a limit where the writers block which is hit by
      future changes.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      59eaef78
    • P
      x86/tsc: Provide 'tsc=unstable' boot parameter · 8309f86c
      Peter Zijlstra 提交于
      Since the clocksource watchdog will only detect broken TSC after the
      fact, all TSC based clocks will likely have observed non-continuous
      values before/when switching away from TSC.
      
      Therefore only thing to fully avoid random clock movement when your
      BIOS randomly mucks with TSC values from SMI handlers is reporting the
      TSC as unstable at boot.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8309f86c
  2. 23 3月, 2017 1 次提交
  3. 14 3月, 2017 1 次提交
    • P
      x86/tsc: Fix ART for TSC_KNOWN_FREQ · 44fee88c
      Peter Zijlstra 提交于
      Subhransu reported that convert_art_to_tsc() isn't working for him.
      
      The ART to TSC relation is only set up for systems which use the refined
      TSC calibration. Systems with known TSC frequency (available via CPUID 15)
      are not using the refined calibration and therefor the ART to TSC relation
      is never established.
      
      Add the setup to the known frequency init path which skips ART
      calibration. The init code needs to be duplicated as for systems which use
      refined calibration the ART setup must be delayed until calibration has
      been done.
      
      The problem has been there since the ART support was introdduced, but only
      detected now because Subhransu tested the first time on hardware which has
      TSC frequency enumerated via CPUID 15.
      
      Note for stable: The conditional has changed from TSC_RELIABLE to
           	 	 TSC_KNOWN_FREQUENCY.
      
      [ tglx: Rewrote changelog and identified the proper 'Fixes' commit ]
      
      Fixes: f9677e0f ("x86/tsc: Always Running Timer (ART) correlated clocksource")
      Reported-by: N"Prusty, Subhransu S" <subhransu.s.prusty@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Cc: christopher.s.hall@intel.com
      Cc: kevin.b.stanton@intel.com
      Cc: john.stultz@linaro.org
      Cc: akataria@vmware.com
      Link: http://lkml.kernel.org/r/20170313145712.GI3312@twins.programming.kicks-ass.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      44fee88c
  4. 02 3月, 2017 2 次提交
    • P
      sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface · f94c8d11
      Peter Zijlstra 提交于
      Wanpeng Li reported that since the following commit:
      
        acb04058 ("sched/clock: Fix hotplug crash")
      
      ... KVM always runs with unstable sched-clock even though KVM's
      kvm_clock _is_ stable.
      
      The problem is that we've tied clear_sched_clock_stable() to the TSC
      state, and overlooked that sched_clock() is a paravirt function.
      
      Solve this by doing two things:
      
       - tie the sched_clock() stable state more clearly to the TSC stable
         state for the normal (!paravirt) case.
      
       - only call clear_sched_clock_stable() when we mark TSC unstable
         when we use native_sched_clock().
      
      The first means we can actually run with stable sched_clock in more
      situations then before, which is good. And since commit:
      
        12907fbb ("sched/clock, clocksource: Add optional cs::mark_unstable() method")
      
      ... this should be reliable. Since any detection of TSC fail now results
      in marking the TSC unstable.
      Reported-by: NWanpeng Li <kernellwp@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Fixes: acb04058 ("sched/clock: Fix hotplug crash")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f94c8d11
    • I
      sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> · e6017571
      Ingo Molnar 提交于
      We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which
      will have to be picked up from other headers and .c files.
      
      Create a trivial placeholder <linux/sched/clock.h> file that just
      maps to <linux/sched.h> to make this patch obviously correct and
      bisectable.
      
      Include the new header in the files that are going to need it.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e6017571
  5. 10 2月, 2017 1 次提交
    • T
      x86/tsc: Avoid the large time jump when sanitizing TSC ADJUST · f2e04214
      Thomas Gleixner 提交于
      Olof reported that on a machine which has a BIOS wreckaged TSC the
      timestamps in dmesg are making a large jump because the TSC value is
      jumping forward after resetting the TSC ADJUST register to a sane value.
      
      This can be avoided by calling the TSC ADJUST saniziting function before
      initializing the per cpu sched clock machinery. That takes the offset into
      account and avoid the time jump.
      
      What cannot be avoided is that the 'Firmware Bug' warnings on the secondary
      CPUs are printed with the large time offsets because it would be too much
      effort and ugly hackery to print those warnings into a buffer and emit them
      after the adjustemt on the starting CPUs. It's a firmware bug and should be
      fixed in firmware. The weird timestamps are collateral damage and just
      illustrate the sillyness of the BIOS folks:
      
      [    0.397445] smp: Bringing up secondary CPUs ...
      [    0.402100] x86: Booting SMP configuration:
      [    0.406343] .... node  #0, CPUs:      #1
      [1265776479.930667] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU1: -2978888639183101
      [1265776479.944664] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -2978888639183101
      [    0.508119]  #2
      [1265776480.032346] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU2: -2978888639183677
      [1265776480.044192] TSC ADJUST synchronize: Reference CPU0: 0 CPU2: -2978888639183677
      [    0.607643]  #3
      [1265776480.131874] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU3: -2978888639184530
      [1265776480.143720] TSC ADJUST synchronize: Reference CPU0: 0 CPU3: -2978888639184530
      [    0.707108] smp: Brought up 1 node, 4 CPUs
      [    0.711271] smpboot: Total of 4 processors activated (21698.88 BogoMIPS)
      Reported-by: NOlof Johansson <olof@lixom.net>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170209151231.411460506@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      f2e04214
  6. 14 1月, 2017 2 次提交
  7. 25 12月, 2016 1 次提交
  8. 15 12月, 2016 2 次提交
    • T
      x86/tsc: Force TSC_ADJUST register to value >= zero · 5bae1562
      Thomas Gleixner 提交于
      Roland reported that his DELL T5810 sports a value add BIOS which
      completely wreckages the TSC. The squirmware [(TM) Ingo Molnar] boots with
      random negative TSC_ADJUST values, different on all CPUs. That renders the
      TSC useless because the sycnchronization check fails.
      
      Roland tested the new TSC_ADJUST mechanism. While it manages to readjust
      the TSCs he needs to disable the TSC deadline timer, otherwise the machine
      just stops booting.
      
      Deeper investigation unearthed that the TSC deadline timer is sensitive to
      the TSC_ADJUST value. Writing TSC_ADJUST to a negative value results in an
      interrupt storm caused by the TSC deadline timer.
      
      This does not make any sense and it's hard to imagine what kind of hardware
      wreckage is behind that misfeature, but it's reliably reproducible on other
      systems which have TSC_ADJUST and TSC deadline timer.
      
      While it would be understandable that a big enough negative value which
      moves the resulting TSC readout into the negative space could have the
      described effect, this happens even with a adjust value of -1, which keeps
      the TSC readout definitely in the positive space. The compare register for
      the TSC deadline timer is set to a positive value larger than the TSC, but
      despite not having reached the deadline the interrupt is raised
      immediately. If this happens on the boot CPU, then the machine dies
      silently because this setup happens before the NMI watchdog is armed.
      
      Further experiments showed that any other adjustment of TSC_ADJUST works as
      expected as long as it stays in the positive range. The direction of the
      adjustment has no influence either. See the lkml link for further analysis.
      
      Yet another proof for the theory that timers are designed by janitors and
      the underlying (obviously undocumented) mechanisms which allow BIOSes to
      wreckage them are considered a feature. Well done Intel - NOT!
      
      To address this wreckage add the following sanity measures:
      
      - If the TSC_ADJUST value on the boot cpu is not 0, set it to 0
      
      - If the TSC_ADJUST value on any cpu is negative, set it to 0
      
      - Prevent the cross package synchronization mechanism from setting negative
        TSC_ADJUST values.
      Reported-and-tested-by: NRoland Scheidegger <rscheidegger_lists@hispeed.ch>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
      Cc: Kevin Stanton <kevin.b.stanton@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Allen Hung <allen_hung@dell.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/r/20161213131211.397588033@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5bae1562
    • T
      x86/tsc: Validate TSC_ADJUST after resume · 6a369583
      Thomas Gleixner 提交于
      Some 'feature' BIOSes fiddle with the TSC_ADJUST register during
      suspend/resume which renders the TSC unusable.
      
      Add sanity checks into the resume path and restore the
      original value if it was adjusted.
      Reported-and-tested-by: NRoland Scheidegger <rscheidegger_lists@hispeed.ch>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
      Cc: Kevin Stanton <kevin.b.stanton@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Allen Hung <allen_hung@dell.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/r/20161213131211.317654500@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      6a369583
  9. 30 11月, 2016 2 次提交
    • T
      x86/tsc: Store and check TSC ADJUST MSR · 8b223bc7
      Thomas Gleixner 提交于
      The TSC_ADJUST MSR shows whether the TSC has been modified. This is helpful
      in a two aspects:
      
      1) It allows to detect BIOS wreckage, where SMM code tries to 'hide' the
         cycles spent by storing the TSC value at SMM entry and restoring it at
         SMM exit. On affected machines the TSCs run slowly out of sync up to the
         point where the clocksource watchdog (if available) detects it.
      
         The TSC_ADJUST MSR allows to detect the TSC modification before that and
         eventually restore it. This is also important for SoCs which have no
         watchdog clocksource and therefore TSC wreckage cannot be detected and
         acted upon.
      
      2) All threads in a package are required to have the same TSC_ADJUST
         value. Broken BIOSes break that and as a result the TSC synchronization
         check fails.
      
         The TSC_ADJUST MSR allows to detect the deviation when a CPU comes
         online. If detected set it to the value of an already online CPU in the
         same package. This also allows to reduce the number of sync tests
         because with that in place the test is only required for the first CPU
         in a package.
      
         In principle all CPUs in a system should have the same TSC_ADJUST value
         even across packages, but with physical CPU hotplug this assumption is
         not true because the TSC starts with power on, so physical hotplug has
         to do some trickery to bring the TSC into sync with already running
         packages, which requires to use an TSC_ADJUST value different from CPUs
         which got powered earlier.
      
         A final enhancement is the opportunity to compensate for unsynced TSCs
         accross nodes at boot time and make the TSC usable that way. It won't
         help for TSCs which run apart due to frequency skew between packages,
         but this gets detected by the clocksource watchdog later.
      
      The first step toward this is to store the TSC_ADJUST value of a starting
      CPU and compare it with the value of an already online CPU in the same
      package. If they differ, emit a warning and adjust it to the reference
      value. The !SMP version just stores the boot value for later verification.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/r/20161119134017.655323776@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8b223bc7
    • T
      x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() · 7b3d2f6e
      Thomas Gleixner 提交于
      The art detection uses rdmsrl_safe() to detect the availablity of the
      TSC_ADJUST MSR.
      
      That's pointless because we have a feature bit for this. Use it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/r/20161119134017.483561692@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      7b3d2f6e
  10. 18 11月, 2016 4 次提交
  11. 20 9月, 2016 2 次提交
  12. 10 8月, 2016 1 次提交
    • N
      x86/timers/apic: Inform TSC deadline clockevent device about recalibration · 6731b0d6
      Nicolai Stange 提交于
      This patch eliminates a source of imprecise APIC timer interrupts,
      which imprecision may result in double interrupts or even late
      interrupts.
      
      The TSC deadline clockevent devices' configuration and registration
      happens before the TSC frequency calibration is refined in
      tsc_refine_calibration_work().
      
      This results in the TSC clocksource and the TSC deadline clockevent
      devices being configured with slightly different frequencies: the former
      gets the refined one and the latter are configured with the inaccurate
      frequency detected earlier by means of the "Fast TSC calibration using PIT".
      
      Within the APIC code, introduce the notifier function
      lapic_update_tsc_freq() which reconfigures all per-CPU TSC deadline
      clockevent devices with the current tsc_khz.
      
      Call it from the TSC code after TSC calibration refinement has happened.
      Signed-off-by: NNicolai Stange <nicstange@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Christopher S. Hall <christopher.s.hall@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20160714152255.18295-3-nicstange@gmail.com
      [ Pushed #ifdef CONFIG_X86_LOCAL_APIC into header, improved changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6731b0d6
  13. 15 7月, 2016 1 次提交
  14. 14 7月, 2016 1 次提交
    • P
      x86/kernel: Audit and remove any unnecessary uses of module.h · 186f4360
      Paul Gortmaker 提交于
      Historically a lot of these existed because we did not have
      a distinction between what was modular code and what was providing
      support to modules via EXPORT_SYMBOL and friends.  That changed
      when we forked out support for the latter into the export.h file.
      
      This means we should be able to reduce the usage of module.h
      in code that is obj-y Makefile or bool Kconfig.  The advantage
      in doing so is that module.h itself sources about 15 other headers;
      adding significantly to what we feed cpp, and it can obscure what
      headers we are effectively using.
      
      Since module.h was the source for init.h (for __init) and for
      export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
      for the presence of either and replace as needed.  Build testing
      revealed some implicit header usage that was fixed up accordingly.
      
      Note that some bool/obj-y instances remain since module.h is
      the header for some exception table entry stuff, and for things
      like __init_or_module (code that is tossed when MODULES=n).
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160714001901.31603-4-paul.gortmaker@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      186f4360
  15. 12 7月, 2016 3 次提交
  16. 13 4月, 2016 3 次提交
  17. 18 3月, 2016 1 次提交
  18. 04 3月, 2016 1 次提交
    • C
      x86/tsc: Always Running Timer (ART) correlated clocksource · f9677e0f
      Christopher S. Hall 提交于
      On modern Intel systems TSC is derived from the new Always Running Timer
      (ART). ART can be captured simultaneous to the capture of
      audio and network device clocks, allowing a correlation between timebases
      to be constructed. Upon capture, the driver converts the captured ART
      value to the appropriate system clock using the correlated clocksource
      mechanism.
      
      On systems that support ART a new CPUID leaf (0x15) returns parameters
      “m” and “n” such that:
      
      TSC_value = (ART_value * m) / n + k [n >= 1]
      
      [k is an offset that can adjusted by a privileged agent. The
      IA32_TSC_ADJUST MSR is an example of an interface to adjust k.
      See 17.14.4 of the Intel SDM for more details]
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: kevin.b.stanton@intel.com
      Cc: kevin.j.clarke@intel.com
      Cc: hpa@zytor.com
      Cc: jeffrey.t.kirsher@intel.com
      Cc: netdev@vger.kernel.org
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NChristopher S. Hall <christopher.s.hall@intel.com>
      [jstultz: Tweaked to fix build issue, also reworked math for
      64bit division on 32bit systems, as well as !CONFIG_CPU_FREQ build
      fixes]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      f9677e0f
  19. 24 2月, 2016 1 次提交
  20. 22 2月, 2016 1 次提交
  21. 19 11月, 2015 1 次提交
  22. 20 10月, 2015 1 次提交
    • A
      perf/x86: Fix time_shift in perf_event_mmap_page · b9511cd7
      Adrian Hunter 提交于
      Commit:
      
        b20112ed ("perf/x86: Improve accuracy of perf/sched clock")
      
      allowed the time_shift value in perf_event_mmap_page to be as much
      as 32.  Unfortunately the documented algorithms for using time_shift
      have it shifting an integer, whereas to work correctly with the value
      32, the type must be u64.
      
      In the case of perf tools, Intel PT decodes correctly but the timestamps
      that are output (for example by perf script) have lost 32-bits of
      granularity so they look like they are not changing at all.
      
      Fix by limiting the shift to 31 and adjusting the multiplier accordingly.
      
      Also update the documentation of perf_event_mmap_page so that new code
      based on it will be more future-proof.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: b20112ed ("perf/x86: Improve accuracy of perf/sched clock")
      Link: http://lkml.kernel.org/r/1445001845-13688-2-git-send-email-adrian.hunter@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b9511cd7
  23. 16 9月, 2015 1 次提交
  24. 13 9月, 2015 1 次提交
  25. 04 8月, 2015 1 次提交
  26. 03 8月, 2015 1 次提交