1. 14 2月, 2014 1 次提交
    • T
      tick: Clear broadcast pending bit when switching to oneshot · dd5fd9b9
      Thomas Gleixner 提交于
      AMD systems which use the C1E workaround in the amd_e400_idle routine
      trigger the WARN_ON_ONCE in the broadcast code when onlining a CPU.
      
      The reason is that the idle routine of those AMD systems switches the
      cpu into forced broadcast mode early on before the newly brought up
      CPU can switch over to high resolution / NOHZ mode. The timer related
      CPU1 bringup looks like this:
      
        clockevent_register_device(local_apic);
        tick_setup(local_apic);
        ...
        idle()
      	tick_broadcast_on_off(FORCE);
      	tick_broadcast_oneshot_control(ENTER)
      	  cpumask_set(cpu, broadcast_oneshot_mask);
      	halt();
      
      Now the broadcast interrupt on CPU0 sets CPU1 in the
      broadcast_pending_mask and wakes CPU1. So CPU1 continues:
      
      	local_apic_timer_interrupt()
      	   tick_handle_periodic();
      	   softirq()
      	     tick_init_highres();
      	       cpumask_clr(cpu, broadcast_oneshot_mask);
      	
      	tick_broadcast_oneshot_control(ENTER)
      	   WARN_ON(cpumask_test(cpu, broadcast_pending_mask);
      
      So while we remove CPU1 from the broadcast_oneshot_mask when we switch
      over to highres mode, we do not clear the pending bit, which then
      triggers the warning when we go back to idle.
      
      The reason why this is only visible on C1E affected AMD systems is
      that the other machines enter the deep sleep states via
      acpi_idle/intel_idle and exit the broadcast mode before executing the
      remote triggered local_apic_timer_interrupt. So the pending bit is
      already cleared when the switch over to highres mode is clearing the
      oneshot mask.
      
      The solution is simple: Clear the pending bit together with the mask
      bit when we switch over to highres mode.
      
      Stanislaw came up independently with the same patch by enforcing the
      C1E workaround and debugging the fallout. I picked mine, because mine
      has a changelog :)
      Reported-by: Npoma <pomidorabelisima@gmail.com>
      Debugged-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Olaf Hering <olaf@aepfle.de>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Justin M. Forbes <jforbes@redhat.com>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402111434180.21991@ionos.tec.linutronix.de
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      dd5fd9b9
  2. 07 2月, 2014 3 次提交
    • P
      tick: Introduce hrtimer based broadcast · 5d1638ac
      Preeti U Murthy 提交于
      On some architectures, in certain CPU deep idle states the local timers stop.
      An external clock device is used to wakeup these CPUs. The kernel support for the
      wakeup of these CPUs is provided by the tick broadcast framework by using the
      external clock device as the wakeup source.
      
      However not all implementations of architectures provide such an external
      clock device. This patch includes support in the broadcast framework to handle
      the wakeup of the CPUs in deep idle states on such systems by queuing a hrtimer
      on one of the CPUs, which is meant to handle the wakeup of CPUs in deep idle states.
      
      This patchset introduces a pseudo clock device which can be registered by the
      archs as tick_broadcast_device in the absence of a real external clock
      device. Once registered, the broadcast framework will work as is for these
      architectures as long as the archs take care of the BROADCAST_ENTER
      notification failing for one of the CPUs. This CPU is made the stand by CPU to
      handle wakeup of the CPUs in deep idle and it *must not enter deep idle states*.
      
      The CPU with the earliest wakeup is chosen to be this CPU. Hence this way the
      stand by CPU dynamically moves around and so does the hrtimer which is queued
      to trigger at the next earliest wakeup time. This is consistent with the case where
      an external clock device is present. The smp affinity of this clock device is
      set to the CPU with the earliest wakeup. This patchset handles the hotplug of
      the stand by CPU as well by moving the hrtimer on to the CPU handling the CPU_DEAD
      notification.
      
      Originally-from: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: deepthi@linux.vnet.ibm.com
      Cc: paulmck@linux.vnet.ibm.com
      Cc: fweisbec@gmail.com
      Cc: paulus@samba.org
      Cc: srivatsa.bhat@linux.vnet.ibm.com
      Cc: svaidy@linux.vnet.ibm.com
      Cc: peterz@infradead.org
      Cc: benh@kernel.crashing.org
      Cc: rafael.j.wysocki@intel.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20140207080632.17187.80532.stgit@preeti.in.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5d1638ac
    • P
      time: Change the return type of clockevents_notify() to integer · da7e6f45
      Preeti U Murthy 提交于
      The broadcast framework can potentially be made use of by archs which do not have an
      external clock device as well. Then, it is required that one of the CPUs need
      to handle the broadcasting of wakeup IPIs to the CPUs in deep idle. As a
      result its local timers should remain functional all the time. For such
      a CPU, the BROADCAST_ENTER notification has to fail indicating that its clock
      device cannot be shutdown. To make way for this support, change the return
      type of tick_broadcast_oneshot_control() and hence clockevents_notify() to
      indicate such scenarios.
      Signed-off-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: deepthi@linux.vnet.ibm.com
      Cc: paulmck@linux.vnet.ibm.com
      Cc: fweisbec@gmail.com
      Cc: paulus@samba.org
      Cc: srivatsa.bhat@linux.vnet.ibm.com
      Cc: svaidy@linux.vnet.ibm.com
      Cc: peterz@infradead.org
      Cc: benh@kernel.crashing.org
      Cc: rafael.j.wysocki@intel.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20140207080606.17187.78306.stgit@preeti.in.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      da7e6f45
    • T
      clockevents: Serialize calls to clockevents_update_freq() in the core · 627ee794
      Thomas Gleixner 提交于
      We can identify the broadcast device in the core and serialize all
      callers including interrupts on a different CPU against the update.
      Also, disabling interrupts is moved into the core allowing callers to
      leave interrutps enabled when calling clockevents_update_freq().
      Signed-off-by: NSoren Brinkmann <soren.brinkmann@xilinx.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Soeren Brinkmann <soren.brinkmann@xilinx.com>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Michal Simek <michal.simek@xilinx.com>
      Link: http://lkml.kernel.org/r/1391466877-28908-2-git-send-email-soren.brinkmann@xilinx.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      627ee794
  3. 03 12月, 2013 1 次提交
    • F
      nohz: Convert a few places to use local per cpu accesses · e8fcaa5c
      Frederic Weisbecker 提交于
      A few functions use remote per CPU access APIs when they
      deal with local values.
      
      Just do the right conversion to improve performance, code
      readability and debug checks.
      
      While at it, lets extend some of these function names with *_this_cpu()
      suffix in order to display their purpose more clearly.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      e8fcaa5c
  4. 02 10月, 2013 1 次提交
    • S
      tick: broadcast: Deny per-cpu clockevents from being broadcast sources · 245a3496
      Soren Brinkmann 提交于
      On most ARM systems the per-cpu clockevents are truly per-cpu in
      the sense that they can't be controlled on any other CPU besides
      the CPU that they interrupt. If one of these clockevents were to
      become a broadcast source we will run into a lot of trouble
      because the broadcast source is enabled on the first CPU to go
      into deep idle (if that CPU suffers from FEAT_C3_STOP) and that
      could be a different CPU than what the clockevent is interrupting
      (or even worse the CPU that the clockevent interrupts could be
      offline).
      
      Theoretically it's possible to support per-cpu clockevents as the
      broadcast source but so far we haven't needed this and supporting
      it is rather complicated. Let's just deny the possibility for now
      until this becomes a reality (let's hope it never does!).
      Signed-off-by: NSoren Brinkmann <soren.brinkmann@xilinx.com>
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NMichal Simek <michal.simek@xilinx.com>
      245a3496
  5. 12 7月, 2013 1 次提交
    • S
      tick: broadcast: Check broadcast mode on CPU hotplug · a272dcca
      Stephen Boyd 提交于
      On ARM systems the dummy clockevent is registered with the cpu
      hotplug notifier chain before any other per-cpu clockevent. This
      has the side-effect of causing the dummy clockevent to be
      registered first in every hotplug sequence. Because the dummy is
      first, we'll try to turn the broadcast source on but the code in
      tick_device_uses_broadcast() assumes the broadcast source is in
      periodic mode and calls tick_broadcast_start_periodic()
      unconditionally.
      
      On boot this isn't a problem because we typically haven't
      switched into oneshot mode yet (if at all). During hotplug, if
      the broadcast source isn't in periodic mode we'll replace the
      broadcast oneshot handler with the broadcast periodic handler and
      start emulating oneshot mode when we shouldn't. Due to the way
      the broadcast oneshot handler programs the next_event it's
      possible for it to contain KTIME_MAX and cause us to hang the
      system when the periodic handler tries to program the next tick.
      Fix this by using the appropriate function to start the broadcast
      source.
      Reported-by: NStephen Warren <swarren@nvidia.com>
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: Mark Rutland <Mark.Rutland@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: ARM kernel mailing list <linux-arm-kernel@lists.infradead.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joseph Lo <josephl@nvidia.com>
      Link: http://lkml.kernel.org/r/20130711140059.GA27430@codeaurora.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a272dcca
  6. 02 7月, 2013 3 次提交
    • T
      tick: Sanitize broadcast control logic · 07bd1172
      Thomas Gleixner 提交于
      The recent implementation of a generic dummy timer resulted in a
      different registration order of per cpu local timers which made the
      broadcast control logic go belly up.
      
      If the dummy timer is the first clock event device which is registered
      for a CPU, then it is installed, the broadcast timer is initialized
      and the CPU is marked as broadcast target.
      
      If a real clock event device is installed after that, we can fail to
      take the CPU out of the broadcast mask. In the worst case we end up
      with two periodic timer events firing for the same CPU. One from the
      per cpu hardware device and one from the broadcast.
      
      Now the problem is that we have no way to distinguish whether the
      system is in a state which makes broadcasting necessary or the
      broadcast bit was set due to the nonfunctional dummy timer
      installment.
      
      To solve this we need to keep track of the system state seperately and
      provide a more detailed decision logic whether we keep the CPU in
      broadcast mode or not.
      
      The old decision logic only clears the broadcast mode, if the newly
      installed clock event device is not affected by power states.
      
      The new logic clears the broadcast mode if one of the following is
      true:
      
        - The new device is not affected by power states.
      
        - The system is not in a power state affected mode
      
        - The system has switched to oneshot mode. The oneshot broadcast is
          controlled from the deep idle state. The CPU is not in idle at
          this point, so it's safe to remove it from the mask.
      
      If we clear the broadcast bit for the CPU when a new device is
      installed, we also shutdown the broadcast device when this was the
      last CPU in the broadcast mask.
      
      If the broadcast bit is kept, then we leave the new device in shutdown
      state and rely on the broadcast to deliver the timer interrupts via
      the broadcast ipis.
      Reported-and-tested-by: NStehle Vincent-B46079 <B46079@freescale.com>
      Reviewed-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: John Stultz <john.stultz@linaro.org>,
      Cc: Mark Rutland <mark.rutland@arm.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      07bd1172
    • T
      tick: Prevent uncontrolled switch to oneshot mode · 1f73a980
      Thomas Gleixner 提交于
      When the system switches from periodic to oneshot mode, the broadcast
      logic causes a possibility that a CPU which has not yet switched to
      oneshot mode puts its own clock event device into oneshot mode without
      updating the state and the timer handler.
      
      CPU0				CPU1
      				per cpu tickdev is in periodic mode
      				and switched to broadcast
      
      Switch to oneshot mode
       tick_broadcast_switch_to_oneshot()
        cpumask_copy(tick_oneshot_broacast_mask,
      	       tick_broadcast_mask);
      
        broadcast device mode = oneshot
      
      				Timer interrupt
      						
      				irq_enter()
      				 tick_check_oneshot_broadcast()
      				  dev->set_mode(ONESHOT);
      
      				tick_handle_periodic()
      				 if (dev->mode == ONESHOT)
      				   dev->next_event += period;
      				   FAIL.
      
      We fail, because dev->next_event contains KTIME_MAX, if the device was
      in periodic mode before the uncontrolled switch to oneshot happened.
      
      We must copy the broadcast bits over to the oneshot mask, because
      otherwise a CPU which relies on the broadcast would not been woken up
      anymore after the broadcast device switched to oneshot mode.
      
      So we need to verify in tick_check_oneshot_broadcast() whether the CPU
      has already switched to oneshot mode. If not, leave the device
      untouched and let the CPU switch controlled into oneshot mode.
      
      This is a long standing bug, which was never noticed, because the main
      user of the broadcast x86 cannot run into that scenario, AFAICT. The
      nonarchitected timer mess of ARM creates a gazillion of differently
      broken abominations which trigger the shortcomings of that broadcast
      code, which better had never been necessary in the first place.
      Reported-and-tested-by: NStehle Vincent-B46079 <B46079@freescale.com>
      Reviewed-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: John Stultz <john.stultz@linaro.org>,
      Cc: Mark Rutland <mark.rutland@arm.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1f73a980
    • T
      tick: Make oneshot broadcast robust vs. CPU offlining · c9b5a266
      Thomas Gleixner 提交于
      In periodic mode we remove offline cpus from the broadcast propagation
      mask. In oneshot mode we fail to do so. This was not a problem so far,
      but the recent changes to the broadcast propagation introduced a
      constellation which can result in a NULL pointer dereference.
      
      What happens is:
      
      CPU0			CPU1
      			idle()
      			  arch_idle()
      			    tick_broadcast_oneshot_control(OFF);
      			      set cpu1 in tick_broadcast_force_mask
      			  if (cpu_offline())
      			     arch_cpu_dead()
      
      cpu_dead_cleanup(cpu1)
       cpu1 tickdevice pointer = NULL
      
      broadcast interrupt
        dereference cpu1 tickdevice pointer -> OOPS
      
      We dereference the pointer because cpu1 is still set in
      tick_broadcast_force_mask and tick_do_broadcast() expects a valid
      cpumask and therefor lacks any further checks.
      
      Remove the cpu from the tick_broadcast_force_mask before we set the
      tick device pointer to NULL. Also add a sanity check to the oneshot
      broadcast function, so we can detect such issues w/o crashing the
      machine.
      Reported-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: athorlton@sgi.com
      Cc: CAI Qian <caiqian@redhat.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1306261303260.4013@ionos.tec.linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c9b5a266
  7. 21 6月, 2013 1 次提交
    • D
      tick: Fix tick_broadcast_pending_mask not cleared · ea8deb8d
      Daniel Lezcano 提交于
      The recent modification in the cpuidle framework consolidated the
      timer broadcast code across the different drivers by setting a new
      flag in the idle state. It tells the cpuidle core code to enter/exit
      the broadcast mode for the cpu when entering a deep idle state. The
      broadcast timer enter/exit is no longer handled by the back-end
      driver.
      
      This change made the local interrupt to be enabled *before* calling
      CLOCK_EVENT_NOTIFY_EXIT.
      
      On a tegra114, a four cores system, when the flag has been introduced
      in the driver, the following warning appeared:
      
      WARNING: at kernel/time/tick-broadcast.c:578 tick_broadcast_oneshot_control
      CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.10.0-rc3-next-20130529+ #15
      [<c00667f8>] (tick_broadcast_oneshot_control+0x1a4/0x1d0) from [<c0065cd0>] (tick_notify+0x240/0x40c)
      [<c0065cd0>] (tick_notify+0x240/0x40c) from [<c0044724>] (notifier_call_chain+0x44/0x84)
      [<c0044724>] (notifier_call_chain+0x44/0x84) from [<c0044828>] (raw_notifier_call_chain+0x18/0x20)
      [<c0044828>] (raw_notifier_call_chain+0x18/0x20) from [<c00650cc>] (clockevents_notify+0x28/0x170)
      [<c00650cc>] (clockevents_notify+0x28/0x170) from [<c033f1f0>] (cpuidle_idle_call+0x11c/0x168)
      [<c033f1f0>] (cpuidle_idle_call+0x11c/0x168) from [<c000ea94>] (arch_cpu_idle+0x8/0x38)
      [<c000ea94>] (arch_cpu_idle+0x8/0x38) from [<c005ea80>] (cpu_startup_entry+0x60/0x134)
      [<c005ea80>] (cpu_startup_entry+0x60/0x134) from [<804fe9a4>] (0x804fe9a4)
      
      I don't have the hardware, so I wasn't able to reproduce the warning
      but after looking a while at the code, I deduced the following:
      
       1. the CPU2 enters a deep idle state and sets the broadcast timer
      
       2. the timer expires, the tick_handle_oneshot_broadcast function is
          called, setting the tick_broadcast_pending_mask and waking up the
          idle cpu CPU2
      
       3. the CPU2 exits idle handles the interrupt and then invokes
          tick_broadcast_oneshot_control with CLOCK_EVENT_NOTIFY_EXIT which
          runs the following code:
      
          [...]
          if (dev->next_event.tv64 == KTIME_MAX)
                  goto out;
      
          if (cpumask_test_and_clear_cpu(cpu,
                                       tick_broadcast_pending_mask))
                  goto out;
          [...]
      
          So if there is no next event scheduled for CPU2, we fulfil the
          first condition and jump out without clearing the
          tick_broadcast_pending_mask.
      
       4. CPU2 goes to deep idle again and calls
          tick_broadcast_oneshot_control with CLOCK_NOTIFY_EVENT_ENTER but
          with the tick_broadcast_pending_mask set for CPU2, triggering the
          warning.
      
      The issue only surfaced due to the modifications of the cpuidle
      framework, which resulted in interrupts being enabled before the call
      to the clockevents code. If the call happens before interrupts have
      been enabled, the warning cannot trigger, because there is still the
      event pending which caused the broadcast timer expiry.
      
      Move the check for the next event below the check for the pending bit,
      so the pending bit gets cleared whether an event is scheduled on the
      cpu or not.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Reported-and-tested-by: NJoseph Lo <josephl@nvidia.com>
      Cc: Stephen Warren <swarren@nvidia.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linaro-kernel@lists.linaro.org
      Link: http://lkml.kernel.org/r/1371485735-31249-1-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      ea8deb8d
  8. 31 5月, 2013 1 次提交
  9. 28 5月, 2013 1 次提交
    • T
      tick: Cure broadcast false positive pending bit warning · 2938d275
      Thomas Gleixner 提交于
      commit 26517f3e (tick: Avoid programming the local cpu timer if
      broadcast pending) added a warning if the cpu enters broadcast mode
      again while the pending bit is still set. Meelis reported that the
      warning triggers. There are two corner cases which have been not
      considered:
      
      1) cpuidle calls clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
         twice. That can result in the following scenario
      
         CPU0                    CPU1
                                 cpuidle_idle_call()
                                   clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
                                     set cpu in tick_broadcast_oneshot_mask
      
         broadcast interrupt
           event expired for cpu1
           set pending bit
      
                                   acpi_idle_enter_simple()
                                     clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
                                       WARN_ON(pending bit)
      
        Move the WARN_ON into the section where we enter broadcast mode so
        it wont provide false positives on the second call.
      
      2) safe_halt() enables interrupts, so a broadcast interrupt can be
         delivered befor the broadcast mode is disabled. That sets the
         pending bit for the CPU which receives the broadcast
         interrupt. Though the interrupt is delivered right away from the
         broadcast handler and leaves the pending bit stale.
      
         Clear the pending bit for the current cpu in the broadcast handler.
      Reported-and-tested-by: NMeelis Roos <mroos@linux.ee>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305271841130.4220@ionosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2938d275
  10. 16 5月, 2013 3 次提交
  11. 05 5月, 2013 1 次提交
  12. 25 4月, 2013 1 次提交
    • T
      clockevents: Set dummy handler on CPU_DEAD shutdown · 6f7a05d7
      Thomas Gleixner 提交于
      Vitaliy reported that a per cpu HPET timer interrupt crashes the
      system during hibernation. What happens is that the per cpu HPET timer
      gets shut down when the nonboot cpus are stopped. When the nonboot
      cpus are onlined again the HPET code sets up the MSI interrupt which
      fires before the clock event device is registered. The event handler
      is still set to hrtimer_interrupt, which then crashes the machine due
      to highres mode not being active.
      
      See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=700333
      
      There is no real good way to avoid that in the HPET code. The HPET
      code alrady has a mechanism to detect spurious interrupts when event
      handler == NULL for a similar reason.
      
      We can handle that in the clockevent/tick layer and replace the
      previous functional handler with a dummy handler like we do in
      tick_setup_new_device().
      
      The original clockevents code did this in clockevents_exchange_device(),
      but that got removed by commit 7c1e7689 (clockevents: prevent
      clockevent event_handler ending up handler_noop) which forgot to fix
      it up in tick_shutdown(). Same issue with the broadcast device.
      Reported-by: NVitaliy Fillipov <vitalif@yourcmc.ru>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: stable@vger.kernel.org
      Cc: 700333@bugs.debian.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6f7a05d7
  13. 18 4月, 2013 1 次提交
    • S
      clockevents: Switch into oneshot mode even if broadcast registered late · c038c1c4
      Stephen Boyd 提交于
      tick_oneshot_notify() is used to notify a particular CPU to try
      to switch into oneshot mode after a oneshot capable tick device
      is registered and tick_clock_notify() is used to notify all CPUs
      to try to switch into oneshot mode after a high res clocksource
      is registered. There is one caveat; if the tick devices suffer
      from FEAT_C3_STOP we don't try to switch into oneshot mode unless
      we have a oneshot capable broadcast device already registered.
      
      If the broadcast device is registered after the tick devices that
      have FEAT_C3_STOP we'll never try to switch into oneshot mode
      again, causing us to be stuck in periodic mode forever. Avoid
      this scenario by calling tick_clock_notify() after we register
      the broadcast device so that we try to switch into oneshot mode
      on all CPUs one more time.
      
      [ tglx: Adopted to timers/core and added a comment ]
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Link: http://lkml.kernel.org/r/1366219566-29783-1-git-send-email-sboyd@codeaurora.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c038c1c4
  14. 16 4月, 2013 1 次提交
    • F
      nohz: Switch from "extended nohz" to "full nohz" based naming · c5bfece2
      Frederic Weisbecker 提交于
      "Extended nohz" was used as a naming base for the full dynticks
      API and Kconfig symbols. It reflects the fact the system tries
      to stop the tick in more places than just idle.
      
      But that "extended" name is a bit opaque and vague. Rename it to
      "full" makes it clearer what the system tries to do under this
      config: try to shutdown the tick anytime it can. The various
      constraints that prevent that to happen shouldn't be considered
      as fundamental properties of this feature but rather technical
      issues that may be solved in the future.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      c5bfece2
  15. 21 3月, 2013 1 次提交
    • F
      nohz: Assign timekeeping duty to a CPU outside the full dynticks range · a382bf93
      Frederic Weisbecker 提交于
      This way the full nohz CPUs can safely run with the tick
      stopped with a guarantee that somebody else is taking
      care of the jiffies and GTOD progression.
      
      Once the duty is attributed to a CPU, it won't change. Also that
      CPU can't enter into dyntick idle mode or be hot unplugged.
      
      This may later be improved from a power consumption POV. At
      least we should be able to share the duty amongst all CPUs
      outside the full dynticks range. Then the duty could even be
      shared with full dynticks CPUs when those can't stop their
      tick for any reason.
      
      But let's start with that very simple approach first.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      [fix have_nohz_full_mask offcase]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a382bf93
  16. 13 3月, 2013 3 次提交
    • T
      tick: Provide a check for a forced broadcast pending · eaa907c5
      Thomas Gleixner 提交于
      On the CPU which gets woken along with the target CPU of the broadcast
      the following happens:
      
        deep_idle()
      			<-- spurious wakeup
        broadcast_exit()
          set forced bit
        
        enable interrupts
          
      			<-- Nothing happens
      
        disable interrupts
      
        broadcast_enter()
      			<-- Here we observe the forced bit is set
        deep_idle()
      
      Now after that the target CPU of the broadcast runs the broadcast
      handler and finds the other CPU in both the broadcast and the forced
      mask, sends the IPI and stuff gets back to normal.
      
      So it's not actually harmful, just more evidence for the theory, that
      hardware designers have access to very special drug supplies.
      
      Now there is no point in going back to deep idle just to wake up again
      right away via an IPI. Provide a check which allows the idle code to
      avoid the deep idle transition.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: LAK <linux-arm-kernel@lists.infradead.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Arjan van de Veen <arjan@infradead.org>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Jason Liu <liu.h.jason@gmail.com>
      Link: http://lkml.kernel.org/r/20130306111537.565418308@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      eaa907c5
    • T
      tick: Handle broadcast wakeup of multiple cpus · 989dcb64
      Thomas Gleixner 提交于
      Some brilliant hardware implementations wake multiple cores when the
      broadcast timer fires. This leads to the following interesting
      problem:
      
      CPU0				CPU1
      wakeup from idle		wakeup from idle
      
      leave broadcast mode		leave broadcast mode
       restart per cpu timer		 restart per cpu timer
       	     	 		go back to idle
      handle broadcast
       (empty mask)			
      				enter broadcast mode
      				programm broadcast device
      enter broadcast mode
      programm broadcast device
      
      So what happens is that due to the forced reprogramming of the cpu
      local timer, we need to set a event in the future. Now if we manage to
      go back to idle before the timer fires, we switch off the timer and
      arm the broadcast device with an already expired time (covered by
      forced mode). So in the worst case we repeat the above ping pong
      forever.
      					
      Unfortunately we have no information about what caused the wakeup, but
      we can check current time against the expiry time of the local cpu. If
      the local event is already in the past, we know that the broadcast
      timer is about to fire and send an IPI. So we mark ourself as an IPI
      target even if we left broadcast mode and avoid the reprogramming of
      the local cpu timer.
      
      This still leaves the possibility that a CPU which is not handling the
      broadcast interrupt is going to reach idle again before the IPI
      arrives. This can't be solved in the core code and will be handled in
      follow up patches.
      Reported-by: NJason Liu <liu.h.jason@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: LAK <linux-arm-kernel@lists.infradead.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Arjan van de Veen <arjan@infradead.org>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Link: http://lkml.kernel.org/r/20130306111537.492045206@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      989dcb64
    • T
      tick: Avoid programming the local cpu timer if broadcast pending · 26517f3e
      Thomas Gleixner 提交于
      If the local cpu timer stops in deep idle, we arm the broadcast device
      and get woken by an IPI. Now when we return from deep idle we reenable
      the local cpu timer unconditionally before handling the IPI. But
      that's a pointless exercise: the timer is already expired and the IPI
      is on the way. And it's an expensive exercise as we use the forced
      reprogramming mode so that we do not lose a timer event. This forced
      reprogramming will loop at least once in the retry.
      
      To avoid this reprogramming, we mark the cpu in a pending bit mask
      before we send the IPI. Now when the IPI target cpu wakes up, it will
      see the pending bit set and skip the reprogramming. The reprogramming
      of the cpu local timer will happen in the IPI handler which runs the
      cpu local timer interrupt function.
      Reported-by: NJason Liu <liu.h.jason@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: LAK <linux-arm-kernel@lists.infradead.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Arjan van de Veen <arjan@infradead.org>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Link: http://lkml.kernel.org/r/20130306111537.431082074@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      26517f3e
  17. 08 3月, 2013 1 次提交
    • M
      clockevents: Don't allow dummy broadcast timers · a7dc19b8
      Mark Rutland 提交于
      Currently tick_check_broadcast_device doesn't reject clock_event_devices
      with CLOCK_EVT_FEAT_DUMMY, and may select them in preference to real
      hardware if they have a higher rating value. In this situation, the
      dummy timer is responsible for broadcasting to itself, and the core
      clockevents code may attempt to call non-existent callbacks for
      programming the dummy, eventually leading to a panic.
      
      This patch makes tick_check_broadcast_device always reject dummy timers,
      preventing this problem.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Jon Medhurst (Tixy) <tixy@linaro.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      a7dc19b8
  18. 07 3月, 2013 3 次提交
  19. 13 2月, 2013 1 次提交
    • M
      clockevents: Fix generic broadcast for FEAT_C3STOP · 5d1d9a29
      Mark Rutland 提交于
      Commit 12ad1000: "clockevents: Add generic timer broadcast function"
      made tick_device_uses_broadcast set up the generic broadcast function
      for dummy devices (where !tick_device_is_functional(dev)), but neglected
      to set up the broadcast function for devices that stop in low power
      states (with the CLOCK_EVT_FEAT_C3STOP flag).
      
      When these devices enter low power states they will not have the generic
      broadcast function assigned, and will bring down the system when an
      attempt is made to broadcast to them.
      
      This patch ensures that the broadcast function is also assigned for
      devices which require broadcast in low power states.
      Reported-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: nico@linaro.org
      Cc: Marc.Zyngier@arm.com
      Cc: Will.Deacon@arm.com
      Cc: santosh.shilimkar@ti.com
      Cc: john.stultz@linaro.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5d1d9a29
  20. 01 2月, 2013 2 次提交
  21. 20 4月, 2012 2 次提交
    • S
      tick: Fix the spurious broadcast timer ticks after resume · a6371f80
      Suresh Siddha 提交于
      During resume, tick_resume_broadcast() programs the broadcast timer in
      oneshot mode unconditionally. On the platforms where broadcast timer
      is not really required, this will generate spurious broadcast timer
      ticks upon resume. For example, on the always running apic timer
      platforms with HPET, I see spurious hpet tick once every ~5minutes
      (which is the 32-bit hpet counter wraparound time).
      
      Similar to boot time, during resume make the oneshot mode setting of
      the broadcast clock event device conditional on the state of active
      broadcast users.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Tested-by: svenjoac@gmx.de
      Cc: torvalds@linux-foundation.org
      Cc: rjw@sisk.pl
      Link: http://lkml.kernel.org/r/1334802459.28674.209.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a6371f80
    • T
      tick: Ensure that the broadcast device is initialized · b9a6a235
      Thomas Gleixner 提交于
      Santosh found another trap when we avoid to initialize the broadcast
      device in the switch_to_oneshot code. The broadcast device might be
      still in SHUTDOWN state when we actually need to use it. That
      obviously breaks, as set_next_event() is called on a shutdown
      device. This did not break on x86, but Suresh analyzed it:
      
      From the review, most likely on Sven's system we are force enabling
      the hpet using the pci quirk's method very late. And in this case,
      hpet_clockevent (which will be global_clock_event) handler can be
      null, specifically as this platform might not be using deeper c-states
      and using the reliable APIC timer.
      
      Prior to commit 'fa4da365', that handler will be set to
      'tick_handle_oneshot_broadcast' when we switch the broadcast timer to
      oneshot mode, even though we don't use it. Post commit
      'fa4da365', we stopped switching the broadcast mode to oneshot
      as this is not really needed and his platform's global_clock_event's
      handler will remain null. While on my SNB laptop, same is set to
      'clockevents_handle_noop' because hpet gets enabled very early. (noop
      handler on my platform set when the early enabled hpet timer gets
      replaced by the lapic timer).
      
      But the commit 'fa4da365' tracked the broadcast timer mode in
      the SW as oneshot, even though it didn't touch the HW timer. During
      resume however, tick_resume_broadcast() saw the SW broadcast mode as
      oneshot and actually programmed the broadcast device also into oneshot
      mode. So this triggered the null pointer de-reference after the hpet
      wraps around and depending on what the hpet counter is set to. On the
      normal platforms where hpet gets enabled early we should be seeing a
      spurious interrupt (in my SNB laptop I see one spurious interrupt
      after around 5 minutes ;) which is 32-bit hpet counter wraparound
      time), but that's a separate issue.
      
      Enforce the mode setting when trying to set an event.
      Reported-and-tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: torvalds@linux-foundation.org
      Cc: svenjoac@gmx.de
      Cc: rjw@sisk.pl
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1204181723350.2542@ionos
      b9a6a235
  22. 18 4月, 2012 1 次提交
    • T
      tick: Fix oneshot broadcast setup really · b435092f
      Thomas Gleixner 提交于
      Sven Joachim reported, that suspend/resume on rc3 trips over a NULL
      pointer dereference. Linus spotted the clockevent handler being NULL.
      
      commit fa4da365(clockevents: tTack broadcast device mode change in
      tick_broadcast_switch_to_oneshot()) tried to fix a problem with the
      broadcast device setup, which was introduced in commit 77b0d60c(
      clockevents: Leave the broadcast device in shutdown mode when not
      needed).
      
      The initial commit avoided to set up the broadcast device when no
      broadcast request bits were set, but that left the broadcast device
      disfunctional. In consequence deep idle states which need the
      broadcast device were not woken up.
      
      commit fa4da365 tried to fix that by initializing the state of the
      broadcast facility, but that missed the fact, that nothing initializes
      the event handler and some other state of the underlying clock event
      device.
      
      The fix is to revert both commits and make only the mode setting of
      the clock event device conditional on the state of active broadcast
      users. 
      
      That initializes everything except the low level device mode, but this
      happens when the broadcast functionality is invoked by deep idle.
      Reported-and-tested-by: NSven Joachim <svenjoac@gmx.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1204181205540.2542@ionos
      b435092f
  23. 10 4月, 2012 1 次提交
  24. 15 2月, 2012 1 次提交
  25. 02 12月, 2011 1 次提交
  26. 08 9月, 2011 1 次提交
  27. 17 5月, 2011 1 次提交
    • T
      tick: Clear broadcast active bit when switching to oneshot · 07f4beb0
      Thomas Gleixner 提交于
      The first cpu which switches from periodic to oneshot mode switches
      also the broadcast device into oneshot mode. The broadcast device
      serves as a backup for per cpu timers which stop in deeper
      C-states. To avoid starvation of the cpus which might be in idle and
      depend on broadcast mode it marks the other cpus as broadcast active
      and sets the brodcast expiry value of those cpus to the next tick.
      
      The oneshot mode broadcast bit for the other cpus is sticky and gets
      only cleared when those cpus exit idle. If a cpu was not idle while
      the bit got set in consequence the bit prevents that the broadcast
      device is armed on behalf of that cpu when it enters idle for the
      first time after it switched to oneshot mode.
      
      In most cases that goes unnoticed as one of the other cpus has usually
      a timer pending which keeps the broadcast device armed with a short
      timeout. Now if the only cpu which has a short timer active has the
      bit set then the broadcast device will not be armed on behalf of that
      cpu and will fire way after the expected timer expiry. In the case of
      Christians bug report it took ~145 seconds which is about half of the
      wrap around time of HPET (the limit for that device) due to the fact
      that all other cpus had no timers armed which expired before the 145
      seconds timeframe.
      
      The solution is simply to clear the broadcast active bit
      unconditionally when a cpu switches to oneshot mode after the first
      cpu switched the broadcast device over. It's not idle at that point
      otherwise it would not be executing that code.
      
      [ I fundamentally hate that broadcast crap. Why the heck thought some
        folks that when going into deep idle it's a brilliant concept to
        switch off the last device which brings the cpu back from that
        state? ]
      
      Thanks to Christian for providing all the valuable debug information!
      Reported-and-tested-by: NChristian Hoffmann <email@christianhoffmann.info>
      Cc: John Stultz <johnstul@us.ibm.com>
      Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1105161105170.3078%40ionos%3E
      Cc: stable@kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      07f4beb0
  28. 05 5月, 2011 1 次提交