1. 15 2月, 2012 2 次提交
  2. 12 12月, 2011 4 次提交
    • F
      nohz: Remove tick_nohz_idle_enter_norcu() / tick_nohz_idle_exit_norcu() · 1268fbc7
      Frederic Weisbecker 提交于
      Those two APIs were provided to optimize the calls of
      tick_nohz_idle_enter() and rcu_idle_enter() into a single
      irq disabled section. This way no interrupt happening in-between would
      needlessly process any RCU job.
      
      Now we are talking about an optimization for which benefits
      have yet to be measured. Let's start simple and completely decouple
      idle rcu and dyntick idle logics to simplify.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1268fbc7
    • F
      nohz: Allow rcu extended quiescent state handling seperately from tick stop · 2bbb6817
      Frederic Weisbecker 提交于
      It is assumed that rcu won't be used once we switch to tickless
      mode and until we restart the tick. However this is not always
      true, as in x86-64 where we dereference the idle notifiers after
      the tick is stopped.
      
      To prepare for fixing this, add two new APIs:
      tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
      
      If no use of RCU is made in the idle loop between
      tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
      must instead call the new *_norcu() version such that the arch doesn't
      need to call rcu_idle_enter() and rcu_idle_exit().
      
      Otherwise the arch must call tick_nohz_enter_idle() and
      tick_nohz_exit_idle() and also call explicitly:
      
      - rcu_idle_enter() after its last use of RCU before the CPU is put
      to sleep.
      - rcu_idle_exit() before the first use of RCU after the CPU is woken
      up.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2bbb6817
    • F
      nohz: Separate out irq exit and idle loop dyntick logic · 280f0677
      Frederic Weisbecker 提交于
      The tick_nohz_stop_sched_tick() function, which tries to delay
      the next timer tick as long as possible, can be called from two
      places:
      
      - From the idle loop to start the dytick idle mode
      - From interrupt exit if we have interrupted the dyntick
      idle mode, so that we reprogram the next tick event in
      case the irq changed some internal state that requires this
      action.
      
      There are only few minor differences between both that
      are handled by that function, driven by the ts->inidle
      cpu variable and the inidle parameter. The whole guarantees
      that we only update the dyntick mode on irq exit if we actually
      interrupted the dyntick idle mode, and that we enter in RCU extended
      quiescent state from idle loop entry only.
      
      Split this function into:
      
      - tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
      dynticks idle mode unconditionally if it can, and enters into RCU
      extended quiescent state.
      
      - tick_nohz_irq_exit() which only updates the dynticks idle mode
      when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).
      
      To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
      into tick_nohz_idle_exit().
      
      This simplifies the code and micro-optimize the irq exit path (no need
      for local_irq_save there). This also prepares for the split between
      dynticks and rcu extended quiescent state logics. We'll need this split to
      further fix illegal uses of RCU in extended quiescent states in the idle
      loop.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      280f0677
    • P
      rcu: Track idleness independent of idle tasks · 9b2e4f18
      Paul E. McKenney 提交于
      Earlier versions of RCU used the scheduling-clock tick to detect idleness
      by checking for the idle task, but handled idleness differently for
      CONFIG_NO_HZ=y.  But there are now a number of uses of RCU read-side
      critical sections in the idle task, for example, for tracing.  A more
      fine-grained detection of idleness is therefore required.
      
      This commit presses the old dyntick-idle code into full-time service,
      so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
      always invoked at the beginning of an idle loop iteration.  Similarly,
      rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
      at the end of an idle-loop iteration.  This allows the idle task to
      use RCU everywhere except between consecutive rcu_idle_enter() and
      rcu_idle_exit() calls, in turn allowing architecture maintainers to
      specify exactly where in the idle loop that RCU may be used.
      
      Because some of the userspace upcall uses can result in what looks
      to RCU like half of an interrupt, it is not possible to expect that
      the irq_enter() and irq_exit() hooks will give exact counts.  This
      patch therefore expands the ->dynticks_nesting counter to 64 bits
      and uses two separate bitfields to count process/idle transitions
      and interrupt entry/exit transitions.  It is presumed that userspace
      upcalls do not happen in the idle loop or from usermode execution
      (though usermode might do a system call that results in an upcall).
      The counter is hard-reset on each process/idle transition, which
      avoids the interrupt entry/exit error from accumulating.  Overflow
      is avoided by the 64-bitness of the ->dyntick_nesting counter.
      
      This commit also adds warnings if a non-idle task asks RCU to enter
      idle state (and these checks will need some adjustment before applying
      Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
      In addition, validation of ->dynticks and ->dynticks_nesting is added.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      9b2e4f18
  3. 06 12月, 2011 1 次提交
  4. 29 9月, 2011 1 次提交
  5. 08 9月, 2011 3 次提交
  6. 01 2月, 2011 1 次提交
  7. 20 1月, 2011 1 次提交
    • S
      hrtimers: Notify hrtimer users of switches to NOHZ mode · 2d0640b4
      Stephen Boyd 提交于
      When NOHZ=y and high res timers are disabled (via cmdline or
      Kconfig) tick_nohz_switch_to_nohz() will notify the user about
      switching into NOHZ mode. Nothing is printed for the case where
      HIGH_RES_TIMERS=y. Fix this for the HIGH_RES_TIMERS=y case by
      duplicating the printk from the low res NOHZ path in the high
      res NOHZ path.
      
      This confused me since I was thinking 'dmesg | grep -i NOHZ' would
      tell me if NOHZ was enabled, but if I have hrtimers there is
      nothing.
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1295419594-13085-1-git-send-email-sboyd@codeaurora.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2d0640b4
  8. 03 8月, 2010 1 次提交
  9. 17 7月, 2010 1 次提交
  10. 01 7月, 2010 1 次提交
    • P
      sched: Cure nr_iowait_cpu() users · 8c215bd3
      Peter Zijlstra 提交于
      Commit 0224cf4c (sched: Intoduce get_cpu_iowait_time_us())
      broke things by not making sure preemption was indeed disabled
      by the callers of nr_iowait_cpu() which took the iowait value of
      the current cpu.
      
      This resulted in a heap of preempt warnings. Cure this by making
      nr_iowait_cpu() take a cpu number and fix up the callers to pass
      in the right number.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Maxim Levitsky <maximlevitsky@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: linux-pm@lists.linux-foundation.org
      LKML-Reference: <1277968037.1868.120.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8c215bd3
  11. 18 6月, 2010 1 次提交
  12. 09 6月, 2010 1 次提交
    • V
      sched: Change nohz idle load balancing logic to push model · 83cd4fe2
      Venkatesh Pallipadi 提交于
      In the new push model, all idle CPUs indeed go into nohz mode. There is
      still the concept of idle load balancer (performing the load balancing
      on behalf of all the idle cpu's in the system). Busy CPU kicks the nohz
      balancer when any of the nohz CPUs need idle load balancing.
      The kickee CPU does the idle load balancing on behalf of all idle CPUs
      instead of the normal idle balance.
      
      This addresses the below two problems with the current nohz ilb logic:
      * the idle load balancer continued to have periodic ticks during idle and
        wokeup frequently, even though it did not have any rebalancing to do on
        behalf of any of the idle CPUs.
      * On x86 and CPUs that have APIC timer stoppage on idle CPUs, this
        periodic wakeup can result in a periodic additional interrupt on a CPU
        doing the timer broadcast.
      
      Also currently we are migrating the unpinned timers from an idle to the cpu
      doing idle load balancing (when all the cpus in the system are idle,
      there is no idle load balancing cpu and timers get added to the same idle cpu
      where the request was made. So the existing optimization works only on semi idle
      system).
      
      And In semi idle system, we no longer have periodic ticks on the idle load
      balancer CPU. Using that cpu will add more delays to the timers than intended
      (as that cpu's timer base may not be uptodate wrt jiffies etc). This was
      causing mysterious slowdowns during boot etc.
      
      For now, in the semi idle case, use the nearest busy cpu for migrating timers
      from an idle cpu.  This is good for power-savings anyway.
      Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <1274486981.2840.46.camel@sbs-t61.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      83cd4fe2
  13. 10 5月, 2010 6 次提交
  14. 12 3月, 2010 1 次提交
    • M
      sched: Rate-limit nohz · 39c0cbe2
      Mike Galbraith 提交于
      Entering nohz code on every micro-idle is costing ~10% throughput for netperf
      TCP_RR when scheduling cross-cpu.  Rate limiting entry fixes this, but raises
      ticks a bit.  On my Q6600, an idle box goes from ~85 interrupts/sec to 128.
      
      The higher the context switch rate, the more nohz entry costs.  With this patch
      and some cycle recovery patches in my tree, max cross cpu context switch rate is
      improved by ~16%, a large portion of which of which is this ratelimiting.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1268301003.6785.28.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      39c0cbe2
  15. 14 11月, 2009 3 次提交
    • T
      nohz: Track last do_timer() cpu · 27185016
      Thomas Gleixner 提交于
      The previous patch which limits the sleep time to the maximum
      deferment time of the time keeping clocksource has some limitations on
      SMP machines: if all CPUs are idle then for all CPUs the maximum sleep
      time is limited.
      
      Solve this by keeping track of which cpu had the do_timer() duty
      assigned last and limit the sleep time only for this cpu.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <new-submission>
      Cc: Jon Hunter <jon-hunter@ti.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      27185016
    • J
      nohz: Prevent clocksource wrapping during idle · 98962465
      Jon Hunter 提交于
      The dynamic tick allows the kernel to sleep for periods longer than a
      single tick, but it does not limit the sleep time currently. In the
      worst case the kernel could sleep longer than the wrap around time of
      the time keeping clock source which would result in losing track of
      time.
      
      Prevent this by limiting it to the safe maximum sleep time of the
      current time keeping clock source. The value is calculated when the
      clock source is registered.
      
      [ tglx: simplified the code a bit and massaged the commit msg ]
      Signed-off-by: NJon Hunter <jon-hunter@ti.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <1250617512-23567-2-git-send-email-jon-hunter@ti.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      98962465
    • T
      nohz: Type cast printk argument · 529eaccd
      Thomas Gleixner 提交于
      On some archs local_softirq_pending() has a data type of unsigned long
      on others its unsigned int. Type cast it to (unsigned int) in the
      printk to avoid the compiler warning.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <new-submission>
      529eaccd
  16. 05 11月, 2009 2 次提交
    • M
      nohz: Introduce arch_needs_cpu · 3c5d92a0
      Martin Schwidefsky 提交于
      Allow the architecture to request a normal jiffy tick when the system
      goes idle and tick_nohz_stop_sched_tick is called . On s390 the hook is
      used to prevent the system going fully idle if there has been an
      interrupt other than a clock comparator interrupt since the last wakeup.
      
      On s390 the HiperSockets response time for 1 connection ping-pong goes
      down from 42 to 34 microseconds. The CPU cost decreases by 27%.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      LKML-Reference: <20090929122533.402715150@de.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      3c5d92a0
    • M
      nohz: Reuse ktime in sub-functions of tick_check_idle. · eed3b9cf
      Martin Schwidefsky 提交于
      On a system with NOHZ=y tick_check_idle calls tick_nohz_stop_idle and
      tick_nohz_update_jiffies. Given the right conditions (ts->idle_active
      and/or ts->tick_stopped) both function get a time stamp with ktime_get.
      The same time stamp can be reused if both function require one.
      
      On s390 this change has the additional benefit that gcc inlines the
      tick_nohz_stop_idle function into tick_check_idle. The number of
      instructions to execute tick_check_idle drops from 225 to 144
      (without the ktime_get optimization it is 367 vs 215 instructions).
      
      before:
      
       0)               |  tick_check_idle() {
       0)               |    tick_nohz_stop_idle() {
       0)               |      ktime_get() {
       0)               |        read_tod_clock() {
       0)   0.601 us    |        }
       0)   1.765 us    |      }
       0)   3.047 us    |    }
       0)               |    ktime_get() {
       0)               |      read_tod_clock() {
       0)   0.570 us    |      }
       0)   1.727 us    |    }
       0)               |    tick_do_update_jiffies64() {
       0)   0.609 us    |    }
       0)   8.055 us    |  }
      
      after:
      
       0)               |  tick_check_idle() {
       0)               |    ktime_get() {
       0)               |      read_tod_clock() {
       0)   0.617 us    |      }
       0)   1.773 us    |    }
       0)               |    tick_do_update_jiffies64() {
       0)   0.593 us    |    }
       0)   4.477 us    |  }
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: john stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090929122533.206589318@de.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      eed3b9cf
  17. 07 10月, 2009 1 次提交
    • E
      NOHZ: update idle state also when NOHZ is inactive · fdc6f192
      Eero Nurkkala 提交于
      Commit f2e21c96 had unfortunate side
      effects with cpufreq governors on some systems.
      
      If the system did not switch into NOHZ mode ts->inidle is not set when
      tick_nohz_stop_sched_tick() is called from the idle routine. Therefor
      all subsequent calls from irq_exit() to tick_nohz_stop_sched_tick()
      fail to call tick_nohz_start_idle(). This results in bogus idle
      accounting information which is passed to cpufreq governors.
      
      Set the inidle flag unconditionally of the NOHZ active state to keep
      the idle time accounting correct in any case.
      
      [ tglx: Added comment and tweaked the changelog ]
      Reported-by: NSteven Noonan <steven@uplinklabs.net>
      Signed-off-by: NEero Nurkkala <ext-eero.nurkkala@nokia.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Cc: stable@kernel.org
      LKML-Reference: <1254907901.30157.93.camel@eenurkka-desktop>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      fdc6f192
  18. 27 5月, 2009 1 次提交
  19. 13 5月, 2009 1 次提交
  20. 15 1月, 2009 1 次提交
  21. 31 12月, 2008 2 次提交
    • M
      [PATCH] idle cputime accounting · 79741dd3
      Martin Schwidefsky 提交于
      The cpu time spent by the idle process actually doing something is
      currently accounted as idle time. This is plain wrong, the architectures
      that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
      time spent doing nothing and the time spent by idle doing work. The first
      is accounted with account_idle_time and the second with account_system_time.
      The architectures that use the account_xxx_time interface directly and not
      the account_xxx_ticks interface now need to do the check for the idle
      process in their arch code. In particular to improve the system vs true
      idle time accounting the arch code needs to measure the true idle time
      instead of just testing for the idle process.
      To improve the tick based accounting as well we would need an architecture
      primitive that can tell us if the pt_regs of the interrupted context
      points to the magic instruction that halts the cpu.
      
      In addition idle time is no more added to the stime of the idle process.
      This field now contains the system time of the idle process as it should
      be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
      every tick that occurs while idle is running will be accounted as idle
      time.
      
      This patch contains the necessary common code changes to be able to
      distinguish idle system time and true idle time. The architectures with
      support for VIRT_CPU_ACCOUNTING need some changes to exploit this.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      79741dd3
    • M
      [PATCH] fix scaled & unscaled cputime accounting · 457533a7
      Martin Schwidefsky 提交于
      The utimescaled / stimescaled fields in the task structure and the
      global cpustat should be set on all architectures. On s390 the calls
      to account_user_time_scaled and account_system_time_scaled never have
      been added. In addition system time that is accounted as guest time
      to the user time of a process is accounted to the scaled system time
      instead of the scaled user time.
      To fix the bugs and to prevent future forgetfulness this patch merges
      account_system_time_scaled into account_system_time and
      account_user_time_scaled into account_user_time.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Michael Neuling <mikey@neuling.org>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      457533a7
  22. 12 12月, 2008 2 次提交
    • W
      nohz: suppress needless timer reprogramming · 00147449
      Woodruff, Richard 提交于
      In my device I get many interrupts from a high speed USB device in a very
      short period of time.  The system spends a lot of time reprogramming the
      hardware timer which is in a slower timing domain as compared to the CPU. 
      This results in the CPU spending a huge amount of time waiting for the
      timer posting to be done.  All of this reprogramming is useless as the
      wake up time has not changed.
      
      As measured using ETM trace this drops my reprogramming penalty from
      almost 60% CPU load down to 15% during high interrupt rate.  I can send
      traces to show this.
      
      Suppress setting of duplicate timer event when timer already stopped. 
      Timer programming can be very costly and can result in long cpu stall/wait
      times.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [tglx@linutronix.de: move the check to the right place and avoid raising
      		     the softirq for nothing]
      Signed-off-by: NRichard Woodruff <r-woodruff2@ti.com>
      Cc: johnstul@us.ibm.com
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      00147449
    • H
      nohz: no softirq pending warnings for offline cpus · fa116ea3
      Heiko Carstens 提交于
      Impact: remove false positive warning
      
      After a cpu was taken down during cpu hotplug (read: disabled for interrupts)
      it still might have pending softirqs. However take_cpu_down makes sure
      that the idle task will run next instead of ksoftirqd on the taken down cpu.
      The idle task will call tick_nohz_stop_sched_tick which might warn about
      pending softirqs just before the cpu kills itself completely.
      
      However the pending softirqs on the dead cpu aren't a problem because they
      will be moved to an online cpu during CPU_DEAD handling.
      
      So make sure we warn only for online cpus.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fa116ea3
  23. 25 11月, 2008 2 次提交
    • P
      hrtimer: removing all ur callback modes · ca109491
      Peter Zijlstra 提交于
      Impact: cleanup, move all hrtimer processing into hardirq context
      
      This is an attempt at removing some of the hrtimer complexity by
      reducing the number of callback modes to 1.
      
      This means that all hrtimer callback functions will be ran from HARD-irq
      context.
      
      I went through all the 30 odd hrtimer callback functions in the kernel
      and saw only one that I'm not quite sure of, which is the one in
      net/can/bcm.c - hence I'm CC-ing the folks responsible for that code.
      
      Furthermore, the hrtimer core now calls callbacks directly with IRQs
      disabled in case you try to enqueue an expired timer. If this timer is a
      periodic timer (which should use hrtimer_forward() to advance its time)
      then it might be possible to end up in an inf. recursive loop due to the
      fact that hrtimer_forward() doesn't round up to the next timer
      granularity, and therefore keeps on calling the callback - obviously
      this needs a fix.
      
      Aside from that, this seems to compile and actually boot on my dual core
      test box - although I'm sure there are some bugs in, me not hitting any
      makes me certain :-)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ca109491
    • R
      sched: convert nohz_cpu_mask to cpumask_var_t. · 6a7b3dc3
      Rusty Russell 提交于
      Impact: (future) size reduction for large NR_CPUS.
      
      Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves
      space for small nr_cpu_ids but big CONFIG_NR_CPUS.  cpumask_var_t
      is just a struct cpumask for !CONFIG_CPUMASK_OFFSTACK.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6a7b3dc3