1. 19 7月, 2009 1 次提交
  2. 24 6月, 2009 1 次提交
    • H
      timer stats: Optimize by adding quick check to avoid function calls · 507e1231
      Heiko Carstens 提交于
      When the kernel is configured with CONFIG_TIMER_STATS but timer
      stats are runtime disabled we still get calls to
      __timer_stats_timer_set_start_info which initializes some
      fields in the corresponding struct timer_list.
      
      So add some quick checks in the the timer stats setup functions
      to avoid function calls to __timer_stats_timer_set_start_info
      when timer stats are disabled.
      
      In an artificial workload that does nothing but playing ping
      pong with a single tcp packet via loopback this decreases cpu
      consumption by 1 - 1.5%.
      
      This is part of a modified function trace output on SLES11:
      
       perl-2497  [00] 28630647177732388 [+  125]: sk_reset_timer <-tcp_v4_rcv
       perl-2497  [00] 28630647177732513 [+  125]: mod_timer <-sk_reset_timer
       perl-2497  [00] 28630647177732638 [+  125]: __timer_stats_timer_set_start_info <-mod_timer
       perl-2497  [00] 28630647177732763 [+  125]: __mod_timer <-mod_timer
       perl-2497  [00] 28630647177732888 [+  125]: __timer_stats_timer_set_start_info <-__mod_timer
       perl-2497  [00] 28630647177733013 [+   93]: lock_timer_base <-__mod_timer
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mustafa Mesanovic <mustafa.mesanovic@de.ibm.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      LKML-Reference: <20090623153811.GA4641@osiris.boeblingen.de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      507e1231
  3. 29 5月, 2009 1 次提交
  4. 15 5月, 2009 2 次提交
    • T
      sched, timers: cleanup avenrun users · 2d02494f
      Thomas Gleixner 提交于
      avenrun is an rough estimate so we don't have to worry about
      consistency of the three avenrun values. Remove the xtime lock
      dependency and provide a function to scale the values. Cleanup the
      users.
      
      [ Impact: cleanup ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      2d02494f
    • T
      sched, timers: move calc_load() to scheduler · dce48a84
      Thomas Gleixner 提交于
      Dimitri Sivanich noticed that xtime_lock is held write locked across
      calc_load() which iterates over all online CPUs. That can cause long
      latencies for xtime_lock readers on large SMP systems. 
      
      The load average calculation is an rough estimate anyway so there is
      no real need to protect the readers vs. the update. It's not a problem
      when the avenrun array is updated while a reader copies the values.
      
      Instead of iterating over all online CPUs let the scheduler_tick code
      update the number of active tasks shortly before the avenrun update
      happens. The avenrun update itself is handled by the CPU which calls
      do_timer().
      
      [ Impact: reduce xtime_lock write locked section ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      dce48a84
  5. 13 5月, 2009 2 次提交
    • A
      timers: Logic to move non pinned timers · eea08f32
      Arun R Bharadwaj 提交于
      * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]:
      
      This patch migrates all non pinned timers and hrtimers to the current
      idle load balancer, from all the idle CPUs. Timers firing on busy CPUs
      are not migrated.
      
      While migrating hrtimers, care should be taken to check if migrating
      a hrtimer would result in a latency or not. So we compare the expiry of the
      hrtimer with the next timer interrupt on the target cpu and migrate the
      hrtimer only if it expires *after* the next interrupt on the target cpu.
      So, added a clockevents_get_next_event() helper function to return the
      next_event on the target cpu's clock_event_device.
      
      [ tglx: cleanups and simplifications ]
      Signed-off-by: NArun R Bharadwaj <arun@linux.vnet.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      eea08f32
    • A
      timers: Framework for identifying pinned timers · 597d0275
      Arun R Bharadwaj 提交于
      * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]:
      
      This patch creates a new framework for identifying cpu-pinned timers
      and hrtimers.
      
      This framework is needed because pinned timers are expected to fire on
      the same CPU on which they are queued. So it is essential to identify
      these and not migrate them, in case there are any.
      
      For regular timers, the currently existing add_timer_on() can be used
      queue pinned timers and subsequently mod_timer_pinned() can be used
      to modify the 'expires' field.
      
      For hrtimers, new modes HRTIMER_ABS_PINNED and HRTIMER_REL_PINNED are
      added to queue cpu-pinned hrtimer.
      
      [ tglx: use .._PINNED mode argument instead of creating tons of new
      functions ]
      Signed-off-by: NArun R Bharadwaj <arun@linux.vnet.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      597d0275
  6. 02 5月, 2009 1 次提交
  7. 06 4月, 2009 1 次提交
    • P
      perf_counter: unify and fix delayed counter wakeup · 925d519a
      Peter Zijlstra 提交于
      While going over the wakeup code I noticed delayed wakeups only work
      for hardware counters but basically all software counters rely on
      them.
      
      This patch unifies and generalizes the delayed wakeup to fix this
      issue.
      
      Since we're dealing with NMI context bits here, use a cmpxchg() based
      single link list implementation to track counters that have pending
      wakeups.
      
      [ This should really be generic code for delayed wakeups, but since we
        cannot use cmpxchg()/xchg() in generic code, I've let it live in the
        perf_counter code. -- Eric Dumazet could use it to aggregate the
        network wakeups. ]
      
      Furthermore, the x86 method of using TIF flags was flawed in that its
      quite possible to end up setting the bit on the idle task, loosing the
      wakeup.
      
      The powerpc method uses per-cpu storage and does appear to be
      sufficient.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Orig-LKML-Reference: <20090330171023.153932974@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      925d519a
  8. 02 4月, 2009 1 次提交
    • R
      timers: add missing kernel-doc · 633fe795
      Randy Dunlap 提交于
      Add missing kernel-doc parameter notation and change function
      name to its new name:
      
        Warning(kernel/timer.c:543): No description found for parameter 'name'
        Warning(kernel/timer.c:543): No description found for parameter 'key'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: akpm <akpm@linux-foundation.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      LKML-Reference: <20090401174723.f0bea0eb.randy.dunlap@oracle.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      633fe795
  9. 19 2月, 2009 1 次提交
    • I
      timers: add mod_timer_pending() · 74019224
      Ingo Molnar 提交于
      Impact: new timer API
      
      Based on an idea from Martin Josefsson with the help of
      Patrick McHardy and Stephen Hemminger:
      
      introduce the mod_timer_pending() API which is a mod_timer()
      offspring that is an invariant on already removed timers.
      
      (regular mod_timer() re-activates non-pending timers.)
      
      This is useful for the networking code in that it can
      allow unserialized mod_timer_pending() timer-forwarding
      calls, but a single del_timer*() will stop the timer
      from being reactivated again.
      
      Also while at it:
      
      - optimize the regular mod_timer() path some more, the
        timer-stat and a debug check was needlessly duplicated
        in __mod_timer().
      
      - make the exports come straight after the function, as
        most other exports in timer.c already did.
      
      - eliminate __mod_timer() as an external API, change the
        users to mod_timer().
      
      The regular mod_timer() code path is not impacted
      significantly, due to inlining optimizations and due to
      the simplifications.
      
      Based-on-patch-from: Stephen Hemminger <shemminger@vyatta.com>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: netdev@vger.kernel.org
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      74019224
  10. 15 2月, 2009 1 次提交
  11. 14 1月, 2009 4 次提交
  12. 31 12月, 2008 2 次提交
    • M
      [PATCH] idle cputime accounting · 79741dd3
      Martin Schwidefsky 提交于
      The cpu time spent by the idle process actually doing something is
      currently accounted as idle time. This is plain wrong, the architectures
      that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
      time spent doing nothing and the time spent by idle doing work. The first
      is accounted with account_idle_time and the second with account_system_time.
      The architectures that use the account_xxx_time interface directly and not
      the account_xxx_ticks interface now need to do the check for the idle
      process in their arch code. In particular to improve the system vs true
      idle time accounting the arch code needs to measure the true idle time
      instead of just testing for the idle process.
      To improve the tick based accounting as well we would need an architecture
      primitive that can tell us if the pt_regs of the interrupted context
      points to the magic instruction that halts the cpu.
      
      In addition idle time is no more added to the stime of the idle process.
      This field now contains the system time of the idle process as it should
      be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
      every tick that occurs while idle is running will be accounted as idle
      time.
      
      This patch contains the necessary common code changes to be able to
      distinguish idle system time and true idle time. The architectures with
      support for VIRT_CPU_ACCOUNTING need some changes to exploit this.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      79741dd3
    • M
      [PATCH] fix scaled & unscaled cputime accounting · 457533a7
      Martin Schwidefsky 提交于
      The utimescaled / stimescaled fields in the task structure and the
      global cpustat should be set on all architectures. On s390 the calls
      to account_user_time_scaled and account_system_time_scaled never have
      been added. In addition system time that is accounted as guest time
      to the user time of a process is accounted to the scaled system time
      instead of the scaled user time.
      To fix the bugs and to prevent future forgetfulness this patch merges
      account_system_time_scaled into account_system_time and
      account_user_time_scaled into account_user_time.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Michael Neuling <mikey@neuling.org>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      457533a7
  13. 14 11月, 2008 1 次提交
  14. 06 11月, 2008 1 次提交
  15. 21 8月, 2008 1 次提交
  16. 11 8月, 2008 1 次提交
  17. 25 5月, 2008 1 次提交
    • C
      Remove argument from open_softirq which is always NULL · 962cf36c
      Carlos R. Mafra 提交于
      As git-grep shows, open_softirq() is always called with the last argument
      being NULL
      
      block/blk-core.c:       open_softirq(BLOCK_SOFTIRQ, blk_done_softirq, NULL);
      kernel/hrtimer.c:       open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq, NULL);
      kernel/rcuclassic.c:    open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL);
      kernel/rcupreempt.c:    open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL);
      kernel/sched.c: open_softirq(SCHED_SOFTIRQ, run_rebalance_domains, NULL);
      kernel/softirq.c:       open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
      kernel/softirq.c:       open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);
      kernel/timer.c: open_softirq(TIMER_SOFTIRQ, run_timer_softirq, NULL);
      net/core/dev.c: open_softirq(NET_TX_SOFTIRQ, net_tx_action, NULL);
      net/core/dev.c: open_softirq(NET_RX_SOFTIRQ, net_rx_action, NULL);
      
      This observation has already been made by Matthew Wilcox in June 2002
      (http://www.cs.helsinki.fi/linux/linux-kernel/2002-25/0687.html)
      
      "I notice that none of the current softirq routines use the data element
      passed to them."
      
      and the situation hasn't changed since them. So it appears we can safely
      remove that extra argument to save 128 (54) bytes of kernel data (text).
      Signed-off-by: NCarlos R. Mafra <crmafra@ift.unesp.br>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      962cf36c
  18. 13 5月, 2008 1 次提交
  19. 30 4月, 2008 1 次提交
  20. 17 4月, 2008 1 次提交
  21. 26 3月, 2008 1 次提交
    • T
      NOHZ: reevaluate idle sleep length after add_timer_on() · 06d8308c
      Thomas Gleixner 提交于
      add_timer_on() can add a timer on a CPU which is currently in a long
      idle sleep, but the timer wheel is not reevaluated by the nohz code on
      that CPU. So a timer can be delayed for quite a long time. This
      triggered a false positive in the clocksource watchdog code.
      
      To avoid this we need to wake up the idle CPU and enforce the
      reevaluation of the timer wheel for the next timer event.
      
      Add a function, which checks a given CPU for idle state, marks the
      idle task with NEED_RESCHED and sends a reschedule IPI to notify the
      other CPU of the change in the timer wheel.
      
      Call this function from add_timer_on().
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: stable@kernel.org
      
      --
       include/linux/sched.h |    6 ++++++
       kernel/sched.c        |   43 +++++++++++++++++++++++++++++++++++++++++++
       kernel/timer.c        |   10 +++++++++-
       3 files changed, 58 insertions(+), 1 deletion(-)
      06d8308c
  22. 09 2月, 2008 2 次提交
  23. 07 2月, 2008 1 次提交
    • M
      taskstats scaled time cleanup · 06b8e878
      Michael Neuling 提交于
      This moves the ability to scale cputime into generic code.  This allows us
      to fix the issue in kernel/timer.c (noticed by Balbir) where we could only
      add an unscaled value to the scaled utime/stime.
      
      This adds a cputime_to_scaled function.  As before, the POWERPC version
      does the scaling based on the last SPURR/PURR ratio calculated.  The
      generic and s390 (only other arch to implement asm/cputime.h) versions are
      both NOPs.
      
      Also moves the SPURR and PURR snapshots closer.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      06b8e878
  24. 30 1月, 2008 2 次提交
  25. 26 1月, 2008 1 次提交
  26. 22 1月, 2008 1 次提交
    • R
      timer: fix section mismatch · 48ccf3da
      Randy Dunlap 提交于
      The caller is __cpuinit.
      Also, this code block and its caller are inside #ifdef CONFIG_HOTPLUG_CPU
      blocks, so this code should reflect that config symbol's usage.
      
      WARNING: vmlinux.o(.text+0x4252f): Section mismatch: reference to .init.text: (between 'timer_cpu_notify' and 'msleep')
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: Linus Torvalds <torvalds@akpm@linux-foundation.org>
      48ccf3da
  27. 14 1月, 2008 1 次提交
    • R
      remove task_ppid_nr_ns · 84427eae
      Roland McGrath 提交于
      task_ppid_nr_ns is called in three places.  One of these should never
      have called it.  In the other two, using it broke the existing
      semantics.  This was presumably accidental.  If the function had not
      been there, it would have been much more obvious to the eye that those
      patches were changing the behavior.  We don't need this function.
      
      In task_state, the pid of the ptracer is not the ppid of the ptracer.
      
      In do_task_stat, ppid is the tgid of the real_parent, not its pid.
      I also moved the call outside of lock_task_sighand, since it doesn't
      need it.
      
      In sys_getppid, ppid is the tgid of the real_parent, not its pid.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      84427eae
  28. 19 12月, 2007 1 次提交
  29. 07 12月, 2007 1 次提交
  30. 10 11月, 2007 1 次提交
    • P
      sched: restore deterministic CPU accounting on powerpc · fa13a5a1
      Paul Mackerras 提交于
      Since powerpc started using CONFIG_GENERIC_CLOCKEVENTS, the
      deterministic CPU accounting (CONFIG_VIRT_CPU_ACCOUNTING) has been
      broken on powerpc, because we end up counting user time twice: once in
      timer_interrupt() and once in update_process_times().
      
      This fixes the problem by pulling the code in update_process_times
      that updates utime and stime into a separate function called
      account_process_tick.  If CONFIG_VIRT_CPU_ACCOUNTING is not defined,
      there is a version of account_process_tick in kernel/timer.c that
      simply accounts a whole tick to either utime or stime as before.  If
      CONFIG_VIRT_CPU_ACCOUNTING is defined, then arch code gets to
      implement account_process_tick.
      
      This also lets us simplify the s390 code a bit; it means that the s390
      timer interrupt can now call update_process_times even when
      CONFIG_VIRT_CPU_ACCOUNTING is turned on, and can just implement a
      suitable account_process_tick().
      
      account_process_tick() now takes the task_struct * as an argument.
      Tested both with and without CONFIG_VIRT_CPU_ACCOUNTING.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fa13a5a1
  31. 06 11月, 2007 1 次提交
  32. 20 10月, 2007 1 次提交
    • P
      pid namespaces: changes to show virtual ids to user · b488893a
      Pavel Emelyanov 提交于
      This is the largest patch in the set. Make all (I hope) the places where
      the pid is shown to or get from user operate on the virtual pids.
      
      The idea is:
       - all in-kernel data structures must store either struct pid itself
         or the pid's global nr, obtained with pid_nr() call;
       - when seeking the task from kernel code with the stored id one
         should use find_task_by_pid() call that works with global pids;
       - when showing pid's numerical value to the user the virtual one
         should be used, but however when one shows task's pid outside this
         task's namespace the global one is to be used;
       - when getting the pid from userspace one need to consider this as
         the virtual one and use appropriate task/pid-searching functions.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: nuther build fix]
      [akpm@linux-foundation.org: yet nuther build fix]
      [akpm@linux-foundation.org: remove unneeded casts]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b488893a