1. 19 10月, 2010 1 次提交
    • P
      irq_work: Add generic hardirq context callbacks · e360adbe
      Peter Zijlstra 提交于
      Provide a mechanism that allows running code in IRQ context. It is
      most useful for NMI code that needs to interact with the rest of the
      system -- like wakeup a task to drain buffers.
      
      Perf currently has such a mechanism, so extract that and provide it as
      a generic feature, independent of perf so that others may also
      benefit.
      
      The IRQ context callback is generated through self-IPIs where
      possible, or on architectures like powerpc the decrementer (the
      built-in timer facility) is set to generate an interrupt immediately.
      
      Architectures that don't have anything like this get to do with a
      callback from the timer tick. These architectures can call
      irq_work_run() at the tail of any IRQ handlers that might enqueue such
      work (like the perf IRQ handler) to avoid undue latencies in
      processing the work.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      [ various fixes ]
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e360adbe
  2. 11 8月, 2010 1 次提交
  3. 04 8月, 2010 2 次提交
    • P
      timer: Added usleep_range timer · 5e7f5a17
      Patrick Pannuto 提交于
      usleep_range is a finer precision implementations of msleep
      and is designed to be a drop-in replacement for udelay where
      a precise sleep / busy-wait is unnecessary.
      
      Since an easy interface to hrtimers could lead to an undesired
      proliferation of interrupts, we provide only a "range" API,
      forcing the caller to think about an acceptable tolerance on
      both ends and hopefully avoiding introducing another interrupt.
      
      INTRO
      
      As discussed here ( http://lkml.org/lkml/2007/8/3/250 ), msleep(1) is not
      precise enough for many drivers (yes, sleep precision is an unfair notion,
      but consistently sleeping for ~an order of magnitude greater than requested
      is worth fixing). This patch adds a usleep API so that udelay does not have
      to be used. Obviously not every udelay can be replaced (those in atomic
      contexts or being used for simple bitbanging come to mind), but there are
      many, many examples of
      
      mydriver_write(...)
      /* Wait for hardware to latch */
      udelay(100)
      
      in various drivers where a busy-wait loop is neither beneficial nor
      necessary, but msleep simply does not provide enough precision and people
      are using a busy-wait loop instead.
      
      CONCERNS FROM THE RFC
      
      Why is udelay a problem / necessary? Most callers of udelay are in device/
      driver initialization code, which is serial...
      
      	As I see it, there is only benefit to sleeping over a delay; the
      	notion of "refactoring" areas that use udelay was presented, but
      	I see usleep as the refactoring. Consider i2c, if the bus is busy,
      	you need to wait a bit (say 100us) before trying again, your
      	current options are:
      
      		* udelay(100)
      		* msleep(1) <-- As noted above, actually as high as ~20ms
      				on some platforms, so not really an option
      		* Manually set up an hrtimer to try again in 100us (which
      		  is what usleep does anyway...)
      
      	People choose the udelay route because it is EASY; we need to
      	provide a better easy route.
      
      	Device / driver / boot code is *currently* serial, but every few
      	months someone makes noise about parallelizing boot, and IMHO, a
      	little forward-thinking now is one less thing to worry about
      	if/when that ever happens
      
      udelay's could be preempted
      
      	Sure, but if udelay plans on looping 1000 times, and it gets
      	preempted on loop 200, whenever it's scheduled again, it is
      	going to do the next 800 loops.
      
      Is the interruptible case needed?
      
      	Probably not, but I see usleep as a very logical parallel to msleep,
      	so it made sense to include the "full" API. Processors are getting
      	faster (albeit not as quickly as they are becoming more parallel),
      	so if someone wanted to be interruptible for a few usecs, why not
      	let them? If this is a contentious point, I'm happy to remove it.
      
      OTHER THOUGHTS
      
      I believe there is also value in exposing the usleep_range option; it gives
      the scheduler a lot more flexibility and allows the programmer to express
      his intent much more clearly; it's something I would hope future driver
      writers will take advantage of.
      
      To get the results in the NUMBERS section below, I literally s/udelay/usleep
      the kernel tree; I had to go in and undo the changes to the USB drivers, but
      everything else booted successfully; I find that extremely telling in and
      of itself -- many people are using a delay API where a sleep will suit them
      just fine.
      
      SOME ATTEMPTS AT NUMBERS
      
      It turns out that calculating quantifiable benefit on this is challenging,
      so instead I will simply present the current state of things, and I hope
      this to be sufficient:
      
      How many udelay calls are there in 2.6.35-rc5?
      
      	udealy(ARG) >=	| COUNT
      	1000		| 319
      	500		| 414
      	100		| 1146
      	20		| 1832
      
      I am working on Android, so that is my focus for this. The following table
      is a modified usleep that simply printk's the amount of time requested to
      sleep; these tests were run on a kernel with udelay >= 20 --> usleep
      
      "boot" is power-on to lock screen
      "power collapse" is when the power button is pushed and the device suspends
      "resume" is when the power button is pushed and the lock screen is displayed
               (no touchscreen events or anything, just turning on the display)
      "use device" is from the unlock swipe to clicking around a bit; there is no
      	sd card in this phone, so fail loading music, video, camera
      
      	ACTION		| TOTAL NUMBER OF USLEEP CALLS	| NET TIME (us)
      	boot		| 22				| 1250
      	power-collapse	| 9				| 1200
      	resume		| 5				| 500
      	use device	| 59				| 7700
      
      The most interesting category to me is the "use device" field; 7700us of
      busy-wait time that could be put towards better responsiveness, or at the
      least less power usage.
      Signed-off-by: NPatrick Pannuto <ppannuto@codeaurora.org>
      Cc: apw@canonical.com
      Cc: corbet@lwn.net
      Cc: arjan@linux.intel.com
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5e7f5a17
    • T
      Revert "timer: Added usleep[_range] timer" · e1b004c3
      Thomas Gleixner 提交于
      This reverts commit 22b8f15c to merge
      an advanced version.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      e1b004c3
  4. 03 8月, 2010 1 次提交
  5. 23 7月, 2010 2 次提交
  6. 09 6月, 2010 1 次提交
    • V
      sched: Change nohz idle load balancing logic to push model · 83cd4fe2
      Venkatesh Pallipadi 提交于
      In the new push model, all idle CPUs indeed go into nohz mode. There is
      still the concept of idle load balancer (performing the load balancing
      on behalf of all the idle cpu's in the system). Busy CPU kicks the nohz
      balancer when any of the nohz CPUs need idle load balancing.
      The kickee CPU does the idle load balancing on behalf of all idle CPUs
      instead of the normal idle balance.
      
      This addresses the below two problems with the current nohz ilb logic:
      * the idle load balancer continued to have periodic ticks during idle and
        wokeup frequently, even though it did not have any rebalancing to do on
        behalf of any of the idle CPUs.
      * On x86 and CPUs that have APIC timer stoppage on idle CPUs, this
        periodic wakeup can result in a periodic additional interrupt on a CPU
        doing the timer broadcast.
      
      Also currently we are migrating the unpinned timers from an idle to the cpu
      doing idle load balancing (when all the cpus in the system are idle,
      there is no idle load balancing cpu and timers get added to the same idle cpu
      where the request was made. So the existing optimization works only on semi idle
      system).
      
      And In semi idle system, we no longer have periodic ticks on the idle load
      balancer CPU. Using that cpu will add more delays to the timers than intended
      (as that cpu's timer base may not be uptodate wrt jiffies etc). This was
      causing mysterious slowdowns during boot etc.
      
      For now, in the semi idle case, use the nearest busy cpu for migrating timers
      from an idle cpu.  This is good for power-savings anyway.
      Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <1274486981.2840.46.camel@sbs-t61.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      83cd4fe2
  7. 05 6月, 2010 1 次提交
  8. 28 5月, 2010 1 次提交
  9. 26 5月, 2010 2 次提交
    • T
      timers: Move local variable into else section · 2abfb9e1
      Thomas Gleixner 提交于
      Fix nit-picking coding style detail.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      2abfb9e1
    • T
      timers: Fix slack calculation really · 8e63d779
      Thomas Gleixner 提交于
      commit f00e047e (timers: Fix slack calculation for expired timers)
      fixed the issue of slack on expired timers only partially. Linus
      noticed that jiffies is volatile so it is reloaded twice, which
      generates bad code.
      
      But its worse. This can defeat the time_after() check if jiffies are
      incremented between time_after() and the slack calculation.
      
      Fix it by reading jiffies into a local variable, which prevents the
      compiler from loading it twice. While at it make the > -1 check into
      >= 0 which is easier to read.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      8e63d779
  10. 24 5月, 2010 1 次提交
  11. 13 5月, 2010 1 次提交
  12. 07 4月, 2010 1 次提交
    • A
      timers: Introduce the concept of timer slack for legacy timers · 3bbb9ec9
      Arjan van de Ven 提交于
      While HR timers have had the concept of timer slack for quite some time
      now, the legacy timers lacked this concept, and had to make do with
      round_jiffies() and friends.
      
      Timer slack is important for power management; grouping timers reduces the
      number of wakeups which in turn reduces power consumption.
      
      This patch introduces timer slack to the legacy timers using the following
      pieces:
      * A slack field in the timer struct
      * An api (set_timer_slack) that callers can use to set explicit timer slack
      * A default slack of 0.4% of the requested delay for callers that do not set
        any explicit slack
      * Rounding code that is part of mod_timer() that tries to
        group timers around jiffies values every 'power of two'
        (so quick timers will group around every 2, but longer timers
        will group around every 4, 8, 16, 32 etc)
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: johnstul@us.ibm.com
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      3bbb9ec9
  13. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  14. 13 3月, 2010 4 次提交
  15. 21 1月, 2010 1 次提交
  16. 17 12月, 2009 1 次提交
  17. 21 9月, 2009 1 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  18. 29 8月, 2009 1 次提交
  19. 26 8月, 2009 1 次提交
  20. 23 8月, 2009 1 次提交
    • P
      rcu: Simplify rcu_pending()/rcu_check_callbacks() API · a157229c
      Paul E. McKenney 提交于
      All calls from outside RCU are of the form:
      
      	if (rcu_pending(cpu))
      		rcu_check_callbacks(cpu, user);
      
      This is silly, instead we put a call to rcu_pending() in
      rcu_check_callbacks(), and then make the outside calls be to
      rcu_check_callbacks().  This cuts down on the code a bit and
      also gives the compiler a better chance of optimizing.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <125097461311-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a157229c
  21. 05 8月, 2009 1 次提交
    • M
      timers: Cache __next_timer_interrupt result · 97fd9ed4
      Martin Schwidefsky 提交于
      Each time a cpu goes to sleep on a NOHZ=y system the timer
      wheel is searched for the next timer interrupt. It can take
      quite a few cycles to find the next pending timer.
      
      This patch adds a field to tvec_base that caches the result of
      __next_timer_interrupt.
      
      The hit ratio is around 80% on my thinkpad under normal use, on
      a server I've seen hit ratios from 5% to 95% dependent on the
      workload.
      
      -v2: jiffies wrap fixes
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
      LKML-Reference: <20090721202505.7d56a079@skybase>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      97fd9ed4
  22. 19 7月, 2009 1 次提交
  23. 24 6月, 2009 1 次提交
    • H
      timer stats: Optimize by adding quick check to avoid function calls · 507e1231
      Heiko Carstens 提交于
      When the kernel is configured with CONFIG_TIMER_STATS but timer
      stats are runtime disabled we still get calls to
      __timer_stats_timer_set_start_info which initializes some
      fields in the corresponding struct timer_list.
      
      So add some quick checks in the the timer stats setup functions
      to avoid function calls to __timer_stats_timer_set_start_info
      when timer stats are disabled.
      
      In an artificial workload that does nothing but playing ping
      pong with a single tcp packet via loopback this decreases cpu
      consumption by 1 - 1.5%.
      
      This is part of a modified function trace output on SLES11:
      
       perl-2497  [00] 28630647177732388 [+  125]: sk_reset_timer <-tcp_v4_rcv
       perl-2497  [00] 28630647177732513 [+  125]: mod_timer <-sk_reset_timer
       perl-2497  [00] 28630647177732638 [+  125]: __timer_stats_timer_set_start_info <-mod_timer
       perl-2497  [00] 28630647177732763 [+  125]: __mod_timer <-mod_timer
       perl-2497  [00] 28630647177732888 [+  125]: __timer_stats_timer_set_start_info <-__mod_timer
       perl-2497  [00] 28630647177733013 [+   93]: lock_timer_base <-__mod_timer
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mustafa Mesanovic <mustafa.mesanovic@de.ibm.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      LKML-Reference: <20090623153811.GA4641@osiris.boeblingen.de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      507e1231
  24. 29 5月, 2009 1 次提交
  25. 15 5月, 2009 2 次提交
    • T
      sched, timers: cleanup avenrun users · 2d02494f
      Thomas Gleixner 提交于
      avenrun is an rough estimate so we don't have to worry about
      consistency of the three avenrun values. Remove the xtime lock
      dependency and provide a function to scale the values. Cleanup the
      users.
      
      [ Impact: cleanup ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      2d02494f
    • T
      sched, timers: move calc_load() to scheduler · dce48a84
      Thomas Gleixner 提交于
      Dimitri Sivanich noticed that xtime_lock is held write locked across
      calc_load() which iterates over all online CPUs. That can cause long
      latencies for xtime_lock readers on large SMP systems. 
      
      The load average calculation is an rough estimate anyway so there is
      no real need to protect the readers vs. the update. It's not a problem
      when the avenrun array is updated while a reader copies the values.
      
      Instead of iterating over all online CPUs let the scheduler_tick code
      update the number of active tasks shortly before the avenrun update
      happens. The avenrun update itself is handled by the CPU which calls
      do_timer().
      
      [ Impact: reduce xtime_lock write locked section ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      dce48a84
  26. 13 5月, 2009 2 次提交
    • A
      timers: Logic to move non pinned timers · eea08f32
      Arun R Bharadwaj 提交于
      * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]:
      
      This patch migrates all non pinned timers and hrtimers to the current
      idle load balancer, from all the idle CPUs. Timers firing on busy CPUs
      are not migrated.
      
      While migrating hrtimers, care should be taken to check if migrating
      a hrtimer would result in a latency or not. So we compare the expiry of the
      hrtimer with the next timer interrupt on the target cpu and migrate the
      hrtimer only if it expires *after* the next interrupt on the target cpu.
      So, added a clockevents_get_next_event() helper function to return the
      next_event on the target cpu's clock_event_device.
      
      [ tglx: cleanups and simplifications ]
      Signed-off-by: NArun R Bharadwaj <arun@linux.vnet.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      eea08f32
    • A
      timers: Framework for identifying pinned timers · 597d0275
      Arun R Bharadwaj 提交于
      * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]:
      
      This patch creates a new framework for identifying cpu-pinned timers
      and hrtimers.
      
      This framework is needed because pinned timers are expected to fire on
      the same CPU on which they are queued. So it is essential to identify
      these and not migrate them, in case there are any.
      
      For regular timers, the currently existing add_timer_on() can be used
      queue pinned timers and subsequently mod_timer_pinned() can be used
      to modify the 'expires' field.
      
      For hrtimers, new modes HRTIMER_ABS_PINNED and HRTIMER_REL_PINNED are
      added to queue cpu-pinned hrtimer.
      
      [ tglx: use .._PINNED mode argument instead of creating tons of new
      functions ]
      Signed-off-by: NArun R Bharadwaj <arun@linux.vnet.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      597d0275
  27. 02 5月, 2009 1 次提交
  28. 06 4月, 2009 1 次提交
    • P
      perf_counter: unify and fix delayed counter wakeup · 925d519a
      Peter Zijlstra 提交于
      While going over the wakeup code I noticed delayed wakeups only work
      for hardware counters but basically all software counters rely on
      them.
      
      This patch unifies and generalizes the delayed wakeup to fix this
      issue.
      
      Since we're dealing with NMI context bits here, use a cmpxchg() based
      single link list implementation to track counters that have pending
      wakeups.
      
      [ This should really be generic code for delayed wakeups, but since we
        cannot use cmpxchg()/xchg() in generic code, I've let it live in the
        perf_counter code. -- Eric Dumazet could use it to aggregate the
        network wakeups. ]
      
      Furthermore, the x86 method of using TIF flags was flawed in that its
      quite possible to end up setting the bit on the idle task, loosing the
      wakeup.
      
      The powerpc method uses per-cpu storage and does appear to be
      sufficient.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Orig-LKML-Reference: <20090330171023.153932974@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      925d519a
  29. 02 4月, 2009 1 次提交
    • R
      timers: add missing kernel-doc · 633fe795
      Randy Dunlap 提交于
      Add missing kernel-doc parameter notation and change function
      name to its new name:
      
        Warning(kernel/timer.c:543): No description found for parameter 'name'
        Warning(kernel/timer.c:543): No description found for parameter 'key'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: akpm <akpm@linux-foundation.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      LKML-Reference: <20090401174723.f0bea0eb.randy.dunlap@oracle.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      633fe795
  30. 19 2月, 2009 1 次提交
    • I
      timers: add mod_timer_pending() · 74019224
      Ingo Molnar 提交于
      Impact: new timer API
      
      Based on an idea from Martin Josefsson with the help of
      Patrick McHardy and Stephen Hemminger:
      
      introduce the mod_timer_pending() API which is a mod_timer()
      offspring that is an invariant on already removed timers.
      
      (regular mod_timer() re-activates non-pending timers.)
      
      This is useful for the networking code in that it can
      allow unserialized mod_timer_pending() timer-forwarding
      calls, but a single del_timer*() will stop the timer
      from being reactivated again.
      
      Also while at it:
      
      - optimize the regular mod_timer() path some more, the
        timer-stat and a debug check was needlessly duplicated
        in __mod_timer().
      
      - make the exports come straight after the function, as
        most other exports in timer.c already did.
      
      - eliminate __mod_timer() as an external API, change the
        users to mod_timer().
      
      The regular mod_timer() code path is not impacted
      significantly, due to inlining optimizations and due to
      the simplifications.
      
      Based-on-patch-from: Stephen Hemminger <shemminger@vyatta.com>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: netdev@vger.kernel.org
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      74019224
  31. 15 2月, 2009 1 次提交
  32. 14 1月, 2009 1 次提交