1. 12 10月, 2009 1 次提交
  2. 07 10月, 2009 1 次提交
    • E
      NOHZ: update idle state also when NOHZ is inactive · fdc6f192
      Eero Nurkkala 提交于
      Commit f2e21c96 had unfortunate side
      effects with cpufreq governors on some systems.
      
      If the system did not switch into NOHZ mode ts->inidle is not set when
      tick_nohz_stop_sched_tick() is called from the idle routine. Therefor
      all subsequent calls from irq_exit() to tick_nohz_stop_sched_tick()
      fail to call tick_nohz_start_idle(). This results in bogus idle
      accounting information which is passed to cpufreq governors.
      
      Set the inidle flag unconditionally of the NOHZ active state to keep
      the idle time accounting correct in any case.
      
      [ tglx: Added comment and tweaked the changelog ]
      Reported-by: NSteven Noonan <steven@uplinklabs.net>
      Signed-off-by: NEero Nurkkala <ext-eero.nurkkala@nokia.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Cc: stable@kernel.org
      LKML-Reference: <1254907901.30157.93.camel@eenurkka-desktop>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      fdc6f192
  3. 02 10月, 2009 1 次提交
  4. 25 9月, 2009 1 次提交
    • M
      clocksource: Resume clocksource without taking the clocksource mutex · 89133f93
      Martin Schwidefsky 提交于
      git commit 75c5158f converted the clocksource spinlock to a
      mutex. This causes the following BUG:
      
      BUG: sleeping function called from invalid context at
      kernel/mutex.c:280 in_atomic(): 0, irqs_disabled(): 1, pid: 2473,
      name: pm-suspend 2 locks held by pm-suspend/2473:
       #0:  (&buffer->mutex){......}, at: [<ffffffff8115ab13>]
      sysfs_write_file+0x3c/0x137
       #1:  (pm_mutex){......}, at: [<ffffffff810865b5>]
      enter_state+0x39/0x130 Pid: 2473, comm: pm-suspend Not tainted 2.6.31
      #1 Call Trace:
       [<ffffffff810792f0>] ? __debug_show_held_locks+0x22/0x24
       [<ffffffff8104a2ef>] __might_sleep+0x107/0x10b
       [<ffffffff8141fca9>] mutex_lock_nested+0x25/0x43
       [<ffffffff81073537>] clocksource_resume+0x1c/0x60
       [<ffffffff81072902>] timekeeping_resume+0x1e/0x1c8
       [<ffffffff812aee62>] __sysdev_resume+0x25/0xcf
       [<ffffffff812aef79>] sysdev_resume+0x6d/0xae
       [<ffffffff810864f8>] suspend_devices_and_enter+0x12b/0x1af
       [<ffffffff8108665b>] enter_state+0xdf/0x130
       [<ffffffff81085dc3>] state_store+0xb6/0xd3
       [<ffffffff81204c73>] kobj_attr_store+0x17/0x19
       [<ffffffff8115abd2>] sysfs_write_file+0xfb/0x137
       [<ffffffff811057d2>] vfs_write+0xae/0x10b
       [<ffffffff81208392>] ? __up_read+0x1a/0x7f
       [<ffffffff811058ef>] sys_write+0x4a/0x6e
       [<ffffffff81011b82>] system_call_fastpath+0x16/0x1b
      
      clocksource_resume is called early in the resume process, there is
      only one cpu, no processes are running and the interrupts are
      disabled. It is therefore possible to resume the clocksources
      without taking the clocksource mutex.
      Reported-by: NXiaotian Feng <xtfeng@gmail.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Tested-by: NMichal Schmidt <mschmidt@redhat.com>
      Cc: Xiaotian Feng <xtfeng@gmail.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090924172952.49697825@mschwide.boeblingen.de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      89133f93
  5. 24 9月, 2009 1 次提交
    • Z
      time: add function to convert between calendar time and broken-down time for universal use · 57f1f087
      Zhaolei 提交于
      There are many similar code in kernel for one object: convert time between
      calendar time and broken-down time.
      
      Here is some source I found:
        fs/ncpfs/dir.c
        fs/smbfs/proc.c
        fs/fat/misc.c
        fs/udf/udftime.c
        fs/cifs/netmisc.c
        net/netfilter/xt_time.c
        drivers/scsi/ips.c
        drivers/input/misc/hp_sdc_rtc.c
        drivers/rtc/rtc-lib.c
        arch/ia64/hp/sim/boot/fw-emu.c
        arch/m68k/mac/misc.c
        arch/powerpc/kernel/time.c
        arch/parisc/include/asm/rtc.h
        ...
      
      We can make a common function for this type of conversion, At least we
      can get following benefit:
      
      1: Make kernel simple and unify
      2: Easy to fix bug in converting code
      3: Reduce clone of code in future
         For example, I'm trying to make ftrace display walltime,
         this patch will make me easy.
      
      This code is based on code from glibc-2.6
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57f1f087
  6. 15 9月, 2009 2 次提交
  7. 12 9月, 2009 1 次提交
    • M
      clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash · f79e0258
      Martin Schwidefsky 提交于
      The watchdog timer is started after the watchdog clocksource
      and at least one watched clocksource have been registered. The
      clocksource work element watchdog_work is initialized just
      before the clocksource timer is started. This is too late for
      the clocksource_mark_unstable call from native_cpu_up. To fix
      this use a static initializer for watchdog_work.
      
      This resolves a boot crash reported by multiple people.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090911153305.3fe9a361@skybase>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f79e0258
  8. 29 8月, 2009 1 次提交
    • T
      clocksource: Resolve cpu hotplug dead lock with TSC unstable · 7285dd7f
      Thomas Gleixner 提交于
      Martin Schwidefsky analyzed it:
      To register a clocksource the clocksource_mutex is acquired and if
      necessary timekeeping_notify is called to install the clocksource as
      the timekeeper clock. timekeeping_notify uses stop_machine which needs
      to take cpu_add_remove_lock mutex.
      Starting a new cpu is done with the cpu_add_remove_lock mutex held.
      native_cpu_up checks the tsc of the new cpu and if the tsc is no good
      clocksource_change_rating is called. Which needs the clocksource_mutex
      and the deadlock is complete.
      
      The solution is to replace the TSC via the clocksource watchdog
      mechanism. Mark the TSC as unstable and schedule the watchdog work so
      it gets removed in the watchdog thread context.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <new-submission>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      7285dd7f
  9. 25 8月, 2009 1 次提交
  10. 22 8月, 2009 1 次提交
    • J
      time: Introduce CLOCK_REALTIME_COARSE · da15cfda
      john stultz 提交于
      After talking with some application writers who want very fast, but not
      fine-grained timestamps, I decided to try to implement new clock_ids
      to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE
      which returns the time at the last tick. This is very fast as we don't
      have to access any hardware (which can be very painful if you're using
      something like the acpi_pm clocksource), and we can even use the vdso
      clock_gettime() method to avoid the syscall. The only trade off is you
      only get low-res tick grained time resolution.
      
      This isn't a new idea, I know Ingo has a patch in the -rt tree that made
      the vsyscall gettimeofday() return coarse grained time when the
      vsyscall64 sysctrl was set to 2. However this affects all applications
      on a system.
      
      With this method, applications can choose the proper speed/granularity
      trade-off for themselves.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: nikolag@ca.ibm.com
      Cc: Darren Hart <dvhltc@us.ibm.com>
      Cc: arjan@infradead.org
      Cc: jonathan@jonmasters.org
      LKML-Reference: <1250734414.6897.5.camel@localhost.localdomain>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      da15cfda
  11. 20 8月, 2009 1 次提交
    • S
      clockevent: Prevent dead lock on clockevents_lock · f833bab8
      Suresh Siddha 提交于
      Currently clockevents_notify() is called with interrupts enabled at
      some places and interrupts disabled at some other places.
      
      This results in a deadlock in this scenario.
      
      cpu A holds clockevents_lock in clockevents_notify() with irqs enabled
      cpu B waits for clockevents_lock in clockevents_notify() with irqs disabled
      cpu C doing set_mtrr() which will try to rendezvous of all the cpus.
      
      This will result in C and A come to the rendezvous point and waiting
      for B. B is stuck forever waiting for the spinlock and thus not
      reaching the rendezvous point.
      
      Fix the clockevents code so that clockevents_lock is taken with
      interrupts disabled and thus avoid the above deadlock.
      
      Also call lapic_timer_propagate_broadcast() on the destination cpu so
      that we avoid calling smp_call_function() in the clockevents notifier
      chain.
      
      This issue left us wondering if we need to change the MTRR rendezvous
      logic to use stop machine logic (instead of smp_call_function) or add
      a check in spinlock debug code to see if there are other spinlocks
      which gets taken under both interrupts enabled/disabled conditions.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
      Cc: "Brown Len" <len.brown@intel.com>
      LKML-Reference: <1250544899.2709.210.camel@sbs-t61.sc.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f833bab8
  12. 19 8月, 2009 2 次提交
    • M
      clocksource: Avoid clocksource watchdog circular locking dependency · 01548f4d
      Martin Schwidefsky 提交于
      stop_machine from a multithreaded workqueue is not allowed because
      of a circular locking dependency between cpu_down and the workqueue
      execution. Use a kernel thread to do the clocksource downgrade.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: john stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090818170942.3ab80c91@skybase>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      01548f4d
    • T
      clocksource: Protect the watchdog rating changes with clocksource_mutex · d0981a1b
      Thomas Gleixner 提交于
      Martin pointed out that commit 6ea41d2529 (clocksource: Call
      clocksource_change_rating() outside of watchdog_lock) has a
      theoretical reference count problem. The calls to
      clocksource_change_rating() are now done outside of the clocksource
      mutex and outside of the watchdog lock. A concurrent
      clocksource_unregister() could remove the clock.
      
      Split out the code which changes the rating from
      clocksource_change_rating() into __clocksource_change_rating().
      
      Protect the clocksource_watchdog_work() code sequence with the
      clocksource_mutex() and call __clocksource_change_rating().
      
      LKML-Reference: <alpine.LFD.2.00.0908171038420.2782@localhost.localdomain>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      d0981a1b
  13. 17 8月, 2009 1 次提交
    • A
      timers: Drop write permission on /proc/timer_list · de809347
      Amerigo Wang 提交于
      /proc/timer_list and /proc/slabinfo are not supposed to be
      written, so there should be no write permissions on it.
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: linux-mm@kvack.org
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Amerigo Wang <amwang@redhat.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      LKML-Reference: <20090817094525.6355.88682.sendpatchset@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      de809347
  14. 15 8月, 2009 16 次提交
  15. 19 7月, 2009 1 次提交
  16. 10 7月, 2009 1 次提交
    • T
      hrtimer: Fix migration expiry check · 6ff7041d
      Thomas Gleixner 提交于
      The timer migration expiry check should prevent the migration of a
      timer to another CPU when the timer expires before the next event is
      scheduled on the other CPU. Migrating the timer might delay it because
      we can not reprogram the clock event device on the other CPU. But the
      code implementing that check has two flaws:
      
      - for !HIGHRES the check compares the expiry value with the clock
        events device expiry value which is wrong for CLOCK_REALTIME based
        timers.
      
      - the check is racy. It holds the hrtimer base lock of the target CPU,
        but the clock event device expiry value can be modified
        nevertheless, e.g. by an timer interrupt firing.
      
      The !HIGHRES case is easy to fix as we can enqueue the timer on the
      cpu which was selected by the load balancer. It runs the idle
      balancing code once per jiffy anyway. So the maximum delay for the
      timer is the same as when we keep the tick on the current cpu going.
      
      In the HIGHRES case we can get the next expiry value from the hrtimer
      cpu_base of the target CPU and serialize the update with the cpu_base
      lock. This moves the lock section in hrtimer_interrupt() so we can set
      next_event to KTIME_MAX while we are handling the expired timers and
      set it to the next expiry value after we handled the timers under the
      base lock. While the expired timers are processed timer migration is
      blocked because the expiry time of the timer is always <= KTIME_MAX.
      
      Also remove the now useless clockevents_get_next_event() function.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6ff7041d
  17. 07 7月, 2009 2 次提交
    • T
      timekeeping: Move ktime_get() functions to timekeeping.c · a40f262c
      Thomas Gleixner 提交于
      The ktime_get() functions for GENERIC_TIME=n are still located in
      hrtimer.c. Move them to time/timekeeping.c where they belong.
      
      LKML-Reference: <new-submission>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      a40f262c
    • M
      timekeeping: optimized ktime_get[_ts] for GENERIC_TIME=y · 951ed4d3
      Martin Schwidefsky 提交于
      The generic ktime_get function defined in kernel/hrtimer.c is suboptimial
      for GENERIC_TIME=y:
      
       0)               |  ktime_get() {
       0)               |    ktime_get_ts() {
       0)               |      getnstimeofday() {
       0)               |        read_tod_clock() {
       0)   0.601 us    |        }
       0)   1.938 us    |      }
       0)               |      set_normalized_timespec() {
       0)   0.602 us    |      }
       0)   4.375 us    |    }
       0)   5.523 us    |  }
      
      Overall there are two read_seqbegin/read_seqretry loops and a lot of
      unnecessary struct timespec calculations. ktime_get returns a nano second
      value which is the sum of xtime, wall_to_monotonic and the nano second
      delta from the clock source.
      
      ktime_get can be optimized for GENERIC_TIME=y. The new version only calls
      clocksource_read:
      
       0)               |  ktime_get() {
       0)               |    read_tod_clock() {
       0)   0.610 us    |    }
       0)   1.977 us    |  }
      
      It uses a single read_seqbegin/readseqretry loop and just adds everthing
      to a nano second value.
      
      ktime_get_ts is optimized in a similar fashion.
      
      [ tglx: added WARN_ON(timekeeping_suspended) as in getnstimeofday() ]
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Acked-by: Njohn stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090707112728.3005244d@skybase>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      951ed4d3
  18. 24 6月, 2009 1 次提交
    • H
      timer stats: Optimize by adding quick check to avoid function calls · 507e1231
      Heiko Carstens 提交于
      When the kernel is configured with CONFIG_TIMER_STATS but timer
      stats are runtime disabled we still get calls to
      __timer_stats_timer_set_start_info which initializes some
      fields in the corresponding struct timer_list.
      
      So add some quick checks in the the timer stats setup functions
      to avoid function calls to __timer_stats_timer_set_start_info
      when timer stats are disabled.
      
      In an artificial workload that does nothing but playing ping
      pong with a single tcp packet via loopback this decreases cpu
      consumption by 1 - 1.5%.
      
      This is part of a modified function trace output on SLES11:
      
       perl-2497  [00] 28630647177732388 [+  125]: sk_reset_timer <-tcp_v4_rcv
       perl-2497  [00] 28630647177732513 [+  125]: mod_timer <-sk_reset_timer
       perl-2497  [00] 28630647177732638 [+  125]: __timer_stats_timer_set_start_info <-mod_timer
       perl-2497  [00] 28630647177732763 [+  125]: __mod_timer <-mod_timer
       perl-2497  [00] 28630647177732888 [+  125]: __timer_stats_timer_set_start_info <-__mod_timer
       perl-2497  [00] 28630647177733013 [+   93]: lock_timer_base <-__mod_timer
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mustafa Mesanovic <mustafa.mesanovic@de.ibm.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      LKML-Reference: <20090623153811.GA4641@osiris.boeblingen.de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      507e1231
  19. 13 6月, 2009 1 次提交
  20. 11 6月, 2009 1 次提交
  21. 27 5月, 2009 1 次提交
  22. 15 5月, 2009 1 次提交
    • T
      sched, timers: move calc_load() to scheduler · dce48a84
      Thomas Gleixner 提交于
      Dimitri Sivanich noticed that xtime_lock is held write locked across
      calc_load() which iterates over all online CPUs. That can cause long
      latencies for xtime_lock readers on large SMP systems. 
      
      The load average calculation is an rough estimate anyway so there is
      no real need to protect the readers vs. the update. It's not a problem
      when the avenrun array is updated while a reader copies the values.
      
      Instead of iterating over all online CPUs let the scheduler_tick code
      update the number of active tasks shortly before the avenrun update
      happens. The avenrun update itself is handled by the CPU which calls
      do_timer().
      
      [ Impact: reduce xtime_lock write locked section ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      dce48a84