1. 05 2月, 2015 1 次提交
  2. 24 1月, 2015 2 次提交
  3. 23 1月, 2015 1 次提交
    • T
      hrtimer: Prevent stale expiry time in hrtimer_interrupt() · 9bc74919
      Thomas Gleixner 提交于
      hrtimer_interrupt() has the following subtle issue:
      
      hrtimer_interrupt()
        lock(cpu_base);
        expires_next = KTIME_MAX;
      
        expire_timers(CLOCK_MONOTONIC);
        expires = get_next_timer(CLOCK_MONOTONIC);
        if (expires < expires_next)
          expires_next = expires;
      
        expire_timers(CLOCK_REALTIME);
          unlock(cpu_base);
          wakeup()
          hrtimer_start(CLOCK_MONOTONIC, newtimer);
          lock(cpu_base();  
        expires = get_next_timer(CLOCK_REALTIME);
        if (expires < expires_next)
          expires_next = expires;
      
      So because we already evaluated the next expiring timer of
      CLOCK_MONOTONIC we ignore that the expiry time of newtimer might be
      earlier than the overall next expiry time in hrtimer_interrupt().
      
      To solve this, remove the caching of the next expiry value from
      hrtimer_interrupt() and reevaluate all active clock bases for the next
      expiry value. To avoid another code duplication, create a shared
      evaluation function and use it for hrtimer_get_next_event(),
      hrtimer_force_reprogram() and hrtimer_interrupt().
      
      There is another subtlety in this mechanism:
      
      While hrtimer_interrupt() is running, we want to avoid to touch the
      hardware device because we will reprogram it anyway at the end of
      hrtimer_interrupt(). This works nicely for hrtimers which get rearmed
      via the HRTIMER_RESTART mechanism, because we drop out when the
      callback on that CPU is running. But that fails, if a new timer gets
      enqueued like in the example above.
      
      This has another implication: While hrtimer_interrupt() is running we
      refuse remote enqueueing of timers - see hrtimer_interrupt() and
      hrtimer_check_target().
      
      hrtimer_interrupt() tries to prevent this by setting cpu_base->expires
      to KTIME_MAX, but that fails if a new timer gets queued.
      
      Prevent both the hardware access and the remote enqueue
      explicitely. We can loosen the restriction on the remote enqueue now
      due to reevaluation of the next expiry value, but that needs a
      seperate patch.
      
      Folded in a fix from Vignesh Radhakrishnan.
      Reported-and-tested-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Based-on-patch-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: vigneshr@codeaurora.org
      Cc: john.stultz@linaro.org
      Cc: viresh.kumar@linaro.org
      Cc: fweisbec@gmail.com
      Cc: cl@linux.com
      Cc: stuart.w.hayes@gmail.com
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1501202049190.5526@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9bc74919
  4. 19 9月, 2014 1 次提交
    • K
      sched, cleanup, treewide: Remove set_current_state(TASK_RUNNING) after schedule() · f139caf2
      Kirill Tkhai 提交于
      schedule(), io_schedule() and schedule_timeout() always return
      with TASK_RUNNING state set, so one more setting is unnecessary.
      
      (All places in patch are visible good, only exception is
       kiblnd_scheduler() from:
      
            drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
      
       Its schedule() is one line above standard 3 lines of unified diff)
      
      No places where set_current_state() is used for mb().
      Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1410529254.3569.23.camel@tkhai
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Anil Belur <askb23@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: David Airlie <airlied@linux.ie>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dmitry Eremin <dmitry.eremin@intel.com>
      Cc: Frank Blaschka <blaschka@linux.vnet.ibm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Isaac Huang <he.huang@intel.com>
      Cc: James E.J. Bottomley <JBottomley@parallels.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Liang Zhen <liang.zhen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Masaru Nomura <massa.nomura@gmail.com>
      Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Oleg Drokin <green@linuxhacker.ru>
      Cc: Peng Tao <bergwolf@gmail.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Robert Love <robert.w.love@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Ursula Braun <ursula.braun@de.ibm.com>
      Cc: Zi Shen Lim <zlim.lnx@gmail.com>
      Cc: devel@driverdev.osuosl.org
      Cc: dm-devel@redhat.com
      Cc: dri-devel@lists.freedesktop.org
      Cc: fcoe-devel@open-fcoe.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux390@de.ibm.com
      Cc: linux-afs@lists.infradead.org
      Cc: linux-cris-kernel@axis.com
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-raid@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-scsi@vger.kernel.org
      Cc: qla2xxx-upstream@qlogic.com
      Cc: user-mode-linux-devel@lists.sourceforge.net
      Cc: user-mode-linux-user@lists.sourceforge.net
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f139caf2
  5. 27 8月, 2014 2 次提交
  6. 24 7月, 2014 4 次提交
  7. 23 6月, 2014 4 次提交
  8. 12 5月, 2014 1 次提交
  9. 30 4月, 2014 2 次提交
    • L
      hrtimer: Prevent remote enqueue of leftmost timers · 012a45e3
      Leon Ma 提交于
      If a cpu is idle and starts an hrtimer which is not pinned on that
      same cpu, the nohz code might target the timer to a different cpu.
      
      In the case that we switch the cpu base of the timer we already have a
      sanity check in place, which determines whether the timer is earlier
      than the current leftmost timer on the target cpu. In that case we
      enqueue the timer on the current cpu because we cannot reprogram the
      clock event device on the target.
      
      If the timers base is already the target CPU we do not have this
      sanity check in place so we enqueue the timer as the leftmost timer in
      the target cpus rb tree, but we cannot reprogram the clock event
      device on the target cpu. So the timer expires late and subsequently
      prevents the reprogramming of the target cpu clock event device until
      the previously programmed event fires or a timer with an earlier
      expiry time gets enqueued on the target cpu itself.
      
      Add the same target check as we have for the switch base case and
      start the timer on the current cpu if it would become the leftmost
      timer on the target.
      
      [ tglx: Rewrote subject and changelog ]
      Signed-off-by: NLeon Ma <xindong.ma@intel.com>
      Link: http://lkml.kernel.org/r/1398847391-5994-1-git-send-email-xindong.ma@intel.com
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      012a45e3
    • S
      hrtimer: Prevent all reprogramming if hang detected · 6c6c0d5a
      Stuart Hayes 提交于
      If the last hrtimer interrupt detected a hang it sets hang_detected=1
      and programs the clock event device with a delay to let the system
      make progress.
      
      If hang_detected == 1, we prevent reprogramming of the clock event
      device in hrtimer_reprogram() but not in hrtimer_force_reprogram().
      
      This can lead to the following situation:
      
      hrtimer_interrupt()
         hang_detected = 1;
         program ce device to Xms from now (hang delay)
      
      We have two timers pending:
         T1 expires 50ms from now
         T2 expires 5s from now
      
      Now T1 gets canceled, which causes hrtimer_force_reprogram() to be
      invoked, which in turn programs the clock event device to T2 (5
      seconds from now).
      
      Any hrtimer_start after that will not reprogram the hardware due to
      hang_detected still being set. So we effectivly block all timers until
      the T2 event fires and cleans up the hang situation.
      
      Add a check for hang_detected to hrtimer_force_reprogram() which
      prevents the reprogramming of the hang delay in the hardware
      timer. The subsequent hrtimer_interrupt will resolve all outstanding
      issues.
      
      [ tglx: Rewrote subject and changelog and fixed up the comment in
        	hrtimer_force_reprogram() ]
      Signed-off-by: NStuart Hayes <stuart.w.hayes@gmail.com>
      Link: http://lkml.kernel.org/r/53602DC6.2060101@gmail.com
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6c6c0d5a
  10. 18 4月, 2014 1 次提交
  11. 20 3月, 2014 1 次提交
  12. 13 1月, 2014 1 次提交
    • D
      sched/deadline: Add SCHED_DEADLINE structures & implementation · aab03e05
      Dario Faggioli 提交于
      Introduces the data structures, constants and symbols needed for
      SCHED_DEADLINE implementation.
      
      Core data structure of SCHED_DEADLINE are defined, along with their
      initializers. Hooks for checking if a task belong to the new policy
      are also added where they are needed.
      
      Adds a scheduling class, in sched/dl.c and a new policy called
      SCHED_DEADLINE. It is an implementation of the Earliest Deadline
      First (EDF) scheduling algorithm, augmented with a mechanism (called
      Constant Bandwidth Server, CBS) that makes it possible to isolate
      the behaviour of tasks between each other.
      
      The typical -deadline task will be made up of a computation phase
      (instance) which is activated on a periodic or sporadic fashion. The
      expected (maximum) duration of such computation is called the task's
      runtime; the time interval by which each instance need to be completed
      is called the task's relative deadline. The task's absolute deadline
      is dynamically calculated as the time instant a task (better, an
      instance) activates plus the relative deadline.
      
      The EDF algorithms selects the task with the smallest absolute
      deadline as the one to be executed first, while the CBS ensures each
      task to run for at most its runtime every (relative) deadline
      length time interval, avoiding any interference between different
      tasks (bandwidth isolation).
      Thanks to this feature, also tasks that do not strictly comply with
      the computational model sketched above can effectively use the new
      policy.
      
      To summarize, this patch:
       - introduces the data structures, constants and symbols needed;
       - implements the core logic of the scheduling algorithm in the new
         scheduling class file;
       - provides all the glue code between the new scheduling class and
         the core scheduler and refines the interactions between sched/dl
         and the other existing scheduling classes.
      Signed-off-by: NDario Faggioli <raistlin@linux.it>
      Signed-off-by: NMichael Trimarchi <michael@amarulasolutions.com>
      Signed-off-by: NFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1383831828-15501-4-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aab03e05
  13. 15 7月, 2013 1 次提交
    • P
      kernel: delete __cpuinit usage from all core kernel files · 0db0628d
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      This removes all the uses of the __cpuinit macros from C files in
      the core kernel directories (kernel, init, lib, mm, and include)
      that don't really have a specific maintainer.
      
      [1] https://lkml.org/lkml/2013/5/20/589Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      0db0628d
  14. 06 7月, 2013 1 次提交
  15. 05 7月, 2013 1 次提交
    • T
      hrtimers: Move SMP function call to thread context · 5ec2481b
      Thomas Gleixner 提交于
      smp_call_function_* must not be called from softirq context.
      
      But clock_was_set() which calls on_each_cpu() is called from softirq
      context to implement a delayed clock_was_set() for the timer interrupt
      handler. Though that almost never gets invoked. A recent change in the
      resume code uses the softirq based delayed clock_was_set to support
      Xens resume mechanism.
      
      linux-next contains a new warning which warns if smp_call_function_*
      is called from softirq context which gets triggered by that Xen
      change.
      
      Fix this by moving the delayed clock_was_set() call to a work context.
      Reported-and-tested-by: NArtem Savkov <artem.savkov@gmail.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>,
      Cc: Konrad Wilk <konrad.wilk@oracle.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: xen-devel@lists.xen.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5ec2481b
  16. 29 6月, 2013 1 次提交
    • D
      hrtimers: Support resuming with two or more CPUs online (but stopped) · 7c4c3a0f
      David Vrabel 提交于
      hrtimers_resume() only reprograms the timers for the current CPU as it
      assumes that all other CPUs are offline at this point in the resume
      process. If other CPUs are online then their timers will not be
      corrected and they may fire at the wrong time.
      
      When running as a Xen guest, this assumption is not true.  Non-boot
      CPUs are only stopped with IRQs disabled instead of offlining them.
      This is a performance optimization as disabling the CPUs would add an
      unacceptable amount of additional downtime during a live migration (>
      200 ms for a 4 VCPU guest).
      
      hrtimers_resume() cannot call on_each_cpu(retrigger_next_event,...)
      as the other CPUs will be stopped with IRQs disabled.  Instead, defer
      the call to the next softirq.
      
      [ tglx: Separated the xen change out ]
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk  <konrad.wilk@oracle.com>
      Cc: John Stultz  <john.stultz@linaro.org>
      Cc: <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/1372329348-20841-2-git-send-email-david.vrabel@citrix.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      7c4c3a0f
  17. 12 5月, 2013 1 次提交
  18. 09 4月, 2013 2 次提交
    • D
      hrtimer: Fix ktime_add_ns() overflow on 32bit architectures · 51fd36f3
      David Engraf 提交于
      One can trigger an overflow when using ktime_add_ns() on a 32bit
      architecture not supporting CONFIG_KTIME_SCALAR.
      
      When passing a very high value for u64 nsec, e.g. 7881299347898368000
      the do_div() function converts this value to seconds (7881299347) which
      is still to high to pass to the ktime_set() function as long. The result
      in is a negative value.
      
      The problem on my system occurs in the tick-sched.c,
      tick_nohz_stop_sched_tick() when time_delta is set to
      timekeeping_max_deferment(). The check for time_delta < KTIME_MAX is
      valid, thus ktime_add_ns() is called with a too large value resulting in
      a negative expire value. This leads to an endless loop in the ticker code:
      
      time_delta: 7881299347898368000
      expires = ktime_add_ns(last_update, time_delta)
      expires: negative value
      
      This fix caps the value to KTIME_MAX.
      
      This error doesn't occurs on 64bit or architectures supporting
      CONFIG_KTIME_SCALAR (e.g. ARM, x86-32).
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NDavid Engraf <david.engraf@sysgo.com>
      [jstultz: Minor tweaks to commit message & header]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      51fd36f3
    • P
      hrtimer: Add expiry time overflow check in hrtimer_interrupt · 8f294b5a
      Prarit Bhargava 提交于
      The settimeofday01 test in the LTP testsuite effectively does
      
              gettimeofday(current time);
              settimeofday(Jan 1, 1970 + 100 seconds);
              settimeofday(current time);
      
      This test causes a stack trace to be displayed on the console during the
      setting of timeofday to Jan 1, 1970 + 100 seconds:
      
      [  131.066751] ------------[ cut here ]------------
      [  131.096448] WARNING: at kernel/time/clockevents.c:209 clockevents_program_event+0x135/0x140()
      [  131.104935] Hardware name: Dinar
      [  131.108150] Modules linked in: sg nfsv3 nfs_acl nfsv4 auth_rpcgss nfs dns_resolver fscache lockd sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables kvm_amd kvm sp5100_tco bnx2 i2c_piix4 crc32c_intel k10temp fam15h_power ghash_clmulni_intel amd64_edac_mod pcspkr serio_raw edac_mce_amd edac_core microcode xfs libcrc32c sr_mod sd_mod cdrom ata_generic crc_t10dif pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm ahci pata_atiixp libahci libata usb_storage i2c_core dm_mirror dm_region_hash dm_log dm_mod
      [  131.176784] Pid: 0, comm: swapper/28 Not tainted 3.8.0+ #6
      [  131.182248] Call Trace:
      [  131.184684]  <IRQ>  [<ffffffff810612af>] warn_slowpath_common+0x7f/0xc0
      [  131.191312]  [<ffffffff8106130a>] warn_slowpath_null+0x1a/0x20
      [  131.197131]  [<ffffffff810b9fd5>] clockevents_program_event+0x135/0x140
      [  131.203721]  [<ffffffff810bb584>] tick_program_event+0x24/0x30
      [  131.209534]  [<ffffffff81089ab1>] hrtimer_interrupt+0x131/0x230
      [  131.215437]  [<ffffffff814b9600>] ? cpufreq_p4_target+0x130/0x130
      [  131.221509]  [<ffffffff81619119>] smp_apic_timer_interrupt+0x69/0x99
      [  131.227839]  [<ffffffff8161805d>] apic_timer_interrupt+0x6d/0x80
      [  131.233816]  <EOI>  [<ffffffff81099745>] ? sched_clock_cpu+0xc5/0x120
      [  131.240267]  [<ffffffff814b9ff0>] ? cpuidle_wrap_enter+0x50/0xa0
      [  131.246252]  [<ffffffff814b9fe9>] ? cpuidle_wrap_enter+0x49/0xa0
      [  131.252238]  [<ffffffff814ba050>] cpuidle_enter_tk+0x10/0x20
      [  131.257877]  [<ffffffff814b9c89>] cpuidle_idle_call+0xa9/0x260
      [  131.263692]  [<ffffffff8101c42f>] cpu_idle+0xaf/0x120
      [  131.268727]  [<ffffffff815f8971>] start_secondary+0x255/0x257
      [  131.274449] ---[ end trace 1151a50552231615 ]---
      
      When we change the system time to a low value like this, the value of
      timekeeper->offs_real will be a negative value.
      
      It seems that the WARN occurs because an hrtimer has been started in the time
      between the releasing of the timekeeper lock and the IPI call (via a call to
      on_each_cpu) in clock_was_set() in the do_settimeofday() code.  The end result
      is that a REALTIME_CLOCK timer has been added with softexpires = expires =
      KTIME_MAX.  The hrtimer_interrupt() fires/is called and the loop at
      kernel/hrtimer.c:1289 is executed.  In this loop the code subtracts the
      clock base's offset (which was set to timekeeper->offs_real in
      do_settimeofday()) from the current hrtimer_cpu_base->expiry value (which
      was KTIME_MAX):
      
      	KTIME_MAX - (a negative value) = overflow
      
      A simple check for an overflow can resolve this problem.  Using KTIME_MAX
      instead of the overflow value will result in the hrtimer function being run,
      and the reprogramming of the timer after that.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      [jstultz: Tweaked commit subject]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      8f294b5a
  19. 03 4月, 2013 1 次提交
    • F
      nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON · 3451d024
      Frederic Weisbecker 提交于
      We are planning to convert the dynticks Kconfig options layout
      into a choice menu. The user must be able to easily pick
      any of the following implementations: constant periodic tick,
      idle dynticks, full dynticks.
      
      As this implies a mutual exclusion, the two dynticks implementions
      need to converge on the selection of a common Kconfig option in order
      to ease the sharing of a common infrastructure.
      
      It would thus seem pretty natural to reuse CONFIG_NO_HZ to
      that end. It already implements all the idle dynticks code
      and the full dynticks depends on all that code for now.
      So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
      CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.
      
      On the other hand we want to stay backward compatible: if
      CONFIG_NO_HZ is set in an older config file, we want to
      enable CONFIG_NO_HZ_IDLE by default.
      
      But we can't afford both at the same time or we run into
      a circular dependency:
      
      1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
         CONFIG_NO_HZ
      2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE
      
      We might be able to support that from Kconfig/Kbuild but it
      may not be wise to introduce such a confusing behaviour.
      
      So to solve this, create a new CONFIG_NO_HZ_COMMON option
      which gathers the common code between idle and full dynticks
      (that common code for now is simply the idle dynticks code)
      and select it from their referring Kconfig.
      
      Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
      to it for backward compatibility.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      3451d024
  20. 27 3月, 2013 1 次提交
    • M
      hrtimer: Don't reinitialize a cpu_base lock on CPU_UP · 84cc8fd2
      Michael Bohan 提交于
      The current code makes the assumption that a cpu_base lock won't be
      held if the CPU corresponding to that cpu_base is offline, which isn't
      always true.
      
      If a hrtimer is not queued, then it will not be migrated by
      migrate_hrtimers() when a CPU is offlined. Therefore, the hrtimer's
      cpu_base may still point to a CPU which has subsequently gone offline
      if the timer wasn't enqueued at the time the CPU went down.
      
      Normally this wouldn't be a problem, but a cpu_base's lock is blindly
      reinitialized each time a CPU is brought up. If a CPU is brought
      online during the period that another thread is performing a hrtimer
      operation on a stale hrtimer, then the lock will be reinitialized
      under its feet, and a SPIN_BUG() like the following will be observed:
      
      <0>[   28.082085] BUG: spinlock already unlocked on CPU#0, swapper/0/0
      <0>[   28.087078]  lock: 0xc4780b40, value 0x0 .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1
      <4>[   42.451150] [<c0014398>] (unwind_backtrace+0x0/0x120) from [<c0269220>] (do_raw_spin_unlock+0x44/0xdc)
      <4>[   42.460430] [<c0269220>] (do_raw_spin_unlock+0x44/0xdc) from [<c071b5bc>] (_raw_spin_unlock+0x8/0x30)
      <4>[   42.469632] [<c071b5bc>] (_raw_spin_unlock+0x8/0x30) from [<c00a9ce0>] (__hrtimer_start_range_ns+0x1e4/0x4f8)
      <4>[   42.479521] [<c00a9ce0>] (__hrtimer_start_range_ns+0x1e4/0x4f8) from [<c00aa014>] (hrtimer_start+0x20/0x28)
      <4>[   42.489247] [<c00aa014>] (hrtimer_start+0x20/0x28) from [<c00e6190>] (rcu_idle_enter_common+0x1ac/0x320)
      <4>[   42.498709] [<c00e6190>] (rcu_idle_enter_common+0x1ac/0x320) from [<c00e6440>] (rcu_idle_enter+0xa0/0xb8)
      <4>[   42.508259] [<c00e6440>] (rcu_idle_enter+0xa0/0xb8) from [<c000f268>] (cpu_idle+0x24/0xf0)
      <4>[   42.516503] [<c000f268>] (cpu_idle+0x24/0xf0) from [<c06ed3c0>] (rest_init+0x88/0xa0)
      <4>[   42.524319] [<c06ed3c0>] (rest_init+0x88/0xa0) from [<c0c00978>] (start_kernel+0x3d0/0x434)
      
      As an example, this particular crash occurred when hrtimer_start() was
      executed on CPU #0. The code locked the hrtimer's current cpu_base
      corresponding to CPU #1. CPU #0 then tried to switch the hrtimer's
      cpu_base to an optimal CPU which was online. In this case, it selected
      the cpu_base corresponding to CPU #3.
      
      Before it could proceed, CPU #1 came online and reinitialized the
      spinlock corresponding to its cpu_base. Thus now CPU #0 held a lock
      which was reinitialized. When CPU #0 finally ended up unlocking the
      old cpu_base corresponding to CPU #1 so that it could switch to CPU
      #3, we hit this SPIN_BUG() above while in switch_hrtimer_base().
      
      CPU #0                            CPU #1
      ----                              ----
      ...                               <offline>
      hrtimer_start()
      lock_hrtimer_base(base #1)
      ...                               init_hrtimers_cpu()
      switch_hrtimer_base()             ...
      ...                               raw_spin_lock_init(&cpu_base->lock)
      raw_spin_unlock(&cpu_base->lock)  ...
      <spin_bug>
      
      Solve this by statically initializing the lock.
      Signed-off-by: NMichael Bohan <mbohan@codeaurora.org>
      Link: http://lkml.kernel.org/r/1363745965-23475-1-git-send-email-mbohan@codeaurora.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      84cc8fd2
  21. 25 3月, 2013 1 次提交
  22. 23 3月, 2013 1 次提交
  23. 08 2月, 2013 2 次提交
  24. 05 2月, 2013 1 次提交
  25. 12 7月, 2012 3 次提交
  26. 19 11月, 2011 1 次提交
    • J
      hrtimer: Fix extra wakeups from __remove_hrtimer() · 27c9cd7e
      Jeff Ohlstein 提交于
      __remove_hrtimer() attempts to reprogram the clockevent device when
      the timer being removed is the next to expire. However,
      __remove_hrtimer() reprograms the clockevent *before* removing the
      timer from the timerqueue and thus when hrtimer_force_reprogram()
      finds the next timer to expire it finds the timer we're trying to
      remove.
      
      This is especially noticeable when the system switches to NOHz mode
      and the system tick is removed. The timer tick is removed from the
      system but the clockevent is programmed to wakeup in another HZ
      anyway.
      
      Silence the extra wakeup by removing the timer from the timerqueue
      before calling hrtimer_force_reprogram() so that we actually program
      the clockevent for the next timer to expire.
      
      This was broken by 998adc3d "hrtimers: Convert hrtimers to use
      timerlist infrastructure".
      Signed-off-by: NJeff Ohlstein <johlstei@codeaurora.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1321660030-8520-1-git-send-email-johlstei@codeaurora.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      27c9cd7e
  27. 31 10月, 2011 1 次提交