1. 02 3月, 2017 3 次提交
  2. 18 2月, 2017 1 次提交
  3. 17 2月, 2017 1 次提交
    • L
      Revert "nohz: Fix collision between tick and other hrtimers" · 558e8e27
      Linus Torvalds 提交于
      This reverts commit 24b91e36 and commit
      7bdb59f1 ("tick/nohz: Fix possible missing clock reprog after tick
      soft restart") that depends on it,
      
      Pavel reports that it causes occasional boot hangs for him that seem to
      depend on just how the machine was booted.  In particular, his machine
      hangs at around the PCI fixups of the EHCI USB host controller, but only
      hangs from cold boot, not from a warm boot.
      
      Thomas Gleixner suspecs it's a CPU hotplug interaction, particularly
      since Pavel also saw suspend/resume issues that seem to be related.
      We're reverting for now while trying to figure out the root cause.
      Reported-bisected-and-tested-by: NPavel Machek <pavel@ucw.cz>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@kernel.org  # reverted commits were marked for stable
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      558e8e27
  4. 15 2月, 2017 1 次提交
    • S
      timekeeping: Use deferred printk() in debug code · f222449c
      Sergey Senozhatsky 提交于
      We cannot do printk() from tk_debug_account_sleep_time(), because
      tk_debug_account_sleep_time() is called under tk_core seq lock.
      The reason why printk() is unsafe there is that console_sem may
      invoke scheduler (up()->wake_up_process()->activate_task()), which,
      in turn, can return back to timekeeping code, for instance, via
      get_time()->ktime_get(), deadlocking the system on tk_core seq lock.
      
      [   48.950592] ======================================================
      [   48.950622] [ INFO: possible circular locking dependency detected ]
      [   48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
      [   48.950622] -------------------------------------------------------
      [   48.950622] kworker/0:0/3 is trying to acquire lock:
      [   48.950653]  (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
      [   48.950683]
                     but task is already holding lock:
      [   48.950683]  (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
      [   48.950714]
                     which lock already depends on the new lock.
      
      [   48.950714]
                     the existing dependency chain (in reverse order) is:
      [   48.950714]
                     -> #5 (hrtimer_bases.lock){-.-...}:
      [   48.950744]        _raw_spin_lock_irqsave+0x50/0x64
      [   48.950775]        lock_hrtimer_base+0x28/0x58
      [   48.950775]        hrtimer_start_range_ns+0x20/0x5c8
      [   48.950775]        __enqueue_rt_entity+0x320/0x360
      [   48.950805]        enqueue_rt_entity+0x2c/0x44
      [   48.950805]        enqueue_task_rt+0x24/0x94
      [   48.950836]        ttwu_do_activate+0x54/0xc0
      [   48.950836]        try_to_wake_up+0x248/0x5c8
      [   48.950836]        __setup_irq+0x420/0x5f0
      [   48.950836]        request_threaded_irq+0xdc/0x184
      [   48.950866]        devm_request_threaded_irq+0x58/0xa4
      [   48.950866]        omap_i2c_probe+0x530/0x6a0
      [   48.950897]        platform_drv_probe+0x50/0xb0
      [   48.950897]        driver_probe_device+0x1f8/0x2cc
      [   48.950897]        __driver_attach+0xc0/0xc4
      [   48.950927]        bus_for_each_dev+0x6c/0xa0
      [   48.950927]        bus_add_driver+0x100/0x210
      [   48.950927]        driver_register+0x78/0xf4
      [   48.950958]        do_one_initcall+0x3c/0x16c
      [   48.950958]        kernel_init_freeable+0x20c/0x2d8
      [   48.950958]        kernel_init+0x8/0x110
      [   48.950988]        ret_from_fork+0x14/0x24
      [   48.950988]
                     -> #4 (&rt_b->rt_runtime_lock){-.-...}:
      [   48.951019]        _raw_spin_lock+0x40/0x50
      [   48.951019]        rq_offline_rt+0x9c/0x2bc
      [   48.951019]        set_rq_offline.part.2+0x2c/0x58
      [   48.951049]        rq_attach_root+0x134/0x144
      [   48.951049]        cpu_attach_domain+0x18c/0x6f4
      [   48.951049]        build_sched_domains+0xba4/0xd80
      [   48.951080]        sched_init_smp+0x68/0x10c
      [   48.951080]        kernel_init_freeable+0x160/0x2d8
      [   48.951080]        kernel_init+0x8/0x110
      [   48.951080]        ret_from_fork+0x14/0x24
      [   48.951110]
                     -> #3 (&rq->lock){-.-.-.}:
      [   48.951110]        _raw_spin_lock+0x40/0x50
      [   48.951141]        task_fork_fair+0x30/0x124
      [   48.951141]        sched_fork+0x194/0x2e0
      [   48.951141]        copy_process.part.5+0x448/0x1a20
      [   48.951171]        _do_fork+0x98/0x7e8
      [   48.951171]        kernel_thread+0x2c/0x34
      [   48.951171]        rest_init+0x1c/0x18c
      [   48.951202]        start_kernel+0x35c/0x3d4
      [   48.951202]        0x8000807c
      [   48.951202]
                     -> #2 (&p->pi_lock){-.-.-.}:
      [   48.951232]        _raw_spin_lock_irqsave+0x50/0x64
      [   48.951232]        try_to_wake_up+0x30/0x5c8
      [   48.951232]        up+0x4c/0x60
      [   48.951263]        __up_console_sem+0x2c/0x58
      [   48.951263]        console_unlock+0x3b4/0x650
      [   48.951263]        vprintk_emit+0x270/0x474
      [   48.951293]        vprintk_default+0x20/0x28
      [   48.951293]        printk+0x20/0x30
      [   48.951324]        kauditd_hold_skb+0x94/0xb8
      [   48.951324]        kauditd_thread+0x1a4/0x56c
      [   48.951324]        kthread+0x104/0x148
      [   48.951354]        ret_from_fork+0x14/0x24
      [   48.951354]
                     -> #1 ((console_sem).lock){-.....}:
      [   48.951385]        _raw_spin_lock_irqsave+0x50/0x64
      [   48.951385]        down_trylock+0xc/0x2c
      [   48.951385]        __down_trylock_console_sem+0x24/0x80
      [   48.951385]        console_trylock+0x10/0x8c
      [   48.951416]        vprintk_emit+0x264/0x474
      [   48.951416]        vprintk_default+0x20/0x28
      [   48.951416]        printk+0x20/0x30
      [   48.951446]        tk_debug_account_sleep_time+0x5c/0x70
      [   48.951446]        __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
      [   48.951446]        timekeeping_resume+0x218/0x23c
      [   48.951477]        syscore_resume+0x94/0x42c
      [   48.951477]        suspend_enter+0x554/0x9b4
      [   48.951477]        suspend_devices_and_enter+0xd8/0x4b4
      [   48.951507]        enter_state+0x934/0xbd4
      [   48.951507]        pm_suspend+0x14/0x70
      [   48.951507]        state_store+0x68/0xc8
      [   48.951538]        kernfs_fop_write+0xf4/0x1f8
      [   48.951538]        __vfs_write+0x1c/0x114
      [   48.951538]        vfs_write+0xa0/0x168
      [   48.951568]        SyS_write+0x3c/0x90
      [   48.951568]        __sys_trace_return+0x0/0x10
      [   48.951568]
                     -> #0 (tk_core){----..}:
      [   48.951599]        lock_acquire+0xe0/0x294
      [   48.951599]        ktime_get_update_offsets_now+0x5c/0x1d4
      [   48.951629]        retrigger_next_event+0x4c/0x90
      [   48.951629]        on_each_cpu+0x40/0x7c
      [   48.951629]        clock_was_set_work+0x14/0x20
      [   48.951660]        process_one_work+0x2b4/0x808
      [   48.951660]        worker_thread+0x3c/0x550
      [   48.951660]        kthread+0x104/0x148
      [   48.951690]        ret_from_fork+0x14/0x24
      [   48.951690]
                     other info that might help us debug this:
      
      [   48.951690] Chain exists of:
                       tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock
      
      [   48.951721]  Possible unsafe locking scenario:
      
      [   48.951721]        CPU0                    CPU1
      [   48.951721]        ----                    ----
      [   48.951721]   lock(hrtimer_bases.lock);
      [   48.951751]                                lock(&rt_b->rt_runtime_lock);
      [   48.951751]                                lock(hrtimer_bases.lock);
      [   48.951751]   lock(tk_core);
      [   48.951782]
                      *** DEADLOCK ***
      
      [   48.951782] 3 locks held by kworker/0:0/3:
      [   48.951782]  #0:  ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
      [   48.951812]  #1:  (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
      [   48.951843]  #2:  (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
      [   48.951843]   stack backtrace:
      [   48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
      [   48.951904] Workqueue: events clock_was_set_work
      [   48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
      [   48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
      [   48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
      [   48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
      [   48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
      [   48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
      [   48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
      [   48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
      [   48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
      [   48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
      [   48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
      [   48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
      [   48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
      [   48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)
      
      Replace printk() with printk_deferred(), which does not call into
      the scheduler.
      
      Fixes: 0bf43f15 ("timekeeping: Prints the amounts of time spent during suspend")
      Reported-and-tested-by: NTony Lindgren <tony@atomide.com>
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: "[4.9+]" <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      f222449c
  5. 13 2月, 2017 1 次提交
    • M
      tick/broadcast: Prevent deadlock on tick_broadcast_lock · 202461e2
      Mike Galbraith 提交于
      tick_broadcast_lock is taken from interrupt context, but the following call
      chain takes the lock without disabling interrupts:
      
      [   12.703736]  _raw_spin_lock+0x3b/0x50
      [   12.703738]  tick_broadcast_control+0x5a/0x1a0
      [   12.703742]  intel_idle_cpu_online+0x22/0x100
      [   12.703744]  cpuhp_invoke_callback+0x245/0x9d0
      [   12.703752]  cpuhp_thread_fun+0x52/0x110
      [   12.703754]  smpboot_thread_fn+0x276/0x320
      
      So the following deadlock can happen:
      
         lock(tick_broadcast_lock);
         <Interrupt>
            lock(tick_broadcast_lock);
      
      intel_idle_cpu_online() is the only place which violates the calling
      convention of tick_broadcast_control(). This was caused by the removal of
      the smp function call in course of the cpu hotplug rework.
      
      Instead of slapping local_irq_disable/enable() at the call site, we can
      relax the calling convention and handle it in the core code, which makes
      the whole machinery more robust.
      
      Fixes: 29d7bbad ("intel_idle: Remove superfluous SMP fuction call")
      Reported-by: NGabriel C <nix.or.die@gmail.com>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Ruslan Ruslichenko <rruslich@cisco.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: lwn@lwn.net
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: stable <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1486953115.5912.4.camel@gmx.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      202461e2
  6. 10 2月, 2017 3 次提交
    • M
      timer_list: Remove useless cast when printing · 7551b02b
      Mars Cheng 提交于
      hrtimer_resolution is already unsigned int, not necessary to cast
      it when printing.
      Signed-off-by: NMars Cheng <mars.cheng@mediatek.com>
      Cc: CC Hwang <cc.hwang@mediatek.com>
      Cc: wsd_upstream@mediatek.com
      Cc: Loda Chou <loda.chou@mediatek.com>
      Cc: Jades Shih <jades.shih@mediatek.com>
      Cc: Miles Chen <miles.chen@mediatek.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: My Chuang <my.chuang@mediatek.com>
      Cc: Matthias Brugger <matthias.bgg@gmail.com>
      Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
      Link: http://lkml.kernel.org/r/1486626615-5879-1-git-send-email-mars.cheng@mediatek.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      7551b02b
    • K
      time: Remove CONFIG_TIMER_STATS · dfb4357d
      Kees Cook 提交于
      Currently CONFIG_TIMER_STATS exposes process information across namespaces:
      
      kernel/time/timer_list.c print_timer():
      
              SEQ_printf(m, ", %s/%d", tmp, timer->start_pid);
      
      /proc/timer_list:
      
       #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570
      
      Given that the tracer can give the same information, this patch entirely
      removes CONFIG_TIMER_STATS.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
      Cc: linux-doc@vger.kernel.org
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Xing Gao <xgao01@email.wm.edu>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jessica Frazelle <me@jessfraz.com>
      Cc: kernel-hardening@lists.openwall.com
      Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-api@vger.kernel.org
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170208192659.GA32582@beastSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      dfb4357d
    • F
      tick/nohz: Fix possible missing clock reprog after tick soft restart · 7bdb59f1
      Frederic Weisbecker 提交于
      ts->next_tick keeps track of the next tick deadline in order to optimize
      clock programmation on irq exit and avoid redundant clock device writes.
      
      Now if ts->next_tick missed an update, we may spuriously miss a clock
      reprog later as the nohz code is fooled by an obsolete next_tick value.
      
      This is what happens here on a specific path: when we observe an
      expired timer from the nohz update code on irq exit, we perform a soft
      tick restart which simply fires the closest possible tick without
      actually exiting the nohz mode and restoring a periodic state. But we
      forget to update ts->next_tick accordingly.
      
      As a result, after the next tick resulting from such soft tick restart,
      the nohz code sees a stale value on ts->next_tick which doesn't match
      the clock deadline that just expired. If that obsolete ts->next_tick
      value happens to collide with the actual next tick deadline to be
      scheduled, we may spuriously bypass the clock reprogramming. In the
      worst case, the tick may never fire again.
      
      Fix this with a ts->next_tick reset on soft tick restart.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed: Wanpeng Li <wanpeng.li@hotmail.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1486485894-29173-1-git-send-email-fweisbec@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      7bdb59f1
  7. 04 2月, 2017 1 次提交
    • W
      tick/broadcast: Reduce lock cacheline contention · 668802c2
      Waiman Long 提交于
      It was observed that on an Intel x86 system without the ARAT (Always
      running APIC timer) feature and with fairly large number of CPUs as
      well as CPUs coming in and out of intel_idle frequently, the lock
      contention on the tick_broadcast_lock can become significant.
      
      To reduce contention, the lock is put into its own cacheline and all
      the cpumask_var_t variables are put into the __read_mostly section.
      
      Running the SP benchmark of the NAS Parallel Benchmarks on a 4-socket
      16-core 32-thread Nehalam system, the performance number improved
      from 3353.94 Mop/s to 3469.31 Mop/s when this patch was applied on
      a 4.9.6 kernel.  This is a 3.4% improvement.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1485799063-20857-1-git-send-email-longman@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      668802c2
  8. 01 2月, 2017 6 次提交
  9. 14 1月, 2017 1 次提交
  10. 11 1月, 2017 1 次提交
    • F
      nohz: Fix collision between tick and other hrtimers · 24b91e36
      Frederic Weisbecker 提交于
      When the tick is stopped and an interrupt occurs afterward, we check on
      that interrupt exit if the next tick needs to be rescheduled. If it
      doesn't need any update, we don't want to do anything.
      
      In order to check if the tick needs an update, we compare it against the
      clockevent device deadline. Now that's a problem because the clockevent
      device is at a lower level than the tick itself if it is implemented
      on top of hrtimer.
      
      Every hrtimer share this clockevent device. So comparing the next tick
      deadline against the clockevent device deadline is wrong because the
      device may be programmed for another hrtimer whose deadline collides
      with the tick. As a result we may end up not reprogramming the tick
      accidentally.
      
      In a worst case scenario under full dynticks mode, the tick stops firing
      as it is supposed to every 1hz, leaving /proc/stat stalled:
      
            Task in a full dynticks CPU
            ----------------------------
      
            * hrtimer A is queued 2 seconds ahead
            * the tick is stopped, scheduled 1 second ahead
            * tick fires 1 second later
            * on tick exit, nohz schedules the tick 1 second ahead but sees
              the clockevent device is already programmed to that deadline,
              fooled by hrtimer A, the tick isn't rescheduled.
            * hrtimer A is cancelled before its deadline
            * tick never fires again until an interrupt happens...
      
      In order to fix this, store the next tick deadline to the tick_sched
      local structure and reuse that value later to check whether we need to
      reprogram the clock after an interrupt.
      
      On the other hand, ts->sleep_length still wants to know about the next
      clock event and not just the tick, so we want to improve the related
      comment to avoid confusion.
      Reported-by: NJames Hartsock <hartsjc@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Link: http://lkml.kernel.org/r/1483539124-5693-1-git-send-email-fweisbec@gmail.com
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      24b91e36
  11. 07 1月, 2017 1 次提交
  12. 26 12月, 2016 2 次提交
    • T
      ktime: Cleanup ktime_set() usage · 8b0e1953
      Thomas Gleixner 提交于
      ktime_set(S,N) was required for the timespec storage type and is still
      useful for situations where a Seconds and Nanoseconds part of a time value
      needs to be converted. For anything where the Seconds argument is 0, this
      is pointless and can be replaced with a simple assignment.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      8b0e1953
    • T
      ktime: Get rid of the union · 2456e855
      Thomas Gleixner 提交于
      ktime is a union because the initial implementation stored the time in
      scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
      variant for 32bit machines. The Y2038 cleanup removed the timespec variant
      and switched everything to scalar nanoseconds. The union remained, but
      become completely pointless.
      
      Get rid of the union and just keep ktime_t as simple typedef of type s64.
      
      The conversion was done with coccinelle and some manual mopping up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      2456e855
  13. 25 12月, 2016 2 次提交
  14. 15 12月, 2016 2 次提交
  15. 09 12月, 2016 4 次提交
    • T
      timekeeping: Use mul_u64_u32_shr() instead of open coding it · c029a2be
      Thomas Gleixner 提交于
      The resume code must deal with a clocksource delta which is potentially big
      enough to overflow the 64bit mult.
      
      Replace the open coded handling with the proper function.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Parit Bhargava <prarit@redhat.com>
      Cc: Laurent Vivier <lvivier@redhat.com>
      Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Liav Rehana <liavr@mellanox.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Link: http://lkml.kernel.org/r/20161208204228.921674404@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c029a2be
    • T
      timekeeping: Get rid of pointless typecasts · cbd99e3b
      Thomas Gleixner 提交于
      cycle_t is defined as u64, so casting it to u64 is a pointless and
      confusing exercise. cycle_t should simply go away and be replaced with a
      plain u64 to avoid further confusion.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Parit Bhargava <prarit@redhat.com>
      Cc: Laurent Vivier <lvivier@redhat.com>
      Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Liav Rehana <liavr@mellanox.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Link: http://lkml.kernel.org/r/20161208204228.844699737@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      cbd99e3b
    • T
      timekeeping: Make the conversion call chain consistently unsigned · acc89612
      Thomas Gleixner 提交于
      Propagating a unsigned value through signed variables and functions makes
      absolutely no sense and is just prone to (re)introduce subtle signed
      vs. unsigned issues as happened recently.
      
      Clean it up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Parit Bhargava <prarit@redhat.com>
      Cc: Laurent Vivier <lvivier@redhat.com>
      Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Liav Rehana <liavr@mellanox.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Link: http://lkml.kernel.org/r/20161208204228.765843099@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      acc89612
    • T
      timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion · 9c164572
      Thomas Gleixner 提交于
      The clocksource delta to nanoseconds conversion is using signed math, but
      the delta is unsigned. This makes the conversion space smaller than
      necessary and in case of a multiplication overflow the conversion can
      become negative. The conversion is done with scaled math:
      
          s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;
      
      Shifting a signed integer right obvioulsy preserves the sign, which has
      interesting consequences:
       
       - Time jumps backwards
       
       - __iter_div_u64_rem() which is used in one of the calling code pathes
         will take forever to piecewise calculate the seconds/nanoseconds part.
      
      This has been reported by several people with different scenarios:
      
      David observed that when stopping a VM with a debugger:
      
       "It was essentially the stopped by debugger case.  I forget exactly why,
        but the guest was being explicitly stopped from outside, it wasn't just
        scheduling lag.  I think it was something in the vicinity of 10 minutes
        stopped."
      
       When lifting the stop the machine went dead.
      
      The stopped by debugger case is not really interesting, but nevertheless it
      would be a good thing not to die completely.
      
      But this was also observed on a live system by Liav:
      
       "When the OS is too overloaded, delta will get a high enough value for the
        msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
        after the shift the nsec variable will gain a value similar to
        0xffffffffff000000."
      
      Unfortunately this has been reintroduced recently with commit 6bd58f09
      ("time: Add cycles to nanoseconds translation"). It had been fixed a year
      ago already in commit 35a4933a ("time: Avoid signed overflow in
      timekeeping_get_ns()").
      
      Though it's not surprising that the issue has been reintroduced because the
      function itself and the whole call chain uses s64 for the result and the
      propagation of it. The change in this recent commit is subtle:
      
         s64 nsec;
      
      -  nsec = (d * m + n) >> s:
      +  nsec = d * m + n;
      +  nsec >>= s;
      
      d being type of cycle_t adds another level of obfuscation.
      
      This wouldn't have happened if the previous change to unsigned computation
      would have made the 'nsec' variable u64 right away and a follow up patch
      had cleaned up the whole call chain.
      
      There have been patches submitted which basically did a revert of the above
      patch leaving everything else unchanged as signed. Back to square one. This
      spawned a admittedly pointless discussion about potential users which rely
      on the unsigned behaviour until someone pointed out that it had been fixed
      before. The changelogs of said patches added further confusion as they made
      finally false claims about the consequences for eventual users which expect
      signed results.
      
      Despite delta being cycle_t, aka. u64, it's very well possible to hand in
      a signed negative value and the signed computation will happily return the
      correct result. But nobody actually sat down and analyzed the code which
      was added as user after the propably unintended signed conversion.
      
      Though in sensitive code like this it's better to analyze it proper and
      make sure that nothing relies on this than hunting the subtle wreckage half
      a year later. After analyzing all call chains it stands that no caller can
      hand in a negative value (which actually would work due to the s64 cast)
      and rely on the signed math to do the right thing.
      
      Change the conversion function to unsigned math. The conversion of all call
      chains is done in a follow up patch.
      
      This solves the starvation issue, which was caused by the negative result,
      but it does not solve the underlying problem. It merily procrastinates
      it. When the timekeeper update is deferred long enough that the unsigned
      multiplication overflows, then time going backwards is observable again.
      
      It does neither solve the issue of clocksources with a small counter width
      which will wrap around possibly several times and cause random time stamps
      to be generated. But those are usually not found on systems used for
      virtualization, so this is likely a non issue.
      
      I took the liberty to claim authorship for this simply because
      analyzing all callsites and writing the changelog took substantially
      more time than just making the simple s/s64/u64/ change and ignore the
      rest.
      
      Fixes: 6bd58f09 ("time: Add cycles to nanoseconds translation")
      Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reported-by: NLiav Rehana <liavr@mellanox.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Parit Bhargava <prarit@redhat.com>
      Cc: Laurent Vivier <lvivier@redhat.com>
      Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9c164572
  16. 08 12月, 2016 1 次提交
  17. 01 12月, 2016 1 次提交
    • B
      alarmtimer: Add tracepoints for alarm timers · 4a057549
      Baolin Wang 提交于
      Alarm timers are one of the mechanisms to wake up a system from suspend,
      but there exist no tracepoints to analyse which process/thread armed an
      alarmtimer.
      
      Add tracepoints for start/cancel/expire of individual alarm timers and one
      for tracing the suspend time decision when to resume the system.
      
      The following trace excerpt illustrates the new mechanism:
      
      Binder:3292_2-3304  [000] d..2   149.981123: alarmtimer_cancel:
      alarmtimer:ffffffc1319a7800 type:REALTIME
      expires:1325463120000000000 now:1325376810370370245
      
      Binder:3292_2-3304  [000] d..2   149.981136: alarmtimer_start:
      alarmtimer:ffffffc1319a7800 type:REALTIME
      expires:1325376840000000000 now:1325376810370384591
      
      Binder:3292_9-3953  [000] d..2   150.212991: alarmtimer_cancel:
      alarmtimer:ffffffc1319a5a00 type:BOOTTIME
      expires:179552000000 now:150154008122
      
      Binder:3292_9-3953  [000] d..2   150.213006: alarmtimer_start:
      alarmtimer:ffffffc1319a5a00 type:BOOTTIME
      expires:179551000000 now:150154025622
      
      system_server-3000  [002] ...1  162.701940: alarmtimer_suspend:
      alarmtimer type:REALTIME expires:1325376840000000000
      
      The wakeup time which is selected at suspend time allows to map it back to
      the task arming the timer: Binder:3292_2.
      
      [ tglx: Store alarm timer expiry time instead of some useless RTC relative
        	information, add proper type information for wakeups which are
        	handled via the clock_nanosleep/freezer and massage the changelog. ]
      Signed-off-by: NBaolin Wang <baolin.wang@linaro.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Link: http://lkml.kernel.org/r/1480372524-15181-5-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      4a057549
  18. 30 11月, 2016 1 次提交
    • J
      timekeeping: Add a fast and NMI safe boot clock · 948a5312
      Joel Fernandes 提交于
      This boot clock can be used as a tracing clock and will account for
      suspend time.
      
      To keep it NMI safe since we're accessing from tracing, we're not using a
      separate timekeeper with updates to monotonic clock and boot offset
      protected with seqlocks. This has the following minor side effects:
      
      (1) Its possible that a timestamp be taken after the boot offset is updated
      but before the timekeeper is updated. If this happens, the new boot offset
      is added to the old timekeeping making the clock appear to update slightly
      earlier:
         CPU 0                                        CPU 1
         timekeeping_inject_sleeptime64()
         __timekeeping_inject_sleeptime(tk, delta);
                                                      timestamp();
         timekeeping_update(tk, TK_CLEAR_NTP...);
      
      (2) On 32-bit systems, the 64-bit boot offset (tk->offs_boot) may be
      partially updated.  Since the tk->offs_boot update is a rare event, this
      should be a rare occurrence which postprocessing should be able to handle.
      Signed-off-by: NJoel Fernandes <joelaf@google.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1480372524-15181-6-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      948a5312
  19. 23 11月, 2016 1 次提交
  20. 16 11月, 2016 3 次提交
  21. 15 11月, 2016 1 次提交
  22. 26 10月, 2016 2 次提交
    • D
      timers: Fix documentation for schedule_timeout() and similar · 4b7e9cf9
      Douglas Anderson 提交于
      The documentation for schedule_timeout(), schedule_hrtimeout(), and
      schedule_hrtimeout_range() all claim that the routines couldn't possibly
      return early if the task state was TASK_UNINTERRUPTIBLE. This is simply
      not true since wake_up_process() will cause those routines to exit early.
      
      We cannot make schedule_[hr]timeout() loop until the timeout expires if the
      task state is uninterruptible because we have users which rely on the
      existing and designed behaviour.
      
      Make the documentation match the (correct) implementation.
      
      schedule_hrtimeout() returns -EINTR even when a uninterruptible task was
      woken up. This might look strange, but making the return code depend on the
      state is too much of an effort as it would affect all the call sites. There
      is no value in doing so, but we spell it out clearly in the documentation.
      Suggested-by: NDaniel Kurtz <djkurtz@chromium.org>
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Cc: huangtao@rock-chips.com
      Cc: heiko@sntech.de
      Cc: broonie@kernel.org
      Cc: briannorris@chromium.org
      Cc: Andreas Mohr <andi@lisas.de>
      Cc: linux-rockchip@lists.infradead.org
      Cc: tony.xie@rock-chips.com
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: linux@roeck-us.net
      Cc: tskd08@gmail.com
      Link: http://lkml.kernel.org/r/1477065531-30342-2-git-send-email-dianders@chromium.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      4b7e9cf9
    • D
      timers: Fix usleep_range() in the context of wake_up_process() · 6c5e9059
      Douglas Anderson 提交于
      Users of usleep_range() expect that it will _never_ return in less time
      than the minimum passed parameter. However, nothing in the code ensures
      this, when the sleeping task is woken by wake_up_process() or any other
      mechanism which can wake a task from uninterruptible state.
      
      Neither usleep_range() nor schedule_hrtimeout_range*() have any protection
      against wakeups. schedule_hrtimeout_range*() is designed this way despite
      the fact that the API documentation does not mention it.
      
      msleep() already has code to handle this case since it will loop as long
      as there was still time left.  usleep_range() has no such loop, add it.
      
      Presumably this problem was not detected before because usleep_range() is
      only used in a few places and the function is mostly used in contexts which
      are not exposed to wakeups of any form.
      
      An effort was made to look for users relying on the old behavior by
      looking for usleep_range() in the same file as wake_up_process().
      No problems were found by this search, though it is conceivable that
      someone could have put the sleep and wakeup in two different files.
      
      An effort was made to ask several upstream maintainers if they were aware
      of people relying on wake_up_process() to wake up usleep_range(). No
      maintainers were aware of that but they were aware of many people relying
      on usleep_range() never returning before the minimum.
      Reported-by: NTao Huang <huangtao@rock-chips.com>
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Cc: heiko@sntech.de
      Cc: broonie@kernel.org
      Cc: briannorris@chromium.org
      Cc: Andreas Mohr <andi@lisas.de>
      Cc: linux-rockchip@lists.infradead.org
      Cc: tony.xie@rock-chips.com
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: djkurtz@chromium.org
      Cc: linux@roeck-us.net
      Cc: tskd08@gmail.com
      Link: http://lkml.kernel.org/r/1477065531-30342-1-git-send-email-dianders@chromium.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      6c5e9059