1. 11 12月, 2015 1 次提交
  2. 08 12月, 2015 2 次提交
    • S
      clocksource: Add CPU info to clocksource watchdog reporting · 390dd67c
      Seiichi Ikarashi 提交于
      The clocksource watchdog reporting was improved by 0b046b21.
      I want to add the info of CPU where the watchdog detects a
      deviation because it is necessary to identify the trouble spot
      if the clocksource is TSC.
      Signed-off-by: NSeiichi Ikarashi <s.ikarashi@jp.fujitsu.com>
      [jstultz: Tweaked commit message]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      390dd67c
    • D
      time: Avoid signed overflow in timekeeping_get_ns() · 35a4933a
      David Gibson 提交于
      1e75fa8b "time: Condense timekeeper.xtime into xtime_sec" replaced a call to
      clocksource_cyc2ns() from timekeeping_get_ns() with an open-coded version
      of the same logic to avoid keeping a semi-redundant struct timespec
      in struct timekeeper.
      
      However, the commit also introduced a subtle semantic change - where
      clocksource_cyc2ns() uses purely unsigned math, the new version introduces
      a signed temporary, meaning that if (delta * tk->mult) has a 63-bit
      overflow the following shift will still give a negative result.  The
      choice of 'maxsec' in __clocksource_updatefreq_scale() means this will
      generally happen if there's a ~10 minute pause in examining the
      clocksource.
      
      This can be triggered on a powerpc KVM guest by stopping it from qemu for
      a bit over 10 minutes.  After resuming time has jumped backwards several
      minutes causing numerous problems (jiffies does not advance, msleep()s can
      be extended by minutes..).  It doesn't happen on x86 KVM guests, because
      the guest TSC is effectively frozen while the guest is stopped, which is
      not the case for the powerpc timebase.
      
      Obviously an unsigned (64 bit) overflow will only take twice as long as a
      signed, 63-bit overflow.  I don't know the time code well enough to know
      if that will still cause incorrect calculations, or if a 64-bit overflow
      is avoided elsewhere.
      
      Still, an incorrect forwards clock adjustment will cause less trouble than
      time going backwards.  So, this patch removes the potential for
      intermediate signed overflow.
      
      Cc: stable@vger.kernel.org  (3.7+)
      Suggested-by: NLaurent Vivier <lvivier@redhat.com>
      Tested-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      35a4933a
  3. 04 12月, 2015 1 次提交
    • Z
      alarmtimer: Avoid unexpected rtc interrupt when system resume from S3 · a0e3213f
      zhuo-hao 提交于
      Before the system go to suspend (S3), if user create a timer
      with clockid CLOCK_REALTIME_ALARM/CLOCK_BOOTTIME_ALARM and set a
      "large" timeout value to this timer. The function
      alarmtimer_suspend will be called to setup a timeout value to
      RTC timer to avoid the system sleep over time. However, if the
      system wakeup early than RTC timeout, the RTC timer will not be
      cleared. And this will cause the hpet_rtc_interrupt come
      unexpectedly until the RTC timeout. To fix this problem, just
      adding alarmtimer_resume to cancel the RTC timer.
      
      This was noticed because the HPET RTC emulation fires an
      interrupt every 16ms(=1/2^DEFAULT_RTC_SHIFT) up to the point
      where the alarm time is reached.
      
      This program always hits this situation
      (https://lkml.org/lkml/2015/11/8/326), if system wake up earlier
      than alarm time.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Signed-off-by: NZhuo-hao Lee <zhuo-hao.lee@intel.com>
      [jstultz: Tweak commit subject & formatting slightly]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      a0e3213f
  4. 10 11月, 2015 1 次提交
    • A
      remove abs64() · 79211c8e
      Andrew Morton 提交于
      Switch everything to the new and more capable implementation of abs().
      Mainly to give the new abs() a bit of a workout.
      
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79211c8e
  5. 05 11月, 2015 1 次提交
    • T
      timers: Use proper base migration in add_timer_on() · 22b886dd
      Tejun Heo 提交于
      Regardless of the previous CPU a timer was on, add_timer_on()
      currently simply sets timer->flags to the new CPU.  As the caller must
      be seeing the timer as idle, this is locally fine, but the timer
      leaving the old base while unlocked can lead to race conditions as
      follows.
      
      Let's say timer was on cpu 0.
      
        cpu 0					cpu 1
        -----------------------------------------------------------------------------
        del_timer(timer) succeeds
      					del_timer(timer)
      					  lock_timer_base(timer) locks cpu_0_base
        add_timer_on(timer, 1)
          spin_lock(&cpu_1_base->lock)
          timer->flags set to cpu_1_base
          operates on @timer			  operates on @timer
      
      This triggered with mod_delayed_work_on() which contains
      "if (del_timer()) add_timer_on()" sequence eventually leading to the
      following oops.
      
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: [<ffffffff810ca6e9>] detach_if_pending+0x69/0x1a0
        ...
        Workqueue: wqthrash wqthrash_workfunc [wqthrash]
        task: ffff8800172ca680 ti: ffff8800172d0000 task.ti: ffff8800172d0000
        RIP: 0010:[<ffffffff810ca6e9>]  [<ffffffff810ca6e9>] detach_if_pending+0x69/0x1a0
        ...
        Call Trace:
         [<ffffffff810cb0b4>] del_timer+0x44/0x60
         [<ffffffff8106e836>] try_to_grab_pending+0xb6/0x160
         [<ffffffff8106e913>] mod_delayed_work_on+0x33/0x80
         [<ffffffffa0000081>] wqthrash_workfunc+0x61/0x90 [wqthrash]
         [<ffffffff8106dba8>] process_one_work+0x1e8/0x650
         [<ffffffff8106e05e>] worker_thread+0x4e/0x450
         [<ffffffff810746af>] kthread+0xef/0x110
         [<ffffffff8185980f>] ret_from_fork+0x3f/0x70
      
      Fix it by updating add_timer_on() to perform proper migration as
      __mod_timer() does.
      Reported-and-tested-by: NJeff Layton <jlayton@poochiereds.net>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Chris Worley <chris.worley@primarydata.com>
      Cc: bfields@fieldses.org
      Cc: Michael Skralivetsky <michael.skralivetsky@primarydata.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: kernel-team@fb.com
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20151029103113.2f893924@tlielax.poochiereds.net
      Link: http://lkml.kernel.org/r/20151104171533.GI5749@mtj.duckdns.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      22b886dd
  6. 26 10月, 2015 1 次提交
  7. 16 10月, 2015 1 次提交
  8. 15 10月, 2015 4 次提交
  9. 12 10月, 2015 2 次提交
  10. 03 10月, 2015 1 次提交
  11. 02 10月, 2015 3 次提交
  12. 22 9月, 2015 2 次提交
  13. 14 9月, 2015 1 次提交
  14. 13 9月, 2015 1 次提交
    • J
      time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64() · 2619d7e9
      John Stultz 提交于
      The internal clocksteering done for fine-grained error
      correction uses a logarithmic approximation, so any time
      adjtimex() adjusts the clock steering, timekeeping_freqadjust()
      quickly approximates the correct clock frequency over a series
      of ticks.
      
      Unfortunately, the logic in timekeeping_freqadjust(), introduced
      in commit:
      
        dc491596 ("timekeeping: Rework frequency adjustments to work better w/ nohz")
      
      used the abs() function with a s64 error value to calculate the
      size of the approximated adjustment to be made.
      
      Per include/linux/kernel.h:
      
        "abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()".
      
      Thus on 32-bit platforms, this resulted in the clocksteering to
      take a quite dampended random walk trying to converge on the
      proper frequency, which caused the adjustments to be made much
      slower then intended (most easily observed when large
      adjustments are made).
      
      This patch fixes the issue by using abs64() instead.
      Reported-by: NNuno Gonçalves <nunojpg@gmail.com>
      Tested-by: NNuno Goncalves <nunojpg@gmail.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: <stable@vger.kernel.org> # v3.17+
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2619d7e9
  15. 02 9月, 2015 1 次提交
  16. 22 8月, 2015 1 次提交
  17. 19 8月, 2015 2 次提交
  18. 18 8月, 2015 8 次提交
    • E
      timer: Write timer->flags atomically · d0023a14
      Eric Dumazet 提交于
      lock_timer_base() cannot prevent the following :
      
      CPU1 ( in __mod_timer()
      timer->flags |= TIMER_MIGRATING;
      spin_unlock(&base->lock);
      base = new_base;
      spin_lock(&base->lock);
      // The next line clears TIMER_MIGRATING
      timer->flags &= ~TIMER_BASEMASK;
                                        CPU2 (in lock_timer_base())
                                        see timer base is cpu0 base
                                        spin_lock_irqsave(&base->lock, *flags);
                                        if (timer->flags == tf)
                                             return base; // oops, wrong base
      timer->flags |= base->cpu // too late
      
      We must write timer->flags in one go, otherwise we can fool other cpus.
      
      Fixes: bc7a34b8 ("timer: Reduce timer migration overhead if disabled")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jon Christopherson <jon@jons.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: xen-devel@lists.xen.org
      Cc: david.vrabel@citrix.com
      Cc: Sander Eikelenboom <linux@eikelenboom.it>
      Link: http://lkml.kernel.org/r/1439831928.32680.11.camel@edumazet-glaptop2.roam.corp.google.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      d0023a14
    • L
    • B
      time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64() · 9ca30850
      Baolin Wang 提交于
      The conversion between struct timespec and jiffies is not year 2038
      safe on 32bit systems. Introduce timespec64_to_jiffies() and
      jiffies_to_timespec64() functions which use struct timespec64 to
      make it ready for 2038 issue.
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBaolin Wang <baolin.wang@linaro.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      9ca30850
    • B
      time: Introduce current_kernel_time64() · 8758a240
      Baolin Wang 提交于
      The current_kernel_time() is not year 2038 safe on 32bit systems
      since it returns a timespec value. Introduce current_kernel_time64()
      which returns a timespec64 value.
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBaolin Wang <baolin.wang@linaro.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      8758a240
    • X
      time: Add the common weak version of update_persistent_clock() · 7494e9ee
      Xunlei Pang 提交于
      The weak update_persistent_clock64() calls update_persistent_clock(),
      if the architecture defines an update_persistent_clock64() to replace
      and remove its update_persistent_clock() version, when building the
      kernel the linker will throw an undefined symbol error, that is, any
      arch that switches to update_persistent_clock64() will have this issue.
      
      To solve the issue, we add the common weak update_persistent_clock().
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      7494e9ee
    • W
      time: Always make sure wall_to_monotonic isn't positive · e1d7ba87
      Wang YanQing 提交于
      Two issues were found on an IMX6 development board without an
      enabled RTC device(resulting in the boot time and monotonic
      time being initialized to 0).
      
      Issue 1:exportfs -a generate:
             "exportfs: /opt/nfs/arm does not support NFS export"
      Issue 2:cat /proc/stat:
             "btime 4294967236"
      
      The same issues can be reproduced on x86 after running the
      following code:
      	int main(void)
      	{
      	    struct timeval val;
      	    int ret;
      
      	    val.tv_sec = 0;
      	    val.tv_usec = 0;
      	    ret = settimeofday(&val, NULL);
      	    return 0;
      	}
      
      Two issues are different symptoms of same problem:
      The reason is a positive wall_to_monotonic pushes boot time back
      to the time before Epoch, and getboottime will return negative
      value.
      
      In symptom 1:
                negative boot time cause get_expiry() to overflow time_t
                when input expire time is 2147483647, then cache_flush()
                always clears entries just added in ip_map_parse.
      In symptom 2:
                show_stat() uses "unsigned long" to print negative btime
                value returned by getboottime.
      
      This patch fix the problem by prohibiting time from being set to a value which
      would cause a negative boot time. As a result one can't set the CLOCK_REALTIME
      time prior to (1970 + system uptime).
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NWang YanQing <udknight@gmail.com>
      [jstultz: reworded commit message]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      e1d7ba87
    • K
      time: Fix nanosecond file time rounding in timespec_trunc() · de4a95fa
      Karsten Blees 提交于
      timespec_trunc() avoids rounding if granularity <= nanoseconds-per-jiffie
      (or TICK_NSEC). This optimization assumes that:
      
       1. current_kernel_time().tv_nsec is already rounded to TICK_NSEC (i.e.
          with HZ=1000 you'd get 1000000, 2000000, 3000000... but never 1000001).
          This is no longer true (probably since hrtimers introduced in 2.6.16).
      
       2. TICK_NSEC is evenly divisible by all possible granularities. This may
          be true for HZ=100, 250, 1000, but obviously not for HZ=300 /
          TICK_NSEC=3333333 (introduced in 2.6.20).
      
      Thus, sub-second portions of in-core file times are not rounded to on-disk
      granularity. I.e. file times may change when the inode is re-read from disk
      or when the file system is remounted.
      
      This affects all file systems with file time granularities > 1 ns and < 1s,
      e.g. CEPH (1000 ns), UDF (1000 ns), CIFS (100 ns), NTFS (100 ns) and FUSE
      (configurable from user mode via struct fuse_init_out.time_gran).
      
      Steps to reproduce with e.g. UDF:
      
        $ dd if=/dev/zero of=udfdisk count=10000 && mkudffs udfdisk
        $ mkdir udf && mount udfdisk udf
        $ touch udf/test && stat -c %y udf/test
        2015-06-09 10:22:56.130006767 +0200
        $ umount udf && mount udfdisk udf
        $ stat -c %y udf/test
        2015-06-09 10:22:56.130006000 +0200
      
      Remounting truncates the mtime to 1 µs.
      
      Fix the rounding in timespec_trunc() and update the documentation.
      
      timespec_trunc() is exclusively used to calculate inode's [acm]time (mostly
      via current_fs_time()), and always with super_block.s_time_gran as second
      argument. So this can safely be changed without side effects.
      
      Note: This does _not_ fix the issue for FAT's 2 second mtime resolution,
      as super_block.s_time_gran isn't prepared to handle different ctime /
      mtime / atime resolutions nor resolutions > 1 second.
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NKarsten Blees <blees@dcon.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      de4a95fa
    • J
      timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers · 38bf985b
      John Stultz 提交于
      I noticed for non-monotonic timers in timer_list, some of the
      output looked a little confusing.
      
      For example:
       #1: <0000000000000000>, posix_timer_fn, S:01, hrtimer_start_range_ns, leap-a-day/2360
       # expires at 1434412800000000000-1434412800000000000 nsecs [in 1434410725062375469 to 1434410725062375469 nsecs]
      
      You'll note the relative time till the expiration "[in xxx to
      yyy nsecs]" is incorrect. This is because its printing the delta
      between CLOCK_MONOTONIC time to the CLOCK_REALTIME expiration.
      
      This patch fixes this issue by adding the clock offset to the
      "now" time which we use to calculate the delta.
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      38bf985b
  19. 10 8月, 2015 1 次提交
  20. 01 8月, 2015 1 次提交
  21. 29 7月, 2015 4 次提交
    • F
      nohz: Remove useless argument on tick_nohz_task_switch() · de734f89
      Frederic Weisbecker 提交于
      Leftover from early code.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      de734f89
    • F
      nohz: Move tick_nohz_restart_sched_tick() above its users · 59d2c7ca
      Frederic Weisbecker 提交于
      Fix the function declaration/definition dance.
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      59d2c7ca
    • F
      nohz: Restart nohz full tick from irq exit · 73738a95
      Frederic Weisbecker 提交于
      Restart the tick when necessary from the irq exit path. It makes nohz
      full more flexible, simplify the related IPIs and doesn't bring
      significant overhead on irq exit.
      
      In a longer term view, it will allow us to piggyback the nohz kick
      on the scheduler IPI in the future instead of sending a dedicated IPI
      that often doubles the scheduler IPI on task wakeup. This will require
      more changes though including careful review of resched_curr() callers
      to include nohz full needs.
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      73738a95
    • F
      nohz: Remove idle task special case · 59449359
      Frederic Weisbecker 提交于
      On nohz full early days, idle dynticks and full dynticks weren't well
      integrated and we couldn't risk full dynticks calls on idle without
      risking messing up tick idle statistics. This is why we prevented such
      thing to happen.
      
      Nowadays full dynticks and idle dynticks are better integrated and
      interact without known issue.
      
      So lets remove that.
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      59449359