提交 · 56fd16cabac9cd8f15e2902898a9d0cc96e2fa70 · openanolis / cloud-kernel

16 10月, 2015 1 次提交

timekeeping: Increment clock_was_set_seq in timekeeping_init() · 56fd16ca

由 Thomas Gleixner 提交于 10月 16, 2015

timekeeping_init() can set the wall time offset, so we need to
increment the clock_was_set_seq counter. That way hrtimers will pick
up the early offset immediately. Otherwise on a machine which does not
set wall time later in the boot process the hrtimer offset is stale at
0 and wall time timers are going to expire with a delay of 45 years.

Fixes: 868a3e91 "hrtimer: Make offset update smarter"
Reported-and-tested-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Stefan Liebler <stli@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>

56fd16ca

03 10月, 2015 1 次提交

clocksource: Fix abs() usage w/ 64bit values · 67dfae0c

由 John Stultz 提交于 9月 14, 2015

This patch fixes one cases where abs() was being used with 64-bit
nanosecond values, where the result may be capped at 32-bits.

This potentially could cause watchdog false negatives on 32-bit
systems, so this patch addresses the issue by using abs64().
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1442279124-7309-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

67dfae0c

14 9月, 2015 1 次提交

clockevents: Remove unused set_mode() callback · eef7635a

由 Viresh Kumar 提交于 9月 11, 2015

All users are migrated to the per-state callbacks, get rid of the
unused interface and the core support code.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: linaro-kernel@lists.linaro.org
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/fd60de14cf6d125489c031207567bb255ad946f6.1441943991.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

eef7635a

13 9月, 2015 1 次提交

time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64() · 2619d7e9

由 John Stultz 提交于 9月 09, 2015

The internal clocksteering done for fine-grained error
correction uses a logarithmic approximation, so any time
adjtimex() adjusts the clock steering, timekeeping_freqadjust()
quickly approximates the correct clock frequency over a series
of ticks.

Unfortunately, the logic in timekeeping_freqadjust(), introduced
in commit:

  dc491596 ("timekeeping: Rework frequency adjustments to work better w/ nohz")

used the abs() function with a s64 error value to calculate the
size of the approximated adjustment to be made.

Per include/linux/kernel.h:

  "abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()".

Thus on 32-bit platforms, this resulted in the clocksteering to
take a quite dampended random walk trying to converge on the
proper frequency, which caused the adjustments to be made much
slower then intended (most easily observed when large
adjustments are made).

This patch fixes the issue by using abs64() instead.
Reported-by: NNuno Gonçalves <nunojpg@gmail.com>
Tested-by: NNuno Goncalves <nunojpg@gmail.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: <stable@vger.kernel.org> # v3.17+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

2619d7e9

02 9月, 2015 1 次提交

nohz: Assert existing housekeepers when nohz full enabled · 7c8bb6cb

由 Frederic Weisbecker 提交于 9月 01, 2015

The code ensures that when nohz full is running, at least the
boot CPU serves as a housekeeper and it can't be later offlined.

Let's assert this assumption to make sure that we have CPUs to
handle unbound jobs like workqueues and timers while nohz full
CPUs run undisturbed.

Also improve the comments on housekeeper offlining prevention.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Vatika Harlalka <vatikaharlalka@gmail.com>
Link: http://lkml.kernel.org/r/1441119060-2230-3-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

7c8bb6cb

22 8月, 2015 1 次提交

hrtimer: Handle failure of tick_init_highres() gracefully · 85e1cd6e

由 Guenter Roeck 提交于 8月 22, 2015

Commit 75e3b37d ("hrtimer: Drop return code of hrtimer_switch_to_hres()")
drops the return code of hrtimer_switch_to_hres(). While doing so, it also
drops the return statement itself on failure. This may cause a system hang.
Seen when running arm:multi_v7_defconfig in qemu with devicetree file
vexpress-v2p-ca9.

Fixes: 75e3b37d ("hrtimer: Drop return code of hrtimer_switch_to_hres()")
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/1440231047-16256-1-git-send-email-linux@roeck-us.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

85e1cd6e

19 8月, 2015 2 次提交

hrtimer: Unconfuse switch_hrtimer_base() a bit · b48362d8

由 Frederic Weisbecker 提交于 8月 18, 2015

The variable called "this_base" is confusing because its name suggests
it's of "struct hrtimer_clock_base" type, along with "base" and "new_base"
which doesn't help understanding this complicated function.

Make its name clearer and fix the misleading comment while at it.

[ tglx: Fixed the comment for real ]
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1439907509-9553-3-git-send-email-fweisbec@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

b48362d8

hrtimer: Simplify get_target_base() by returning current base · 662b3e19

由 Frederic Weisbecker 提交于 8月 18, 2015

Instead of fetching again the current cpu base, just take it from the
parameter.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1439907509-9553-2-git-send-email-fweisbec@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

662b3e19

18 8月, 2015 8 次提交

timer: Write timer->flags atomically · d0023a14

由 Eric Dumazet 提交于 8月 17, 2015

lock_timer_base() cannot prevent the following :

CPU1 ( in __mod_timer()
timer->flags |= TIMER_MIGRATING;
spin_unlock(&base->lock);
base = new_base;
spin_lock(&base->lock);
// The next line clears TIMER_MIGRATING
timer->flags &= ~TIMER_BASEMASK;
                                  CPU2 (in lock_timer_base())
                                  see timer base is cpu0 base
                                  spin_lock_irqsave(&base->lock, *flags);
                                  if (timer->flags == tf)
                                       return base; // oops, wrong base
timer->flags |= base->cpu // too late

We must write timer->flags in one go, otherwise we can fool other cpus.

Fixes: bc7a34b8 ("timer: Reduce timer migration overhead if disabled")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jon Christopherson <jon@jons.org>
Cc: David Miller <davem@davemloft.net>
Cc: xen-devel@lists.xen.org
Cc: david.vrabel@citrix.com
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Link: http://lkml.kernel.org/r/1439831928.32680.11.camel@edumazet-glaptop2.roam.corp.google.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>

d0023a14

hrtimer: Drop return code of hrtimer_switch_to_hres() · 75e3b37d

由 Luiz Capitulino 提交于 8月 11, 2015

It's not checked by the caller.
Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
Link: http://lkml.kernel.org/r/20150811164043.538241ef@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

75e3b37d

time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64() · 9ca30850

由 Baolin Wang 提交于 7月 29, 2015

The conversion between struct timespec and jiffies is not year 2038
safe on 32bit systems. Introduce timespec64_to_jiffies() and
jiffies_to_timespec64() functions which use struct timespec64 to
make it ready for 2038 issue.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBaolin Wang <baolin.wang@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

9ca30850

time: Introduce current_kernel_time64() · 8758a240

由 Baolin Wang 提交于 7月 29, 2015

The current_kernel_time() is not year 2038 safe on 32bit systems
since it returns a timespec value. Introduce current_kernel_time64()
which returns a timespec64 value.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBaolin Wang <baolin.wang@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

8758a240

time: Add the common weak version of update_persistent_clock() · 7494e9ee

由 Xunlei Pang 提交于 7月 26, 2015

The weak update_persistent_clock64() calls update_persistent_clock(),
if the architecture defines an update_persistent_clock64() to replace
and remove its update_persistent_clock() version, when building the
kernel the linker will throw an undefined symbol error, that is, any
arch that switches to update_persistent_clock64() will have this issue.

To solve the issue, we add the common weak update_persistent_clock().

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

7494e9ee

time: Always make sure wall_to_monotonic isn't positive · e1d7ba87

由 Wang YanQing 提交于 6月 23, 2015

Two issues were found on an IMX6 development board without an
enabled RTC device(resulting in the boot time and monotonic
time being initialized to 0).

Issue 1:exportfs -a generate:
       "exportfs: /opt/nfs/arm does not support NFS export"
Issue 2:cat /proc/stat:
       "btime 4294967236"

The same issues can be reproduced on x86 after running the
following code:
	int main(void)
	{
	    struct timeval val;
	    int ret;

	    val.tv_sec = 0;
	    val.tv_usec = 0;
	    ret = settimeofday(&val, NULL);
	    return 0;
	}

Two issues are different symptoms of same problem:
The reason is a positive wall_to_monotonic pushes boot time back
to the time before Epoch, and getboottime will return negative
value.

In symptom 1:
          negative boot time cause get_expiry() to overflow time_t
          when input expire time is 2147483647, then cache_flush()
          always clears entries just added in ip_map_parse.
In symptom 2:
          show_stat() uses "unsigned long" to print negative btime
          value returned by getboottime.

This patch fix the problem by prohibiting time from being set to a value which
would cause a negative boot time. As a result one can't set the CLOCK_REALTIME
time prior to (1970 + system uptime).

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NWang YanQing <udknight@gmail.com>
[jstultz: reworded commit message]
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

e1d7ba87

time: Fix nanosecond file time rounding in timespec_trunc() · de4a95fa

由 Karsten Blees 提交于 6月 25, 2015

timespec_trunc() avoids rounding if granularity <= nanoseconds-per-jiffie
(or TICK_NSEC). This optimization assumes that:

 1. current_kernel_time().tv_nsec is already rounded to TICK_NSEC (i.e.
    with HZ=1000 you'd get 1000000, 2000000, 3000000... but never 1000001).
    This is no longer true (probably since hrtimers introduced in 2.6.16).

 2. TICK_NSEC is evenly divisible by all possible granularities. This may
    be true for HZ=100, 250, 1000, but obviously not for HZ=300 /
    TICK_NSEC=3333333 (introduced in 2.6.20).

Thus, sub-second portions of in-core file times are not rounded to on-disk
granularity. I.e. file times may change when the inode is re-read from disk
or when the file system is remounted.

This affects all file systems with file time granularities > 1 ns and < 1s,
e.g. CEPH (1000 ns), UDF (1000 ns), CIFS (100 ns), NTFS (100 ns) and FUSE
(configurable from user mode via struct fuse_init_out.time_gran).

Steps to reproduce with e.g. UDF:

  $ dd if=/dev/zero of=udfdisk count=10000 && mkudffs udfdisk
  $ mkdir udf && mount udfdisk udf
  $ touch udf/test && stat -c %y udf/test
  2015-06-09 10:22:56.130006767 +0200
  $ umount udf && mount udfdisk udf
  $ stat -c %y udf/test
  2015-06-09 10:22:56.130006000 +0200

Remounting truncates the mtime to 1 µs.

Fix the rounding in timespec_trunc() and update the documentation.

timespec_trunc() is exclusively used to calculate inode's [acm]time (mostly
via current_fs_time()), and always with super_block.s_time_gran as second
argument. So this can safely be changed without side effects.

Note: This does _not_ fix the issue for FAT's 2 second mtime resolution,
as super_block.s_time_gran isn't prepared to handle different ctime /
mtime / atime resolutions nor resolutions > 1 second.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NKarsten Blees <blees@dcon.de>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

de4a95fa

timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers · 38bf985b

由 John Stultz 提交于 5月 27, 2015

I noticed for non-monotonic timers in timer_list, some of the
output looked a little confusing.

For example:
 #1: <0000000000000000>, posix_timer_fn, S:01, hrtimer_start_range_ns, leap-a-day/2360
 # expires at 1434412800000000000-1434412800000000000 nsecs [in 1434410725062375469 to 1434410725062375469 nsecs]

You'll note the relative time till the expiration "[in xxx to
yyy nsecs]" is incorrect. This is because its printing the delta
between CLOCK_MONOTONIC time to the CLOCK_REALTIME expiration.

This patch fixes this issue by adding the clock offset to the
"now" time which we use to calculate the delta.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

38bf985b

10 8月, 2015 1 次提交

kernel: broadcast-hrtimer: Migrate to new 'set-state' interface · ecbebcb8

由 Viresh Kumar 提交于 7月 16, 2015

Migrate broadcast-hrtimer driver to the new 'set-state' interface
provided by clockevents core, the earlier 'set-mode' interface is marked
obsolete now.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

ecbebcb8

01 8月, 2015 1 次提交

clockevents: Drop redundant cpumask check in tick_check_new_device() · d74892c5

由 Luiz Capitulino 提交于 7月 29, 2015

The same check is performed by tick_check_percpu().
Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
Link: http://lkml.kernel.org/r/20150729151417.069d1bb0@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

d74892c5

29 7月, 2015 5 次提交

nohz: Remove useless argument on tick_nohz_task_switch() · de734f89

由 Frederic Weisbecker 提交于 6月 11, 2015

Leftover from early code.

Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

de734f89

nohz: Move tick_nohz_restart_sched_tick() above its users · 59d2c7ca

由 Frederic Weisbecker 提交于 5月 29, 2015

Fix the function declaration/definition dance.
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

59d2c7ca

nohz: Restart nohz full tick from irq exit · 73738a95

由 Frederic Weisbecker 提交于 5月 27, 2015

Restart the tick when necessary from the irq exit path. It makes nohz
full more flexible, simplify the related IPIs and doesn't bring
significant overhead on irq exit.

In a longer term view, it will allow us to piggyback the nohz kick
on the scheduler IPI in the future instead of sending a dedicated IPI
that often doubles the scheduler IPI on task wakeup. This will require
more changes though including careful review of resched_curr() callers
to include nohz full needs.
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

73738a95

nohz: Remove idle task special case · 59449359

由 Frederic Weisbecker 提交于 5月 27, 2015

On nohz full early days, idle dynticks and full dynticks weren't well
integrated and we couldn't risk full dynticks calls on idle without
risking messing up tick idle statistics. This is why we prevented such
thing to happen.

Nowadays full dynticks and idle dynticks are better integrated and
interact without known issue.

So lets remove that.
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

59449359

jiffies: Remove HZ > USEC_PER_SEC special case · e0758676

由 Frederic Weisbecker 提交于 10月 10, 2014

HZ never goes much further 1000 and a bit. And if we ever reach one tick
per microsecond, we might be having a problem.

Lets stop maintaining this special case, just leave a paranoid check.
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc; John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

e0758676

14 7月, 2015 1 次提交

tick: Move the export of tick_broadcast_oneshot_control to the proper place · 0f447051

由 Thomas Gleixner 提交于 7月 14, 2015

tick_broadcast_oneshot_control got moved from tick-broadcast to
tick-common, but the export stayed in the old place. Fix it up.

Fixes: f32dd117 'tick/broadcast: Make idle check independent from mode and config'
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

0f447051

11 7月, 2015 1 次提交

tick/broadcast: Prevent NULL pointer dereference · c4d029f2

由 Thomas Gleixner 提交于 7月 11, 2015

Dan reported that the recent changes to the broadcast code introduced
a potential NULL dereference.

Add the proper check.

Fixes: e0454311 "tick/broadcast: Sanity check the shutdown of the local clock_event"
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

c4d029f2

08 7月, 2015 9 次提交

tick/broadcast: Handle spurious interrupts gracefully · c4288334

由 Thomas Gleixner 提交于 7月 05, 2015

Andriy reported that on a virtual machine the warning about negative
expiry time in the clock events programming code triggered:

hpet: hpet0 irq 40 for MSI
hpet: hpet1 irq 41 for MSI
Switching to clocksource hpet
WARNING: at kernel/time/clockevents.c:239

[<ffffffff810ce6eb>] clockevents_program_event+0xdb/0xf0
[<ffffffff810cf211>] tick_handle_periodic_broadcast+0x41/0x50
[<ffffffff81016525>] timer_interrupt+0x15/0x20

When the second hpet is installed as a per cpu timer the broadcast
event is not longer required and stopped, which sets the next_evt of
the broadcast device to KTIME_MAX.

If after that a spurious interrupt happens on the broadcast device,
then the current code blindly handles it and tries to reprogram the
broadcast device afterwards, which adds the period to
next_evt. KTIME_MAX + period results in a negative expiry value
causing the WARN_ON in the clockevents code to trigger.

Add a proper check for the state of the broadcast device into the
interrupt handler and return if the interrupt is spurious.

[ Folded in pointer fix from Sudeep ]
Reported-by: NAndriy Gapon <avg@FreeBSD.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20150705205221.802094647@linutronix.de

c4288334

tick/broadcast: Check for hrtimer broadcast active early · d5113e13

由 Thomas Gleixner 提交于 7月 07, 2015

If the current cpu is the one which has the hrtimer based broadcast
queued then we better return busy immediately instead of going through
loops and hoops to figure that out.

[ Split out from a larger combo patch ]
Tested-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

d5113e13

tick/broadcast: Return busy when IPI is pending · 0cc5281a

由 Thomas Gleixner 提交于 7月 07, 2015

Tell the idle code not to go deep if the broadcast IPI is about to
arrive.

[ Split out from a larger combo patch ]
Tested-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

0cc5281a

tick/broadcast: Return busy if periodic mode and hrtimer broadcast · d3325726

由 Thomas Gleixner 提交于 7月 07, 2015

If the system is in periodic mode and the broadcast device is hrtimer
based, return busy as we have no proper handling for this.

[ Split out from a larger combo patch ]
Tested-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

d3325726

tick/broadcast: Move the check for periodic mode inside state handling · e3ac79e0

由 Thomas Gleixner 提交于 7月 07, 2015

We need to check more than the periodic mode for proper operation in
all runtime combinations. To avoid code duplication move the check
into the enter state handling.

No functional change.

[ Split out from a larger combo patch ]
Reported-and-tested-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

e3ac79e0

tick/broadcast: Prevent deep idle if no broadcast device available · b78f3f3c

由 Thomas Gleixner 提交于 7月 07, 2015

Add a check for a installed broadcast device to the oneshot control
function and return busy if not.

[ Split out from a larger combo patch ]
Reported-and-tested-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

b78f3f3c

tick/broadcast: Make idle check independent from mode and config · f32dd117

由 Thomas Gleixner 提交于 7月 07, 2015

Currently the broadcast busy check, which prevents the idle code from
going into deep idle, works only in one shot mode.

If NOHZ and HIGHRES are off (config or command line) there is no
sanity check at all, so under certain conditions cpus are allowed to
go into deep idle, where the local timer stops, and are not woken up
again because there is no broadcast timer installed or a hrtimer based
broadcast device is not evaluated.

Move tick_broadcast_oneshot_control() into the common code and provide
proper subfunctions for the various config combinations.

The common check in tick_broadcast_oneshot_control() is for the C3STOP
misfeature flag of the local clock event device. If its not set, idle
can proceed. If set, further checks are necessary.

Provide checks for the trivial cases:

 - If broadcast is disabled in the config, then return busy

 - If oneshot mode (NOHZ/HIGHES) is disabled in the config, return
   busy if the broadcast device is hrtimer based.

 - If oneshot mode is enabled in the config call the original
   tick_broadcast_oneshot_control() function. That function needs
   extra checks which will be implemented in seperate patches.

[ Split out from a larger combo patch ]
Reported-and-tested-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

f32dd117

tick/broadcast: Sanity check the shutdown of the local clock_event · e0454311

由 Thomas Gleixner 提交于 7月 07, 2015

The broadcast code shuts down the local clock event unconditionally
even if no broadcast device is installed or if the broadcast device is
hrtimer based.

Add proper sanity checks.

[ Split out from a larger combo patch ]
Reported-and-tested-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

e0454311

tick/broadcast: Prevent hrtimer recursion · 8eb23126

由 Thomas Gleixner 提交于 7月 07, 2015

The hrtimer based broadcast vehicle can cause a hrtimer recursion
which went unnoticed until we changed the hrtimer expiry code to keep
track of the currently running timer.

local_timer_interrupt()
  local_handler()
    hrtimer_interrupt()
      expire_hrtimers()
        broadcast_hrtimer()
	  send_ipis()
	  local_handler()
	    hrtimer_interrupt()
	     ....

Solution is simple: Prevent the local handler call from the broadcast
code when the broadcast 'device' is hrtimer based.

[ Split out from a larger combo patch ]
Tested-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos

8eb23126

07 7月, 2015 2 次提交

clockevents: Allow set-state callbacks to be optional · 7c4a976c

由 Viresh Kumar 提交于 7月 07, 2015

Its mandatory for the drivers to provide set_state_{oneshot|periodic}()
(only if related modes are supported) and set_state_shutdown() callbacks
today, if they are implementing the new set-state interface.

But this leads to unnecessary noop callbacks for drivers which don't
want to implement them. Over that, it will lead to a full function call
for nothing really useful.

Lets make all set-state callbacks optional.
Suggested-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Link: http://lkml.kernel.org/r/1436256875-15562-1-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

7c4a976c

rcu: Drop RCU_USER_QS in favor of NO_HZ_FULL · d1ec4c34

由 Paul E. McKenney 提交于 5月 13, 2015

The RCU_USER_QS Kconfig parameter is now just a synonym for NO_HZ_FULL,
so this commit eliminates RCU_USER_QS, replacing all uses with NO_HZ_FULL.
Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>

d1ec4c34

01 7月, 2015 1 次提交

time: Remove development rules from Kbuild/Makefile · 65f26062

由 Thomas Gleixner 提交于 7月 01, 2015

time.o gets rebuilt unconditionally due to a leftover Makefile rule
which was placed there for development purposes.

Remove it along with the commented out always rule in the toplevel
Kbuild file.

Fixes: 0a227985 'time: Move timeconst.h into include/generated'
Reported-by; Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Nicholas Mc Guire <der.herr@hofr.at>

65f26062

27 6月, 2015 1 次提交

timer: Fix hotplug regression · 24bfcb10

由 Thomas Gleixner 提交于 6月 26, 2015

The recent timer wheel rework removed the get/put_cpu_var() pair in
the hotplug migration code, which results in:

BUG: using smp_processor_id() in preemptible [00000000] code: hib.sh/2845
...
[<ffffffff810d4fa3>] timer_cpu_notify+0x53/0x12

That hunk is a leftover from an earlier iteration and went unnoticed
so far.

Restore the previous code which was obviously correct.

Fixes: 0eeda71b 'timer: Replace timer base by a cpu index'
Reported-and_tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

24bfcb10

19 6月, 2015 2 次提交

timer: Minimize nohz off overhead · 683be13a

由 Thomas Gleixner 提交于 5月 26, 2015

If nohz is disabled on the kernel command line the [hr]timer code
still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty
pointless exercise. Cache nohz_active in [hr]timer per cpu bases and
avoid the overhead.

Before:
  48.10%  hog       [.] main
  15.25%  [kernel]  [k] _raw_spin_lock_irqsave
   9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.50%  [kernel]  [k] mod_timer
   6.44%  [kernel]  [k] lock_timer_base.isra.38
   3.87%  [kernel]  [k] detach_if_pending
   3.80%  [kernel]  [k] del_timer
   2.67%  [kernel]  [k] internal_add_timer
   1.33%  [kernel]  [k] __internal_add_timer
   0.73%  [kernel]  [k] timerfn
   0.54%  [kernel]  [k] wake_up_nohz_cpu

After:
  48.73%  hog       [.] main
  15.36%  [kernel]  [k] _raw_spin_lock_irqsave
   9.77%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.61%  [kernel]  [k] lock_timer_base.isra.38
   6.42%  [kernel]  [k] mod_timer
   3.90%  [kernel]  [k] detach_if_pending
   3.76%  [kernel]  [k] del_timer
   2.41%  [kernel]  [k] internal_add_timer
   1.39%  [kernel]  [k] __internal_add_timer
   0.76%  [kernel]  [k] timerfn

We probably should have a cached value for nohz full in the per cpu
bases as well to avoid the cpumask check. The base cache line is hot
already, the cpumask not necessarily.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.207378134@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

683be13a

timer: Reduce timer migration overhead if disabled · bc7a34b8

由 Thomas Gleixner 提交于 5月 26, 2015

Eric reported that the timer_migration sysctl is not really nice
performance wise as it needs to check at every timer insertion whether
the feature is enabled or not. Further the check does not live in the
timer code, so we have an extra function call which checks an extra
cache line to figure out that it is disabled.

We can do better and store that information in the per cpu (hr)timer
bases. I pondered to use a static key, but that's a nightmare to
update from the nohz code and the timer base cache line is hot anyway
when we select a timer base.

The old logic enabled the timer migration unconditionally if
CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
line.

With this modification, we start off with migration disabled. The user
visible sysctl is still set to enabled. If the kernel switches to NOHZ
migration is enabled, if the user did not disable it via the sysctl
prior to the switch. If nohz=off is on the kernel command line,
migration stays disabled no matter what.

Before:
  47.76%  hog       [.] main
  14.84%  [kernel]  [k] _raw_spin_lock_irqsave
   9.55%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.71%  [kernel]  [k] mod_timer
   6.24%  [kernel]  [k] lock_timer_base.isra.38
   3.76%  [kernel]  [k] detach_if_pending
   3.71%  [kernel]  [k] del_timer
   2.50%  [kernel]  [k] internal_add_timer
   1.51%  [kernel]  [k] get_nohz_timer_target
   1.28%  [kernel]  [k] __internal_add_timer
   0.78%  [kernel]  [k] timerfn
   0.48%  [kernel]  [k] wake_up_nohz_cpu

After:
  48.10%  hog       [.] main
  15.25%  [kernel]  [k] _raw_spin_lock_irqsave
   9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.50%  [kernel]  [k] mod_timer
   6.44%  [kernel]  [k] lock_timer_base.isra.38
   3.87%  [kernel]  [k] detach_if_pending
   3.80%  [kernel]  [k] del_timer
   2.67%  [kernel]  [k] internal_add_timer
   1.33%  [kernel]  [k] __internal_add_timer
   0.73%  [kernel]  [k] timerfn
   0.54%  [kernel]  [k] wake_up_nohz_cpu
Reported-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

bc7a34b8

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功