提交 · a5a1d1c2914b5316924c7893eb683a5420ebd3be · openanolis / cloud-kernel

25 12月, 2016 2 次提交

clocksource: Use a plain u64 instead of cycle_t · a5a1d1c2

由 Thomas Gleixner 提交于 12月 21, 2016

There is no point in having an extra type for extra confusion. u64 is
unambiguous.

Conversion was done with the following coccinelle script:

@rem@
@@
-typedef u64 cycle_t;

@fix@
typedef cycle_t;
@@
-cycle_t
+u64
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>

a5a1d1c2

Replace <asm/uaccess.h> with <linux/uaccess.h> globally · 7c0f6ba6

由 Linus Torvalds 提交于 12月 24, 2016

This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.
Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c0f6ba6

15 12月, 2016 2 次提交

tick/broadcast: Prevent NULL pointer dereference · c1a9eeb9

由 Thomas Gleixner 提交于 12月 15, 2016

When a disfunctional timer, e.g. dummy timer, is installed, the tick core
tries to setup the broadcast timer.

If no broadcast device is installed, the kernel crashes with a NULL pointer
dereference in tick_broadcast_setup_oneshot() because the function has no
sanity check.
Reported-by: NMason <slash.tmp@free.fr>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Richard Cochran <rcochran@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Sebastian Frias <sf84@laposte.net>
Cc: Thibaud Cornic <thibaud_cornic@sigmadesigns.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr

c1a9eeb9

posix-timers: give lazy compilers some help optimizing code away · b6f8a92c

由 Nicolas Pitre 提交于 12月 14, 2016

The OpenRISC compiler (so far) fails to optimize away a large portion of
code containing a reference to posix_timer_event in alarmtimer.c when
CONFIG_POSIX_TIMERS is unset.  Let's give it a direct clue to let the
build succeed.

This fixes
[linux-next:master 6682/7183] alarmtimer.c:undefined reference to `posix_timer_event'
reported by kbuild test robot.
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b6f8a92c

09 12月, 2016 4 次提交

timekeeping: Use mul_u64_u32_shr() instead of open coding it · c029a2be

由 Thomas Gleixner 提交于 12月 08, 2016

The resume code must deal with a clocksource delta which is potentially big
enough to overflow the 64bit mult.

Replace the open coded handling with the proper function.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Parit Bhargava <prarit@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Liav Rehana <liavr@mellanox.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20161208204228.921674404@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

c029a2be

timekeeping: Get rid of pointless typecasts · cbd99e3b

由 Thomas Gleixner 提交于 12月 08, 2016

cycle_t is defined as u64, so casting it to u64 is a pointless and
confusing exercise. cycle_t should simply go away and be replaced with a
plain u64 to avoid further confusion.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Parit Bhargava <prarit@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Liav Rehana <liavr@mellanox.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20161208204228.844699737@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

cbd99e3b

timekeeping: Make the conversion call chain consistently unsigned · acc89612

由 Thomas Gleixner 提交于 12月 08, 2016

Propagating a unsigned value through signed variables and functions makes
absolutely no sense and is just prone to (re)introduce subtle signed
vs. unsigned issues as happened recently.

Clean it up.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Parit Bhargava <prarit@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Liav Rehana <liavr@mellanox.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20161208204228.765843099@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

acc89612

timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion · 9c164572

由 Thomas Gleixner 提交于 12月 08, 2016

The clocksource delta to nanoseconds conversion is using signed math, but
the delta is unsigned. This makes the conversion space smaller than
necessary and in case of a multiplication overflow the conversion can
become negative. The conversion is done with scaled math:

    s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;

Shifting a signed integer right obvioulsy preserves the sign, which has
interesting consequences:
 
 - Time jumps backwards
 
 - __iter_div_u64_rem() which is used in one of the calling code pathes
   will take forever to piecewise calculate the seconds/nanoseconds part.

This has been reported by several people with different scenarios:

David observed that when stopping a VM with a debugger:

 "It was essentially the stopped by debugger case.  I forget exactly why,
  but the guest was being explicitly stopped from outside, it wasn't just
  scheduling lag.  I think it was something in the vicinity of 10 minutes
  stopped."

 When lifting the stop the machine went dead.

The stopped by debugger case is not really interesting, but nevertheless it
would be a good thing not to die completely.

But this was also observed on a live system by Liav:

 "When the OS is too overloaded, delta will get a high enough value for the
  msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
  after the shift the nsec variable will gain a value similar to
  0xffffffffff000000."

Unfortunately this has been reintroduced recently with commit 6bd58f09
("time: Add cycles to nanoseconds translation"). It had been fixed a year
ago already in commit 35a4933a ("time: Avoid signed overflow in
timekeeping_get_ns()").

Though it's not surprising that the issue has been reintroduced because the
function itself and the whole call chain uses s64 for the result and the
propagation of it. The change in this recent commit is subtle:

   s64 nsec;

-  nsec = (d * m + n) >> s:
+  nsec = d * m + n;
+  nsec >>= s;

d being type of cycle_t adds another level of obfuscation.

This wouldn't have happened if the previous change to unsigned computation
would have made the 'nsec' variable u64 right away and a follow up patch
had cleaned up the whole call chain.

There have been patches submitted which basically did a revert of the above
patch leaving everything else unchanged as signed. Back to square one. This
spawned a admittedly pointless discussion about potential users which rely
on the unsigned behaviour until someone pointed out that it had been fixed
before. The changelogs of said patches added further confusion as they made
finally false claims about the consequences for eventual users which expect
signed results.

Despite delta being cycle_t, aka. u64, it's very well possible to hand in
a signed negative value and the signed computation will happily return the
correct result. But nobody actually sat down and analyzed the code which
was added as user after the propably unintended signed conversion.

Though in sensitive code like this it's better to analyze it proper and
make sure that nothing relies on this than hunting the subtle wreckage half
a year later. After analyzing all call chains it stands that no caller can
hand in a negative value (which actually would work due to the s64 cast)
and rely on the signed math to do the right thing.

Change the conversion function to unsigned math. The conversion of all call
chains is done in a follow up patch.

This solves the starvation issue, which was caused by the negative result,
but it does not solve the underlying problem. It merily procrastinates
it. When the timekeeper update is deferred long enough that the unsigned
multiplication overflows, then time going backwards is observable again.

It does neither solve the issue of clocksources with a small counter width
which will wrap around possibly several times and cause random time stamps
to be generated. But those are usually not found on systems used for
virtualization, so this is likely a non issue.

I took the liberty to claim authorship for this simply because
analyzing all callsites and writing the changelog took substantially
more time than just making the simple s/s64/u64/ change and ignore the
rest.

Fixes: 6bd58f09 ("time: Add cycles to nanoseconds translation")
Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reported-by: NLiav Rehana <liavr@mellanox.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Parit Bhargava <prarit@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

9c164572

08 12月, 2016 1 次提交

clocksource: export the clocks_calc_mult_shift to use by timestamp code · 5304121a

由 Murali Karicheri 提交于 12月 06, 2016

The CPSW CPTS driver is capable of doing timestamping on tx/rx packets and
requires to know mult and shift factors for timestamp conversion from raw
value to nanoseconds (ptp clock). Now these mult and shift factors are
calculated manually and provided through DT, which makes very hard to
support of a lot number of platforms, especially if CPTS refclk is not the
same for some kind of boards and depends on efuse settings (Keystone 2
platforms). Hence, export clocks_calc_mult_shift() to allow drivers like
CPSW CPTS (and other ptp drivesr) to benefit from automaitc calculation of
mult and shift factors.

Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: NMurali Karicheri <m-karicheri2@ti.com>
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5304121a

01 12月, 2016 1 次提交

alarmtimer: Add tracepoints for alarm timers · 4a057549

由 Baolin Wang 提交于 11月 28, 2016

Alarm timers are one of the mechanisms to wake up a system from suspend,
but there exist no tracepoints to analyse which process/thread armed an
alarmtimer.

Add tracepoints for start/cancel/expire of individual alarm timers and one
for tracing the suspend time decision when to resume the system.

The following trace excerpt illustrates the new mechanism:

Binder:3292_2-3304  [000] d..2   149.981123: alarmtimer_cancel:
alarmtimer:ffffffc1319a7800 type:REALTIME
expires:1325463120000000000 now:1325376810370370245

Binder:3292_2-3304  [000] d..2   149.981136: alarmtimer_start:
alarmtimer:ffffffc1319a7800 type:REALTIME
expires:1325376840000000000 now:1325376810370384591

Binder:3292_9-3953  [000] d..2   150.212991: alarmtimer_cancel:
alarmtimer:ffffffc1319a5a00 type:BOOTTIME
expires:179552000000 now:150154008122

Binder:3292_9-3953  [000] d..2   150.213006: alarmtimer_start:
alarmtimer:ffffffc1319a5a00 type:BOOTTIME
expires:179551000000 now:150154025622

system_server-3000  [002] ...1  162.701940: alarmtimer_suspend:
alarmtimer type:REALTIME expires:1325376840000000000

The wakeup time which is selected at suspend time allows to map it back to
the task arming the timer: Binder:3292_2.

[ tglx: Store alarm timer expiry time instead of some useless RTC relative
  	information, add proper type information for wakeups which are
  	handled via the clock_nanosleep/freezer and massage the changelog. ]
Signed-off-by: NBaolin Wang <baolin.wang@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Link: http://lkml.kernel.org/r/1480372524-15181-5-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

4a057549

30 11月, 2016 1 次提交

timekeeping: Add a fast and NMI safe boot clock · 948a5312

由 Joel Fernandes 提交于 11月 28, 2016

This boot clock can be used as a tracing clock and will account for
suspend time.

To keep it NMI safe since we're accessing from tracing, we're not using a
separate timekeeper with updates to monotonic clock and boot offset
protected with seqlocks. This has the following minor side effects:

(1) Its possible that a timestamp be taken after the boot offset is updated
but before the timekeeper is updated. If this happens, the new boot offset
is added to the old timekeeping making the clock appear to update slightly
earlier:
   CPU 0                                        CPU 1
   timekeeping_inject_sleeptime64()
   __timekeeping_inject_sleeptime(tk, delta);
                                                timestamp();
   timekeeping_update(tk, TK_CLEAR_NTP...);

(2) On 32-bit systems, the 64-bit boot offset (tk->offs_boot) may be
partially updated.  Since the tk->offs_boot update is a rare event, this
should be a rare occurrence which postprocessing should be able to handle.
Signed-off-by: NJoel Fernandes <joelaf@google.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1480372524-15181-6-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

948a5312

23 11月, 2016 1 次提交

sched/nohz: Convert to hotplug state machine · 31eff243

由 Sebastian Andrzej Siewior 提交于 11月 17, 2016

Install the callbacks via the state machine.
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: rt@linuxtronix.de
Link: http://lkml.kernel.org/r/20161117183541.8588-14-bigeasy@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

31eff243

16 11月, 2016 3 次提交

posix-timers: Make them configurable · baa73d9e

由 Nicolas Pitre 提交于 11月 11, 2016

Some embedded systems have no use for them.  This removes about
25KB from the kernel binary size when configured out.

Corresponding syscalls are routed to a stub logging the attempt to
use those syscalls which should be enough of a clue if they were
disabled without proper consideration. They are: timer_create,
timer_gettime: timer_getoverrun, timer_settime, timer_delete,
clock_adjtime, setitimer, getitimer, alarm.

The clock_settime, clock_gettime, clock_getres and clock_nanosleep
syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME,
CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast
majority of use cases with very little code.
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: linux-kbuild@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Michal Marek <mmarek@suse.com>
Cc: Edward Cree <ecree@solarflare.com>
Link: http://lkml.kernel.org/r/1478841010-28605-7-git-send-email-nicolas.pitre@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

baa73d9e

posix_cpu_timers: Move the add_device_randomness() call to a proper place · 53d3eaa3

由 Nicolas Pitre 提交于 11月 11, 2016

There is no logical relation between add_device_randomness() and
posix_cpu_timers_exit(). Let's move the former to where the later
is called. This way, when posix-cpu-timers.c is compiled out, there
is no need to worry about not losing a call to add_device_randomness().
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: linux-kbuild@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Edward Cree <ecree@solarflare.com>
Link: http://lkml.kernel.org/r/1478841010-28605-6-git-send-email-nicolas.pitre@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

53d3eaa3

timer: Move sys_alarm from timer.c to itimer.c · 74ba181e

由 Nicolas Pitre 提交于 11月 11, 2016

Move the only user of alarm_setitimer to itimer.c where it is defined.
This allows for making alarm_setitimer static, and dropping it from the
build when __ARCH_WANT_SYS_ALARM is not defined.
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: linux-kbuild@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Edward Cree <ecree@solarflare.com>
Link: http://lkml.kernel.org/r/1478841010-28605-5-git-send-email-nicolas.pitre@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

74ba181e

15 11月, 2016 1 次提交

sched/cputime: Simplify task_cputime() · 353c50eb

由 Stanislaw Gruszka 提交于 11月 15, 2016

Now since fetch_task_cputime() has no other users than task_cputime(),
its code could be used directly in task_cputime().

Moreover since only 2 task_cputime() calls of 17 use a NULL argument,
we can add dummy variables to those calls and remove NULL checks from
task_cputimes().

Also remove NULL checks from task_cputimes_scaled().
Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1479175612-14718-5-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

353c50eb

26 10月, 2016 2 次提交

timers: Fix documentation for schedule_timeout() and similar · 4b7e9cf9

由 Douglas Anderson 提交于 10月 21, 2016

The documentation for schedule_timeout(), schedule_hrtimeout(), and
schedule_hrtimeout_range() all claim that the routines couldn't possibly
return early if the task state was TASK_UNINTERRUPTIBLE. This is simply
not true since wake_up_process() will cause those routines to exit early.

We cannot make schedule_[hr]timeout() loop until the timeout expires if the
task state is uninterruptible because we have users which rely on the
existing and designed behaviour.

Make the documentation match the (correct) implementation.

schedule_hrtimeout() returns -EINTR even when a uninterruptible task was
woken up. This might look strange, but making the return code depend on the
state is too much of an effort as it would affect all the call sites. There
is no value in doing so, but we spell it out clearly in the documentation.
Suggested-by: NDaniel Kurtz <djkurtz@chromium.org>
Signed-off-by: NDouglas Anderson <dianders@chromium.org>
Cc: huangtao@rock-chips.com
Cc: heiko@sntech.de
Cc: broonie@kernel.org
Cc: briannorris@chromium.org
Cc: Andreas Mohr <andi@lisas.de>
Cc: linux-rockchip@lists.infradead.org
Cc: tony.xie@rock-chips.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux@roeck-us.net
Cc: tskd08@gmail.com
Link: http://lkml.kernel.org/r/1477065531-30342-2-git-send-email-dianders@chromium.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

4b7e9cf9

timers: Fix usleep_range() in the context of wake_up_process() · 6c5e9059

由 Douglas Anderson 提交于 10月 21, 2016

Users of usleep_range() expect that it will _never_ return in less time
than the minimum passed parameter. However, nothing in the code ensures
this, when the sleeping task is woken by wake_up_process() or any other
mechanism which can wake a task from uninterruptible state.

Neither usleep_range() nor schedule_hrtimeout_range*() have any protection
against wakeups. schedule_hrtimeout_range*() is designed this way despite
the fact that the API documentation does not mention it.

msleep() already has code to handle this case since it will loop as long
as there was still time left.  usleep_range() has no such loop, add it.

Presumably this problem was not detected before because usleep_range() is
only used in a few places and the function is mostly used in contexts which
are not exposed to wakeups of any form.

An effort was made to look for users relying on the old behavior by
looking for usleep_range() in the same file as wake_up_process().
No problems were found by this search, though it is conceivable that
someone could have put the sleep and wakeup in two different files.

An effort was made to ask several upstream maintainers if they were aware
of people relying on wake_up_process() to wake up usleep_range(). No
maintainers were aware of that but they were aware of many people relying
on usleep_range() never returning before the minimum.
Reported-by: NTao Huang <huangtao@rock-chips.com>
Signed-off-by: NDouglas Anderson <dianders@chromium.org>
Cc: heiko@sntech.de
Cc: broonie@kernel.org
Cc: briannorris@chromium.org
Cc: Andreas Mohr <andi@lisas.de>
Cc: linux-rockchip@lists.infradead.org
Cc: tony.xie@rock-chips.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: djkurtz@chromium.org
Cc: linux@roeck-us.net
Cc: tskd08@gmail.com
Link: http://lkml.kernel.org/r/1477065531-30342-1-git-send-email-dianders@chromium.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

6c5e9059

25 10月, 2016 4 次提交

timers: Prevent base clock corruption when forwarding · 6bad6bcc

由 Thomas Gleixner 提交于 10月 22, 2016

When a timer is enqueued we try to forward the timer base clock. This
mechanism has two issues:

1) Forwarding a remote base unlocked

The forwarding function is called from get_target_base() with the current
timer base lock held. But if the new target base is a different base than
the current base (can happen with NOHZ, sigh!) then the forwarding is done
on an unlocked base. This can lead to corruption of base->clk.

Solution is simple: Invoke the forwarding after the target base is locked.

2) Possible corruption due to jiffies advancing

This is similar to the issue in get_net_timer_interrupt() which was fixed
in the previous patch. jiffies can advance between check and assignement
and therefore advancing base->clk beyond the next expiry value.

So we need to read jiffies into a local variable once and do the checks and
assignment with the local copy.

Fixes: a683f390("timers: Forward the wheel clock whenever possible")
Reported-by: NAshton Holmes <scoopta@gmail.com>
Reported-by: NMichael Thayer <michael.thayer@oracle.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Michal Necasek <michal.necasek@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: knut.osmundsen@oracle.com
Cc: stable@vger.kernel.org
Cc: stern@rowland.harvard.edu
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161022110552.253640125@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

6bad6bcc

timers: Prevent base clock rewind when forwarding clock · 041ad7bc

由 Thomas Gleixner 提交于 10月 22, 2016

Ashton and Michael reported, that kernel versions 4.8 and later suffer from
USB timeouts which are caused by the timer wheel rework.

This is caused by a bug in the base clock forwarding mechanism, which leads
to timers expiring early. The scenario which leads to this is:

run_timers()
  while (jiffies >= base->clk) {
    collect_expired_timers();
    base->clk++;
    expire_timers();
  }          

So base->clk = jiffies + 1. Now the cpu goes idle:

idle()
  get_next_timer_interrupt()
    nextevt = __next_time_interrupt();
    if (time_after(nextevt, base->clk))
       	base->clk = jiffies;

jiffies has not advanced since run_timers(), so this assignment effectively
decrements base->clk by one.

base->clk is the index into the timer wheel arrays. So let's assume the
following state after the base->clk increment in run_timers():

 jiffies = 0
 base->clk = 1

A timer gets enqueued with an expiry delta of 63 ticks (which is the case
with the USB timeout and HZ=250) so the resulting bucket index is:

  base->clk + delta = 1 + 63 = 64

The timer goes into the first wheel level. The array size is 64 so it ends
up in bucket 0, which is correct as it takes 63 ticks to advance base->clk
to index into bucket 0 again.

If the cpu goes idle before jiffies advance, then the bug in the forwarding
mechanism sets base->clk back to 0, so the next invocation of run_timers()
at the next tick will index into bucket 0 and therefore expire the timer 62
ticks too early.

Instead of blindly setting base->clk to jiffies we must make the forwarding
conditional on jiffies > base->clk, but we cannot use jiffies for this as
we might run into the following issue:

  if (time_after(jiffies, base->clk) {
    if (time_after(nextevt, base->clk))
       base->clk = jiffies;

jiffies can increment between the check and the assigment far enough to
advance beyond nextevt. So we need to use a stable value for checking.

get_next_timer_interrupt() has the basej argument which is the jiffies
value snapshot taken in the calling code. So we can just that.

Thanks to Ashton for bisecting and providing trace data!

Fixes: a683f390 ("timers: Forward the wheel clock whenever possible")
Reported-by: NAshton Holmes <scoopta@gmail.com>
Reported-by: NMichael Thayer <michael.thayer@oracle.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Michal Necasek <michal.necasek@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: knut.osmundsen@oracle.com
Cc: stable@vger.kernel.org
Cc: stern@rowland.harvard.edu
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161022110552.175308322@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

041ad7bc

timers: Lock base for same bucket optimization · 4da9152a

由 Thomas Gleixner 提交于 10月 24, 2016

Linus stumbled over the unlocked modification of the timer expiry value in
mod_timer() which is an optimization for timers which stay in the same
bucket - due to the bucket granularity - despite their expiry time getting
updated.

The optimization itself still makes sense even if we take the lock, because
in case that the bucket stays the same, we avoid the pointless
queue/enqueue dance.

Make the check and the modification of timer->expires protected by the base
lock and shuffle the remaining code around so we can keep the lock held
when we actually have to requeue the timer to a different bucket.

Fixes: f00c0afd ("timers: Implement optimization for same expiry time in mod_timer()")
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610241711220.4983@nanos
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>

4da9152a

timers: Plug locking race vs. timer migration · b831275a

由 Thomas Gleixner 提交于 10月 24, 2016

Linus noticed that lock_timer_base() lacks a READ_ONCE() for accessing the
timer flags. As a consequence the compiler is allowed to reload the flags
between the initial check for TIMER_MIGRATION and the following timer base
computation and the spin lock of the base.

While this has not been observed (yet), we need to make sure that it never
happens.

Fixes: 0eeda71b ("timer: Replace timer base by a cpu index")
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610241711220.4983@nanos
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>

b831275a

17 10月, 2016 1 次提交

alarmtimer: Remove unused but set variable · 54e23845

由 Tobias Klauser 提交于 10月 17, 2016

Remove the set but unused variable base in alarm_clock_get to fix the
following warning when building with 'W=1':

kernel/time/alarmtimer.c: In function ‘alarm_timer_create’:
kernel/time/alarmtimer.c:545:21: warning: variable ‘base’ set but not used [-Wunused-but-set-variable]
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20161017094702.10873-1-tklauser@distanz.chSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

54e23845

11 10月, 2016 1 次提交

latent_entropy: Mark functions with __latent_entropy · 0766f788

由 Emese Revfy 提交于 6月 20, 2016

The __latent_entropy gcc attribute can be used only on functions and
variables.  If it is on a function then the plugin will instrument it for
gathering control-flow entropy. If the attribute is on a variable then
the plugin will initialize it with random contents.  The variable must
be an integer, an integer array type or a structure with integer fields.

These specific functions have been selected because they are init
functions (to help gather boot-time entropy), are called at unpredictable
times, or they have variable loops, each of which provide some level of
latent entropy.
Signed-off-by: NEmese Revfy <re.emese@gmail.com>
[kees: expanded commit message]
Signed-off-by: NKees Cook <keescook@chromium.org>

0766f788

05 10月, 2016 1 次提交

timekeeping: Fix __ktime_get_fast_ns() regression · 58bfea95

由 John Stultz 提交于 10月 04, 2016

In commit 27727df2 ("Avoid taking lock in NMI path with
CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code
the timekeeping_get_ns() function, but I forgot to include
the unit conversion from cycles to nanoseconds, breaking the
function's output, which impacts users like perf.

This results in bogus perf timestamps like:
swapper 0 [000] 253.427536: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.426573: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.426687: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.426800: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.426905: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.427022: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.427127: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.427239: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.427346: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 254.427463: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 255.426572: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

Instead of more reasonable expected timestamps like:
swapper 0 [000] 39.953768: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.064839: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.175956: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.287103: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.398217: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.509324: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.620437: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.731546: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.842654: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 40.953772: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper 0 [000] 41.064881: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

Add the proper use of timekeeping_delta_to_ns() to convert
the cycle delta to nanoseconds as needed.

Thanks to Brendan and Alexei for finding this quickly after
the v4.8 release. Unfortunately the problematic commit has
landed in some -stable trees so they'll need this fix as
well.

Many apologies for this mistake. I'll be looking to add a
perf-clock sanity test to the kselftest timers tests soon.

Fixes: 27727df2 "timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING"
Reported-by: NBrendan Gregg <bgregg@netflix.com>
Reported-by: NAlexei Starovoitov <alexei.starovoitov@gmail.com>
Tested-and-reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable <stable@vger.kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1475636148-26539-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

58bfea95

13 9月, 2016 1 次提交

tick/nohz: Prevent stopping the tick on an offline CPU · 57ccdf44

由 Wanpeng Li 提交于 9月 07, 2016

can_stop_full_tick() has no check for offline cpus. So it allows to stop
the tick on an offline cpu from the interrupt return path, which is wrong
and subsequently makes irq_work_needs_cpu() warn about being called for an
offline cpu.

Commit f7ea0fd6 ("tick: Don't invoke tick_nohz_stop_sched_tick() if
the cpu is offline") added prevention for can_stop_idle_tick(), but forgot
to do the same in can_stop_full_tick(). Add it.

[ tglx: Massaged changelog ]
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1473245473-4463-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

57ccdf44

02 9月, 2016 1 次提交

tick/nohz: Fix softlockup on scheduler stalls in kvm guest · 08d07259

由 Wanpeng Li 提交于 9月 02, 2016

tick_nohz_start_idle() is prevented to be called if the idle tick can't 
be stopped since commit 1f3b0f82 ("tick/nohz: Optimize nohz idle 
enter"). As a result, after suspend/resume the host machine, full dynticks 
kvm guest will softlockup:

 NMI watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [swapper/0:0]
 Call Trace:
  default_idle+0x31/0x1a0
  arch_cpu_idle+0xf/0x20
  default_idle_call+0x2a/0x50
  cpu_startup_entry+0x39b/0x4d0
  rest_init+0x138/0x140
  ? rest_init+0x5/0x140
  start_kernel+0x4c1/0x4ce
  ? set_init_arg+0x55/0x55
  ? early_idt_handler_array+0x120/0x120
  x86_64_start_reservations+0x24/0x26
  x86_64_start_kernel+0x142/0x14f

In addition, cat /proc/stat | grep cpu in guest or host:

cpu  398 16 5049 15754 5490 0 1 46 0 0
cpu0 206 5 450 0 0 0 1 14 0 0
cpu1 81 0 3937 3149 1514 0 0 9 0 0
cpu2 45 6 332 6052 2243 0 0 11 0 0
cpu3 65 2 328 6552 1732 0 0 11 0 0

The idle and iowait states are weird 0 for cpu0(housekeeping). 

The bug is present in both guest and host kernels, and they both have 
cpu0's idle and iowait states issue, however, host kernel's suspend/resume 
path etc will touch watchdog to avoid the softlockup.

- The watchdog will not be touched in tick_nohz_stop_idle path (need be 
  touched since the scheduler stall is expected) if idle_active flags are 
  not detected.
- The idle and iowait states will not be accounted when exit idle loop 
  (resched or interrupt) if idle start time and idle_active flags are 
  not set. 

This patch fixes it by reverting commit 1f3b0f82 since can't stop 
idle tick doesn't mean can't be idle.

Fixes: 1f3b0f82 ("tick/nohz: Optimize nohz idle enter")
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Cc: Sanjeev Yadav<sanjeev.yadav@spreadtrum.com>
Cc: Gaurav Jindal<gaurav.jindal@spreadtrum.com>
Cc: stable@vger.kernel.org
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/1472798303-4154-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

08d07259

01 9月, 2016 5 次提交

time: Avoid undefined behaviour in ktime_add_safe() · 979515c5

由 Vegard Nossum 提交于 8月 13, 2016

I ran into this:

    ================================================================================
    UBSAN: Undefined behaviour in kernel/time/hrtimer.c:310:16
    signed integer overflow:
    9223372036854775807 + 50000 cannot be represented in type 'long long int'
    CPU: 2 PID: 4798 Comm: trinity-c2 Not tainted 4.8.0-rc1+ #91
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
     0000000000000000 ffff88010ce6fb88 ffffffff82344740 0000000041b58ab3
     ffffffff84f97a20 ffffffff82344694 ffff88010ce6fbb0 ffff88010ce6fb60
     000000000000c350 ffff88010ce6f968 dffffc0000000000 ffffffff857bc320
    Call Trace:
     [<ffffffff82344740>] dump_stack+0xac/0xfc
     [<ffffffff82344694>] ? _atomic_dec_and_lock+0xc4/0xc4
     [<ffffffff8242df78>] ubsan_epilogue+0xd/0x8a
     [<ffffffff8242e6b4>] handle_overflow+0x202/0x23d
     [<ffffffff8242e4b2>] ? val_to_string.constprop.6+0x11e/0x11e
     [<ffffffff8236df71>] ? timerqueue_add+0x151/0x410
     [<ffffffff81485c48>] ? hrtimer_start_range_ns+0x3b8/0x1380
     [<ffffffff81795631>] ? memset+0x31/0x40
     [<ffffffff8242e6fd>] __ubsan_handle_add_overflow+0xe/0x10
     [<ffffffff81488ac9>] hrtimer_nanosleep+0x5d9/0x790
     [<ffffffff814884f0>] ? hrtimer_init_sleeper+0x80/0x80
     [<ffffffff813a9ffb>] ? __might_sleep+0x5b/0x260
     [<ffffffff8148be10>] common_nsleep+0x20/0x30
     [<ffffffff814906c7>] SyS_clock_nanosleep+0x197/0x210
     [<ffffffff81490530>] ? SyS_clock_getres+0x150/0x150
     [<ffffffff823c7113>] ? __this_cpu_preempt_check+0x13/0x20
     [<ffffffff8162ef60>] ? __context_tracking_exit.part.3+0x30/0x1b0
     [<ffffffff81490530>] ? SyS_clock_getres+0x150/0x150
     [<ffffffff81007bd3>] do_syscall_64+0x1b3/0x4b0
     [<ffffffff845f85aa>] entry_SYSCALL64_slow_path+0x25/0x25
    ================================================================================

Add a new ktime_add_unsafe() helper which doesn't check for overflow, but
doesn't throw a UBSAN warning when it does overflow either.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

979515c5

time: Avoid undefined behaviour in timespec64_add_safe() · 469e857f

由 Vegard Nossum 提交于 8月 12, 2016

I ran into this:

    ================================================================================
    UBSAN: Undefined behaviour in kernel/time/time.c:783:2
    signed integer overflow:
    5273 + 9223372036854771711 cannot be represented in type 'long int'
    CPU: 0 PID: 17363 Comm: trinity-c0 Not tainted 4.8.0-rc1+ #88
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org
    04/01/2014
     0000000000000000 ffff88011457f8f0 ffffffff82344f50 0000000041b58ab3
     ffffffff84f98080 ffffffff82344ea4 ffff88011457f918 ffff88011457f8c8
     ffff88011457f8e0 7fffffffffffefff ffff88011457f6d8 dffffc0000000000
    Call Trace:
     [<ffffffff82344f50>] dump_stack+0xac/0xfc
     [<ffffffff82344ea4>] ? _atomic_dec_and_lock+0xc4/0xc4
     [<ffffffff8242f4c8>] ubsan_epilogue+0xd/0x8a
     [<ffffffff8242fc04>] handle_overflow+0x202/0x23d
     [<ffffffff8242fa02>] ? val_to_string.constprop.6+0x11e/0x11e
     [<ffffffff823c7837>] ? debug_smp_processor_id+0x17/0x20
     [<ffffffff8131b581>] ? __sigqueue_free.part.13+0x51/0x70
     [<ffffffff8146d4e0>] ? rcu_is_watching+0x110/0x110
     [<ffffffff8242fc4d>] __ubsan_handle_add_overflow+0xe/0x10
     [<ffffffff81476ef8>] timespec64_add_safe+0x298/0x340
     [<ffffffff81476c60>] ? timespec_add_safe+0x330/0x330
     [<ffffffff812f7990>] ? wait_noreap_copyout+0x1d0/0x1d0
     [<ffffffff8184bf18>] poll_select_set_timeout+0xf8/0x170
     [<ffffffff8184be20>] ? poll_schedule_timeout+0x2b0/0x2b0
     [<ffffffff813aa9bb>] ? __might_sleep+0x5b/0x260
     [<ffffffff833c8a87>] __sys_recvmmsg+0x107/0x790
     [<ffffffff833c8980>] ? SyS_recvmsg+0x20/0x20
     [<ffffffff81486378>] ? hrtimer_start_range_ns+0x3b8/0x1380
     [<ffffffff845f8bfb>] ? _raw_spin_unlock_irqrestore+0x3b/0x60
     [<ffffffff8148bcea>] ? do_setitimer+0x39a/0x8e0
     [<ffffffff813aa9bb>] ? __might_sleep+0x5b/0x260
     [<ffffffff833c9110>] ? __sys_recvmmsg+0x790/0x790
     [<ffffffff833c91e9>] SyS_recvmmsg+0xd9/0x160
     [<ffffffff833c9110>] ? __sys_recvmmsg+0x790/0x790
     [<ffffffff823c7853>] ? __this_cpu_preempt_check+0x13/0x20
     [<ffffffff8162f680>] ? __context_tracking_exit.part.3+0x30/0x1b0
     [<ffffffff833c9110>] ? __sys_recvmmsg+0x790/0x790
     [<ffffffff81007bd3>] do_syscall_64+0x1b3/0x4b0
     [<ffffffff845f936a>] entry_SYSCALL64_slow_path+0x25/0x25
    ================================================================================

Line 783 is this:

783         set_normalized_timespec64(&res, lhs.tv_sec + rhs.tv_sec,
784                         lhs.tv_nsec + rhs.tv_nsec);

In other words, since lhs.tv_sec and rhs.tv_sec are both time64_t, this
is a signed addition which will cause undefined behaviour on overflow.

Note that this is not currently a huge concern since the kernel should be
built with -fno-strict-overflow by default, but could be a problem in the
future, a problem with older compilers, or other compilers than gcc.

The easiest way to avoid the overflow is to cast one of the arguments to
unsigned (so the addition will be done using unsigned arithmetic).

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

469e857f

timekeeping: Prints the amounts of time spent during suspend · 0bf43f15

由 Ruchi Kandoi 提交于 8月 11, 2016

In addition to keeping a histogram of suspend times, also
print out the time spent in suspend to dmesg.

This helps to keep track of suspend time while debugging using
kernel logs.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: NRuchi Kandoi <kandoiruchi@google.com>
[jstultz: Tweaked commit message]
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

0bf43f15

clocksource: Defer override invalidation unless clock is unstable · 36374583

由 Kyle Walker 提交于 8月 06, 2016

Clocksources don't get the VALID_FOR_HRES flag until they have been
checked by a watchdog. However, when using an override, the
clocksource_select logic will clear the override value if the
clocksource is not marked VALID_FOR_HRES during that inititial check.
When using the boot arguments clocksource=<foo>, this selection can
run before the watchdog, and can cause the override to be incorrectly
cleared.

To address this condition, the override_name is only invalidated for
unstable clocksources. Otherwise, the override is left intact until after
the watchdog has validated the clocksource as stable/unstable.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NKyle Walker <kwalker@redhat.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

36374583

hrtimer: Spelling fixes · b4d90e9f

由 Pratyush Patel 提交于 6月 23, 2016

Fix a minor spelling error.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: NPratyush Patel <pratyushpatel.1995@gmail.com>
[jstultz: Added commit message]
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

b4d90e9f

24 8月, 2016 2 次提交

timekeeping: Cap array access in timekeeping_debug · a4f8f666

由 John Stultz 提交于 8月 23, 2016

It was reported that hibernation could fail on the 2nd attempt, where the
system hangs at hibernate() -> syscore_resume() -> i8237A_resume() ->
claim_dma_lock(), because the lock has already been taken.

However there is actually no other process would like to grab this lock on
that problematic platform.

Further investigation showed that the problem is triggered by setting
/sys/power/pm_trace to 1 before the 1st hibernation.

Since once pm_trace is enabled, the rtc becomes unmeaningful after suspend,
and meanwhile some BIOSes would like to adjust the 'invalid' RTC (e.g, smaller
than 1970) to the release date of that motherboard during POST stage, thus
after resumed, it may seem that the system had a significant long sleep time
which is a completely meaningless value.

Then in timekeeping_resume -> tk_debug_account_sleep_time, if the bit31 of the
sleep time happened to be set to 1, fls() returns 32 and we add 1 to
sleep_time_bin[32], which causes an out of bounds array access and therefor
memory being overwritten.

As depicted by System.map:
0xffffffff81c9d080 b sleep_time_bin
0xffffffff81c9d100 B dma_spin_lock
the dma_spin_lock.val is set to 1, which caused this problem.

This patch adds a sanity check in tk_debug_account_sleep_time()
to ensure we don't index past the sleep_time_bin array.

[jstultz: Problem diagnosed and original patch by Chen Yu, I've solved the
 issue slightly differently, but borrowed his excelent explanation of the
 issue here.]

Fixes: 5c83545f "power: Add option to log time spent in suspend"
Reported-by: NJanek Kozicki <cosurgi@gmail.com>
Reported-by: NChen Yu <yu.c.chen@intel.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: linux-pm@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xunlei Pang <xpang@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: stable <stable@vger.kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Link: http://lkml.kernel.org/r/1471993702-29148-3-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

a4f8f666

timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING · 27727df2

由 John Stultz 提交于 8月 23, 2016

When I added some extra sanity checking in timekeeping_get_ns() under
CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns()
method was using timekeeping_get_ns().

Thus the locking added to the debug checks broke the NMI-safety of
__ktime_get_fast_ns().

This patch open-codes the timekeeping_get_ns() logic for
__ktime_get_fast_ns(), so can avoid any deadlocks in NMI.

Fixes: 4ca22c26 "timekeeping: Add warnings when overflows or underflows are observed"
Reported-by: NSteven Rostedt <rostedt@goodmis.org>
Reported-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: stable <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1471993702-29148-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

27727df2

09 8月, 2016 1 次提交

timers: Fix get_next_timer_interrupt() computation · 46c8f0b0

由 Chris Metcalf 提交于 8月 08, 2016

The tick_nohz_stop_sched_tick() routine is not properly
canceling the sched timer when nothing is pending, because
get_next_timer_interrupt() is no longer returning KTIME_MAX in
that case.  This causes periodic interrupts when none are needed.

When determining the next interrupt time, we first use
__next_timer_interrupt() to get the first expiring timer in the
timer wheel.  If no timer is found, we return the base clock value
plus NEXT_TIMER_MAX_DELTA to indicate there is no timer in the
timer wheel.

Back in get_next_timer_interrupt(), we set the "expires" value
by converting the timer wheel expiry (in ticks) to a nsec value.
But we don't want to do this if the timer wheel expiry value
indicates no timer; we want to return KTIME_MAX.

Prior to commit 500462a9 ("timers: Switch to a non-cascading
wheel") we checked base->active_timers to see if any timers
were active, and if not, we didn't touch the expiry value and so
properly returned KTIME_MAX.  Now we don't have active_timers.

To fix this, we now just check the timer wheel expiry value to
see if it is "now + NEXT_TIMER_MAX_DELTA", and if it is, we don't
try to compute a new value based on it, but instead simply let the
KTIME_MAX value in expires remain.

Fixes: 500462a9 "timers: Switch to a non-cascading wheel"
Signed-off-by: NChris Metcalf <cmetcalf@mellanox.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/1470688147-22287-1-git-send-email-cmetcalf@mellanox.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

46c8f0b0

19 7月, 2016 2 次提交

tick/nohz: Optimize nohz idle enter · 1f3b0f82

由 Gaurav Jindal 提交于 7月 14, 2016

tick_nohz_start_idle is called before checking whether the idle tick can be
stopped. If the tick cannot be stopped, calling tick_nohz_start_idle() is
pointless and just wasting CPU cycles.

Only invoke tick_nohz_start_idle() when can_stop_idle_tick() returns true. A
short one minute observation of the effect on ARM64 shows a reduction of calls
by 1.5% thus optimizing the idle entry sequence.

[tglx: Massaged changelog ]

Co-developed-by: Sanjeev Yadav<sanjeev.yadav@spreadtrum.com>
Signed-off-by: Gaurav Jindal<gaurav.jindal@spreadtrum.com>
Link: http://lkml.kernel.org/r/20160714120416.GB21099@gaurav.jindal@spreadtrum.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

1f3b0f82

clockevents: Make clockevents_subsys static · 775be506

由 Ben Dooks 提交于 6月 17, 2016

The clockevents_subsys struct is used for sysfs support and
is not declared or used outside the file it is defined in.
Fix the following warning by making it static:

kernel/time/clockevents.c:648:17: warning: symbol 'clockevents_subsys' was not declared. Should it be static?
Signed-off-by: NBen Dooks <ben.dooks@codethink.co.uk>
Cc: linux-kernel@lists.codethink.co.uk
Link: http://lkml.kernel.org/r/1466178974-7105-1-git-send-email-ben.dooks@codethink.co.ukSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

775be506

15 7月, 2016 2 次提交

timers/core: Convert to hotplug state machine · 24f73b99

由 Richard Cochran 提交于 7月 13, 2016

When tearing down, call timers_dead_cpu() before notify_dead().
There is a hidden dependency between:

 - timers
 - block multiqueue
 - rcutree

If timers_dead_cpu() comes later than blk_mq_queue_reinit_notify()
that latter function causes a RCU stall.
Signed-off-by: NRichard Cochran <rcochran@linutronix.de>
Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153337.566790058@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

24f73b99

hrtimer: Convert to hotplug state machine · 27590dc1

由 Thomas Gleixner 提交于 7月 15, 2016

Split out the clockevents callbacks instead of piggybacking them on
hrtimers.

This gets rid of a POST_DEAD user. See commit:

  54e88fad ("sched: Make sure timers have migrated before killing the migration_thread")

We just move the callback state to the proper place in the state machine.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153337.485419196@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

27590dc1

11 7月, 2016 1 次提交

posix_cpu_timer: Exit early when process has been reaped · 2c13ce8f

由 Alexey Dobriyan 提交于 7月 08, 2016

Variable "now" seems to be genuinely used unintialized
if branch

	if (CPUCLOCK_PERTHREAD(timer->it_clock)) {

is not taken and branch

	if (unlikely(sighand == NULL)) {

is taken. In this case the process has been reaped and the timer is marked as
disarmed anyway. So none of the postprocessing of the sample is
required. Return right away.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160707223911.GA26483@p183.telecom.bySigned-off-by: NThomas Gleixner <tglx@linutronix.de>

2c13ce8f

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功