- 14 7月, 2016 3 次提交
-
-
由 Thomas Gleixner 提交于
Get rid of the prio ordering of the separate notifiers and use a proper state callback pair. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: NTejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153335.197083890@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Actually a nice symmetric startup/teardown pair which fits properly into the state machine concept. In the long run we should be able to invoke the startup callback for the boot CPU via the state machine and get rid of the init function which invokes it on the boot CPU. Note: This comes actually before the perf hardware callbacks. In the notifier model the hardware callbacks have a higher priority than the core callback. But that's solely for CPU offline so that hardware migration of events happens before the core is notified about the outgoing CPU. With the symetric state array model we have the following ordering: UP: core -> hardware DOWN: hardware -> core Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: NSebastian Siewior <bigeasy@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153333.587514098@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
We switched the hotplug machinery to smpboot threads. Early registration of hotplug callbacks, i.e. from do_pre_smp_initcalls(), happens before the threads are initialized. Instead of moving the thread init, we simply handle it in the hotplug code itself and invoke the function directly. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153332.896450738@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 7月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
Xiaolong Ye reported lock debug warnings triggered by the following commit: 8de4a0066106 ("perf/x86: Convert the core to the hotplug state machine") The bug is the following: the cpuhp_bp_states[] array is cut short when CONFIG_SMP=n, but the dynamically registered callbacks are stored nevertheless and happily scribble outside of the array bounds... We need to store them in case that the state is unregistered so we can invoke the teardown function. That's independent of CONFIG_SMP. Make sure the array is large enough. Reported-by: Nkernel test robot <xiaolong.ye@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Adam Borowski <kilobyte@angband.pl> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: lkp@01.org Cc: stable@vger.kernel.org Cc: tipbuild@zytor.com Fixes: cff7d378 "cpu/hotplug: Convert to a state machine for the control processor" Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607122144560.4083@nanosSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 07 7月, 2016 14 次提交
-
-
由 Anna-Maria Gleixner 提交于
The existing optimization for same expiry time in mod_timer() checks whether the timer expiry time is the same as the new requested expiry time. In the old timer wheel implementation this does not take the slack batching into account, neither does the new implementation evaluate whether the new expiry time will requeue the timer to the same bucket. To optimize that, we can calculate the resulting bucket and check if the new expiry time is different from the current expiry time. This calculation happens outside the base lock held region. If the resulting bucket is the same we can avoid taking the base lock and requeueing the timer. If the timer needs to be requeued then we have to check under the base lock whether the base time has changed between the lockless calculation and taking the lock. If it has changed we need to recalculate under the lock. This optimization takes effect for timers which are enqueued into the less granular wheel levels (1 and above). With a simple test case the functionality has been verified: Before After Match: 5.5% 86.6% Requeue: 94.5% 13.4% Recalc: <0.01% In the non optimized case the timer is requeued in 94.5% of the cases. With the index optimization in place the requeue rate drops to 13.4%. The case where the lockless index calculation has to be redone is less than 0.01%. With a real world test case (networking) we observed the following changes: Before After Match: 97.8% 99.7% Requeue: 2.2% 0.3% Recalc: <0.001% That means two percent fewer lock/requeue/unlock operations done in one of the hot path use cases of timers. Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094342.778527749@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org> -
由 Anna-Maria Gleixner 提交于
For further optimizations we need to seperate index calculation from queueing. No functional change. Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094342.691159619@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
With the wheel forwading in place and with the HZ=1000 4ms folding we can avoid running the softirq at all. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094342.607650550@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
The wheel clock is stale when a CPU goes into a long idle sleep. This has the side effect that timers which are queued end up in the outer wheel levels. That results in coarser granularity. To solve this, we keep track of the idle state and forward the wheel clock whenever possible. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094342.512039360@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
This was a failed attempt to optimize the timer expiry in idle, which was disabled and never revisited. Remove the cruft. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094342.431073782@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Anna-Maria Gleixner 提交于
After a NOHZ idle sleep the timer wheel must be forwarded to current jiffies. There might be expired timers so the current code loops and checks the expired buckets for timers. This can take quite some time for long NOHZ idle periods. The pending bitmask in the timer base allows us to do a quick search for the next expiring timer and therefore a fast forward of the base time which prevents pointless long lasting loops. For a 3 seconds idle sleep this reduces the catchup time from ~1ms to 5us. Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094342.351296290@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Anna-Maria Gleixner 提交于
Move __run_timers() below __next_timer_interrupt() and next_pending_bucket() in preparation for __run_timers() NOHZ optimization. No functional change. Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094342.271872665@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
We now have implicit batching in the timer wheel. The slack API is no longer used, so remove it. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Andrew F. Davis <afd@ti.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jaehoon Chung <jh80.chung@samsung.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: John Stultz <john.stultz@linaro.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathias Nyman <mathias.nyman@intel.com> Cc: Pali Rohár <pali.rohar@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-usb@vger.kernel.org Cc: netdev@vger.kernel.org Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094342.189813118@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
The current timer wheel has some drawbacks: 1) Cascading: Cascading can be an unbound operation and is completely pointless in most cases because the vast majority of the timer wheel timers are canceled or rearmed before expiration. (They are used as timeout safeguards, not as real timers to measure time.) 2) No fast lookup of the next expiring timer: In NOHZ scenarios the first timer soft interrupt after a long NOHZ period must fast forward the base time to the current value of jiffies. As we have no way to find the next expiring timer fast, the code loops linearly and increments the base time one by one and checks for expired timers in each step. This causes unbound overhead spikes exactly in the moment when we should wake up as fast as possible. After a thorough analysis of real world data gathered on laptops, workstations, webservers and other machines (thanks Chris!) I came to the conclusion that the current 'classic' timer wheel implementation can be modified to address the above issues. The vast majority of timer wheel timers is canceled or rearmed before expiry. Most of them are timeouts for networking and other I/O tasks. The nature of timeouts is to catch the exception from normal operation (TCP ack timed out, disk does not respond, etc.). For these kinds of timeouts the accuracy of the timeout is not really a concern. Timeouts are very often approximate worst-case values and in case the timeout fires, we already waited for a long time and performance is down the drain already. The few timers which actually expire can be split into two categories: 1) Short expiry times which expect halfways accurate expiry 2) Long term expiry times are inaccurate today already due to the batching which is done for NOHZ automatically and also via the set_timer_slack() API. So for long term expiry timers we can avoid the cascading property and just leave them in the less granular outer wheels until expiry or cancelation. Timers which are armed with a timeout larger than the wheel capacity are no longer cascaded. We expire them with the longest possible timeout (6+ days). We have not observed such timeouts in our data collection, but at least we handle them, applying the rule of the least surprise. To avoid extending the wheel levels for HZ=1000 so we can accomodate the longest observed timeouts (5 days in the network conntrack code) we reduce the first level granularity on HZ=1000 to 4ms, which effectively is the same as the HZ=250 behaviour. From our data analysis there is nothing which relies on that 1ms granularity and as a side effect we get better batching and timer locality for the networking code as well. Contrary to the classic wheel the granularity of the next wheel is not the capacity of the first wheel. The granularities of the wheels are in the currently chosen setting 8 times the granularity of the previous wheel. So for HZ=250 we end up with the following granularity levels: Level Offset Granularity Range 0 0 4 ms 0 ms - 252 ms 1 64 32 ms 256 ms - 2044 ms (256ms - ~2s) 2 128 256 ms 2048 ms - 16380 ms (~2s - ~16s) 3 192 2048 ms (~2s) 16384 ms - 131068 ms (~16s - ~2m) 4 256 16384 ms (~16s) 131072 ms - 1048572 ms (~2m - ~17m) 5 320 131072 ms (~2m) 1048576 ms - 8388604 ms (~17m - ~2h) 6 384 1048576 ms (~17m) 8388608 ms - 67108863 ms (~2h - ~18h) 7 448 8388608 ms (~2h) 67108864 ms - 536870911 ms (~18h - ~6d) That's a worst case inaccuracy of 12.5% for the timers which are queued at the beginning of a level. So the new wheel concept addresses the old issues: 1) Cascading is avoided completely 2) By keeping the timers in the bucket until expiry/cancelation we can track the buckets which have timers enqueued in a bucket bitmap and therefore can look up the next expiring timer very fast and O(1). A further benefit of the concept is that the slack calculation which is done on every timer start is no longer necessary because the granularity levels provide natural batching already. Our extensive testing with various loads did not show any performance degradation vs. the current wheel implementation. This patch does not address the 'fast lookup' issue as we wanted to make sure that there is no regression introduced by the wheel redesign. The optimizations are in follow up patches. This patch contains fixes from Anna-Maria Gleixner and Richard Cochran. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094342.108621834@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org> -
由 Thomas Gleixner 提交于
Some of the names in the internal implementation of the timer code are not longer correct and others are simply too long to type. Clean it up before we switch the wheel implementation over to the new scheme. No functional change. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094341.948752516@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
We've converted most timeout related syscalls to hrtimers, but sigtimedwait() did not get this treatment. Convert it so we get a reasonable accuracy and remove the user space exposure to the timer wheel properties. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Cyril Hrubis <chrubis@suse.cz> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094341.787164909@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
We switched all users to initialize the timers as pinned and call mod_timer(). Remove the now unused timer API function. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094341.706205231@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
We want to move the timer migration logic from a 'push' to a 'pull' model. Under the current 'push' model pinned timers are handled via a runtime API variant: mod_timer_pinned(). The 'pull' model requires us to store the pinned attribute of a timer in the timer_list structure itself, as a new TIMER_PINNED bit in timer->flags. This flag must be set at initialization time and the timer APIs recognize the flag. This patch: - Implements the new flag and associated new-style initialization methods - makes mod_timer() recognize new-style pinned timers, - and adds some migration helper facility to allow step by step conversion of old-style to new-style pinned timers. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094341.049338558@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mark Rutland 提交于
The following commit: 66eb579e ("perf: allow for PMU-specific event filtering") added the pmu::filter_match() callback. This was intended to avoid HW constraints on events from resulting in extremely pessimistic scheduling. However, pmu::filter_match() is only called for the leader of each event group. When the leader is a SW event, we do not filter the groups, and may fail at pmu::add() time, and when this happens we'll give up on scheduling any event groups later in the list until they are rotated ahead of the failing group. This can result in extremely sub-optimal event scheduling behaviour, e.g. if running the following on a big.LITTLE platform: $ taskset -c 0 ./perf stat \ -e 'a57{context-switches,armv8_cortex_a57/config=0x11/}' \ -e 'a53{context-switches,armv8_cortex_a53/config=0x11/}' \ ls <not counted> context-switches (0.00%) <not counted> armv8_cortex_a57/config=0x11/ (0.00%) 24 context-switches (37.36%) 57589154 armv8_cortex_a53/config=0x11/ (37.36%) Here the 'a53' event group was always eligible to be scheduled, but the 'a57' group never eligible to be scheduled, as the task was always affine to a Cortex-A53 CPU. The SW (group leader) event in the 'a57' group was eligible, but the HW event failed at pmu::add() time, resulting in ctx_flexible_sched_in giving up on scheduling further groups with HW events. One way of avoiding this is to check pmu::filter_match() on siblings as well as the group leader. If any of these fail their pmu::filter_match() call, we must skip the entire group before attempting to add any events. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Fixes: 66eb579e ("perf: allow for PMU-specific event filtering") Link: http://lkml.kernel.org/r/1465917041-15339-1-git-send-email-mark.rutland@arm.com [ Small readability edits. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 05 7月, 2016 1 次提交
-
-
由 Jisheng Zhang 提交于
This is to avoid the "null" name when we either ~ # cat /sys/devices/system/clockevents/broadcast/current_device (null) or ~ # cat /proc/timer_list ... Tick Device: mode: 1 Broadcast device Clock Event Device: (null) ... Signed-off-by: NJisheng Zhang <jszhang@marvell.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1467709071-3667-1-git-send-email-jszhang@marvell.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 01 7月, 2016 2 次提交
-
-
由 Ingo Molnar 提交于
While reviewing another patch I noticed that kernel/time/tick-sched.c had a charmingly (confusingly, annoyingly) rich set of variants for spelling 'CPU': cpu cpus CPU CPUs per CPU per-CPU per cpu ... sometimes these were mixed even within the same comment block! Compress these variants down to a single consistent set of: CPU CPUs per-CPU Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org> -
由 Wei Jiangang 提交于
Signed-off-by: NWei Jiangang <weijg.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: fenghua.yu@intel.com Link: http://lkml.kernel.org/r/1467175910-2966-2-git-send-email-weijg.fnst@cn.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 29 6月, 2016 3 次提交
-
-
由 Daniel Borkmann 提交于
Commit dead9f29 ("perf: Fix race in BPF program unregister") moved destruction of BPF program from free_event_rcu() callback to __free_event(), which is problematic if used with tail calls: if prog A is attached as trace event directly, but at the same time present in a tail call map used by another trace event program elsewhere, then we need to delay destruction via RCU grace period since it can still be in use by the program doing the tail call (the prog first needs to be dropped from the tail call map, then trace event with prog A attached destroyed, so we get immediate destruction). Fixes: dead9f29 ("perf: Fix race in BPF program unregister") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Cc: Jann Horn <jann@thejh.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Richard Guy Briggs 提交于
The only users of audit_get_tty and audit_put_tty are internal to audit, so move it out of include/linux/audit.h to kernel.h and create a proper function rather than inlining it. This also reduces kABI changes. Suggested-by: NPaul Moore <pmoore@redhat.com> Signed-off-by: NRichard Guy Briggs <rgb@redhat.com> [PM: line wrapped description] Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
由 Richard Guy Briggs 提交于
Move the calculations of values after the allocation in case the allocation fails. This avoids wasting effort in the rare case that it fails, but more importantly saves us extra logic to release the tty ref. Signed-off-by: NRichard Guy Briggs <rgb@redhat.com> Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
- 27 6月, 2016 2 次提交
-
-
由 Peter Zijlstra 提交于
Commit: fde7d22e ("sched/fair: Fix overly small weight for interactive group entities") did something non-obvious but also did it buggy yet latent. The problem was exposed for real by a later commit in the v4.7 merge window: 2159197d ("sched/core: Enable increased load resolution on 64-bit kernels") ... after which tg->load_avg and cfs_rq->load.weight had different units (10 bit fixed point and 20 bit fixed point resp.). Add a comment to explain the use of cfs_rq->load.weight over the 'natural' cfs_rq->avg.load_avg and add scale_load_down() to correct for the difference in unit. Since this is (now, as per a previous commit) the only user of calc_tg_weight(), collapse it. The effects of this bug should be randomly inconsistent SMP-balancing of cgroups workloads. Reported-by: NJirka Hladky <jhladky@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 2159197d ("sched/core: Enable increased load resolution on 64-bit kernels") Fixes: fde7d22e ("sched/fair: Fix overly small weight for interactive group entities") Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Starting with the following commit: fde7d22e ("sched/fair: Fix overly small weight for interactive group entities") calc_tg_weight() doesn't compute the right value as expected by effective_load(). The difference is in the 'correction' term. In order to ensure \Sum rw_j >= rw_i we cannot use tg->load_avg directly, since that might be lagging a correction on the current cfs_rq->avg.load_avg value. Therefore we use tg->load_avg - cfs_rq->tg_load_avg_contrib + cfs_rq->avg.load_avg. Now, per the referenced commit, calc_tg_weight() doesn't use cfs_rq->avg.load_avg, as is later used in @w, but uses cfs_rq->load.weight instead. So stop using calc_tg_weight() and do it explicitly. The effects of this bug are wake_affine() making randomly poor choices in cgroup-intense workloads. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # v4.3+ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: fde7d22e ("sched/fair: Fix overly small weight for interactive group entities") Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 25 6月, 2016 3 次提交
-
-
由 Michael Ellerman 提交于
Commit b235beea ("Clarify naming of thread info/stack allocators") breaks the build on some powerpc configs, where THREAD_SIZE < PAGE_SIZE: kernel/fork.c:235:2: error: implicit declaration of function 'free_thread_stack' kernel/fork.c:355:8: error: assignment from incompatible pointer type stack = alloc_thread_stack_node(tsk, node); ^ Fix it by renaming free_stack() to free_thread_stack(), and updating the return type of alloc_thread_stack_node(). Fixes: b235beea ("Clarify naming of thread info/stack allocators") Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Tetsuo has reported the following potential oom_killer_disable vs. oom_reaper race: (1) freeze_processes() starts freezing user space threads. (2) Somebody (maybe a kenrel thread) calls out_of_memory(). (3) The OOM killer calls mark_oom_victim() on a user space thread P1 which is already in __refrigerator(). (4) oom_killer_disable() sets oom_killer_disabled = true. (5) P1 leaves __refrigerator() and enters do_exit(). (6) The OOM reaper calls exit_oom_victim(P1) before P1 can call exit_oom_victim(P1). (7) oom_killer_disable() returns while P1 not yet finished (8) P1 perform IO/interfere with the freezer. This situation is unfortunate. We cannot move oom_killer_disable after all the freezable kernel threads are frozen because the oom victim might depend on some of those kthreads to make a forward progress to exit so we could deadlock. It is also far from trivial to teach the oom_reaper to not call exit_oom_victim() because then we would lose a guarantee of the OOM killer and oom_killer_disable forward progress because exit_mm->mmput might block and never call exit_oom_victim. It seems the easiest way forward is to workaround this race by calling try_to_freeze_tasks again after oom_killer_disable. This will make sure that all the tasks are frozen or it bails out. Fixes: 449d777d ("mm, oom_reaper: clear TIF_MEMDIE for all tasks queued for oom_reaper") Link: http://lkml.kernel.org/r/1466597634-16199-1-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> -
由 Linus Torvalds 提交于
We've had the thread info allocated together with the thread stack for most architectures for a long time (since the thread_info was split off from the task struct), but that is about to change. But the patches that move the thread info to be off-stack (and a part of the task struct instead) made it clear how confused the allocator and freeing functions are. Because the common case was that we share an allocation with the thread stack and the thread_info, the two pointers were identical. That identity then meant that we would have things like ti = alloc_thread_info_node(tsk, node); ... tsk->stack = ti; which certainly _worked_ (since stack and thread_info have the same value), but is rather confusing: why are we assigning a thread_info to the stack? And if we move the thread_info away, the "confusing" code just gets to be entirely bogus. So remove all this confusion, and make it clear that we are doing the stack allocation by renaming and clarifying the function names to be about the stack. The fact that the thread_info then shares the allocation is an implementation detail, and not really about the allocation itself. This is a pure renaming and type fix: we pass in the same pointer, it's just that we clarify what the pointer means. The ia64 code that actually only has one single allocation (for all of task_struct, thread_info and kernel thread stack) now looks a bit odd, but since "tsk->stack" is actually not even used there, that oddity doesn't matter. It would be a separate thing to clean that up, I intentionally left the ia64 changes as a pure brute-force renaming and type change. Acked-by: NAndy Lutomirski <luto@amacapital.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 6月, 2016 6 次提交
-
-
由 Tejun Heo 提交于
During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is online but not active. A CPU_ONLINE callback may create or bind a kthread so that its cpus_allowed mask only allows the CPU which is being brought online. The kthread may start executing before the CPU is made active and can end up in select_fallback_rq(). In such cases, the expected behavior is selecting the CPU which is coming online; however, because select_fallback_rq() only chooses from active CPUs, it determines that the task doesn't have any viable CPU in its allowed mask and ends up overriding it to cpu_possible_mask. CPU_ONLINE callbacks should be able to put kthreads on the CPU which is coming online. Update select_fallback_rq() so that it follows cpu_online() rather than cpu_active() for kthreads. Reported-by: NGautham R Shenoy <ego@linux.vnet.ibm.com> Tested-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@fb.com Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20160616193504.GB3262@mtj.duckdns.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Konstantin Khlebnikov 提交于
Hierarchy could be already throttled at this point. Throttled next buddy could trigger a NULL pointer dereference in pick_next_task_fair(). Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NBen Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzzSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Konstantin Khlebnikov 提交于
Cgroup created inside throttled group must inherit current throttle_count. Broken throttle_count allows to nominate throttled entries as a next buddy, later this leads to null pointer dereference in pick_next_task_fair(). This patch initialize cfs_rq->throttle_count at first enqueue: laziness allows to skip locking all rq at group creation. Lazy approach also allows to skip full sub-tree scan at throttling hierarchy (not in this patch). Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Link: http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzzSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Paolo Bonzini 提交于
The following scenario is possible: CPU 1 CPU 2 static_key_slow_inc() atomic_inc_not_zero() -> key.enabled == 0, no increment jump_label_lock() atomic_inc_return() -> key.enabled == 1 now static_key_slow_inc() atomic_inc_not_zero() -> key.enabled == 1, inc to 2 return ** static key is wrong! jump_label_update() jump_label_unlock() Testing the static key at the point marked by (**) will follow the wrong path for jumps that have not been patched yet. This can actually happen when creating many KVM virtual machines with userspace LAPIC emulation; just run several copies of the following program: #include <fcntl.h> #include <unistd.h> #include <sys/ioctl.h> #include <linux/kvm.h> int main(void) { for (;;) { int kvmfd = open("/dev/kvm", O_RDONLY); int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0); close(ioctl(vmfd, KVM_CREATE_VCPU, 1)); close(vmfd); close(kvmfd); } return 0; } Every KVM_CREATE_VCPU ioctl will attempt a static_key_slow_inc() call. The static key's purpose is to skip NULL pointer checks and indeed one of the processes eventually dereferences NULL. As explained in the commit that introduced the bug: 706249c2 ("locking/static_keys: Rework update logic") jump_label_update() needs key.enabled to be true. The solution adopted here is to temporarily make key.enabled == -1, and use go down the slow path when key.enabled <= 0. Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # v4.3+ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 706249c2 ("locking/static_keys: Rework update logic") Link: http://lkml.kernel.org/r/1466527937-69798-1-git-send-email-pbonzini@redhat.com [ Small stylistic edits to the changelog and the code. ] Signed-off-by: NIngo Molnar <mingo@kernel.org> -
While testing the deadline scheduler + cgroup setup I hit this warning. [ 132.612935] ------------[ cut here ]------------ [ 132.612951] WARNING: CPU: 5 PID: 0 at kernel/softirq.c:150 __local_bh_enable_ip+0x6b/0x80 [ 132.612952] Modules linked in: (a ton of modules...) [ 132.612981] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc2 #2 [ 132.612981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014 [ 132.612982] 0000000000000086 45c8bb5effdd088b ffff88013fd43da0 ffffffff813d229e [ 132.612984] 0000000000000000 0000000000000000 ffff88013fd43de0 ffffffff810a652b [ 132.612985] 00000096811387b5 0000000000000200 ffff8800bab29d80 ffff880034c54c00 [ 132.612986] Call Trace: [ 132.612987] <IRQ> [<ffffffff813d229e>] dump_stack+0x63/0x85 [ 132.612994] [<ffffffff810a652b>] __warn+0xcb/0xf0 [ 132.612997] [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170 [ 132.612999] [<ffffffff810a665d>] warn_slowpath_null+0x1d/0x20 [ 132.613000] [<ffffffff810aba5b>] __local_bh_enable_ip+0x6b/0x80 [ 132.613008] [<ffffffff817d6c8a>] _raw_write_unlock_bh+0x1a/0x20 [ 132.613010] [<ffffffff817d6c9e>] _raw_spin_unlock_bh+0xe/0x10 [ 132.613015] [<ffffffff811388ac>] put_css_set+0x5c/0x60 [ 132.613016] [<ffffffff8113dc7f>] cgroup_free+0x7f/0xa0 [ 132.613017] [<ffffffff810a3912>] __put_task_struct+0x42/0x140 [ 132.613018] [<ffffffff810e776a>] dl_task_timer+0xca/0x250 [ 132.613027] [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170 [ 132.613030] [<ffffffff8111371e>] __hrtimer_run_queues+0xee/0x270 [ 132.613031] [<ffffffff81113ec8>] hrtimer_interrupt+0xa8/0x190 [ 132.613034] [<ffffffff81051a58>] local_apic_timer_interrupt+0x38/0x60 [ 132.613035] [<ffffffff817d9b0d>] smp_apic_timer_interrupt+0x3d/0x50 [ 132.613037] [<ffffffff817d7c5c>] apic_timer_interrupt+0x8c/0xa0 [ 132.613038] <EOI> [<ffffffff81063466>] ? native_safe_halt+0x6/0x10 [ 132.613043] [<ffffffff81037a4e>] default_idle+0x1e/0xd0 [ 132.613044] [<ffffffff810381cf>] arch_cpu_idle+0xf/0x20 [ 132.613046] [<ffffffff810e8fda>] default_idle_call+0x2a/0x40 [ 132.613047] [<ffffffff810e92d7>] cpu_startup_entry+0x2e7/0x340 [ 132.613048] [<ffffffff81050235>] start_secondary+0x155/0x190 [ 132.613049] ---[ end trace f91934d162ce9977 ]--- The warn is the spin_(lock|unlock)_bh(&css_set_lock) in the interrupt context. Converting the spin_lock_bh to spin_lock_irq(save) to avoid this problem - and other problems of sharing a spinlock with an interrupt. Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: cgroups@vger.kernel.org Cc: stable@vger.kernel.org # 4.5+ Cc: linux-kernel@vger.kernel.org Reviewed-by: NRik van Riel <riel@redhat.com> Reviewed-by: N"Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Signed-off-by: NDaniel Bristot de Oliveira <bristot@redhat.com> Acked-by: NZefan Li <lizefan@huawei.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Linus Torvalds 提交于
None of the code actually wants a thread_info, it all wants a task_struct, and it's just converting back and forth between the two ("ti->task" to get the task_struct from the thread_info, and "task_thread_info(task)" to go the other way). No semantic change. Acked-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 6月, 2016 5 次提交
-
-
由 Arnd Bergmann 提交于
The tstats_show() function prints a ktime_t variable by converting it to struct timespec first. The algorithm is ok, but we want to stop using timespec in general because of the 32-bit time_t overflow problem. This changes the code to use struct timespec64, without any functional change. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Arnd Bergmann 提交于
udelay_test_single() uses ktime_get_ts() to get two timespec values and calculate the difference between them, while udelay_test_show() uses the same to printk() the current monotonic time. Both of these are y2038 safe on all machines, but we want to get rid of struct timespec anyway, so this converts the code to use ktime_get_ns() and ktime_get_ts64() respectively. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Deepa Dinamani 提交于
time_to_tm() takes time_t as an argument. time_t is not y2038 safe. Add time64_to_tm() that takes time64_t as an argument which is y2038 safe. The plan is to eventually replace all calls to time_to_tm() by time64_to_tm(). Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Pratyush Patel 提交于
Updated struct alarm and struct alarm_timer descriptions. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: NPratyush Patel <pratyushpatel.1995@gmail.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Graziadei 提交于
The user notices the problem in a raw and real time drift, calling clock_gettime with CLOCK_REALTIME / CLOCK_MONOTONIC_RAW on a system with no ntp correction taking place (no ntpd or ptp stuff running). The problem is, that old_vsyscall_fixup adds an extra 1ns even though xtime_nsec is already held in full nsecs and the remainder in this case is 0. Do the rounding up buisness only if needed. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: NThomas Graziadei <thomas.graziadei@omicronenergy.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-