1. 14 7月, 2016 3 次提交
  2. 13 7月, 2016 1 次提交
    • T
      cpu/hotplug: Keep enough storage space if SMP=n to avoid array out of bounds scribble · a7c73414
      Thomas Gleixner 提交于
      Xiaolong Ye reported lock debug warnings triggered by the following commit:
      
        8de4a0066106 ("perf/x86: Convert the core to the hotplug state machine")
      
      The bug is the following: the cpuhp_bp_states[] array is cut short when
      CONFIG_SMP=n, but the dynamically registered callbacks are stored nevertheless
      and happily scribble outside of the array bounds...
      
      We need to store them in case that the state is unregistered so we can invoke
      the teardown function. That's independent of CONFIG_SMP. Make sure the array
      is large enough.
      Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Adam Borowski <kilobyte@angband.pl>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: lkp@01.org
      Cc: stable@vger.kernel.org
      Cc: tipbuild@zytor.com
      Fixes: cff7d378 "cpu/hotplug: Convert to a state machine for the control processor"
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607122144560.4083@nanosSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a7c73414
  3. 07 7月, 2016 14 次提交
    • A
      timers: Implement optimization for same expiry time in mod_timer() · f00c0afd
      Anna-Maria Gleixner 提交于
      The existing optimization for same expiry time in mod_timer() checks whether
      the timer expiry time is the same as the new requested expiry time. In the old
      timer wheel implementation this does not take the slack batching into account,
      neither does the new implementation evaluate whether the new expiry time will
      requeue the timer to the same bucket.
      
      To optimize that, we can calculate the resulting bucket and check if the new
      expiry time is different from the current expiry time. This calculation
      happens outside the base lock held region. If the resulting bucket is the same
      we can avoid taking the base lock and requeueing the timer.
      
      If the timer needs to be requeued then we have to check under the base lock
      whether the base time has changed between the lockless calculation and taking
      the lock. If it has changed we need to recalculate under the lock.
      
      This optimization takes effect for timers which are enqueued into the less
      granular wheel levels (1 and above). With a simple test case the functionality
      has been verified:
      
                  Before        After
       Match:       5.5%        86.6%
       Requeue:    94.5%        13.4%
       Recalc:                  <0.01%
      
      In the non optimized case the timer is requeued in 94.5% of the cases. With
      the index optimization in place the requeue rate drops to 13.4%. The case
      where the lockless index calculation has to be redone is less than 0.01%.
      
      With a real world test case (networking) we observed the following changes:
      
                  Before        After
       Match:      97.8%        99.7%
       Requeue:     2.2%         0.3%
       Recalc:                  <0.001%
      
      That means two percent fewer lock/requeue/unlock operations done in one of
      the hot path use cases of timers.
      Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094342.778527749@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f00c0afd
    • A
      timers: Split out index calculation · ffdf0477
      Anna-Maria Gleixner 提交于
      For further optimizations we need to seperate index calculation
      from queueing. No functional change.
      Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094342.691159619@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ffdf0477
    • T
      timers: Only wake softirq if necessary · 4e85876a
      Thomas Gleixner 提交于
      With the wheel forwading in place and with the HZ=1000 4ms folding we can
      avoid running the softirq at all.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094342.607650550@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4e85876a
    • T
      timers: Forward the wheel clock whenever possible · a683f390
      Thomas Gleixner 提交于
      The wheel clock is stale when a CPU goes into a long idle sleep. This has the
      side effect that timers which are queued end up in the outer wheel levels.
      That results in coarser granularity.
      
      To solve this, we keep track of the idle state and forward the wheel clock
      whenever possible.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094342.512039360@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a683f390
    • T
      timers/nohz: Remove pointless tick_nohz_kick_tick() function · ff006732
      Thomas Gleixner 提交于
      This was a failed attempt to optimize the timer expiry in idle, which was
      disabled and never revisited. Remove the cruft.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094342.431073782@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ff006732
    • A
      timers: Optimize collect_expired_timers() for NOHZ · 23696838
      Anna-Maria Gleixner 提交于
      After a NOHZ idle sleep the timer wheel must be forwarded to current jiffies.
      There might be expired timers so the current code loops and checks the expired
      buckets for timers. This can take quite some time for long NOHZ idle periods.
      
      The pending bitmask in the timer base allows us to do a quick search for the
      next expiring timer and therefore a fast forward of the base time which
      prevents pointless long lasting loops.
      
      For a 3 seconds idle sleep this reduces the catchup time from ~1ms to 5us.
      Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094342.351296290@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      23696838
    • A
      timers: Move __run_timers() function · 73420fea
      Anna-Maria Gleixner 提交于
      Move __run_timers() below __next_timer_interrupt() and next_pending_bucket()
      in preparation for __run_timers() NOHZ optimization.
      
      No functional change.
      Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094342.271872665@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      73420fea
    • T
      timers: Remove set_timer_slack() leftovers · 53bf837b
      Thomas Gleixner 提交于
      We now have implicit batching in the timer wheel. The slack API is no longer
      used, so remove it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Andrew F. Davis <afd@ti.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jaehoon Chung <jh80.chung@samsung.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathias Nyman <mathias.nyman@intel.com>
      Cc: Pali Rohár <pali.rohar@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sebastian Reichel <sre@kernel.org>
      Cc: Ulf Hansson <ulf.hansson@linaro.org>
      Cc: linux-block@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mmc@vger.kernel.org
      Cc: linux-pm@vger.kernel.org
      Cc: linux-usb@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094342.189813118@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      53bf837b
    • T
      timers: Switch to a non-cascading wheel · 500462a9
      Thomas Gleixner 提交于
      The current timer wheel has some drawbacks:
      
      1) Cascading:
      
         Cascading can be an unbound operation and is completely pointless in most
         cases because the vast majority of the timer wheel timers are canceled or
         rearmed before expiration. (They are used as timeout safeguards, not as
         real timers to measure time.)
      
      2) No fast lookup of the next expiring timer:
      
         In NOHZ scenarios the first timer soft interrupt after a long NOHZ period
         must fast forward the base time to the current value of jiffies. As we
         have no way to find the next expiring timer fast, the code loops linearly
         and increments the base time one by one and checks for expired timers
         in each step. This causes unbound overhead spikes exactly in the moment
         when we should wake up as fast as possible.
      
      After a thorough analysis of real world data gathered on laptops,
      workstations, webservers and other machines (thanks Chris!) I came to the
      conclusion that the current 'classic' timer wheel implementation can be
      modified to address the above issues.
      
      The vast majority of timer wheel timers is canceled or rearmed before
      expiry. Most of them are timeouts for networking and other I/O tasks. The
      nature of timeouts is to catch the exception from normal operation (TCP ack
      timed out, disk does not respond, etc.). For these kinds of timeouts the
      accuracy of the timeout is not really a concern. Timeouts are very often
      approximate worst-case values and in case the timeout fires, we already
      waited for a long time and performance is down the drain already.
      
      The few timers which actually expire can be split into two categories:
      
       1) Short expiry times which expect halfways accurate expiry
      
       2) Long term expiry times are inaccurate today already due to the
          batching which is done for NOHZ automatically and also via the
          set_timer_slack() API.
      
      So for long term expiry timers we can avoid the cascading property and just
      leave them in the less granular outer wheels until expiry or
      cancelation. Timers which are armed with a timeout larger than the wheel
      capacity are no longer cascaded. We expire them with the longest possible
      timeout (6+ days). We have not observed such timeouts in our data collection,
      but at least we handle them, applying the rule of the least surprise.
      
      To avoid extending the wheel levels for HZ=1000 so we can accomodate the
      longest observed timeouts (5 days in the network conntrack code) we reduce the
      first level granularity on HZ=1000 to 4ms, which effectively is the same as
      the HZ=250 behaviour. From our data analysis there is nothing which relies on
      that 1ms granularity and as a side effect we get better batching and timer
      locality for the networking code as well.
      
      Contrary to the classic wheel the granularity of the next wheel is not the
      capacity of the first wheel. The granularities of the wheels are in the
      currently chosen setting 8 times the granularity of the previous wheel.
      
      So for HZ=250 we end up with the following granularity levels:
      
       Level Offset   Granularity                  Range
           0      0          4 ms                 0 ms -        252 ms
           1     64         32 ms               256 ms -       2044 ms (256ms - ~2s)
           2    128        256 ms              2048 ms -      16380 ms (~2s   - ~16s)
           3    192       2048 ms (~2s)       16384 ms -     131068 ms (~16s  - ~2m)
           4    256      16384 ms (~16s)     131072 ms -    1048572 ms (~2m   - ~17m)
           5    320     131072 ms (~2m)     1048576 ms -    8388604 ms (~17m  - ~2h)
           6    384    1048576 ms (~17m)    8388608 ms -   67108863 ms (~2h   - ~18h)
           7    448    8388608 ms (~2h)    67108864 ms -  536870911 ms (~18h  - ~6d)
      
      That's a worst case inaccuracy of 12.5% for the timers which are queued at the
      beginning of a level.
      
      So the new wheel concept addresses the old issues:
      
      1) Cascading is avoided completely
      
      2) By keeping the timers in the bucket until expiry/cancelation we can track
         the buckets which have timers enqueued in a bucket bitmap and therefore can
         look up the next expiring timer very fast and O(1).
      
      A further benefit of the concept is that the slack calculation which is done
      on every timer start is no longer necessary because the granularity levels
      provide natural batching already.
      
      Our extensive testing with various loads did not show any performance
      degradation vs. the current wheel implementation.
      
      This patch does not address the 'fast lookup' issue as we wanted to make sure
      that there is no regression introduced by the wheel redesign. The
      optimizations are in follow up patches.
      
      This patch contains fixes from Anna-Maria Gleixner and Richard Cochran.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094342.108621834@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      500462a9
    • T
      timers: Give a few structs and members proper names · 494af3ed
      Thomas Gleixner 提交于
      Some of the names in the internal implementation of the timer code
      are not longer correct and others are simply too long to type.
      
      Clean it up before we switch the wheel implementation over to
      the new scheme.
      
      No functional change.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.948752516@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      494af3ed
    • T
      signals: Use hrtimer for sigtimedwait() · 2b1ecc3d
      Thomas Gleixner 提交于
      We've converted most timeout related syscalls to hrtimers, but
      sigtimedwait() did not get this treatment.
      
      Convert it so we get a reasonable accuracy and remove the
      user space exposure to the timer wheel properties.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Cyril Hrubis <chrubis@suse.cz>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.787164909@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2b1ecc3d
    • T
      timers: Remove the deprecated mod_timer_pinned() API · 177ec0a0
      Thomas Gleixner 提交于
      We switched all users to initialize the timers as pinned and call
      mod_timer(). Remove the now unused timer API function.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.706205231@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      177ec0a0
    • T
      timers: Make 'pinned' a timer property · e675447b
      Thomas Gleixner 提交于
      We want to move the timer migration logic from a 'push' to a 'pull' model.
      
      Under the current 'push' model pinned timers are handled via
      a runtime API variant: mod_timer_pinned().
      
      The 'pull' model requires us to store the pinned attribute of a timer
      in the timer_list structure itself, as a new TIMER_PINNED bit in
      timer->flags.
      
      This flag must be set at initialization time and the timer APIs
      recognize the flag.
      
      This patch:
      
       - Implements the new flag and associated new-style initialization
         methods
      
       - makes mod_timer() recognize new-style pinned timers,
      
       - and adds some migration helper facility to allow
         step by step conversion of old-style to new-style
         pinned timers.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094341.049338558@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e675447b
    • M
      perf/core: Fix pmu::filter_match for SW-led groups · 2c81a647
      Mark Rutland 提交于
      The following commit:
      
        66eb579e ("perf: allow for PMU-specific event filtering")
      
      added the pmu::filter_match() callback. This was intended to
      avoid HW constraints on events from resulting in extremely
      pessimistic scheduling.
      
      However, pmu::filter_match() is only called for the leader of each event
      group. When the leader is a SW event, we do not filter the groups, and
      may fail at pmu::add() time, and when this happens we'll give up on
      scheduling any event groups later in the list until they are rotated
      ahead of the failing group.
      
      This can result in extremely sub-optimal event scheduling behaviour,
      e.g. if running the following on a big.LITTLE platform:
      
      $ taskset -c 0 ./perf stat \
       -e 'a57{context-switches,armv8_cortex_a57/config=0x11/}' \
       -e 'a53{context-switches,armv8_cortex_a53/config=0x11/}' \
       ls
      
           <not counted>      context-switches                                              (0.00%)
           <not counted>      armv8_cortex_a57/config=0x11/                                 (0.00%)
                      24      context-switches                                              (37.36%)
                57589154      armv8_cortex_a53/config=0x11/                                 (37.36%)
      
      Here the 'a53' event group was always eligible to be scheduled, but
      the 'a57' group never eligible to be scheduled, as the task was always
      affine to a Cortex-A53 CPU. The SW (group leader) event in the 'a57'
      group was eligible, but the HW event failed at pmu::add() time,
      resulting in ctx_flexible_sched_in giving up on scheduling further
      groups with HW events.
      
      One way of avoiding this is to check pmu::filter_match() on siblings
      as well as the group leader. If any of these fail their
      pmu::filter_match() call, we must skip the entire group before
      attempting to add any events.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Fixes: 66eb579e ("perf: allow for PMU-specific event filtering")
      Link: http://lkml.kernel.org/r/1465917041-15339-1-git-send-email-mark.rutland@arm.com
      [ Small readability edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2c81a647
  4. 05 7月, 2016 1 次提交
  5. 01 7月, 2016 2 次提交
  6. 29 6月, 2016 3 次提交
  7. 27 6月, 2016 2 次提交
    • P
      sched/fair: Fix calc_cfs_shares() fixed point arithmetics width confusion · ea1dc6fc
      Peter Zijlstra 提交于
      Commit:
      
        fde7d22e ("sched/fair: Fix overly small weight for interactive group entities")
      
      did something non-obvious but also did it buggy yet latent.
      
      The problem was exposed for real by a later commit in the v4.7 merge window:
      
        2159197d ("sched/core: Enable increased load resolution on 64-bit kernels")
      
      ... after which tg->load_avg and cfs_rq->load.weight had different
      units (10 bit fixed point and 20 bit fixed point resp.).
      
      Add a comment to explain the use of cfs_rq->load.weight over the
      'natural' cfs_rq->avg.load_avg and add scale_load_down() to correct
      for the difference in unit.
      
      Since this is (now, as per a previous commit) the only user of
      calc_tg_weight(), collapse it.
      
      The effects of this bug should be randomly inconsistent SMP-balancing
      of cgroups workloads.
      Reported-by: NJirka Hladky <jhladky@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 2159197d ("sched/core: Enable increased load resolution on 64-bit kernels")
      Fixes: fde7d22e ("sched/fair: Fix overly small weight for interactive group entities")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ea1dc6fc
    • P
      sched/fair: Fix effective_load() to consistently use smoothed load · 7dd49125
      Peter Zijlstra 提交于
      Starting with the following commit:
      
        fde7d22e ("sched/fair: Fix overly small weight for interactive group entities")
      
      calc_tg_weight() doesn't compute the right value as expected by effective_load().
      
      The difference is in the 'correction' term. In order to ensure \Sum
      rw_j >= rw_i we cannot use tg->load_avg directly, since that might be
      lagging a correction on the current cfs_rq->avg.load_avg value.
      Therefore we use tg->load_avg - cfs_rq->tg_load_avg_contrib +
      cfs_rq->avg.load_avg.
      
      Now, per the referenced commit, calc_tg_weight() doesn't use
      cfs_rq->avg.load_avg, as is later used in @w, but uses
      cfs_rq->load.weight instead.
      
      So stop using calc_tg_weight() and do it explicitly.
      
      The effects of this bug are wake_affine() making randomly
      poor choices in cgroup-intense workloads.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # v4.3+
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: fde7d22e ("sched/fair: Fix overly small weight for interactive group entities")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7dd49125
  8. 25 6月, 2016 3 次提交
    • M
      Fix build break in fork.c when THREAD_SIZE < PAGE_SIZE · 9521d399
      Michael Ellerman 提交于
      Commit b235beea ("Clarify naming of thread info/stack allocators")
      breaks the build on some powerpc configs, where THREAD_SIZE < PAGE_SIZE:
      
        kernel/fork.c:235:2: error: implicit declaration of function 'free_thread_stack'
        kernel/fork.c:355:8: error: assignment from incompatible pointer type
          stack = alloc_thread_stack_node(tsk, node);
          ^
      
      Fix it by renaming free_stack() to free_thread_stack(), and updating the
      return type of alloc_thread_stack_node().
      
      Fixes: b235beea ("Clarify naming of thread info/stack allocators")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9521d399
    • M
      oom, suspend: fix oom_reaper vs. oom_killer_disable race · 74070542
      Michal Hocko 提交于
      Tetsuo has reported the following potential oom_killer_disable vs.
      oom_reaper race:
      
       (1) freeze_processes() starts freezing user space threads.
       (2) Somebody (maybe a kenrel thread) calls out_of_memory().
       (3) The OOM killer calls mark_oom_victim() on a user space thread
           P1 which is already in __refrigerator().
       (4) oom_killer_disable() sets oom_killer_disabled = true.
       (5) P1 leaves __refrigerator() and enters do_exit().
       (6) The OOM reaper calls exit_oom_victim(P1) before P1 can call
           exit_oom_victim(P1).
       (7) oom_killer_disable() returns while P1 not yet finished
       (8) P1 perform IO/interfere with the freezer.
      
      This situation is unfortunate.  We cannot move oom_killer_disable after
      all the freezable kernel threads are frozen because the oom victim might
      depend on some of those kthreads to make a forward progress to exit so
      we could deadlock.  It is also far from trivial to teach the oom_reaper
      to not call exit_oom_victim() because then we would lose a guarantee of
      the OOM killer and oom_killer_disable forward progress because
      exit_mm->mmput might block and never call exit_oom_victim.
      
      It seems the easiest way forward is to workaround this race by calling
      try_to_freeze_tasks again after oom_killer_disable.  This will make sure
      that all the tasks are frozen or it bails out.
      
      Fixes: 449d777d ("mm, oom_reaper: clear TIF_MEMDIE for all tasks queued for oom_reaper")
      Link: http://lkml.kernel.org/r/1466597634-16199-1-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      74070542
    • L
      Clarify naming of thread info/stack allocators · b235beea
      Linus Torvalds 提交于
      We've had the thread info allocated together with the thread stack for
      most architectures for a long time (since the thread_info was split off
      from the task struct), but that is about to change.
      
      But the patches that move the thread info to be off-stack (and a part of
      the task struct instead) made it clear how confused the allocator and
      freeing functions are.
      
      Because the common case was that we share an allocation with the thread
      stack and the thread_info, the two pointers were identical.  That
      identity then meant that we would have things like
      
      	ti = alloc_thread_info_node(tsk, node);
      	...
      	tsk->stack = ti;
      
      which certainly _worked_ (since stack and thread_info have the same
      value), but is rather confusing: why are we assigning a thread_info to
      the stack? And if we move the thread_info away, the "confusing" code
      just gets to be entirely bogus.
      
      So remove all this confusion, and make it clear that we are doing the
      stack allocation by renaming and clarifying the function names to be
      about the stack.  The fact that the thread_info then shares the
      allocation is an implementation detail, and not really about the
      allocation itself.
      
      This is a pure renaming and type fix: we pass in the same pointer, it's
      just that we clarify what the pointer means.
      
      The ia64 code that actually only has one single allocation (for all of
      task_struct, thread_info and kernel thread stack) now looks a bit odd,
      but since "tsk->stack" is actually not even used there, that oddity
      doesn't matter.  It would be a separate thing to clean that up, I
      intentionally left the ia64 changes as a pure brute-force renaming and
      type change.
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b235beea
  9. 24 6月, 2016 6 次提交
    • T
      sched/core: Allow kthreads to fall back to online && !active cpus · feb245e3
      Tejun Heo 提交于
      During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is
      online but not active.  A CPU_ONLINE callback may create or bind a
      kthread so that its cpus_allowed mask only allows the CPU which is
      being brought online.  The kthread may start executing before the CPU
      is made active and can end up in select_fallback_rq().
      
      In such cases, the expected behavior is selecting the CPU which is
      coming online; however, because select_fallback_rq() only chooses from
      active CPUs, it determines that the task doesn't have any viable CPU
      in its allowed mask and ends up overriding it to cpu_possible_mask.
      
      CPU_ONLINE callbacks should be able to put kthreads on the CPU which
      is coming online.  Update select_fallback_rq() so that it follows
      cpu_online() rather than cpu_active() for kthreads.
      Reported-by: NGautham R Shenoy <ego@linux.vnet.ibm.com>
      Tested-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-team@fb.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20160616193504.GB3262@mtj.duckdns.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      feb245e3
    • K
      sched/fair: Do not announce throttled next buddy in dequeue_task_fair() · 754bd598
      Konstantin Khlebnikov 提交于
      Hierarchy could be already throttled at this point. Throttled next
      buddy could trigger a NULL pointer dereference in pick_next_task_fair().
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NBen Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzzSigned-off-by: NIngo Molnar <mingo@kernel.org>
      754bd598
    • K
      sched/fair: Initialize throttle_count for new task-groups lazily · 094f4691
      Konstantin Khlebnikov 提交于
      Cgroup created inside throttled group must inherit current throttle_count.
      Broken throttle_count allows to nominate throttled entries as a next buddy,
      later this leads to null pointer dereference in pick_next_task_fair().
      
      This patch initialize cfs_rq->throttle_count at first enqueue: laziness
      allows to skip locking all rq at group creation. Lazy approach also allows
      to skip full sub-tree scan at throttling hierarchy (not in this patch).
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bsegall@google.com
      Link: http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzzSigned-off-by: NIngo Molnar <mingo@kernel.org>
      094f4691
    • P
      locking/static_key: Fix concurrent static_key_slow_inc() · 4c5ea0a9
      Paolo Bonzini 提交于
      The following scenario is possible:
      
          CPU 1                                   CPU 2
          static_key_slow_inc()
           atomic_inc_not_zero()
            -> key.enabled == 0, no increment
           jump_label_lock()
           atomic_inc_return()
            -> key.enabled == 1 now
                                                  static_key_slow_inc()
                                                   atomic_inc_not_zero()
                                                    -> key.enabled == 1, inc to 2
                                                   return
                                                  ** static key is wrong!
           jump_label_update()
           jump_label_unlock()
      
      Testing the static key at the point marked by (**) will follow the
      wrong path for jumps that have not been patched yet.  This can
      actually happen when creating many KVM virtual machines with userspace
      LAPIC emulation; just run several copies of the following program:
      
          #include <fcntl.h>
          #include <unistd.h>
          #include <sys/ioctl.h>
          #include <linux/kvm.h>
      
          int main(void)
          {
              for (;;) {
                  int kvmfd = open("/dev/kvm", O_RDONLY);
                  int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
                  close(ioctl(vmfd, KVM_CREATE_VCPU, 1));
                  close(vmfd);
                  close(kvmfd);
              }
              return 0;
          }
      
      Every KVM_CREATE_VCPU ioctl will attempt a static_key_slow_inc() call.
      The static key's purpose is to skip NULL pointer checks and indeed one
      of the processes eventually dereferences NULL.
      
      As explained in the commit that introduced the bug:
      
        706249c2 ("locking/static_keys: Rework update logic")
      
      jump_label_update() needs key.enabled to be true.  The solution adopted
      here is to temporarily make key.enabled == -1, and use go down the
      slow path when key.enabled <= 0.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # v4.3+
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 706249c2 ("locking/static_keys: Rework update logic")
      Link: http://lkml.kernel.org/r/1466527937-69798-1-git-send-email-pbonzini@redhat.com
      [ Small stylistic edits to the changelog and the code. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4c5ea0a9
    • D
      cgroup: Disable IRQs while holding css_set_lock · 82d6489d
      Daniel Bristot de Oliveira 提交于
      While testing the deadline scheduler + cgroup setup I hit this
      warning.
      
      [  132.612935] ------------[ cut here ]------------
      [  132.612951] WARNING: CPU: 5 PID: 0 at kernel/softirq.c:150 __local_bh_enable_ip+0x6b/0x80
      [  132.612952] Modules linked in: (a ton of modules...)
      [  132.612981] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc2 #2
      [  132.612981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
      [  132.612982]  0000000000000086 45c8bb5effdd088b ffff88013fd43da0 ffffffff813d229e
      [  132.612984]  0000000000000000 0000000000000000 ffff88013fd43de0 ffffffff810a652b
      [  132.612985]  00000096811387b5 0000000000000200 ffff8800bab29d80 ffff880034c54c00
      [  132.612986] Call Trace:
      [  132.612987]  <IRQ>  [<ffffffff813d229e>] dump_stack+0x63/0x85
      [  132.612994]  [<ffffffff810a652b>] __warn+0xcb/0xf0
      [  132.612997]  [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170
      [  132.612999]  [<ffffffff810a665d>] warn_slowpath_null+0x1d/0x20
      [  132.613000]  [<ffffffff810aba5b>] __local_bh_enable_ip+0x6b/0x80
      [  132.613008]  [<ffffffff817d6c8a>] _raw_write_unlock_bh+0x1a/0x20
      [  132.613010]  [<ffffffff817d6c9e>] _raw_spin_unlock_bh+0xe/0x10
      [  132.613015]  [<ffffffff811388ac>] put_css_set+0x5c/0x60
      [  132.613016]  [<ffffffff8113dc7f>] cgroup_free+0x7f/0xa0
      [  132.613017]  [<ffffffff810a3912>] __put_task_struct+0x42/0x140
      [  132.613018]  [<ffffffff810e776a>] dl_task_timer+0xca/0x250
      [  132.613027]  [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170
      [  132.613030]  [<ffffffff8111371e>] __hrtimer_run_queues+0xee/0x270
      [  132.613031]  [<ffffffff81113ec8>] hrtimer_interrupt+0xa8/0x190
      [  132.613034]  [<ffffffff81051a58>] local_apic_timer_interrupt+0x38/0x60
      [  132.613035]  [<ffffffff817d9b0d>] smp_apic_timer_interrupt+0x3d/0x50
      [  132.613037]  [<ffffffff817d7c5c>] apic_timer_interrupt+0x8c/0xa0
      [  132.613038]  <EOI>  [<ffffffff81063466>] ? native_safe_halt+0x6/0x10
      [  132.613043]  [<ffffffff81037a4e>] default_idle+0x1e/0xd0
      [  132.613044]  [<ffffffff810381cf>] arch_cpu_idle+0xf/0x20
      [  132.613046]  [<ffffffff810e8fda>] default_idle_call+0x2a/0x40
      [  132.613047]  [<ffffffff810e92d7>] cpu_startup_entry+0x2e7/0x340
      [  132.613048]  [<ffffffff81050235>] start_secondary+0x155/0x190
      [  132.613049] ---[ end trace f91934d162ce9977 ]---
      
      The warn is the spin_(lock|unlock)_bh(&css_set_lock) in the interrupt
      context. Converting the spin_lock_bh to spin_lock_irq(save) to avoid
      this problem - and other problems of sharing a spinlock with an
      interrupt.
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: cgroups@vger.kernel.org
      Cc: stable@vger.kernel.org # 4.5+
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: N"Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
      Signed-off-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
      Acked-by: NZefan Li <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      82d6489d
    • L
      locking: avoid passing around 'thread_info' in mutex debugging code · 6720a305
      Linus Torvalds 提交于
      None of the code actually wants a thread_info, it all wants a
      task_struct, and it's just converting back and forth between the two
      ("ti->task" to get the task_struct from the thread_info, and
      "task_thread_info(task)" to go the other way).
      
      No semantic change.
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6720a305
  10. 21 6月, 2016 5 次提交
    • A
      timer: Avoid using timespec · 7c71feb0
      Arnd Bergmann 提交于
      The tstats_show() function prints a ktime_t variable by converting
      it to struct timespec first. The algorithm is ok, but we want to
      stop using timespec in general because of the 32-bit time_t
      overflow problem.
      
      This changes the code to use struct timespec64, without any
      functional change.
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      7c71feb0
    • A
      time: Avoid timespec in udelay_test · 4a19bd3d
      Arnd Bergmann 提交于
      udelay_test_single() uses ktime_get_ts() to get two timespec values
      and calculate the difference between them, while udelay_test_show()
      uses the same to printk() the current monotonic time.
      
      Both of these are y2038 safe on all machines, but we want to
      get rid of struct timespec anyway, so this converts the code to
      use ktime_get_ns() and ktime_get_ts64() respectively.
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      4a19bd3d
    • D
      time: Add time64_to_tm() · e6c2682a
      Deepa Dinamani 提交于
      time_to_tm() takes time_t as an argument.
      time_t is not y2038 safe.
      Add time64_to_tm() that takes time64_t as an argument
      which is y2038 safe.
      The plan is to eventually replace all calls to time_to_tm()
      by time64_to_tm().
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      e6c2682a
    • P
      alarmtimer: Fix comments describing structure fields · af4afb40
      Pratyush Patel 提交于
      Updated struct alarm and struct alarm_timer descriptions.
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NPratyush Patel <pratyushpatel.1995@gmail.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      af4afb40
    • T
      timekeeping: Fix 1ns/tick drift with GENERIC_TIME_VSYSCALL_OLD · 0209b937
      Thomas Graziadei 提交于
      The user notices the problem in a raw and real time drift, calling
      clock_gettime with CLOCK_REALTIME / CLOCK_MONOTONIC_RAW on a system
      with no ntp correction taking place (no ntpd or ptp stuff running).
      
      The problem is, that old_vsyscall_fixup adds an extra 1ns even though
      xtime_nsec is already held in full nsecs and the remainder in this
      case is 0. Do the rounding up buisness only if needed.
      
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NThomas Graziadei <thomas.graziadei@omicronenergy.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      0209b937