1. 09 12月, 2016 1 次提交
    • T
      timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion · 9c164572
      Thomas Gleixner 提交于
      The clocksource delta to nanoseconds conversion is using signed math, but
      the delta is unsigned. This makes the conversion space smaller than
      necessary and in case of a multiplication overflow the conversion can
      become negative. The conversion is done with scaled math:
      
          s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;
      
      Shifting a signed integer right obvioulsy preserves the sign, which has
      interesting consequences:
       
       - Time jumps backwards
       
       - __iter_div_u64_rem() which is used in one of the calling code pathes
         will take forever to piecewise calculate the seconds/nanoseconds part.
      
      This has been reported by several people with different scenarios:
      
      David observed that when stopping a VM with a debugger:
      
       "It was essentially the stopped by debugger case.  I forget exactly why,
        but the guest was being explicitly stopped from outside, it wasn't just
        scheduling lag.  I think it was something in the vicinity of 10 minutes
        stopped."
      
       When lifting the stop the machine went dead.
      
      The stopped by debugger case is not really interesting, but nevertheless it
      would be a good thing not to die completely.
      
      But this was also observed on a live system by Liav:
      
       "When the OS is too overloaded, delta will get a high enough value for the
        msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
        after the shift the nsec variable will gain a value similar to
        0xffffffffff000000."
      
      Unfortunately this has been reintroduced recently with commit 6bd58f09
      ("time: Add cycles to nanoseconds translation"). It had been fixed a year
      ago already in commit 35a4933a ("time: Avoid signed overflow in
      timekeeping_get_ns()").
      
      Though it's not surprising that the issue has been reintroduced because the
      function itself and the whole call chain uses s64 for the result and the
      propagation of it. The change in this recent commit is subtle:
      
         s64 nsec;
      
      -  nsec = (d * m + n) >> s:
      +  nsec = d * m + n;
      +  nsec >>= s;
      
      d being type of cycle_t adds another level of obfuscation.
      
      This wouldn't have happened if the previous change to unsigned computation
      would have made the 'nsec' variable u64 right away and a follow up patch
      had cleaned up the whole call chain.
      
      There have been patches submitted which basically did a revert of the above
      patch leaving everything else unchanged as signed. Back to square one. This
      spawned a admittedly pointless discussion about potential users which rely
      on the unsigned behaviour until someone pointed out that it had been fixed
      before. The changelogs of said patches added further confusion as they made
      finally false claims about the consequences for eventual users which expect
      signed results.
      
      Despite delta being cycle_t, aka. u64, it's very well possible to hand in
      a signed negative value and the signed computation will happily return the
      correct result. But nobody actually sat down and analyzed the code which
      was added as user after the propably unintended signed conversion.
      
      Though in sensitive code like this it's better to analyze it proper and
      make sure that nothing relies on this than hunting the subtle wreckage half
      a year later. After analyzing all call chains it stands that no caller can
      hand in a negative value (which actually would work due to the s64 cast)
      and rely on the signed math to do the right thing.
      
      Change the conversion function to unsigned math. The conversion of all call
      chains is done in a follow up patch.
      
      This solves the starvation issue, which was caused by the negative result,
      but it does not solve the underlying problem. It merily procrastinates
      it. When the timekeeper update is deferred long enough that the unsigned
      multiplication overflows, then time going backwards is observable again.
      
      It does neither solve the issue of clocksources with a small counter width
      which will wrap around possibly several times and cause random time stamps
      to be generated. But those are usually not found on systems used for
      virtualization, so this is likely a non issue.
      
      I took the liberty to claim authorship for this simply because
      analyzing all callsites and writing the changelog took substantially
      more time than just making the simple s/s64/u64/ change and ignore the
      rest.
      
      Fixes: 6bd58f09 ("time: Add cycles to nanoseconds translation")
      Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reported-by: NLiav Rehana <liavr@mellanox.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Parit Bhargava <prarit@redhat.com>
      Cc: Laurent Vivier <lvivier@redhat.com>
      Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9c164572
  2. 01 12月, 2016 1 次提交
    • B
      alarmtimer: Add tracepoints for alarm timers · 4a057549
      Baolin Wang 提交于
      Alarm timers are one of the mechanisms to wake up a system from suspend,
      but there exist no tracepoints to analyse which process/thread armed an
      alarmtimer.
      
      Add tracepoints for start/cancel/expire of individual alarm timers and one
      for tracing the suspend time decision when to resume the system.
      
      The following trace excerpt illustrates the new mechanism:
      
      Binder:3292_2-3304  [000] d..2   149.981123: alarmtimer_cancel:
      alarmtimer:ffffffc1319a7800 type:REALTIME
      expires:1325463120000000000 now:1325376810370370245
      
      Binder:3292_2-3304  [000] d..2   149.981136: alarmtimer_start:
      alarmtimer:ffffffc1319a7800 type:REALTIME
      expires:1325376840000000000 now:1325376810370384591
      
      Binder:3292_9-3953  [000] d..2   150.212991: alarmtimer_cancel:
      alarmtimer:ffffffc1319a5a00 type:BOOTTIME
      expires:179552000000 now:150154008122
      
      Binder:3292_9-3953  [000] d..2   150.213006: alarmtimer_start:
      alarmtimer:ffffffc1319a5a00 type:BOOTTIME
      expires:179551000000 now:150154025622
      
      system_server-3000  [002] ...1  162.701940: alarmtimer_suspend:
      alarmtimer type:REALTIME expires:1325376840000000000
      
      The wakeup time which is selected at suspend time allows to map it back to
      the task arming the timer: Binder:3292_2.
      
      [ tglx: Store alarm timer expiry time instead of some useless RTC relative
        	information, add proper type information for wakeups which are
        	handled via the clock_nanosleep/freezer and massage the changelog. ]
      Signed-off-by: NBaolin Wang <baolin.wang@linaro.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Link: http://lkml.kernel.org/r/1480372524-15181-5-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      4a057549
  3. 30 11月, 2016 2 次提交
  4. 16 11月, 2016 3 次提交
  5. 26 10月, 2016 2 次提交
    • D
      timers: Fix documentation for schedule_timeout() and similar · 4b7e9cf9
      Douglas Anderson 提交于
      The documentation for schedule_timeout(), schedule_hrtimeout(), and
      schedule_hrtimeout_range() all claim that the routines couldn't possibly
      return early if the task state was TASK_UNINTERRUPTIBLE. This is simply
      not true since wake_up_process() will cause those routines to exit early.
      
      We cannot make schedule_[hr]timeout() loop until the timeout expires if the
      task state is uninterruptible because we have users which rely on the
      existing and designed behaviour.
      
      Make the documentation match the (correct) implementation.
      
      schedule_hrtimeout() returns -EINTR even when a uninterruptible task was
      woken up. This might look strange, but making the return code depend on the
      state is too much of an effort as it would affect all the call sites. There
      is no value in doing so, but we spell it out clearly in the documentation.
      Suggested-by: NDaniel Kurtz <djkurtz@chromium.org>
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Cc: huangtao@rock-chips.com
      Cc: heiko@sntech.de
      Cc: broonie@kernel.org
      Cc: briannorris@chromium.org
      Cc: Andreas Mohr <andi@lisas.de>
      Cc: linux-rockchip@lists.infradead.org
      Cc: tony.xie@rock-chips.com
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: linux@roeck-us.net
      Cc: tskd08@gmail.com
      Link: http://lkml.kernel.org/r/1477065531-30342-2-git-send-email-dianders@chromium.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      4b7e9cf9
    • D
      timers: Fix usleep_range() in the context of wake_up_process() · 6c5e9059
      Douglas Anderson 提交于
      Users of usleep_range() expect that it will _never_ return in less time
      than the minimum passed parameter. However, nothing in the code ensures
      this, when the sleeping task is woken by wake_up_process() or any other
      mechanism which can wake a task from uninterruptible state.
      
      Neither usleep_range() nor schedule_hrtimeout_range*() have any protection
      against wakeups. schedule_hrtimeout_range*() is designed this way despite
      the fact that the API documentation does not mention it.
      
      msleep() already has code to handle this case since it will loop as long
      as there was still time left.  usleep_range() has no such loop, add it.
      
      Presumably this problem was not detected before because usleep_range() is
      only used in a few places and the function is mostly used in contexts which
      are not exposed to wakeups of any form.
      
      An effort was made to look for users relying on the old behavior by
      looking for usleep_range() in the same file as wake_up_process().
      No problems were found by this search, though it is conceivable that
      someone could have put the sleep and wakeup in two different files.
      
      An effort was made to ask several upstream maintainers if they were aware
      of people relying on wake_up_process() to wake up usleep_range(). No
      maintainers were aware of that but they were aware of many people relying
      on usleep_range() never returning before the minimum.
      Reported-by: NTao Huang <huangtao@rock-chips.com>
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Cc: heiko@sntech.de
      Cc: broonie@kernel.org
      Cc: briannorris@chromium.org
      Cc: Andreas Mohr <andi@lisas.de>
      Cc: linux-rockchip@lists.infradead.org
      Cc: tony.xie@rock-chips.com
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: djkurtz@chromium.org
      Cc: linux@roeck-us.net
      Cc: tskd08@gmail.com
      Link: http://lkml.kernel.org/r/1477065531-30342-1-git-send-email-dianders@chromium.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      6c5e9059
  6. 21 10月, 2016 1 次提交
  7. 20 10月, 2016 1 次提交
    • L
      printk: suppress empty continuation lines · 8835ca59
      Linus Torvalds 提交于
      We have a fairly common pattern where you print several things as
      continuations on one single line in a loop, and then at the end you do
      
      	printk(KERN_CONT "\n");
      
      to flush the buffered output.
      
      But if the output was flushed by something else (concurrent printk
      activity, or just system logging), we don't want that final flushing to
      just print an empty line.
      
      So just suppress empty continuation lines when they couldn't be merged
      into the line they are a continuation of.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8835ca59
  8. 19 10月, 2016 3 次提交
  9. 17 10月, 2016 1 次提交
  10. 16 10月, 2016 1 次提交
  11. 13 10月, 2016 1 次提交
    • L
      Disable the __builtin_return_address() warning globally after all · ef6000b4
      Linus Torvalds 提交于
      This affectively reverts commit 377ccbb4 ("Makefile: Mute warning
      for __builtin_return_address(>0) for tracing only") because it turns out
      that it really isn't tracing only - it's all over the tree.
      
      We already also had the warning disabled separately for mm/usercopy.c
      (which this commit also removes), and it turns out that we will also
      want to disable it for get_lock_parent_ip(), that is used for at least
      TRACE_IRQFLAGS.  Which (when enabled) ends up being all over the tree.
      
      Steven Rostedt had a patch that tried to limit it to just the config
      options that actually triggered this, but quite frankly, the extra
      complexity and abstraction just isn't worth it.  We have never actually
      had a case where the warning is actually useful, so let's just disable
      it globally and not worry about it.
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef6000b4
  12. 12 10月, 2016 20 次提交
    • J
      hung_task: allow hung_task_panic when hung_task_warnings is 0 · 48a6d64e
      John Siddle 提交于
      Previously hung_task_panic would not be respected if enabled after
      hung_task_warnings had already been decremented to 0.
      
      Permit the kernel to panic if hung_task_panic is enabled after
      hung_task_warnings has already been decremented to 0 and another task
      hangs for hung_task_timeout_secs seconds.
      
      Check if hung_task_panic is enabled so we don't return prematurely, and
      check if hung_task_warnings is non-zero so we don't print the warning
      unnecessarily.
      
      [akpm@linux-foundation.org: fix off-by-one]
      Link: http://lkml.kernel.org/r/1473450214-4049-1-git-send-email-jsiddle@redhat.comSigned-off-by: NJohn Siddle <jsiddle@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      48a6d64e
    • P
      kthread: better support freezable kthread workers · dbf52682
      Petr Mladek 提交于
      This patch allows to make kthread worker freezable via a new @flags
      parameter. It will allow to avoid an init work in some kthreads.
      
      It currently does not affect the function of kthread_worker_fn()
      but it might help to do some optimization or fixes eventually.
      
      I currently do not know about any other use for the @flags
      parameter but I believe that we will want more flags
      in the future.
      
      Finally, I hope that it will not cause confusion with @flags member
      in struct kthread. Well, I guess that we will want to rework the
      basic kthreads implementation once all kthreads are converted into
      kthread workers or workqueues. It is possible that we will merge
      the two structures.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-12-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dbf52682
    • P
      kthread: allow to modify delayed kthread work · 9a6b06c8
      Petr Mladek 提交于
      There are situations when we need to modify the delay of a delayed kthread
      work. For example, when the work depends on an event and the initial delay
      means a timeout. Then we want to queue the work immediately when the event
      happens.
      
      This patch implements kthread_mod_delayed_work() as inspired workqueues.
      It cancels the timer, removes the work from any worker list and queues it
      again with the given timeout.
      
      A very special case is when the work is being canceled at the same time.
      It might happen because of the regular kthread_cancel_delayed_work_sync()
      or by another kthread_mod_delayed_work(). In this case, we do nothing and
      let the other operation win. This should not normally happen as the caller
      is supposed to synchronize these operations a reasonable way.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-11-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a6b06c8
    • P
      kthread: allow to cancel kthread work · 37be45d4
      Petr Mladek 提交于
      We are going to use kthread workers more widely and sometimes we will need
      to make sure that the work is neither pending nor running.
      
      This patch implements cancel_*_sync() operations as inspired by
      workqueues.  Well, we are synchronized against the other operations via
      the worker lock, we use del_timer_sync() and a counter to count parallel
      cancel operations.  Therefore the implementation might be easier.
      
      First, we check if a worker is assigned.  If not, the work has newer been
      queued after it was initialized.
      
      Second, we take the worker lock.  It must be the right one.  The work must
      not be assigned to another worker unless it is initialized in between.
      
      Third, we try to cancel the timer when it exists.  The timer is deleted
      synchronously to make sure that the timer call back is not running.  We
      need to temporary release the worker->lock to avoid a possible deadlock
      with the callback.  In the meantime, we set work->canceling counter to
      avoid any queuing.
      
      Fourth, we try to remove the work from a worker list. It might be
      the list of either normal or delayed works.
      
      Fifth, if the work is running, we call kthread_flush_work().  It might
      take an arbitrary time.  We need to release the worker-lock again.  In the
      meantime, we again block any queuing by the canceling counter.
      
      As already mentioned, the check for a pending kthread work is done under a
      lock.  In compare with workqueues, we do not need to fight for a single
      PENDING bit to block other operations.  Therefore we do not suffer from
      the thundering storm problem and all parallel canceling jobs might use
      kthread_flush_work().  Any queuing is blocked until the counter gets zero.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-10-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      37be45d4
    • P
      kthread: initial support for delayed kthread work · 22597dc3
      Petr Mladek 提交于
      We are going to use kthread_worker more widely and delayed works
      will be pretty useful.
      
      The implementation is inspired by workqueues.  It uses a timer to queue
      the work after the requested delay.  If the delay is zero, the work is
      queued immediately.
      
      In compare with workqueues, each work is associated with a single worker
      (kthread).  Therefore the implementation could be much easier.  In
      particular, we use the worker->lock to synchronize all the operations with
      the work.  We do not need any atomic operation with a flags variable.
      
      In fact, we do not need any state variable at all.  Instead, we add a list
      of delayed works into the worker.  Then the pending work is listed either
      in the list of queued or delayed works.  And the existing check of pending
      works is the same even for the delayed ones.
      
      A work must not be assigned to another worker unless reinitialized.
      Therefore the timer handler might expect that dwork->work->worker is valid
      and it could simply take the lock.  We just add some sanity checks to help
      with debugging a potential misuse.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-9-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22597dc3
    • P
      kthread: detect when a kthread work is used by more workers · 8197b3d4
      Petr Mladek 提交于
      Nothing currently prevents a work from queuing for a kthread worker when
      it is already running on another one.  This means that the work might run
      in parallel on more than one worker.  Also some operations are not
      reliable, e.g.  flush.
      
      This problem will be even more visible after we add kthread_cancel_work()
      function.  It will only have "work" as the parameter and will use
      worker->lock to synchronize with others.
      
      Well, normally this is not a problem because the API users are sane.
      But bugs might happen and users also might be crazy.
      
      This patch adds a warning when we try to insert the work for another
      worker.  It does not fully prevent the misuse because it would make the
      code much more complicated without a big benefit.
      
      It adds the same warning also into kthread_flush_work() instead of the
      repeated attempts to get the right lock.
      
      A side effect is that one needs to explicitly reinitialize the work if it
      must be queued into another worker.  This is needed, for example, when the
      worker is stopped and started again.  It is a bit inconvenient.  But it
      looks like a good compromise between the stability and complexity.
      
      I have double checked all existing users of the kthread worker API and
      they all seems to initialize the work after the worker gets started.
      
      Just for completeness, the patch adds a check that the work is not already
      in a queue.
      
      The patch also puts all the checks into a separate function.  It will be
      reused when implementing delayed works.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-8-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8197b3d4
    • P
      kthread: add kthread_destroy_worker() · 35033fe9
      Petr Mladek 提交于
      The current kthread worker users call flush() and stop() explicitly.
      This function does the same plus it frees the kthread_worker struct
      in one call.
      
      It is supposed to be used together with kthread_create_worker*() that
      allocates struct kthread_worker.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-7-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      35033fe9
    • P
      kthread: add kthread_create_worker*() · fbae2d44
      Petr Mladek 提交于
      Kthread workers are currently created using the classic kthread API,
      namely kthread_run().  kthread_worker_fn() is passed as the @threadfn
      parameter.
      
      This patch defines kthread_create_worker() and
      kthread_create_worker_on_cpu() functions that hide implementation details.
      
      They enforce using kthread_worker_fn() for the main thread.  But I doubt
      that there are any plans to create any alternative.  In fact, I think that
      we do not want any alternative main thread because it would be hard to
      support consistency with the rest of the kthread worker API.
      
      The naming and function of kthread_create_worker() is inspired by the
      workqueues API like the rest of the kthread worker API.
      
      The kthread_create_worker_on_cpu() variant is motivated by the original
      kthread_create_on_cpu().  Note that we need to bind per-CPU kthread
      workers already when they are created.  It makes the life easier.
      kthread_bind() could not be used later for an already running worker.
      
      This patch does _not_ convert existing kthread workers.  The kthread
      worker API need more improvements first, e.g.  a function to destroy the
      worker.
      
      IMPORTANT:
      
      kthread_create_worker_on_cpu() allows to use any format of the worker
      name, in compare with kthread_create_on_cpu().  The good thing is that it
      is more generic.  The bad thing is that most users will need to pass the
      cpu number in two parameters, e.g.  kthread_create_worker_on_cpu(cpu,
      "helper/%d", cpu).
      
      To be honest, the main motivation was to avoid the need for an empty
      va_list.  The only legal way was to create a helper function that would be
      called with an empty list.  Other attempts caused compilation warnings or
      even errors on different architectures.
      
      There were also other alternatives, for example, using #define or
      splitting __kthread_create_worker().  The used solution looked like the
      least ugly.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-6-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fbae2d44
    • P
      kthread: allow to call __kthread_create_on_node() with va_list args · 255451e4
      Petr Mladek 提交于
      kthread_create_on_node() implements a bunch of logic to create the
      kthread.  It is already called by kthread_create_on_cpu().
      
      We are going to extend the kthread worker API and will need to call
      kthread_create_on_node() with va_list args there.
      
      This patch does only a refactoring and does not modify the existing
      behavior.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-5-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      255451e4
    • P
      kthread/smpboot: do not park in kthread_create_on_cpu() · a65d4096
      Petr Mladek 提交于
      kthread_create_on_cpu() was added by the commit 2a1d4460
      ("kthread: Implement park/unpark facility").  It is currently used only
      when enabling new CPU.  For this purpose, the newly created kthread has to
      be parked.
      
      The CPU binding is a bit tricky.  The kthread is parked when the CPU has
      not been allowed yet.  And the CPU is bound when the kthread is unparked.
      
      The function would be useful for more per-CPU kthreads, e.g.
      bnx2fc_thread, fcoethread.  For this purpose, the newly created kthread
      should stay in the uninterruptible state.
      
      This patch moves the parking into smpboot.  It binds the thread already
      when created.  Then the function might be used universally.  Also the
      behavior is consistent with kthread_create() and kthread_create_on_node().
      
      Link: http://lkml.kernel.org/r/1470754545-17632-4-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a65d4096
    • P
      kthread: kthread worker API cleanup · 3989144f
      Petr Mladek 提交于
      A good practice is to prefix the names of functions by the name
      of the subsystem.
      
      The kthread worker API is a mix of classic kthreads and workqueues.  Each
      worker has a dedicated kthread.  It runs a generic function that process
      queued works.  It is implemented as part of the kthread subsystem.
      
      This patch renames the existing kthread worker API to use
      the corresponding name from the workqueues API prefixed by
      kthread_:
      
      __init_kthread_worker()		-> __kthread_init_worker()
      init_kthread_worker()		-> kthread_init_worker()
      init_kthread_work()		-> kthread_init_work()
      insert_kthread_work()		-> kthread_insert_work()
      queue_kthread_work()		-> kthread_queue_work()
      flush_kthread_work()		-> kthread_flush_work()
      flush_kthread_worker()		-> kthread_flush_worker()
      
      Note that the names of DEFINE_KTHREAD_WORK*() macros stay
      as they are. It is common that the "DEFINE_" prefix has
      precedence over the subsystem names.
      
      Note that INIT() macros and init() functions use different
      naming scheme. There is no good solution. There are several
      reasons for this solution:
      
        + "init" in the function names stands for the verb "initialize"
          aka "initialize worker". While "INIT" in the macro names
          stands for the noun "INITIALIZER" aka "worker initializer".
      
        + INIT() macros are used only in DEFINE() macros
      
        + init() functions are used close to the other kthread()
          functions. It looks much better if all the functions
          use the same scheme.
      
        + There will be also kthread_destroy_worker() that will
          be used close to kthread_cancel_work(). It is related
          to the init() function. Again it looks better if all
          functions use the same naming scheme.
      
        + there are several precedents for such init() function
          names, e.g. amd_iommu_init_device(), free_area_init_node(),
          jump_label_init_type(),  regmap_init_mmio_clk(),
      
        + It is not an argument but it was inconsistent even before.
      
      [arnd@arndb.de: fix linux-next merge conflict]
       Link: http://lkml.kernel.org/r/20160908135724.1311726-1-arnd@arndb.de
      Link: http://lkml.kernel.org/r/1470754545-17632-3-git-send-email-pmladek@suse.comSuggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3989144f
    • P
      kthread: rename probe_kthread_data() to kthread_probe_data() · e700591a
      Petr Mladek 提交于
      Patch series "kthread: Kthread worker API improvements"
      
      The intention of this patchset is to make it easier to manipulate and
      maintain kthreads.  Especially, I want to replace all the custom main
      cycles with a generic one.  Also I want to make the kthreads sleep in a
      consistent state in a common place when there is no work.
      
      This patch (of 11):
      
      A good practice is to prefix the names of functions by the name of the
      subsystem.
      
      This patch fixes the name of probe_kthread_data().  The other wrong
      functions names are part of the kthread worker API and will be fixed
      separately.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-2-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e700591a
    • R
      config: android: enable CONFIG_SECCOMP · 2489a177
      Rob Herring 提交于
      As of Android N, SECCOMP is required. Without it, we will get
      mediaextractor error:
      
      E /system/bin/mediaextractor: libminijail: prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER): Invalid argument
      
      Link: http://lkml.kernel.org/r/20160908185934.18098-3-robh@kernel.orgSigned-off-by: NRob Herring <robh@kernel.org>
      Acked-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Amit Pundir <amit.pundir@linaro.org>
      Cc: Dmitry Shmidt <dimitrysh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2489a177
    • R
      config: android: set SELinux as default security mode · d90ae51a
      Rob Herring 提交于
      Android won't boot without SELinux enabled, so make it the default.
      
      Link: http://lkml.kernel.org/r/20160908185934.18098-2-robh@kernel.orgSigned-off-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d90ae51a
    • R
      config: android: move device mapper options to recommended · f023a395
      Rob Herring 提交于
      CONFIG_MD is in recommended, but other dependent options like DM_CRYPT and
      DM_VERITY options are in base.  The result is the options in base don't
      get enabled when applying both base and recommended fragments.  Move all
      the options to recommended.
      
      Link: http://lkml.kernel.org/r/20160908185934.18098-1-robh@kernel.orgSigned-off-by: NRob Herring <robh@kernel.org>
      Acked-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Amit Pundir <amit.pundir@linaro.org>
      Cc: Dmitry Shmidt <dimitrysh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f023a395
    • B
      config/android: Remove CONFIG_IPV6_PRIVACY · a2c6a235
      Borislav Petkov 提交于
      Option is long gone, see commit 5d9efa7e ("ipv6: Remove privacy
      config option.")
      
      Link: http://lkml.kernel.org/r/20160811170340.9859-1-bp@alien8.deSigned-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Rob Herring <robh@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2c6a235
    • P
      relay: Use irq_work instead of plain timer for deferred wakeup · 26b5679e
      Peter Zijlstra 提交于
      Relay avoids calling wake_up_interruptible() for doing the wakeup of
      readers/consumers, waiting for the generation of new data, from the
      context of a process which produced the data.  This is apparently done to
      prevent the possibility of a deadlock in case Scheduler itself is is
      generating data for the relay, after acquiring rq->lock.
      
      The following patch used a timer (to be scheduled at next jiffy), for
      delegating the wakeup to another context.
      	commit 7c9cb383
      	Author: Tom Zanussi <zanussi@comcast.net>
      	Date:   Wed May 9 02:34:01 2007 -0700
      
      	relay: use plain timer instead of delayed work
      
      	relay doesn't need to use schedule_delayed_work() for waking readers
      	when a simple timer will do.
      
      Scheduling a plain timer, at next jiffies boundary, to do the wakeup
      causes a significant wakeup latency for the Userspace client, which makes
      relay less suitable for the high-frequency low-payload use cases where the
      data gets generated at a very high rate, like multiple sub buffers getting
      filled within a milli second.  Moreover the timer is re-scheduled on every
      newly produced sub buffer so the timer keeps getting pushed out if sub
      buffers are filled in a very quick succession (less than a jiffy gap
      between filling of 2 sub buffers).  As a result relay runs out of sub
      buffers to store the new data.
      
      By using irq_work it is ensured that wakeup of userspace client, blocked
      in the poll call, is done at earliest (through self IPI or next timer
      tick) enabling it to always consume the data in time.  Also this makes
      relay consistent with printk & ring buffers (trace), as they too use
      irq_work for deferred wake up of readers.
      
      [arnd@arndb.de: select CONFIG_IRQ_WORK]
       Link: http://lkml.kernel.org/r/20160912154035.3222156-1-arnd@arndb.de
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/1472906487-1559-1-git-send-email-akash.goel@intel.comSigned-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAkash Goel <akash.goel@intel.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26b5679e
    • H
      x86/panic: replace smp_send_stop() with kdump friendly version in panic path · 0ee59413
      Hidehiro Kawai 提交于
      Daniel Walker reported problems which happens when
      crash_kexec_post_notifiers kernel option is enabled
      (https://lkml.org/lkml/2015/6/24/44).
      
      In that case, smp_send_stop() is called before entering kdump routines
      which assume other CPUs are still online.  As the result, for x86, kdump
      routines fail to save other CPUs' registers and disable virtualization
      extensions.
      
      To fix this problem, call a new kdump friendly function,
      crash_smp_send_stop(), instead of the smp_send_stop() when
      crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a weak
      function, and it just call smp_send_stop().  Architecture codes should
      override it so that kdump can work appropriately.  This patch only
      provides x86-specific version.
      
      For Xen's PV kernel, just keep the current behavior.
      
      NOTES:
      
      - Right solution would be to place crash_smp_send_stop() before
        __crash_kexec() invocation in all cases and remove smp_send_stop(), but
        we can't do that until all architectures implement own
        crash_smp_send_stop()
      
      - crash_smp_send_stop()-like work is still needed by
        machine_crash_shutdown() because crash_kexec() can be called without
        entering panic()
      
      Fixes: f06e5153 (kernel/panic.c: add "crash_kexec_post_notifiers" option)
      Link: http://lkml.kernel.org/r/20160810080948.11028.15344.stgit@sysi4-13.yrl.intra.hitachi.co.jpSigned-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Reported-by: NDaniel Walker <dwalker@fifo99.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Daniel Walker <dwalker@fifo99.com>
      Cc: Xunlei Pang <xpang@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
      Cc: "Steven J. Hill" <steven.hill@cavium.com>
      Cc: Corey Minyard <cminyard@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ee59413
    • A
      ptrace: clear TIF_SYSCALL_TRACE on ptrace detach · 0a5bf409
      Ales Novak 提交于
      On __ptrace_detach(), called from do_exit()->exit_notify()->
      forget_original_parent()->exit_ptrace(), the TIF_SYSCALL_TRACE in
      thread->flags of the tracee is not cleared up.  This results in the
      tracehook_report_syscall_* being called (though there's no longer a tracer
      listening to that) upon its further syscalls.
      
      Example scenario - attach "strace" to a running process and kill it (the
      strace) with SIGKILL.  You'll see that the syscall trace hooks are still
      being called.
      
      The clearing of this flag should be moved from ptrace_detach() to
      __ptrace_detach().
      
      Link: http://lkml.kernel.org/r/1472759493-20554-1-git-send-email-alnovak@suse.czSigned-off-by: NAles Novak <alnovak@suse.cz>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a5bf409
    • C
      kprobes: include <asm/sections.h> instead of <asm-generic/sections.h> · bfd45be0
      Christoph Hellwig 提交于
      asm-generic headers are generic implementations for architecture specific
      code and should not be included by common code.  Thus use the asm/ version
      of sections.h to get at the linker sections.
      
      Link: http://lkml.kernel.org/r/1473602302-6208-1-git-send-email-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bfd45be0
  13. 11 10月, 2016 3 次提交
    • W
      sched/fair: Fix sched domains NULL dereference in select_idle_sibling() · 9cfb38a7
      Wanpeng Li 提交于
      Commit:
      
        10e2f1ac ("sched/core: Rewrite and improve select_idle_siblings()")
      
      ... improved select_idle_sibling(), but also triggered a regression (crash)
      during CPU-hotplug:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
        IP: [<ffffffffb10cd332>] select_idle_sibling+0x1c2/0x4f0
        Call Trace:
         <IRQ>
          select_task_rq_fair+0x749/0x930
          ? select_task_rq_fair+0xb4/0x930
          ? __lock_is_held+0x54/0x70
          try_to_wake_up+0x19a/0x5b0
          default_wake_function+0x12/0x20
          autoremove_wake_function+0x12/0x40
          __wake_up_common+0x55/0x90
          __wake_up+0x39/0x50
          wake_up_klogd_work_func+0x40/0x60
          irq_work_run_list+0x57/0x80
          irq_work_run+0x2c/0x30
          smp_irq_work_interrupt+0x2e/0x40
          irq_work_interrupt+0x96/0xa0
         <EOI>
          ? _raw_spin_unlock_irqrestore+0x45/0x80
          try_to_wake_up+0x4a/0x5b0
          wake_up_state+0x10/0x20
          __kthread_unpark+0x67/0x70
          kthread_unpark+0x22/0x30
          cpuhp_online_idle+0x3e/0x70
          cpu_startup_entry+0x6a/0x450
          start_secondary+0x154/0x180
      
      This can be reproduced by running the ftrace test case of kselftest, the
      test case will hot-unplug the CPU and the CPU will attach to the NULL
      sched-domain during scheduler teardown.
      
      The step 2 for the rewrite select_idle_siblings():
      
        | Step 2) tracks the average cost of the scan and compares this to the
        | average idle time guestimate for the CPU doing the wakeup.
      
      If the CPU which doing the wakeup is the going hot-unplug CPU, then NULL
      sched domain will be dereferenced to acquire the average cost of the scan.
      
      This patch fix it by failing the search of an idle CPU in the LLC process
      if this sched domain is NULL.
      Tested-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1475971443-3187-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9cfb38a7
    • E
      latent_entropy: Mark functions with __latent_entropy · 0766f788
      Emese Revfy 提交于
      The __latent_entropy gcc attribute can be used only on functions and
      variables.  If it is on a function then the plugin will instrument it for
      gathering control-flow entropy. If the attribute is on a variable then
      the plugin will initialize it with random contents.  The variable must
      be an integer, an integer array type or a structure with integer fields.
      
      These specific functions have been selected because they are init
      functions (to help gather boot-time entropy), are called at unpredictable
      times, or they have variable loops, each of which provide some level of
      latent entropy.
      Signed-off-by: NEmese Revfy <re.emese@gmail.com>
      [kees: expanded commit message]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      0766f788
    • E
      gcc-plugins: Add latent_entropy plugin · 38addce8
      Emese Revfy 提交于
      This adds a new gcc plugin named "latent_entropy". It is designed to
      extract as much possible uncertainty from a running system at boot time as
      possible, hoping to capitalize on any possible variation in CPU operation
      (due to runtime data differences, hardware differences, SMP ordering,
      thermal timing variation, cache behavior, etc).
      
      At the very least, this plugin is a much more comprehensive example for
      how to manipulate kernel code using the gcc plugin internals.
      
      The need for very-early boot entropy tends to be very architecture or
      system design specific, so this plugin is more suited for those sorts
      of special cases. The existing kernel RNG already attempts to extract
      entropy from reliable runtime variation, but this plugin takes the idea to
      a logical extreme by permuting a global variable based on any variation
      in code execution (e.g. a different value (and permutation function)
      is used to permute the global based on loop count, case statement,
      if/then/else branching, etc).
      
      To do this, the plugin starts by inserting a local variable in every
      marked function. The plugin then adds logic so that the value of this
      variable is modified by randomly chosen operations (add, xor and rol) and
      random values (gcc generates separate static values for each location at
      compile time and also injects the stack pointer at runtime). The resulting
      value depends on the control flow path (e.g., loops and branches taken).
      
      Before the function returns, the plugin mixes this local variable into
      the latent_entropy global variable. The value of this global variable
      is added to the kernel entropy pool in do_one_initcall() and _do_fork(),
      though it does not credit any bytes of entropy to the pool; the contents
      of the global are just used to mix the pool.
      
      Additionally, the plugin can pre-initialize arrays with build-time
      random contents, so that two different kernel builds running on identical
      hardware will not have the same starting values.
      Signed-off-by: NEmese Revfy <re.emese@gmail.com>
      [kees: expanded commit message and code comments]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      38addce8