1. 26 10月, 2016 1 次提交
    • D
      timers: Fix usleep_range() in the context of wake_up_process() · 6c5e9059
      Douglas Anderson 提交于
      Users of usleep_range() expect that it will _never_ return in less time
      than the minimum passed parameter. However, nothing in the code ensures
      this, when the sleeping task is woken by wake_up_process() or any other
      mechanism which can wake a task from uninterruptible state.
      
      Neither usleep_range() nor schedule_hrtimeout_range*() have any protection
      against wakeups. schedule_hrtimeout_range*() is designed this way despite
      the fact that the API documentation does not mention it.
      
      msleep() already has code to handle this case since it will loop as long
      as there was still time left.  usleep_range() has no such loop, add it.
      
      Presumably this problem was not detected before because usleep_range() is
      only used in a few places and the function is mostly used in contexts which
      are not exposed to wakeups of any form.
      
      An effort was made to look for users relying on the old behavior by
      looking for usleep_range() in the same file as wake_up_process().
      No problems were found by this search, though it is conceivable that
      someone could have put the sleep and wakeup in two different files.
      
      An effort was made to ask several upstream maintainers if they were aware
      of people relying on wake_up_process() to wake up usleep_range(). No
      maintainers were aware of that but they were aware of many people relying
      on usleep_range() never returning before the minimum.
      Reported-by: NTao Huang <huangtao@rock-chips.com>
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Cc: heiko@sntech.de
      Cc: broonie@kernel.org
      Cc: briannorris@chromium.org
      Cc: Andreas Mohr <andi@lisas.de>
      Cc: linux-rockchip@lists.infradead.org
      Cc: tony.xie@rock-chips.com
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: djkurtz@chromium.org
      Cc: linux@roeck-us.net
      Cc: tskd08@gmail.com
      Link: http://lkml.kernel.org/r/1477065531-30342-1-git-send-email-dianders@chromium.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      6c5e9059
  2. 21 10月, 2016 1 次提交
  3. 20 10月, 2016 1 次提交
    • L
      printk: suppress empty continuation lines · 8835ca59
      Linus Torvalds 提交于
      We have a fairly common pattern where you print several things as
      continuations on one single line in a loop, and then at the end you do
      
      	printk(KERN_CONT "\n");
      
      to flush the buffered output.
      
      But if the output was flushed by something else (concurrent printk
      activity, or just system logging), we don't want that final flushing to
      just print an empty line.
      
      So just suppress empty continuation lines when they couldn't be merged
      into the line they are a continuation of.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8835ca59
  4. 19 10月, 2016 3 次提交
  5. 17 10月, 2016 1 次提交
  6. 16 10月, 2016 1 次提交
  7. 13 10月, 2016 1 次提交
    • L
      Disable the __builtin_return_address() warning globally after all · ef6000b4
      Linus Torvalds 提交于
      This affectively reverts commit 377ccbb4 ("Makefile: Mute warning
      for __builtin_return_address(>0) for tracing only") because it turns out
      that it really isn't tracing only - it's all over the tree.
      
      We already also had the warning disabled separately for mm/usercopy.c
      (which this commit also removes), and it turns out that we will also
      want to disable it for get_lock_parent_ip(), that is used for at least
      TRACE_IRQFLAGS.  Which (when enabled) ends up being all over the tree.
      
      Steven Rostedt had a patch that tried to limit it to just the config
      options that actually triggered this, but quite frankly, the extra
      complexity and abstraction just isn't worth it.  We have never actually
      had a case where the warning is actually useful, so let's just disable
      it globally and not worry about it.
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef6000b4
  8. 12 10月, 2016 20 次提交
    • J
      hung_task: allow hung_task_panic when hung_task_warnings is 0 · 48a6d64e
      John Siddle 提交于
      Previously hung_task_panic would not be respected if enabled after
      hung_task_warnings had already been decremented to 0.
      
      Permit the kernel to panic if hung_task_panic is enabled after
      hung_task_warnings has already been decremented to 0 and another task
      hangs for hung_task_timeout_secs seconds.
      
      Check if hung_task_panic is enabled so we don't return prematurely, and
      check if hung_task_warnings is non-zero so we don't print the warning
      unnecessarily.
      
      [akpm@linux-foundation.org: fix off-by-one]
      Link: http://lkml.kernel.org/r/1473450214-4049-1-git-send-email-jsiddle@redhat.comSigned-off-by: NJohn Siddle <jsiddle@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      48a6d64e
    • P
      kthread: better support freezable kthread workers · dbf52682
      Petr Mladek 提交于
      This patch allows to make kthread worker freezable via a new @flags
      parameter. It will allow to avoid an init work in some kthreads.
      
      It currently does not affect the function of kthread_worker_fn()
      but it might help to do some optimization or fixes eventually.
      
      I currently do not know about any other use for the @flags
      parameter but I believe that we will want more flags
      in the future.
      
      Finally, I hope that it will not cause confusion with @flags member
      in struct kthread. Well, I guess that we will want to rework the
      basic kthreads implementation once all kthreads are converted into
      kthread workers or workqueues. It is possible that we will merge
      the two structures.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-12-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dbf52682
    • P
      kthread: allow to modify delayed kthread work · 9a6b06c8
      Petr Mladek 提交于
      There are situations when we need to modify the delay of a delayed kthread
      work. For example, when the work depends on an event and the initial delay
      means a timeout. Then we want to queue the work immediately when the event
      happens.
      
      This patch implements kthread_mod_delayed_work() as inspired workqueues.
      It cancels the timer, removes the work from any worker list and queues it
      again with the given timeout.
      
      A very special case is when the work is being canceled at the same time.
      It might happen because of the regular kthread_cancel_delayed_work_sync()
      or by another kthread_mod_delayed_work(). In this case, we do nothing and
      let the other operation win. This should not normally happen as the caller
      is supposed to synchronize these operations a reasonable way.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-11-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a6b06c8
    • P
      kthread: allow to cancel kthread work · 37be45d4
      Petr Mladek 提交于
      We are going to use kthread workers more widely and sometimes we will need
      to make sure that the work is neither pending nor running.
      
      This patch implements cancel_*_sync() operations as inspired by
      workqueues.  Well, we are synchronized against the other operations via
      the worker lock, we use del_timer_sync() and a counter to count parallel
      cancel operations.  Therefore the implementation might be easier.
      
      First, we check if a worker is assigned.  If not, the work has newer been
      queued after it was initialized.
      
      Second, we take the worker lock.  It must be the right one.  The work must
      not be assigned to another worker unless it is initialized in between.
      
      Third, we try to cancel the timer when it exists.  The timer is deleted
      synchronously to make sure that the timer call back is not running.  We
      need to temporary release the worker->lock to avoid a possible deadlock
      with the callback.  In the meantime, we set work->canceling counter to
      avoid any queuing.
      
      Fourth, we try to remove the work from a worker list. It might be
      the list of either normal or delayed works.
      
      Fifth, if the work is running, we call kthread_flush_work().  It might
      take an arbitrary time.  We need to release the worker-lock again.  In the
      meantime, we again block any queuing by the canceling counter.
      
      As already mentioned, the check for a pending kthread work is done under a
      lock.  In compare with workqueues, we do not need to fight for a single
      PENDING bit to block other operations.  Therefore we do not suffer from
      the thundering storm problem and all parallel canceling jobs might use
      kthread_flush_work().  Any queuing is blocked until the counter gets zero.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-10-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      37be45d4
    • P
      kthread: initial support for delayed kthread work · 22597dc3
      Petr Mladek 提交于
      We are going to use kthread_worker more widely and delayed works
      will be pretty useful.
      
      The implementation is inspired by workqueues.  It uses a timer to queue
      the work after the requested delay.  If the delay is zero, the work is
      queued immediately.
      
      In compare with workqueues, each work is associated with a single worker
      (kthread).  Therefore the implementation could be much easier.  In
      particular, we use the worker->lock to synchronize all the operations with
      the work.  We do not need any atomic operation with a flags variable.
      
      In fact, we do not need any state variable at all.  Instead, we add a list
      of delayed works into the worker.  Then the pending work is listed either
      in the list of queued or delayed works.  And the existing check of pending
      works is the same even for the delayed ones.
      
      A work must not be assigned to another worker unless reinitialized.
      Therefore the timer handler might expect that dwork->work->worker is valid
      and it could simply take the lock.  We just add some sanity checks to help
      with debugging a potential misuse.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-9-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22597dc3
    • P
      kthread: detect when a kthread work is used by more workers · 8197b3d4
      Petr Mladek 提交于
      Nothing currently prevents a work from queuing for a kthread worker when
      it is already running on another one.  This means that the work might run
      in parallel on more than one worker.  Also some operations are not
      reliable, e.g.  flush.
      
      This problem will be even more visible after we add kthread_cancel_work()
      function.  It will only have "work" as the parameter and will use
      worker->lock to synchronize with others.
      
      Well, normally this is not a problem because the API users are sane.
      But bugs might happen and users also might be crazy.
      
      This patch adds a warning when we try to insert the work for another
      worker.  It does not fully prevent the misuse because it would make the
      code much more complicated without a big benefit.
      
      It adds the same warning also into kthread_flush_work() instead of the
      repeated attempts to get the right lock.
      
      A side effect is that one needs to explicitly reinitialize the work if it
      must be queued into another worker.  This is needed, for example, when the
      worker is stopped and started again.  It is a bit inconvenient.  But it
      looks like a good compromise between the stability and complexity.
      
      I have double checked all existing users of the kthread worker API and
      they all seems to initialize the work after the worker gets started.
      
      Just for completeness, the patch adds a check that the work is not already
      in a queue.
      
      The patch also puts all the checks into a separate function.  It will be
      reused when implementing delayed works.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-8-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8197b3d4
    • P
      kthread: add kthread_destroy_worker() · 35033fe9
      Petr Mladek 提交于
      The current kthread worker users call flush() and stop() explicitly.
      This function does the same plus it frees the kthread_worker struct
      in one call.
      
      It is supposed to be used together with kthread_create_worker*() that
      allocates struct kthread_worker.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-7-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      35033fe9
    • P
      kthread: add kthread_create_worker*() · fbae2d44
      Petr Mladek 提交于
      Kthread workers are currently created using the classic kthread API,
      namely kthread_run().  kthread_worker_fn() is passed as the @threadfn
      parameter.
      
      This patch defines kthread_create_worker() and
      kthread_create_worker_on_cpu() functions that hide implementation details.
      
      They enforce using kthread_worker_fn() for the main thread.  But I doubt
      that there are any plans to create any alternative.  In fact, I think that
      we do not want any alternative main thread because it would be hard to
      support consistency with the rest of the kthread worker API.
      
      The naming and function of kthread_create_worker() is inspired by the
      workqueues API like the rest of the kthread worker API.
      
      The kthread_create_worker_on_cpu() variant is motivated by the original
      kthread_create_on_cpu().  Note that we need to bind per-CPU kthread
      workers already when they are created.  It makes the life easier.
      kthread_bind() could not be used later for an already running worker.
      
      This patch does _not_ convert existing kthread workers.  The kthread
      worker API need more improvements first, e.g.  a function to destroy the
      worker.
      
      IMPORTANT:
      
      kthread_create_worker_on_cpu() allows to use any format of the worker
      name, in compare with kthread_create_on_cpu().  The good thing is that it
      is more generic.  The bad thing is that most users will need to pass the
      cpu number in two parameters, e.g.  kthread_create_worker_on_cpu(cpu,
      "helper/%d", cpu).
      
      To be honest, the main motivation was to avoid the need for an empty
      va_list.  The only legal way was to create a helper function that would be
      called with an empty list.  Other attempts caused compilation warnings or
      even errors on different architectures.
      
      There were also other alternatives, for example, using #define or
      splitting __kthread_create_worker().  The used solution looked like the
      least ugly.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-6-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fbae2d44
    • P
      kthread: allow to call __kthread_create_on_node() with va_list args · 255451e4
      Petr Mladek 提交于
      kthread_create_on_node() implements a bunch of logic to create the
      kthread.  It is already called by kthread_create_on_cpu().
      
      We are going to extend the kthread worker API and will need to call
      kthread_create_on_node() with va_list args there.
      
      This patch does only a refactoring and does not modify the existing
      behavior.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-5-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      255451e4
    • P
      kthread/smpboot: do not park in kthread_create_on_cpu() · a65d4096
      Petr Mladek 提交于
      kthread_create_on_cpu() was added by the commit 2a1d4460
      ("kthread: Implement park/unpark facility").  It is currently used only
      when enabling new CPU.  For this purpose, the newly created kthread has to
      be parked.
      
      The CPU binding is a bit tricky.  The kthread is parked when the CPU has
      not been allowed yet.  And the CPU is bound when the kthread is unparked.
      
      The function would be useful for more per-CPU kthreads, e.g.
      bnx2fc_thread, fcoethread.  For this purpose, the newly created kthread
      should stay in the uninterruptible state.
      
      This patch moves the parking into smpboot.  It binds the thread already
      when created.  Then the function might be used universally.  Also the
      behavior is consistent with kthread_create() and kthread_create_on_node().
      
      Link: http://lkml.kernel.org/r/1470754545-17632-4-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a65d4096
    • P
      kthread: kthread worker API cleanup · 3989144f
      Petr Mladek 提交于
      A good practice is to prefix the names of functions by the name
      of the subsystem.
      
      The kthread worker API is a mix of classic kthreads and workqueues.  Each
      worker has a dedicated kthread.  It runs a generic function that process
      queued works.  It is implemented as part of the kthread subsystem.
      
      This patch renames the existing kthread worker API to use
      the corresponding name from the workqueues API prefixed by
      kthread_:
      
      __init_kthread_worker()		-> __kthread_init_worker()
      init_kthread_worker()		-> kthread_init_worker()
      init_kthread_work()		-> kthread_init_work()
      insert_kthread_work()		-> kthread_insert_work()
      queue_kthread_work()		-> kthread_queue_work()
      flush_kthread_work()		-> kthread_flush_work()
      flush_kthread_worker()		-> kthread_flush_worker()
      
      Note that the names of DEFINE_KTHREAD_WORK*() macros stay
      as they are. It is common that the "DEFINE_" prefix has
      precedence over the subsystem names.
      
      Note that INIT() macros and init() functions use different
      naming scheme. There is no good solution. There are several
      reasons for this solution:
      
        + "init" in the function names stands for the verb "initialize"
          aka "initialize worker". While "INIT" in the macro names
          stands for the noun "INITIALIZER" aka "worker initializer".
      
        + INIT() macros are used only in DEFINE() macros
      
        + init() functions are used close to the other kthread()
          functions. It looks much better if all the functions
          use the same scheme.
      
        + There will be also kthread_destroy_worker() that will
          be used close to kthread_cancel_work(). It is related
          to the init() function. Again it looks better if all
          functions use the same naming scheme.
      
        + there are several precedents for such init() function
          names, e.g. amd_iommu_init_device(), free_area_init_node(),
          jump_label_init_type(),  regmap_init_mmio_clk(),
      
        + It is not an argument but it was inconsistent even before.
      
      [arnd@arndb.de: fix linux-next merge conflict]
       Link: http://lkml.kernel.org/r/20160908135724.1311726-1-arnd@arndb.de
      Link: http://lkml.kernel.org/r/1470754545-17632-3-git-send-email-pmladek@suse.comSuggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3989144f
    • P
      kthread: rename probe_kthread_data() to kthread_probe_data() · e700591a
      Petr Mladek 提交于
      Patch series "kthread: Kthread worker API improvements"
      
      The intention of this patchset is to make it easier to manipulate and
      maintain kthreads.  Especially, I want to replace all the custom main
      cycles with a generic one.  Also I want to make the kthreads sleep in a
      consistent state in a common place when there is no work.
      
      This patch (of 11):
      
      A good practice is to prefix the names of functions by the name of the
      subsystem.
      
      This patch fixes the name of probe_kthread_data().  The other wrong
      functions names are part of the kthread worker API and will be fixed
      separately.
      
      Link: http://lkml.kernel.org/r/1470754545-17632-2-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e700591a
    • R
      config: android: enable CONFIG_SECCOMP · 2489a177
      Rob Herring 提交于
      As of Android N, SECCOMP is required. Without it, we will get
      mediaextractor error:
      
      E /system/bin/mediaextractor: libminijail: prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER): Invalid argument
      
      Link: http://lkml.kernel.org/r/20160908185934.18098-3-robh@kernel.orgSigned-off-by: NRob Herring <robh@kernel.org>
      Acked-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Amit Pundir <amit.pundir@linaro.org>
      Cc: Dmitry Shmidt <dimitrysh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2489a177
    • R
      config: android: set SELinux as default security mode · d90ae51a
      Rob Herring 提交于
      Android won't boot without SELinux enabled, so make it the default.
      
      Link: http://lkml.kernel.org/r/20160908185934.18098-2-robh@kernel.orgSigned-off-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d90ae51a
    • R
      config: android: move device mapper options to recommended · f023a395
      Rob Herring 提交于
      CONFIG_MD is in recommended, but other dependent options like DM_CRYPT and
      DM_VERITY options are in base.  The result is the options in base don't
      get enabled when applying both base and recommended fragments.  Move all
      the options to recommended.
      
      Link: http://lkml.kernel.org/r/20160908185934.18098-1-robh@kernel.orgSigned-off-by: NRob Herring <robh@kernel.org>
      Acked-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Amit Pundir <amit.pundir@linaro.org>
      Cc: Dmitry Shmidt <dimitrysh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f023a395
    • B
      config/android: Remove CONFIG_IPV6_PRIVACY · a2c6a235
      Borislav Petkov 提交于
      Option is long gone, see commit 5d9efa7e ("ipv6: Remove privacy
      config option.")
      
      Link: http://lkml.kernel.org/r/20160811170340.9859-1-bp@alien8.deSigned-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Rob Herring <robh@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2c6a235
    • P
      relay: Use irq_work instead of plain timer for deferred wakeup · 26b5679e
      Peter Zijlstra 提交于
      Relay avoids calling wake_up_interruptible() for doing the wakeup of
      readers/consumers, waiting for the generation of new data, from the
      context of a process which produced the data.  This is apparently done to
      prevent the possibility of a deadlock in case Scheduler itself is is
      generating data for the relay, after acquiring rq->lock.
      
      The following patch used a timer (to be scheduled at next jiffy), for
      delegating the wakeup to another context.
      	commit 7c9cb383
      	Author: Tom Zanussi <zanussi@comcast.net>
      	Date:   Wed May 9 02:34:01 2007 -0700
      
      	relay: use plain timer instead of delayed work
      
      	relay doesn't need to use schedule_delayed_work() for waking readers
      	when a simple timer will do.
      
      Scheduling a plain timer, at next jiffies boundary, to do the wakeup
      causes a significant wakeup latency for the Userspace client, which makes
      relay less suitable for the high-frequency low-payload use cases where the
      data gets generated at a very high rate, like multiple sub buffers getting
      filled within a milli second.  Moreover the timer is re-scheduled on every
      newly produced sub buffer so the timer keeps getting pushed out if sub
      buffers are filled in a very quick succession (less than a jiffy gap
      between filling of 2 sub buffers).  As a result relay runs out of sub
      buffers to store the new data.
      
      By using irq_work it is ensured that wakeup of userspace client, blocked
      in the poll call, is done at earliest (through self IPI or next timer
      tick) enabling it to always consume the data in time.  Also this makes
      relay consistent with printk & ring buffers (trace), as they too use
      irq_work for deferred wake up of readers.
      
      [arnd@arndb.de: select CONFIG_IRQ_WORK]
       Link: http://lkml.kernel.org/r/20160912154035.3222156-1-arnd@arndb.de
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/1472906487-1559-1-git-send-email-akash.goel@intel.comSigned-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAkash Goel <akash.goel@intel.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26b5679e
    • H
      x86/panic: replace smp_send_stop() with kdump friendly version in panic path · 0ee59413
      Hidehiro Kawai 提交于
      Daniel Walker reported problems which happens when
      crash_kexec_post_notifiers kernel option is enabled
      (https://lkml.org/lkml/2015/6/24/44).
      
      In that case, smp_send_stop() is called before entering kdump routines
      which assume other CPUs are still online.  As the result, for x86, kdump
      routines fail to save other CPUs' registers and disable virtualization
      extensions.
      
      To fix this problem, call a new kdump friendly function,
      crash_smp_send_stop(), instead of the smp_send_stop() when
      crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a weak
      function, and it just call smp_send_stop().  Architecture codes should
      override it so that kdump can work appropriately.  This patch only
      provides x86-specific version.
      
      For Xen's PV kernel, just keep the current behavior.
      
      NOTES:
      
      - Right solution would be to place crash_smp_send_stop() before
        __crash_kexec() invocation in all cases and remove smp_send_stop(), but
        we can't do that until all architectures implement own
        crash_smp_send_stop()
      
      - crash_smp_send_stop()-like work is still needed by
        machine_crash_shutdown() because crash_kexec() can be called without
        entering panic()
      
      Fixes: f06e5153 (kernel/panic.c: add "crash_kexec_post_notifiers" option)
      Link: http://lkml.kernel.org/r/20160810080948.11028.15344.stgit@sysi4-13.yrl.intra.hitachi.co.jpSigned-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Reported-by: NDaniel Walker <dwalker@fifo99.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Daniel Walker <dwalker@fifo99.com>
      Cc: Xunlei Pang <xpang@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
      Cc: "Steven J. Hill" <steven.hill@cavium.com>
      Cc: Corey Minyard <cminyard@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ee59413
    • A
      ptrace: clear TIF_SYSCALL_TRACE on ptrace detach · 0a5bf409
      Ales Novak 提交于
      On __ptrace_detach(), called from do_exit()->exit_notify()->
      forget_original_parent()->exit_ptrace(), the TIF_SYSCALL_TRACE in
      thread->flags of the tracee is not cleared up.  This results in the
      tracehook_report_syscall_* being called (though there's no longer a tracer
      listening to that) upon its further syscalls.
      
      Example scenario - attach "strace" to a running process and kill it (the
      strace) with SIGKILL.  You'll see that the syscall trace hooks are still
      being called.
      
      The clearing of this flag should be moved from ptrace_detach() to
      __ptrace_detach().
      
      Link: http://lkml.kernel.org/r/1472759493-20554-1-git-send-email-alnovak@suse.czSigned-off-by: NAles Novak <alnovak@suse.cz>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a5bf409
    • C
      kprobes: include <asm/sections.h> instead of <asm-generic/sections.h> · bfd45be0
      Christoph Hellwig 提交于
      asm-generic headers are generic implementations for architecture specific
      code and should not be included by common code.  Thus use the asm/ version
      of sections.h to get at the linker sections.
      
      Link: http://lkml.kernel.org/r/1473602302-6208-1-git-send-email-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bfd45be0
  9. 11 10月, 2016 3 次提交
    • W
      sched/fair: Fix sched domains NULL dereference in select_idle_sibling() · 9cfb38a7
      Wanpeng Li 提交于
      Commit:
      
        10e2f1ac ("sched/core: Rewrite and improve select_idle_siblings()")
      
      ... improved select_idle_sibling(), but also triggered a regression (crash)
      during CPU-hotplug:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
        IP: [<ffffffffb10cd332>] select_idle_sibling+0x1c2/0x4f0
        Call Trace:
         <IRQ>
          select_task_rq_fair+0x749/0x930
          ? select_task_rq_fair+0xb4/0x930
          ? __lock_is_held+0x54/0x70
          try_to_wake_up+0x19a/0x5b0
          default_wake_function+0x12/0x20
          autoremove_wake_function+0x12/0x40
          __wake_up_common+0x55/0x90
          __wake_up+0x39/0x50
          wake_up_klogd_work_func+0x40/0x60
          irq_work_run_list+0x57/0x80
          irq_work_run+0x2c/0x30
          smp_irq_work_interrupt+0x2e/0x40
          irq_work_interrupt+0x96/0xa0
         <EOI>
          ? _raw_spin_unlock_irqrestore+0x45/0x80
          try_to_wake_up+0x4a/0x5b0
          wake_up_state+0x10/0x20
          __kthread_unpark+0x67/0x70
          kthread_unpark+0x22/0x30
          cpuhp_online_idle+0x3e/0x70
          cpu_startup_entry+0x6a/0x450
          start_secondary+0x154/0x180
      
      This can be reproduced by running the ftrace test case of kselftest, the
      test case will hot-unplug the CPU and the CPU will attach to the NULL
      sched-domain during scheduler teardown.
      
      The step 2 for the rewrite select_idle_siblings():
      
        | Step 2) tracks the average cost of the scan and compares this to the
        | average idle time guestimate for the CPU doing the wakeup.
      
      If the CPU which doing the wakeup is the going hot-unplug CPU, then NULL
      sched domain will be dereferenced to acquire the average cost of the scan.
      
      This patch fix it by failing the search of an idle CPU in the LLC process
      if this sched domain is NULL.
      Tested-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1475971443-3187-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9cfb38a7
    • E
      latent_entropy: Mark functions with __latent_entropy · 0766f788
      Emese Revfy 提交于
      The __latent_entropy gcc attribute can be used only on functions and
      variables.  If it is on a function then the plugin will instrument it for
      gathering control-flow entropy. If the attribute is on a variable then
      the plugin will initialize it with random contents.  The variable must
      be an integer, an integer array type or a structure with integer fields.
      
      These specific functions have been selected because they are init
      functions (to help gather boot-time entropy), are called at unpredictable
      times, or they have variable loops, each of which provide some level of
      latent entropy.
      Signed-off-by: NEmese Revfy <re.emese@gmail.com>
      [kees: expanded commit message]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      0766f788
    • E
      gcc-plugins: Add latent_entropy plugin · 38addce8
      Emese Revfy 提交于
      This adds a new gcc plugin named "latent_entropy". It is designed to
      extract as much possible uncertainty from a running system at boot time as
      possible, hoping to capitalize on any possible variation in CPU operation
      (due to runtime data differences, hardware differences, SMP ordering,
      thermal timing variation, cache behavior, etc).
      
      At the very least, this plugin is a much more comprehensive example for
      how to manipulate kernel code using the gcc plugin internals.
      
      The need for very-early boot entropy tends to be very architecture or
      system design specific, so this plugin is more suited for those sorts
      of special cases. The existing kernel RNG already attempts to extract
      entropy from reliable runtime variation, but this plugin takes the idea to
      a logical extreme by permuting a global variable based on any variation
      in code execution (e.g. a different value (and permutation function)
      is used to permute the global based on loop count, case statement,
      if/then/else branching, etc).
      
      To do this, the plugin starts by inserting a local variable in every
      marked function. The plugin then adds logic so that the value of this
      variable is modified by randomly chosen operations (add, xor and rol) and
      random values (gcc generates separate static values for each location at
      compile time and also injects the stack pointer at runtime). The resulting
      value depends on the control flow path (e.g., loops and branches taken).
      
      Before the function returns, the plugin mixes this local variable into
      the latent_entropy global variable. The value of this global variable
      is added to the kernel entropy pool in do_one_initcall() and _do_fork(),
      though it does not credit any bytes of entropy to the pool; the contents
      of the global are just used to mix the pool.
      
      Additionally, the plugin can pre-initialize arrays with build-time
      random contents, so that two different kernel builds running on identical
      hardware will not have the same starting values.
      Signed-off-by: NEmese Revfy <re.emese@gmail.com>
      [kees: expanded commit message and code comments]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      38addce8
  10. 10 10月, 2016 4 次提交
    • L
      printk: make reading the kernel log flush pending lines · bfd8d3f2
      Linus Torvalds 提交于
      That will mean that any possible subsequent continuation will now be
      broken up onto a line of its own (since reading the log has finalized
      the beginning og the line), but if user space has activated system
      logging (or if there's a kernel message dump going on) that is the right
      thing to do.
      
      And now that we actually get the continuation flags _right_ for this
      all, the user space logger that is reading the kernel messages can
      actually see the continuation marker.  Not that anybody seems to really
      bother with it (or care), but in theory user space can do its own
      message stitching.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bfd8d3f2
    • L
      printk: re-organize log_output() to be more legible · 5e467652
      Linus Torvalds 提交于
      Avoid some duplicate logic now that we can return early, and update the
      comments for the new LOG_CONT world order.
      
      This also stops the continuation flushing from just using random record
      flags for the flushing action, instead taking the flags from the proper
      original line and updating them as we add continuations to it.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e467652
    • L
      printk: split out core logging code into helper function · c362c7ff
      Linus Torvalds 提交于
      The code that actually decides how to log the message (whether to put it
      directly into the record log, whether to append it to an existing
      buffered log, or whether to start a new buffered log) is fairly
      non-obvious code in the middle of the vprintk_emit() function.
      
      Splitting that code up into a helper function makes it easier to
      understand, but perhaps more importantly also allows for the code to
      just return early out of the helper function once it has made the
      decision about where the new log content goes.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c362c7ff
    • L
      printk: reinstate KERN_CONT for printing continuation lines · 4bcc595c
      Linus Torvalds 提交于
      Long long ago the kernel log buffer was a buffered stream of bytes, very
      much like stdio in user space.  It supported log levels by scanning the
      stream and noticing the log level markers at the beginning of each line,
      but if you wanted to print a partial line in multiple chunks, you just
      did multiple printk() calls, and it just automatically worked.
      
      Except when it didn't, and you had very confusing output when different
      lines got all mixed up with each other.  Then you got fragment lines
      mixing with each other, or with non-fragment lines, because it was
      traditionally impossible to tell whether a printk() call was a
      continuation or not.
      
      To at least help clarify the issue of continuation lines, we added a
      KERN_CONT marker back in 2007 to mark continuation lines:
      
        47492527 ("printk: add KERN_CONT annotation").
      
      That continuation marker was initially an empty string, and didn't
      actuall make any semantic difference.  But it at least made it possible
      to annotate the source code, and have check-patch notice that a printk()
      didn't need or want a log level marker, because it was a continuation of
      a previous line.
      
      To avoid the ambiguity between a continuation line that had that
      KERN_CONT marker, and a printk with no level information at all, we then
      in 2009 made KERN_CONT be a real log level marker which meant that we
      could now reliably tell the difference between the two cases.
      
        5fd29d6c ("printk: clean up handling of log-levels and newlines")
      
      and we could take advantage of that to make sure we didn't mix up
      continuation lines with lines that just didn't have any loglevel at all.
      
      Then, in 2012, the kernel log buffer was changed to be a "record" based
      log, where each line was a record that has a loglevel and a timestamp.
      
      You can see the beginning of that conversion in commits
      
        e11fea92 ("kmsg: export printk records to the /dev/kmsg interface")
        7ff9554b ("printk: convert byte-buffer to variable-length record buffer")
      
      with a number of follow-up commits to fix some painful fallout from that
      conversion.  Over all, it took a couple of months to sort out most of
      it.  But the upside was that you could have concurrent readers (and
      writers) of the kernel log and not have lines with mixed output in them.
      
      And one particular pain-point for the record-based kernel logging was
      exactly the fragmentary lines that are generated in smaller chunks.  In
      order to still log them as one recrod, the continuation lines need to be
      attached to the previous record properly.
      
      However the explicit continuation record marker that is actually useful
      for this exact case was actually removed in aroundm the same time by commit
      
        61e99ab8 ("printk: remove the now unnecessary "C" annotation for KERN_CONT")
      
      due to the incorrect belief that KERN_CONT wasn't meaningful.  The
      ambiguity between "is this a continuation line" or "is this a plain
      printk with no log level information" was reintroduced, and in fact
      became an even bigger pain point because there was now the whole
      record-level merging of kernel messages going on.
      
      This patch reinstates the KERN_CONT as a real non-empty string marker,
      so that the ambiguity is fixed once again.
      
      But it's not a plain revert of that original removal: in the four years
      since we made KERN_CONT an empty string again, not only has the format
      of the log level markers changed, we've also had some usage changes in
      this area.
      
      For example, some ACPI code seems to use KERN_CONT _together_ with a log
      level, and now uses both the KERN_CONT marker and (for example) a
      KERN_INFO marker to show that it's an informational continuation of a
      line.
      
      Which is actually not a bad idea - if the continuation line cannot be
      attached to its predecessor, without the log level information we don't
      know what log level to assign to it (and we traditionally just assigned
      it the default loglevel).  So having both a log level and the KERN_CONT
      marker is not necessarily a bad idea, but it does mean that we need to
      actually iterate over potentially multiple markers, rather than just a
      single one.
      
      Also, since KERN_CONT was still conceptually needed, and encouraged, but
      didn't actually _do_ anything, we've also had the reverse problem:
      rather than having too many annotations it has too few, and there is bit
      rot with code that no longer marks the continuation lines with the
      KERN_CONT marker.
      
      So this patch not only re-instates the non-empty KERN_CONT marker, it
      also fixes up the cases of bit-rot I noticed in my own logs.
      
      There are probably other cases where KERN_CONT will be needed to be
      added, either because it is new code that never dealt with the need for
      KERN_CONT, or old code that has bitrotted without anybody noticing.
      
      That said, we should strive to avoid the need for KERN_CONT.  It does
      result in real problems for logging, and should generally not be seen as
      a good feature.  If we some day can get rid of the feature entirely,
      because nobody does any fragmented printk calls, that would be lovely.
      
      But until that point, let's at mark the code that relies on the hacky
      multi-fragment kernel printk's.  Not only does it avoid the ambiguity,
      it also annotates code as "maybe this would be good to fix some day".
      
      (That said, particularly during single-threaded bootup, the downsides of
      KERN_CONT are very limited.  Things get much hairier when you have
      multiple threads going on and user level reading and writing logs too).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4bcc595c
  11. 08 10月, 2016 4 次提交
    • P
      console: don't prefer first registered if DT specifies stdout-path · 05fd007e
      Paul Burton 提交于
      If a device tree specifies a preferred device for kernel console output
      via the stdout-path or linux,stdout-path chosen node properties or the
      stdout alias then the kernel ought to honor it & output the kernel
      console to that device.  As it stands, this isn't the case.  Whilst we
      parse the stdout-path properties & set an of_stdout variable from
      of_alias_scan(), and use that from of_console_check() to determine
      whether to add a console device as a preferred console whilst
      registering it, we also prefer the first registered console if no other
      has been selected at the time of its registration.
      
      This means that if a console other than the one the device tree selects
      via stdout-path is registered first, we will switch to using it & when
      the stdout-path console is later registered the call to
      add_preferred_console() via of_console_check() is too late to do
      anything useful.  In practice this seems to mean that we switch to the
      dummy console device fairly early & see no further console output:
      
          Console: colour dummy device 80x25
          console [tty0] enabled
          bootconsole [ns16550a0] disabled
      
      Fix this by not automatically preferring the first registered console if
      one is specified by the device tree.  This allows consoles to be
      registered but not enabled, and once the driver for the console selected
      by stdout-path calls of_console_check() the driver will be added to the
      list of preferred consoles before any other console has been enabled.
      When that console is then registered via register_console() it will be
      enabled as expected.
      
      Link: http://lkml.kernel.org/r/20160809151937.26118-1-paul.burton@imgtec.comSigned-off-by: NPaul Burton <paul.burton@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Ivan Delalande <colona@arista.com>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jan Kara <jack@suse.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05fd007e
    • A
      cred: simpler, 1D supplementary groups · 81243eac
      Alexey Dobriyan 提交于
      Current supplementary groups code can massively overallocate memory and
      is implemented in a way so that access to individual gid is done via 2D
      array.
      
      If number of gids is <= 32, memory allocation is more or less tolerable
      (140/148 bytes).  But if it is not, code allocates full page (!)
      regardless and, what's even more fun, doesn't reuse small 32-entry
      array.
      
      2D array means dependent shifts, loads and LEAs without possibility to
      optimize them (gid is never known at compile time).
      
      All of the above is unnecessary.  Switch to the usual
      trailing-zero-len-array scheme.  Memory is allocated with
      kmalloc/vmalloc() and only as much as needed.  Accesses become simpler
      (LEA 8(gi,idx,4) or even without displacement).
      
      Maximum number of gids is 65536 which translates to 256KB+8 bytes.  I
      think kernel can handle such allocation.
      
      On my usual desktop system with whole 9 (nine) aux groups, struct
      group_info shrinks from 148 bytes to 44 bytes, yay!
      
      Nice side effects:
      
       - "gi->gid[i]" is shorter than "GROUP_AT(gi, i)", less typing,
      
       - fix little mess in net/ipv4/ping.c
         should have been using GROUP_AT macro but this point becomes moot,
      
       - aux group allocation is persistent and should be accounted as such.
      
      Link: http://lkml.kernel.org/r/20160817201927.GA2096@p183.telecom.bySigned-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Vasily Kulikov <segoon@openwall.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81243eac
    • C
      nmi_backtrace: generate one-line reports for idle cpus · 6727ad9e
      Chris Metcalf 提交于
      When doing an nmi backtrace of many cores, most of which are idle, the
      output is a little overwhelming and very uninformative.  Suppress
      messages for cpus that are idling when they are interrupted and just
      emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
      
      We do this by grouping all the cpuidle code together into a new
      .cpuidle.text section, and then checking the address of the interrupted
      PC to see if it lies within that section.
      
      This commit suitably tags x86 and tile idle routines, and only adds in
      the minimal framework for other architectures.
      
      Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.comSigned-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
      Tested-by: NPetr Mladek <pmladek@suse.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6727ad9e
    • A
      thp: reduce usage of huge zero page's atomic counter · 6fcb52a5
      Aaron Lu 提交于
      The global zero page is used to satisfy an anonymous read fault.  If
      THP(Transparent HugePage) is enabled then the global huge zero page is
      used.  The global huge zero page uses an atomic counter for reference
      counting and is allocated/freed dynamically according to its counter
      value.
      
      CPU time spent on that counter will greatly increase if there are a lot
      of processes doing anonymous read faults.  This patch proposes a way to
      reduce the access to the global counter so that the CPU load can be
      reduced accordingly.
      
      To do this, a new flag of the mm_struct is introduced:
      MMF_USED_HUGE_ZERO_PAGE.  With this flag, the process only need to touch
      the global counter in two cases:
      
       1 The first time it uses the global huge zero page;
       2 The time when mm_user of its mm_struct reaches zero.
      
      Note that right now, the huge zero page is eligible to be freed as soon
      as its last use goes away.  With this patch, the page will not be
      eligible to be freed until the exit of the last process from which it
      was ever used.
      
      And with the use of mm_user, the kthread is not eligible to use huge
      zero page either.  Since no kthread is using huge zero page today, there
      is no difference after applying this patch.  But if that is not desired,
      I can change it to when mm_count reaches zero.
      
      Case used for test on Haswell EP:
      
        usemem -n 72 --readonly -j 0x200000 100G
      
      Which spawns 72 processes and each will mmap 100G anonymous space and
      then do read only access to that space sequentially with a step of 2MB.
      
        CPU cycles from perf report for base commit:
            54.03%  usemem   [kernel.kallsyms]   [k] get_huge_zero_page
        CPU cycles from perf report for this commit:
             0.11%  usemem   [kernel.kallsyms]   [k] mm_get_huge_zero_page
      
      Performance(throughput) of the workload for base commit: 1784430792
      Performance(throughput) of the workload for this commit: 4726928591
      164% increase.
      
      Runtime of the workload for base commit: 707592 us
      Runtime of the workload for this commit: 303970 us
      50% drop.
      
      Link: http://lkml.kernel.org/r/fe51a88f-446a-4622-1363-ad1282d71385@intel.comSigned-off-by: NAaron Lu <aaron.lu@intel.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6fcb52a5