1. 18 11月, 2016 1 次提交
  2. 10 11月, 2016 1 次提交
  3. 20 10月, 2016 1 次提交
    • L
      printk: suppress empty continuation lines · 8835ca59
      Linus Torvalds 提交于
      We have a fairly common pattern where you print several things as
      continuations on one single line in a loop, and then at the end you do
      
      	printk(KERN_CONT "\n");
      
      to flush the buffered output.
      
      But if the output was flushed by something else (concurrent printk
      activity, or just system logging), we don't want that final flushing to
      just print an empty line.
      
      So just suppress empty continuation lines when they couldn't be merged
      into the line they are a continuation of.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8835ca59
  4. 10 10月, 2016 4 次提交
    • L
      printk: make reading the kernel log flush pending lines · bfd8d3f2
      Linus Torvalds 提交于
      That will mean that any possible subsequent continuation will now be
      broken up onto a line of its own (since reading the log has finalized
      the beginning og the line), but if user space has activated system
      logging (or if there's a kernel message dump going on) that is the right
      thing to do.
      
      And now that we actually get the continuation flags _right_ for this
      all, the user space logger that is reading the kernel messages can
      actually see the continuation marker.  Not that anybody seems to really
      bother with it (or care), but in theory user space can do its own
      message stitching.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bfd8d3f2
    • L
      printk: re-organize log_output() to be more legible · 5e467652
      Linus Torvalds 提交于
      Avoid some duplicate logic now that we can return early, and update the
      comments for the new LOG_CONT world order.
      
      This also stops the continuation flushing from just using random record
      flags for the flushing action, instead taking the flags from the proper
      original line and updating them as we add continuations to it.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e467652
    • L
      printk: split out core logging code into helper function · c362c7ff
      Linus Torvalds 提交于
      The code that actually decides how to log the message (whether to put it
      directly into the record log, whether to append it to an existing
      buffered log, or whether to start a new buffered log) is fairly
      non-obvious code in the middle of the vprintk_emit() function.
      
      Splitting that code up into a helper function makes it easier to
      understand, but perhaps more importantly also allows for the code to
      just return early out of the helper function once it has made the
      decision about where the new log content goes.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c362c7ff
    • L
      printk: reinstate KERN_CONT for printing continuation lines · 4bcc595c
      Linus Torvalds 提交于
      Long long ago the kernel log buffer was a buffered stream of bytes, very
      much like stdio in user space.  It supported log levels by scanning the
      stream and noticing the log level markers at the beginning of each line,
      but if you wanted to print a partial line in multiple chunks, you just
      did multiple printk() calls, and it just automatically worked.
      
      Except when it didn't, and you had very confusing output when different
      lines got all mixed up with each other.  Then you got fragment lines
      mixing with each other, or with non-fragment lines, because it was
      traditionally impossible to tell whether a printk() call was a
      continuation or not.
      
      To at least help clarify the issue of continuation lines, we added a
      KERN_CONT marker back in 2007 to mark continuation lines:
      
        47492527 ("printk: add KERN_CONT annotation").
      
      That continuation marker was initially an empty string, and didn't
      actuall make any semantic difference.  But it at least made it possible
      to annotate the source code, and have check-patch notice that a printk()
      didn't need or want a log level marker, because it was a continuation of
      a previous line.
      
      To avoid the ambiguity between a continuation line that had that
      KERN_CONT marker, and a printk with no level information at all, we then
      in 2009 made KERN_CONT be a real log level marker which meant that we
      could now reliably tell the difference between the two cases.
      
        5fd29d6c ("printk: clean up handling of log-levels and newlines")
      
      and we could take advantage of that to make sure we didn't mix up
      continuation lines with lines that just didn't have any loglevel at all.
      
      Then, in 2012, the kernel log buffer was changed to be a "record" based
      log, where each line was a record that has a loglevel and a timestamp.
      
      You can see the beginning of that conversion in commits
      
        e11fea92 ("kmsg: export printk records to the /dev/kmsg interface")
        7ff9554b ("printk: convert byte-buffer to variable-length record buffer")
      
      with a number of follow-up commits to fix some painful fallout from that
      conversion.  Over all, it took a couple of months to sort out most of
      it.  But the upside was that you could have concurrent readers (and
      writers) of the kernel log and not have lines with mixed output in them.
      
      And one particular pain-point for the record-based kernel logging was
      exactly the fragmentary lines that are generated in smaller chunks.  In
      order to still log them as one recrod, the continuation lines need to be
      attached to the previous record properly.
      
      However the explicit continuation record marker that is actually useful
      for this exact case was actually removed in aroundm the same time by commit
      
        61e99ab8 ("printk: remove the now unnecessary "C" annotation for KERN_CONT")
      
      due to the incorrect belief that KERN_CONT wasn't meaningful.  The
      ambiguity between "is this a continuation line" or "is this a plain
      printk with no log level information" was reintroduced, and in fact
      became an even bigger pain point because there was now the whole
      record-level merging of kernel messages going on.
      
      This patch reinstates the KERN_CONT as a real non-empty string marker,
      so that the ambiguity is fixed once again.
      
      But it's not a plain revert of that original removal: in the four years
      since we made KERN_CONT an empty string again, not only has the format
      of the log level markers changed, we've also had some usage changes in
      this area.
      
      For example, some ACPI code seems to use KERN_CONT _together_ with a log
      level, and now uses both the KERN_CONT marker and (for example) a
      KERN_INFO marker to show that it's an informational continuation of a
      line.
      
      Which is actually not a bad idea - if the continuation line cannot be
      attached to its predecessor, without the log level information we don't
      know what log level to assign to it (and we traditionally just assigned
      it the default loglevel).  So having both a log level and the KERN_CONT
      marker is not necessarily a bad idea, but it does mean that we need to
      actually iterate over potentially multiple markers, rather than just a
      single one.
      
      Also, since KERN_CONT was still conceptually needed, and encouraged, but
      didn't actually _do_ anything, we've also had the reverse problem:
      rather than having too many annotations it has too few, and there is bit
      rot with code that no longer marks the continuation lines with the
      KERN_CONT marker.
      
      So this patch not only re-instates the non-empty KERN_CONT marker, it
      also fixes up the cases of bit-rot I noticed in my own logs.
      
      There are probably other cases where KERN_CONT will be needed to be
      added, either because it is new code that never dealt with the need for
      KERN_CONT, or old code that has bitrotted without anybody noticing.
      
      That said, we should strive to avoid the need for KERN_CONT.  It does
      result in real problems for logging, and should generally not be seen as
      a good feature.  If we some day can get rid of the feature entirely,
      because nobody does any fragmented printk calls, that would be lovely.
      
      But until that point, let's at mark the code that relies on the hacky
      multi-fragment kernel printk's.  Not only does it avoid the ambiguity,
      it also annotates code as "maybe this would be good to fix some day".
      
      (That said, particularly during single-threaded bootup, the downsides of
      KERN_CONT are very limited.  Things get much hairier when you have
      multiple threads going on and user level reading and writing logs too).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4bcc595c
  5. 08 10月, 2016 1 次提交
    • P
      console: don't prefer first registered if DT specifies stdout-path · 05fd007e
      Paul Burton 提交于
      If a device tree specifies a preferred device for kernel console output
      via the stdout-path or linux,stdout-path chosen node properties or the
      stdout alias then the kernel ought to honor it & output the kernel
      console to that device.  As it stands, this isn't the case.  Whilst we
      parse the stdout-path properties & set an of_stdout variable from
      of_alias_scan(), and use that from of_console_check() to determine
      whether to add a console device as a preferred console whilst
      registering it, we also prefer the first registered console if no other
      has been selected at the time of its registration.
      
      This means that if a console other than the one the device tree selects
      via stdout-path is registered first, we will switch to using it & when
      the stdout-path console is later registered the call to
      add_preferred_console() via of_console_check() is too late to do
      anything useful.  In practice this seems to mean that we switch to the
      dummy console device fairly early & see no further console output:
      
          Console: colour dummy device 80x25
          console [tty0] enabled
          bootconsole [ns16550a0] disabled
      
      Fix this by not automatically preferring the first registered console if
      one is specified by the device tree.  This allows consoles to be
      registered but not enabled, and once the driver for the console selected
      by stdout-path calls of_console_check() the driver will be added to the
      list of preferred consoles before any other console has been enabled.
      When that console is then registered via register_console() it will be
      enabled as expected.
      
      Link: http://lkml.kernel.org/r/20160809151937.26118-1-paul.burton@imgtec.comSigned-off-by: NPaul Burton <paul.burton@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Ivan Delalande <colona@arista.com>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jan Kara <jack@suse.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05fd007e
  6. 10 8月, 2016 1 次提交
    • L
      Revert "printk: create pr_<level> functions" · a0cba217
      Linus Torvalds 提交于
      This reverts commit 874f9c7d.
      
      Geert Uytterhoeven reports:
       "This change seems to have an (unintendent?) side-effect.
      
        Before, pr_*() calls without a trailing newline characters would be
        printed with a newline character appended, both on the console and in
        the output of the dmesg command.
      
        After this commit, no new line character is appended, and the output
        of the next pr_*() call of the same type may be appended, like in:
      
          - Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000
          - Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)
          + Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)"
      
      Joe Perches says:
       "No, that is not intentional.
      
        The newline handling code inside vprintk_emit is a bit involved and
        for now I suggest a revert until this has all the same behavior as
        earlier"
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Requested-by: NJoe Perches <joe@perches.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0cba217
  7. 09 8月, 2016 1 次提交
  8. 03 8月, 2016 5 次提交
  9. 29 7月, 2016 1 次提交
  10. 21 5月, 2016 3 次提交
    • P
      printk/nmi: flush NMI messages on the system panic · cf9b1106
      Petr Mladek 提交于
      In NMI context, printk() messages are stored into per-CPU buffers to
      avoid a possible deadlock.  They are normally flushed to the main ring
      buffer via an IRQ work.  But the work is never called when the system
      calls panic() in the very same NMI handler.
      
      This patch tries to flush NMI buffers before the crash dump is
      generated.  In this case it does not risk a double release and bails out
      when the logbuf_lock is already taken.  The aim is to get the messages
      into the main ring buffer when possible.  It makes them better
      accessible in the vmcore.
      
      Then the patch tries to flush the buffers second time when other CPUs
      are down.  It might be more aggressive and reset logbuf_lock.  The aim
      is to get the messages available for the consequent kmsg_dump() and
      console_flush_on_panic() calls.
      
      The patch causes vprintk_emit() to be called even in NMI context again.
      But it is done via printk_deferred() so that the console handling is
      skipped.  Consoles use internal locks and we could not prevent a
      deadlock easily.  They are explicitly called later when the crash dump
      is not generated, see console_flush_on_panic().
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Kosina <jkosina@suse.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf9b1106
    • P
      printk/nmi: warn when some message has been lost in NMI context · b522deab
      Petr Mladek 提交于
      We could not resize the temporary buffer in NMI context.  Let's warn if
      a message is lost.
      
      This is rather theoretical.  printk() should not be used in NMI.  The
      only sensible use is when we want to print backtrace from all CPUs.  The
      current buffer should be enough for this purpose.
      
      [akpm@linux-foundation.org: whitespace fixlet]
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Cc: Jiri Kosina <jkosina@suse.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b522deab
    • P
      printk/nmi: generic solution for safe printk in NMI · 42a0bb3f
      Petr Mladek 提交于
      printk() takes some locks and could not be used a safe way in NMI
      context.
      
      The chance of a deadlock is real especially when printing stacks from
      all CPUs.  This particular problem has been addressed on x86 by the
      commit a9edc880 ("x86/nmi: Perform a safe NMI stack trace on all
      CPUs").
      
      The patchset brings two big advantages.  First, it makes the NMI
      backtraces safe on all architectures for free.  Second, it makes all NMI
      messages almost safe on all architectures (the temporary buffer is
      limited.  We still should keep the number of messages in NMI context at
      minimum).
      
      Note that there already are several messages printed in NMI context:
      WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
      handlers.  These are not easy to avoid.
      
      This patch reuses most of the code and makes it generic.  It is useful
      for all messages and architectures that support NMI.
      
      The alternative printk_func is set when entering and is reseted when
      leaving NMI context.  It queues IRQ work to copy the messages into the
      main ring buffer in a safe context.
      
      __printk_nmi_flush() copies all available messages and reset the buffer.
      Then we could use a simple cmpxchg operations to get synchronized with
      writers.  There is also used a spinlock to get synchronized with other
      flushers.
      
      We do not longer use seq_buf because it depends on external lock.  It
      would be hard to make all supported operations safe for a lockless use.
      It would be confusing and error prone to make only some operations safe.
      
      The code is put into separate printk/nmi.c as suggested by Steven
      Rostedt.  It needs a per-CPU buffer and is compiled only on
      architectures that call nmi_enter().  This is achieved by the new
      HAVE_NMI Kconfig flag.
      
      The are MN10300 and Xtensa architectures.  We need to clean up NMI
      handling there first.  Let's do it separately.
      
      The patch is heavily based on the draft from Peter Zijlstra, see
      
        https://lkml.org/lkml/2015/6/10/327
      
      [arnd@arndb.de: printk-nmi: use %zu format string for size_t]
      [akpm@linux-foundation.org: min_t->min - all types are size_t here]
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Jan Kara <jack@suse.cz>
      Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>	[arm part]
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Cc: Jiri Kosina <jkosina@suse.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42a0bb3f
  11. 18 3月, 2016 4 次提交
    • I
      printk: add clear_idx symbol to vmcoreinfo · f468908b
      Ivan Delalande 提交于
      This allows us to extract from the vmcore only the messages emitted
      since the last time the ring buffer was cleared.  We just have to make
      sure its value is always up-to-date, when old messages are discarded to
      free space in log_make_free_space() for example.
      Signed-off-by: NZeyu Zhao <zzy8200@gmail.com>
      Signed-off-by: NIvan Delalande <colona@arista.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f468908b
    • S
      printk: check CON_ENABLED in have_callable_console() · adaf6590
      Sergey Senozhatsky 提交于
      have_callable_console() must also test CON_ENABLED bit, not just
      CON_ANYTIME.  We may have disabled CON_ANYTIME console so printk can
      wrongly assume that it's safe to call_console_drivers().
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Kyle McMartin <kyle@kernel.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Calvin Owens <calvinowens@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      adaf6590
    • S
      printk: set may_schedule for some of console_trylock() callers · 6b97a20d
      Sergey Senozhatsky 提交于
      console_unlock() allows to cond_resched() if its caller has set
      `console_may_schedule' to 1, since 8d91f8b1 ("printk: do
      cond_resched() between lines while outputting to consoles").
      
      The rules are:
      -- console_lock() always sets `console_may_schedule' to 1
      -- console_trylock() always sets `console_may_schedule' to 0
      
      However, console_trylock() callers (among them is printk()) do not
      always call printk() from atomic contexts, and some of them can
      cond_resched() in console_unlock(), so console_trylock() can set
      `console_may_schedule' to 1 for such processes.
      
      For !CONFIG_PREEMPT_COUNT kernels, however, console_trylock() always
      sets `console_may_schedule' to 0.
      
      It's possible to drop explicit preempt_disable()/preempt_enable() in
      vprintk_emit(), because console_unlock() and console_trylock() are now
      smart enough:
       a) console_unlock() does not cond_resched() when it's unsafe
          (console_trylock() takes care of that)
       b) console_unlock() does can_use_console() check.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Kyle McMartin <kyle@kernel.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Calvin Owens <calvinowens@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b97a20d
    • S
      printk: move can_use_console() out of console_trylock_for_printk() · a8199371
      Sergey Senozhatsky 提交于
      console_unlock() allows to cond_resched() if its caller has set
      `console_may_schedule' to 1 (this functionality is present since
      8d91f8b1 ("printk: do cond_resched() between lines while outputting
      to consoles").
      
      The rules are:
      -- console_lock() always sets `console_may_schedule' to 1
      -- console_trylock() always sets `console_may_schedule' to 0
      
      printk() calls console_unlock() with preemption desabled, which
      basically can lead to RCU stalls, watchdog soft lockups, etc.  if
      something is simultaneously calling printk() frequent enough (IOW,
      console_sem owner always has new data to send to console divers and
      can't leave console_unlock() for a long time).
      
      printk()->console_trylock() callers do not necessarily execute in atomic
      contexts, and some of them can cond_resched() in console_unlock().
      console_trylock() can set `console_may_schedule' to 1 (allow
      cond_resched() later in consoe_unlock()) when it's safe.
      
      This patch (of 3):
      
      vprintk_emit() disables preemption around console_trylock_for_printk()
      and console_unlock() calls for a strong reason -- can_use_console()
      check.  The thing is that vprintl_emit() can be called on a CPU that is
      not fully brought up yet (!cpu_online()), which potentially can cause
      problems if console driver wants to access per-cpu data.  A console
      driver can explicitly state that it's safe to call it from !online cpu
      by setting CON_ANYTIME bit in console ->flags.  That's why for
      !cpu_online() can_use_console() iterates all the console to find out if
      there is a CON_ANYTIME console, otherwise console_unlock() must be
      avoided.
      
      can_use_console() ensures that console_unlock() call is safe in
      vprintk_emit() only; console_lock() and console_trylock() are not
      covered by this check.  Even though call_console_drivers(), invoked from
      console_cont_flush() and console_unlock(), tests `!cpu_online() &&
      CON_ANYTIME' for_each_console(), it may be too late, which can result in
      messages loss.
      
      Assume that we have 2 cpus -- CPU0 is online, CPU1 is !online, and no
      CON_ANYTIME consoles available.
      
      CPU0 online                        CPU1 !online
                                       console_trylock()
                                       ...
                                       console_unlock()
                                         console_cont_flush
                                           spin_lock logbuf_lock
                                           if (!cont.len) {
                                              spin_unlock logbuf_lock
                                              return
                                           }
                                         for (;;) {
      vprintk_emit
        spin_lock logbuf_lock
        log_store
        spin_unlock logbuf_lock
                                           spin_lock logbuf_lock
        !console_trylock_for_printk        msg_print_text
       return                              console_idx = log_next()
                                           console_seq++
                                           console_prev = msg->flags
                                           spin_unlock logbuf_lock
      
                                           call_console_drivers()
                                             for_each_console(con) {
                                               if (!cpu_online() &&
                                                   !(con->flags & CON_ANYTIME))
                                                       continue;
                                               }
                                         /*
                                          * no message printed, we lost it
                                          */
      vprintk_emit
        spin_lock logbuf_lock
        log_store
        spin_unlock logbuf_lock
        !console_trylock_for_printk
       return
                                         /*
                                          * go to the beginning of the loop,
                                          * find out there are new messages,
                                          * lose it
                                          */
                                         }
      
      console_trylock()/console_lock() call on CPU1 may come from cpu
      notifiers registered on that CPU.  Since notifiers are not getting
      unregistered when CPU is going DOWN, all of the notifiers receive
      notifications during CPU UP.  For example, on my x86_64, I see around 50
      notification sent from offline CPU to itself
      
       [swapper/2] from cpu:2 to:2 action:CPU_STARTING hotplug_hrtick
       [swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_main_cpu_notify
       [swapper/2] from cpu:2 to:2 action:CPU_STARTING blk_mq_queue_reinit_notify
       [swapper/2] from cpu:2 to:2 action:CPU_STARTING console_cpu_notify
      
      while doing
        echo 0 > /sys/devices/system/cpu/cpu2/online
        echo 1 > /sys/devices/system/cpu/cpu2/online
      
      So grabbing the console_sem lock while CPU is !online is possible,
      in theory.
      
      This patch moves can_use_console() check out of
      console_trylock_for_printk().  Instead it calls it in console_unlock(),
      so now console_lock()/console_unlock() are also 'protected' by
      can_use_console().  This also means that console_trylock_for_printk() is
      not really needed anymore and can be removed.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Kyle McMartin <kyle@kernel.org>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Calvin Owens <calvinowens@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8199371
  12. 21 1月, 2016 1 次提交
    • A
      kernel: printk: specify alignment for struct printk_log · 5c9cf8af
      Andrey Ryabinin 提交于
      On architectures that have support for efficient unaligned access struct
      printk_log has 4-byte alignment.  Specify alignment attribute in type
      declaration.
      
      The whole point of this patch is to fix deadlock which happening when
      UBSAN detects unaligned access in printk() thus UBSAN recursively calls
      printk() with logbuf_lock held by top printk() call.
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yury Gribov <y.gribov@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c9cf8af
  13. 17 1月, 2016 3 次提交
    • S
      printk: change recursion_bug type to bool · 06b031de
      Sergey Senozhatsky 提交于
      `recursion_bug' is used as recursion_bug toggle, so make it `bool'.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      06b031de
    • T
      printk: do cond_resched() between lines while outputting to consoles · 8d91f8b1
      Tejun Heo 提交于
      @console_may_schedule tracks whether console_sem was acquired through
      lock or trylock.  If the former, we're inside a sleepable context and
      console_conditional_schedule() performs cond_resched().  This allows
      console drivers which use console_lock for synchronization to yield
      while performing time-consuming operations such as scrolling.
      
      However, the actual console outputting is performed while holding
      irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
      before starting outputting lines.  Also, only a few drivers call
      console_conditional_schedule() to begin with.  This means that when a
      lot of lines need to be output by console_unlock(), for example on a
      console registration, the task doing console_unlock() may not yield for
      a long time on a non-preemptible kernel.
      
      If this happens with a slow console devices, for example a serial
      console, the outputting task may occupy the cpu for a very long time.
      Long enough to trigger softlockup and/or RCU stall warnings, which in
      turn pile more messages, sometimes enough to trigger the next cycle of
      warnings incapacitating the system.
      
      Fix it by making console_unlock() insert cond_resched() between lines if
      @console_may_schedule.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NCalvin Owens <calvinowens@fb.com>
      Acked-by: NJan Kara <jack@suse.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Kyle McMartin <kyle@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d91f8b1
    • T
      printk: only unregister boot consoles when necessary · 81cc26f2
      Thierry Reding 提交于
      Boot consoles are typically replaced by proper consoles during the boot
      process.  This can be problematic if the boot console data is part of
      the init section that is reclaimed late during boot.  If the proper
      console does not register before this point in time, the boot console
      will need to be removed (so that the freed memory is not accessed),
      leaving the system without output for some time.
      
      There are various reasons why the proper console may not register early
      enough, such as deferred probe or the driver being a loadable module.
      If that happens, there is some amount of time where no console messages
      are visible to the user, which in turn can mean that they won't see
      crashes or other potentially useful information.
      
      To avoid this situation, only remove the boot console when it resides in
      the init section.  Code exists to replace the boot console by the proper
      console when it is registered, keeping a seamless transition between the
      boot and proper consoles.
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81cc26f2
  14. 07 11月, 2015 1 次提交
    • M
      printk: prevent userland from spoofing kernel messages · 3824657c
      Mathias Krause 提交于
      The following statement of ABI/testing/dev-kmsg is not quite right:
      
         It is not possible to inject messages from userspace with the
         facility number LOG_KERN (0), to make sure that the origin of the
         messages can always be reliably determined.
      
      Userland actually can inject messages with a facility of 0 by abusing the
      fact that the facility is stored in a u8 data type.  By using a facility
      which is a multiple of 256 the assignment of msg->facility in log_store()
      implicitly truncates it to 0, i.e.  LOG_KERN, allowing users of /dev/kmsg
      to spoof kernel messages as shown below:
      
      The following call...
         # printf '<%d>Kernel panic - not syncing: beer empty\n' 0 >/dev/kmsg
      ...leads to the following log entry (dmesg -x | tail -n 1):
         user  :emerg : [   66.137758] Kernel panic - not syncing: beer empty
      
      However, this call...
         # printf '<%d>Kernel panic - not syncing: beer empty\n' 0x800 >/dev/kmsg
      ...leads to the slightly different log entry (note the kernel facility):
         kern  :emerg : [   74.177343] Kernel panic - not syncing: beer empty
      
      Fix that by limiting the user provided facility to 8 bit right from the
      beginning and catch the truncation early.
      
      Fixes: 7ff9554b ("printk: convert byte-buffer to variable-length...")
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Alex Elder <elder@linaro.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3824657c
  15. 22 10月, 2015 1 次提交
  16. 11 9月, 2015 1 次提交
    • D
      kexec: split kexec_load syscall from kexec core code · 2965faa5
      Dave Young 提交于
      There are two kexec load syscalls, kexec_load another and kexec_file_load.
       kexec_file_load has been splited as kernel/kexec_file.c.  In this patch I
      split kexec_load syscall code to kernel/kexec.c.
      
      And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
      use kexec_file_load only, or vice verse.
      
      The original requirement is from Ted Ts'o, he want kexec kernel signature
      being checked with CONFIG_KEXEC_VERIFY_SIG enabled.  But kexec-tools use
      kexec_load syscall can bypass the checking.
      
      Vivek Goyal proposed to create a common kconfig option so user can compile
      in only one syscall for loading kexec kernel.  KEXEC/KEXEC_FILE selects
      KEXEC_CORE so that old config files still work.
      
      Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
      architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
      KEXEC_CORE in arch Kconfig.  Also updated general kernel code with to
      kexec_load syscall.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2965faa5
  17. 01 7月, 2015 1 次提交
  18. 26 6月, 2015 5 次提交
    • V
      check_syslog_permissions() cleanup · 3ea4331c
      Vasily Averin 提交于
      Patch fixes drawbacks in heck_syslog_permissions() noticed by AKPM:
      "from_file handling makes me cry.
      
      That's not a boolean - it's an enumerated value with two values
      currently defined.
      
      But the code in check_syslog_permissions() treats it as a boolean and
      also hardwires the knowledge that SYSLOG_FROM_PROC == 1 (or == `true`).
      
      And the name is wrong: it should be called from_proc to match
      SYSLOG_FROM_PROC."
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ea4331c
    • V
      security_syslog() should be called once only · d194e5d6
      Vasily Averin 提交于
      The final version of commit 637241a9 ("kmsg: honor dmesg_restrict
      sysctl on /dev/kmsg") lost few hooks, as result security_syslog() are
      processed incorrectly:
      
      - open of /dev/kmsg checks syslog access permissions by using
        check_syslog_permissions() where security_syslog() is not called if
        dmesg_restrict is set.
      
      - syslog syscall and /proc/kmsg calls do_syslog() where security_syslog
        can be executed twice (inside check_syslog_permissions() and then
        directly in do_syslog())
      
      With this patch security_syslog() is called once only in all
      syslog-related operations regardless of dmesg_restrict value.
      
      Fixes: 637241a9 ("kmsg: honor dmesg_restrict sysctl on /dev/kmsg")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d194e5d6
    • T
      printk: implement support for extended console drivers · 6fe29354
      Tejun Heo 提交于
      printk log_buf keeps various metadata for each message including its
      sequence number and timestamp.  The metadata is currently available only
      through /dev/kmsg and stripped out before passed onto console drivers.  We
      want this metadata to be available to console drivers too so that console
      consumers can get full information including the metadata and dictionary,
      which among other things can be used to detect whether messages got lost
      in transit.
      
      This patch implements support for extended console drivers.  Consoles can
      indicate that they want extended messages by setting the new CON_EXTENDED
      flag and they'll be fed messages formatted the same way as /dev/kmsg.
      
       "<level>,<sequnum>,<timestamp>,<contflag>;<message text>\n"
      
      If extended consoles exist, in-kernel fragment assembly is disabled.  This
      ensures that all messages emitted to consoles have full metadata including
      sequence number.  The contflag carries enough information to reassemble
      the fragments from the reader side trivially.  Note that this only affects
      /dev/kmsg.  Regular console and /proc/kmsg outputs are not affected by
      this change.
      
      * Extended message formatting for console drivers is enabled iff there
        are registered extended consoles.
      
      * Comment describing /dev/kmsg message format updated to add missing
        contflag field and help distinguishing variable from verbatim terms.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Kay Sievers <kay@vrfy.org>
      Reviewed-by: NPetr Mladek <pmladek@suse.cz>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6fe29354
    • T
      printk: factor out message formatting from devkmsg_read() · 0a295e67
      Tejun Heo 提交于
      The extended message formatting used for /dev/kmsg will be used implement
      extended consoles.  Factor out msg_print_ext_header() and
      msg_print_ext_body() from devkmsg_read().
      
      This is pure restructuring.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Kay Sievers <kay@vrfy.org>
      Reviewed-by: NPetr Mladek <pmladek@suse.cz>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a295e67
    • T
      printk: guard the amount written per line by devkmsg_read() · d43ff430
      Tejun Heo 提交于
      This patchset updates netconsole so that it can emit messages with the
      same header as used in /dev/kmsg which gives neconsole receiver full log
      information which enables things like structured logging and detection
      of lost messages.
      
      This patch (of 7):
      
      devkmsg_read() uses 8k buffer and assumes that the formatted output
      message won't overrun which seems safe given LOG_LINE_MAX, the current use
      of dict and the escaping method being used; however, we're planning to use
      devkmsg formatting wider and accounting for the buffer size properly isn't
      that complicated.
      
      This patch defines CONSOLE_EXT_LOG_MAX as 8192 and updates devkmsg_read()
      so that it limits output accordingly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Kay Sievers <kay@vrfy.org>
      Reviewed-by: NPetr Mladek <pmladek@suse.cz>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d43ff430
  19. 26 3月, 2015 2 次提交
  20. 13 3月, 2015 1 次提交
  21. 07 3月, 2015 1 次提交