1. 17 1月, 2017 1 次提交
  2. 14 1月, 2017 1 次提交
  3. 12 1月, 2017 2 次提交
    • J
      x86/unwind: Disable KASAN checks for non-current tasks · 84936118
      Josh Poimboeuf 提交于
      There are a handful of callers to save_stack_trace_tsk() and
      show_stack() which try to unwind the stack of a task other than current.
      In such cases, it's remotely possible that the task is running on one
      CPU while the unwinder is reading its stack from another CPU, causing
      the unwinder to see stack corruption.
      
      These cases seem to be mostly harmless.  The unwinder has checks which
      prevent it from following bad pointers beyond the bounds of the stack.
      So it's not really a bug as long as the caller understands that
      unwinding another task will not always succeed.
      
      In such cases, it's possible that the unwinder may read a KASAN-poisoned
      region of the stack.  Account for that by using READ_ONCE_NOCHECK() when
      reading the stack of another task.
      
      Use READ_ONCE() when reading the stack of the current task, since KASAN
      warnings can still be useful for finding bugs in that case.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/4c575eb288ba9f73d498dfe0acde2f58674598f1.1483978430.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      84936118
    • J
      x86/unwind: Silence warnings for non-current tasks · 900742d8
      Josh Poimboeuf 提交于
      There are a handful of callers to save_stack_trace_tsk() and
      show_stack() which try to unwind the stack of a task other than current.
      In such cases, it's remotely possible that the task is running on one
      CPU while the unwinder is reading its stack from another CPU, causing
      the unwinder to see stack corruption.
      
      These cases seem to be mostly harmless.  The unwinder has checks which
      prevent it from following bad pointers beyond the bounds of the stack.
      So it's not really a bug as long as the caller understands that
      unwinding another task will not always succeed.
      
      Since stack "corruption" on another task's stack isn't necessarily a
      bug, silence the warnings when unwinding tasks other than current.
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/00d8c50eea3446c1524a2a755397a3966629354c.1483978430.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      900742d8
  4. 10 1月, 2017 4 次提交
  5. 06 1月, 2017 1 次提交
  6. 05 1月, 2017 1 次提交
  7. 27 12月, 2016 1 次提交
  8. 25 12月, 2016 4 次提交
  9. 24 12月, 2016 1 次提交
    • J
      Revert "x86/unwind: Detect bad stack return address" · c280f773
      Josh Poimboeuf 提交于
      Revert the following commit:
      
        b6959a36 ("x86/unwind: Detect bad stack return address")
      
      ... because Andrey Konovalov reported an unwinder warning:
      
        WARNING: unrecognized kernel stack return address ffffffffa0000001 at ffff88006377fa18 in a.out:4467
      
      The unwind was initiated from an interrupt which occurred while running in the
      generated code for a kprobe.  The unwinder printed the warning because it
      expected regs->ip to point to a valid text address, but instead it pointed to
      the generated code.
      
      Eventually we may want come up with a way to identify generated kprobe
      code so the unwinder can know that it's a valid return address.  Until
      then, just remove the warning.
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/02f296848fbf49fb72dfeea706413ecbd9d4caf6.1482418739.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c280f773
  10. 23 12月, 2016 1 次提交
    • P
      x86/paravirt: Mark unused patch_default label · cef4402d
      Peter Zijlstra 提交于
      A bugfix commit:
      
        45dbea5f ("x86/paravirt: Fix native_patch()")
      
      ... introduced a harmless warning:
      
        arch/x86/kernel/paravirt_patch_32.c: In function 'native_patch':
        arch/x86/kernel/paravirt_patch_32.c:71:1: error: label 'patch_default' defined but not used [-Werror=unused-label]
      
      Fix it by annotating the label as __maybe_unused.
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Reported-by: NPiotr Gregor <piotrgregor@rsyncme.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 45dbea5f ("x86/paravirt: Fix native_patch()")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      cef4402d
  11. 21 12月, 2016 1 次提交
  12. 20 12月, 2016 2 次提交
    • B
      x86/alternatives: Do not use sync_core() to serialize I$ · 34bfab0e
      Borislav Petkov 提交于
      We use sync_core() in the alternatives code to stop speculative
      execution of prefetched instructions because we are potentially changing
      them and don't want to execute stale bytes.
      
      What it does on most machines is call CPUID which is a serializing
      instruction. And that's expensive.
      
      However, the instruction cache is serialized when we're on the local CPU
      and are changing the data through the same virtual address. So then, we
      don't need the serializing CPUID but a simple control flow change. Last
      being accomplished with a CALL/RET which the noinline causes.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
      Cc: Matthew Whitehead <tedheadster@gmail.com>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161203150258.vwr5zzco7ctgc4pe@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      34bfab0e
    • V
      x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic · 59107e2f
      Vitaly Kuznetsov 提交于
      There is a feature in Hyper-V ('Debug-VM --InjectNonMaskableInterrupt')
      which injects NMI to the guest. We may want to crash the guest and do kdump
      on this NMI by enabling unknown_nmi_panic. To make kdump succeed we need to
      allow the kdump kernel to re-establish VMBus connection so it will see
      VMBus devices (storage, network,..).
      
      To properly unload VMBus making it possible to start over during kdump we
      need to do the following:
      
       - Send an 'unload' message to the hypervisor. This can be done on any CPU
         so we do this the crashing CPU.
      
       - Receive the 'unload finished' reply message. WS2012R2 delivers this
         message to the CPU which was used to establish VMBus connection during
         module load and this CPU may differ from the CPU sending 'unload'.
      
      Receiving a VMBus message means the following:
      
       - There is a per-CPU slot in memory for one message. This slot can in
         theory be accessed by any CPU.
      
       - We get an interrupt on the CPU when a message was placed into the slot.
      
       - When we read the message we need to clear the slot and signal the fact
         to the hypervisor. In case there are more messages to this CPU pending
         the hypervisor will deliver the next message. The signaling is done by
         writing to an MSR so this can only be done on the appropriate CPU.
      
      To avoid doing cross-CPU work on crash we have vmbus_wait_for_unload()
      function which checks message slots for all CPUs in a loop waiting for the
      'unload finished' messages. However, there is an issue which arises when
      these conditions are met:
      
       - We're crashing on a CPU which is different from the one which was used
         to initially contact the hypervisor.
      
       - The CPU which was used for the initial contact is blocked with interrupts
         disabled and there is a message pending in the message slot.
      
      In this case we won't be able to read the 'unload finished' message on the
      crashing CPU. This is reproducible when we receive unknown NMIs on all CPUs
      simultaneously: the first CPU entering panic() will proceed to crash and
      all other CPUs will stop themselves with interrupts disabled.
      
      The suggested solution is to handle unknown NMIs for Hyper-V guests on the
      first CPU which gets them only. This will allow us to rely on VMBus
      interrupt handler being able to receive the 'unload finish' message in
      case it is delivered to a different CPU.
      
      The issue is not reproducible on WS2016 as Debug-VM delivers NMI to the
      boot CPU only, WS2012R2 and earlier Hyper-V versions are affected.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Acked-by: NK. Y. Srinivasan <kys@microsoft.com>
      Cc: devel@linuxdriverproject.org
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Link: http://lkml.kernel.org/r/20161202100720.28121-1-vkuznets@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      59107e2f
  13. 19 12月, 2016 12 次提交
  14. 18 12月, 2016 2 次提交
    • T
      x86/tsc: Limit the adjust value further · 8c9b9d87
      Thomas Gleixner 提交于
      Adjust value 0x80000000 and other values larger than that render the TSC
      deadline timer disfunctional.
      
      We have not yet any information about this from Intel, but experimentation
      clearly proves that this is a 32/64 bit and sign extension issue.
      
      If adjust values larger than that are actually required, which might be the
      case for physical CPU hotplug, then we need to disable the deadline timer
      on the affected package/CPUs and use the local APIC timer instead.
      
      That requires some surgery in the APIC setup code, so we just limit the
      ADJUST register value into the known to work range for now and revisit this
      when Intel comes forth with proper information.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Roland Scheidegger <rscheidegger_lists@hispeed.ch>
      Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
      Cc: Kevin Stanton <kevin.b.stanton@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      8c9b9d87
    • T
      x86/tsc: Annotate printouts as firmware bug · 16588f65
      Thomas Gleixner 提交于
      Make it more obvious that the BIOS is screwed up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Roland Scheidegger <rscheidegger_lists@hispeed.ch>
      Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
      Cc: Kevin Stanton <kevin.b.stanton@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      16588f65
  15. 15 12月, 2016 6 次提交
    • T
      x86/tsc: Force TSC_ADJUST register to value >= zero · 5bae1562
      Thomas Gleixner 提交于
      Roland reported that his DELL T5810 sports a value add BIOS which
      completely wreckages the TSC. The squirmware [(TM) Ingo Molnar] boots with
      random negative TSC_ADJUST values, different on all CPUs. That renders the
      TSC useless because the sycnchronization check fails.
      
      Roland tested the new TSC_ADJUST mechanism. While it manages to readjust
      the TSCs he needs to disable the TSC deadline timer, otherwise the machine
      just stops booting.
      
      Deeper investigation unearthed that the TSC deadline timer is sensitive to
      the TSC_ADJUST value. Writing TSC_ADJUST to a negative value results in an
      interrupt storm caused by the TSC deadline timer.
      
      This does not make any sense and it's hard to imagine what kind of hardware
      wreckage is behind that misfeature, but it's reliably reproducible on other
      systems which have TSC_ADJUST and TSC deadline timer.
      
      While it would be understandable that a big enough negative value which
      moves the resulting TSC readout into the negative space could have the
      described effect, this happens even with a adjust value of -1, which keeps
      the TSC readout definitely in the positive space. The compare register for
      the TSC deadline timer is set to a positive value larger than the TSC, but
      despite not having reached the deadline the interrupt is raised
      immediately. If this happens on the boot CPU, then the machine dies
      silently because this setup happens before the NMI watchdog is armed.
      
      Further experiments showed that any other adjustment of TSC_ADJUST works as
      expected as long as it stays in the positive range. The direction of the
      adjustment has no influence either. See the lkml link for further analysis.
      
      Yet another proof for the theory that timers are designed by janitors and
      the underlying (obviously undocumented) mechanisms which allow BIOSes to
      wreckage them are considered a feature. Well done Intel - NOT!
      
      To address this wreckage add the following sanity measures:
      
      - If the TSC_ADJUST value on the boot cpu is not 0, set it to 0
      
      - If the TSC_ADJUST value on any cpu is negative, set it to 0
      
      - Prevent the cross package synchronization mechanism from setting negative
        TSC_ADJUST values.
      Reported-and-tested-by: NRoland Scheidegger <rscheidegger_lists@hispeed.ch>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
      Cc: Kevin Stanton <kevin.b.stanton@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Allen Hung <allen_hung@dell.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/r/20161213131211.397588033@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5bae1562
    • T
      x86/tsc: Validate TSC_ADJUST after resume · 6a369583
      Thomas Gleixner 提交于
      Some 'feature' BIOSes fiddle with the TSC_ADJUST register during
      suspend/resume which renders the TSC unusable.
      
      Add sanity checks into the resume path and restore the
      original value if it was adjusted.
      Reported-and-tested-by: NRoland Scheidegger <rscheidegger_lists@hispeed.ch>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
      Cc: Kevin Stanton <kevin.b.stanton@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Allen Hung <allen_hung@dell.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/r/20161213131211.317654500@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      6a369583
    • B
      x86/acpi: Use proper macro for invalid node · 4370a3ef
      Boris Ostrovsky 提交于
      Use NUMA_NO_NODE instead of -1.
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: len.brown@intel.com
      Cc: rjw@rjwysocki.net
      Cc: linux-acpi@vger.kernel.org
      Cc: pavel@ucw.cz
      Link: http://lkml.kernel.org/r/1481570993-13941-1-git-send-email-boris.ostrovsky@oracle.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      4370a3ef
    • T
      x86/smpboot: Prevent false positive out of bounds cpumask access warning · 427d77a3
      Thomas Gleixner 提交于
      prefill_possible_map() reinitializes the cpu_possible_map by setting the
      possible cpu bits and clearing all other bits up to NR_CPUS.
      
      This is technically always correct because cpu_possible_map is statically
      allocated and sized NR_CPUS. With CPUMASK_OFFSTACK and DEBUG_PER_CPU_MAPS
      enabled the bounds check of cpu masks happens on nr_cpu_ids. nr_cpu_ids is
      initialized to NR_CPUS and only limited after the set/clear bit loops have
      been executed. 
      
      But if the system was booted with "nr_cpus=N" on the command line, where N
      is < NR_CPUS then nr_cpu_ids is limited in the parameter parsing function
      before prefill_possible_map() is invoked. As a consequence the cpumask
      bounds check triggers when clearing the bits past nr_cpu_ids.
      
      Add a helper which allows to reset cpu_possible_map w/o the bounds check
      and then set only the possible bits which are well inside bounds.
      Reported-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: 0x7f454c46@gmail.com
      Cc: Jan Beulich <JBeulich@novell.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612131836050.3415@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      427d77a3
    • B
      kexec: export the value of phys_base instead of symbol address · 401721ec
      Baoquan He 提交于
      Currently in x86_64, the symbol address of phys_base is exported to
      vmcoreinfo.  Dave Anderson complained this is really useless for his
      Crash implementation.  Because in user-space utility Crash and
      Makedumpfile which exported vmcore information is mainly used for, value
      of phys_base is needed to covert virtual address of exported kernel
      symbol to physical address.  Especially init_level4_pgt, if we want to
      access and go over the page table to look up a PA corresponding to VA,
      firstly we need calculate
      
        page_dir = SYMBOL(init_level4_pgt) - __START_KERNEL_map + phys_base;
      
      Now in Crash and Makedumpfile, we have to analyze the vmcore elf program
      header to get value of phys_base.  As Dave said, it would be preferable
      if it were readily availabl in vmcoreinfo rather than depending upon the
      PT_LOAD semantics.
      
      Hence in this patch change to export the value of phys_base instead of
      its virtual address.
      
      And people also complained that KERNEL_IMAGE_SIZE exporting is x86_64
      only, should be moved into arch dependent function
      arch_crash_save_vmcoreinfo.  Do the moving in this patch.
      
      Link: http://lkml.kernel.org/r/1478568596-30060-2-git-send-email-bhe@redhat.comSigned-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Xunlei Pang <xlpang@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Eugene Surovegin <surovegin@google.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      401721ec
    • B
      Revert "kdump, vmcoreinfo: report memory sections virtual addresses" · 69f58384
      Baoquan He 提交于
      This reverts commit 0549a3c0 ("kdump, vmcoreinfo: report memory
      sections virtual addresses").
      
      Commit 0549a3c0 tells the userspace utility makedumpfile the
      randomized base address of these memmory sections when mm kaslr is
      enabled.  However the following patch "kexec: export the value of
      phys_base instead of symbol address" makes makedumpfile not need these
      addresses any more.
      
      Besides we should use VMCOREINFO_NUMBER to export the value of the
      variable so that we can use the existing number_table mechanism of
      Makedumpfile to fetch it.  So revert it now.  If needed we can add it
      later.
      
      http://lists.infradead.org/pipermail/kexec/2016-October/017540.html
      Link: http://lkml.kernel.org/r/1478568596-30060-1-git-send-email-bhe@redhat.comSigned-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Xunlei Pang <xlpang@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Eugene Surovegin <surovegin@google.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69f58384