1. 25 11月, 2016 1 次提交
  2. 16 11月, 2016 1 次提交
  3. 07 11月, 2016 1 次提交
  4. 03 11月, 2016 1 次提交
    • P
      KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK · ea26e4ec
      Paolo Bonzini 提交于
      Since commit a545ab6a ("kvm: x86: add tsc_offset field to struct
      kvm_vcpu_arch", 2016-09-07) the offset between host and L1 TSC is
      cached and need not be fished out of the VMCS or VMCB.  This means
      that we can implement adjust_tsc_offset_guest and read_l1_tsc
      entirely in generic code.  The simplification is particularly
      significant for VMX code, where vmx->nested.vmcs01_tsc_offset
      was duplicating what is now in vcpu->arch.tsc_offset.  Therefore
      the vmcs01_tsc_offset can be dropped completely.
      
      More importantly, this fixes KVM_GET_CLOCK/KVM_SET_CLOCK
      which, after commit 108b249c ("KVM: x86: introduce get_kvmclock_ns",
      2016-09-01) called read_l1_tsc while the VMCS was not loaded.
      It thus returned bogus values on Intel CPUs.
      
      Fixes: 108b249cReported-by: NRoman Kagan <rkagan@virtuozzo.com>
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ea26e4ec
  5. 26 10月, 2016 1 次提交
    • D
      x86/io: add interface to reserve io memtype for a resource range. (v1.1) · 8ef42276
      Dave Airlie 提交于
      A recent change to the mm code in:
      87744ab3 mm: fix cache mode tracking in vm_insert_mixed()
      
      started enforcing checking the memory type against the registered list for
      amixed pfn insertion mappings. It happens that the drm drivers for a number
      of gpus relied on this being broken. Currently the driver only inserted
      VRAM mappings into the tracking table when they came from the kernel,
      and userspace mappings never landed in the table. This led to a regression
      where all the mapping end up as UC instead of WC now.
      
      I've considered a number of solutions but since this needs to be fixed
      in fixes and not next, and some of the solutions were going to introduce
      overhead that hadn't been there before I didn't consider them viable at
      this stage. These mainly concerned hooking into the TTM io reserve APIs,
      but these API have a bunch of fast paths I didn't want to unwind to add
      this to.
      
      The solution I've decided on is to add a new API like the arch_phys_wc
      APIs (these would have worked but wc_del didn't take a range), and
      use them from the drivers to add a WC compatible mapping to the table
      for all VRAM on those GPUs. This means we can then create userspace
      mapping that won't get degraded to UC.
      
      v1.1: use CONFIG_X86_PAT + add some comments in io.h
      
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: x86@kernel.org
      Cc: mcgrof@suse.com
      Cc: Dan Williams <dan.j.williams@intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      8ef42276
  6. 20 10月, 2016 1 次提交
    • H
      sched/core, x86: Make struct thread_info arch specific again · c8061485
      Heiko Carstens 提交于
      The following commit:
      
        c65eacbe ("sched/core: Allow putting thread_info into task_struct")
      
      ... made 'struct thread_info' a generic struct with only a
      single ::flags member, if CONFIG_THREAD_INFO_IN_TASK_STRUCT=y is
      selected.
      
      This change however seems to be quite x86 centric, since at least the
      generic preemption code (asm-generic/preempt.h) assumes that struct
      thread_info also has a preempt_count member, which apparently was not
      true for x86.
      
      We could add a bit more #ifdefs to solve this problem too, but it seems
      to be much simpler to make struct thread_info arch specific
      again. This also makes the conversion to THREAD_INFO_IN_TASK_STRUCT a
      bit easier for architectures that have a couple of arch specific stuff
      in their thread_info definition.
      
      The arch specific stuff _could_ be moved to thread_struct. However
      keeping them in thread_info makes it easier: accessing thread_info
      members is simple, since it is at the beginning of the task_struct,
      while the thread_struct is at the end. At least on s390 the offsets
      needed to access members of the thread_struct (with task_struct as
      base) are too large for various asm instructions.  This is not a
      problem when keeping these members within thread_info.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: keescook@chromium.org
      Cc: linux-arch@vger.kernel.org
      Link: http://lkml.kernel.org/r/1476901693-8492-2-git-send-email-mark.rutland@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c8061485
  7. 19 10月, 2016 1 次提交
    • P
      x86/cpufeature: Add AVX512_4VNNIW and AVX512_4FMAPS features · 82148993
      Piotr Luc 提交于
      AVX512_4VNNIW  - Vector instructions for deep learning enhanced word
      variable precision.
      AVX512_4FMAPS - Vector instructions for deep learning floating-point
      single precision.
      
      These new instructions are to be used in future Intel Xeon & Xeon Phi
      processors. The bits 2&3 of CPUID[level:0x07, EDX] inform that new
      instructions are supported by a processor.
      
      The spec can be found in the Intel Software Developer Manual (SDM) or in
      the Instruction Set Extensions Programming Reference (ISE).
      
      Define new feature flags to enumerate the new instructions in /proc/cpuinfo
      accordingly to CPUID bits and add the required xsave extensions which are
      required for proper operation.
      Signed-off-by: NPiotr Luc <piotr.luc@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20161018150111.29926-1-piotr.luc@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      82148993
  8. 18 10月, 2016 1 次提交
  9. 17 10月, 2016 1 次提交
  10. 14 10月, 2016 1 次提交
  11. 12 10月, 2016 1 次提交
    • H
      x86/panic: replace smp_send_stop() with kdump friendly version in panic path · 0ee59413
      Hidehiro Kawai 提交于
      Daniel Walker reported problems which happens when
      crash_kexec_post_notifiers kernel option is enabled
      (https://lkml.org/lkml/2015/6/24/44).
      
      In that case, smp_send_stop() is called before entering kdump routines
      which assume other CPUs are still online.  As the result, for x86, kdump
      routines fail to save other CPUs' registers and disable virtualization
      extensions.
      
      To fix this problem, call a new kdump friendly function,
      crash_smp_send_stop(), instead of the smp_send_stop() when
      crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a weak
      function, and it just call smp_send_stop().  Architecture codes should
      override it so that kdump can work appropriately.  This patch only
      provides x86-specific version.
      
      For Xen's PV kernel, just keep the current behavior.
      
      NOTES:
      
      - Right solution would be to place crash_smp_send_stop() before
        __crash_kexec() invocation in all cases and remove smp_send_stop(), but
        we can't do that until all architectures implement own
        crash_smp_send_stop()
      
      - crash_smp_send_stop()-like work is still needed by
        machine_crash_shutdown() because crash_kexec() can be called without
        entering panic()
      
      Fixes: f06e5153 (kernel/panic.c: add "crash_kexec_post_notifiers" option)
      Link: http://lkml.kernel.org/r/20160810080948.11028.15344.stgit@sysi4-13.yrl.intra.hitachi.co.jpSigned-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Reported-by: NDaniel Walker <dwalker@fifo99.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Daniel Walker <dwalker@fifo99.com>
      Cc: Xunlei Pang <xpang@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
      Cc: "Steven J. Hill" <steven.hill@cavium.com>
      Cc: Corey Minyard <cminyard@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ee59413
  12. 08 10月, 2016 4 次提交
    • D
      x86/pkeys: Make protection keys an "eager" feature · d4b05923
      Dave Hansen 提交于
      Our XSAVE features are divided into two categories: those that
      generate FPU exceptions, and those that do not.  MPX and pkeys do
      not generate FPU exceptions and thus can not be used lazily.  We
      disable them when lazy mode is forced on.
      
      We have a pair of masks to collect these two sets of features, but
      XFEATURE_MASK_PKRU was added to the wrong mask: XFEATURE_MASK_LAZY.
      Fix it by moving the feature to XFEATURE_MASK_EAGER.
      
      Note: this only causes problem if you boot with lazy FPU mode
      (eagerfpu=off) which is *not* the default.  It also only affects
      hardware which is not currently publicly available.  It looks like
      eager mode is going away, but we still need this patch applied
      to any kernel that has protection keys and lazy mode, which is 4.6
      through 4.8 at this point, and 4.9 if the lazy removal isn't sent
      to Linus for 4.9.
      
      Fixes: c8df4009 ("x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structures")
      Signed-off-by: NDave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20161007162342.28A49813@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      d4b05923
    • C
      nmi_backtrace: generate one-line reports for idle cpus · 6727ad9e
      Chris Metcalf 提交于
      When doing an nmi backtrace of many cores, most of which are idle, the
      output is a little overwhelming and very uninformative.  Suppress
      messages for cpus that are idling when they are interrupted and just
      emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
      
      We do this by grouping all the cpuidle code together into a new
      .cpuidle.text section, and then checking the address of the interrupted
      PC to see if it lies within that section.
      
      This commit suitably tags x86 and tile idle routines, and only adds in
      the minimal framework for other architectures.
      
      Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.comSigned-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
      Tested-by: NPetr Mladek <pmladek@suse.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6727ad9e
    • C
      nmi_backtrace: add more trigger_*_cpu_backtrace() methods · 9a01c3ed
      Chris Metcalf 提交于
      Patch series "improvements to the nmi_backtrace code" v9.
      
      This patch series modifies the trigger_xxx_backtrace() NMI-based remote
      backtracing code to make it more flexible, and makes a few small
      improvements along the way.
      
      The motivation comes from the task isolation code, where there are
      scenarios where we want to be able to diagnose a case where some cpu is
      about to interrupt a task-isolated cpu.  It can be helpful to see both
      where the interrupting cpu is, and also an approximation of where the
      cpu that is being interrupted is.  The nmi_backtrace framework allows us
      to discover the stack of the interrupted cpu.
      
      I've tested that the change works as desired on tile, and build-tested
      x86, arm, mips, and sparc64.  For x86 I confirmed that the generic
      cpuidle stuff as well as the architecture-specific routines are in the
      new cpuidle section.  For arm, mips, and sparc I just build-tested it
      and made sure the generic cpuidle routines were in the new cpuidle
      section, but I didn't attempt to figure out which the platform-specific
      idle routines might be.  That might be more usefully done by someone
      with platform experience in follow-up patches.
      
      This patch (of 4):
      
      Currently you can only request a backtrace of either all cpus, or all
      cpus but yourself.  It can also be helpful to request a remote backtrace
      of a single cpu, and since we want that, the logical extension is to
      support a cpumask as the underlying primitive.
      
      This change modifies the existing lib/nmi_backtrace.c code to take a
      cpumask as its basic primitive, and modifies the linux/nmi.h code to use
      the new "cpumask" method instead.
      
      The existing clients of nmi_backtrace (arm and x86) are converted to
      using the new cpumask approach in this change.
      
      The other users of the backtracing API (sparc64 and mips) are converted
      to use the cpumask approach rather than the all/allbutself approach.
      The mips code ignored the "include_self" boolean but with this change it
      will now also dump a local backtrace if requested.
      
      Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.comSigned-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
      Reviewed-by: NAaron Tomlin <atomlin@redhat.com>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a01c3ed
    • B
      mm: move phys_mem_access_prot_allowed() declaration to pgtable.h · 08ea8c07
      Baoyou Xie 提交于
      We get 1 warning when building kernel with W=1:
      
        drivers/char/mem.c:220:12: warning: no previous prototype for 'phys_mem_access_prot_allowed' [-Wmissing-prototypes]
         int __weak phys_mem_access_prot_allowed(struct file *file,
      
      In fact, its declaration is spreading to several header files in
      different architecture, but need to be declare in common header file.
      
      So this patch moves phys_mem_access_prot_allowed() to pgtable.h.
      
      Link: http://lkml.kernel.org/r/1473751597-12139-1-git-send-email-baoyou.xie@linaro.orgSigned-off-by: NBaoyou Xie <baoyou.xie@linaro.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08ea8c07
  13. 06 10月, 2016 1 次提交
    • J
      x86/unwind: Fix oprofile module link error · cfee9edd
      Josh Poimboeuf 提交于
      When compiling on x86 with CONFIG_OPROFILE=m and CONFIG_FRAME_POINTER=n,
      the oprofile module fails to link:
      
        ERROR: ftrace_graph_ret_addr" [arch/x86/oprofile/oprofile.ko] undefined!
      
      The problem was introduced when oprofile was converted to use the new
      x86 unwinder.  When frame pointers are disabled, the "guess" unwinder's
      unwind_get_return_address() is an inline function which calls
      ftrace_graph_ret_addr(), which is not exported.
      
      Fix it by converting the "guess" version of unwind_get_return_address()
      to an exported out-of-line function, just like its frame pointer
      counterpart.
      Reported-by: NKarl Beldan <karl.beldan@gmail.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: ec2ad9cc ("oprofile/x86: Convert x86_backtrace() to use the new unwinder")
      Link: http://lkml.kernel.org/r/be08d589f6474df78364e081c42777e382af9352.1475731632.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cfee9edd
  14. 30 9月, 2016 5 次提交
  15. 28 9月, 2016 2 次提交
    • A
      x86: separate extable.h, switch sections.h to it · 45caf470
      Al Viro 提交于
      drivers/platform/x86/dell-smo8800.c is touched due to the following obscenity:
      drivers/platform/x86/dell-smo8800.c ->
      	linux/interrupt.h ->
      		linux/hardirq.h ->
      			asm/hardirq.h ->
      				linux/irq.h ->
      					asm/hw_irq.h ->
      						asm/sections.h ->
      							asm/uaccess.h
      is the only chain of includes pulling asm/uaccess.h there.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      45caf470
    • A
      remove stray include of asm/uaccess.h from cacheflush.h · b79d8d82
      Al Viro 提交于
      It was introduced in "arch, x86: pmem api for ensuring durability
      of persistent memory updates" in July 2015, along with the code
      that really used that stuff.  Three weeks later all that code
      got moved to asm/pmem.h by "pmem, x86: move x86 PMEM API to new
      pmem.h header"; include, however, was left behind.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b79d8d82
  16. 24 9月, 2016 1 次提交
  17. 23 9月, 2016 2 次提交
  18. 22 9月, 2016 5 次提交
  19. 21 9月, 2016 1 次提交
  20. 20 9月, 2016 8 次提交
    • P
      KVM: x86: Hyper-V tsc page setup · 095cf55d
      Paolo Bonzini 提交于
      Lately tsc page was implemented but filled with empty
      values. This patch setup tsc page scale and offset based
      on vcpu tsc, tsc_khz and  HV_X64_MSR_TIME_REF_COUNT value.
      
      The valid tsc page drops HV_X64_MSR_TIME_REF_COUNT msr
      reads count to zero which potentially improves performance.
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NPeter Hornyack <peterhornyack@google.com>
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      [Computation of TSC page parameters rewritten to use the Linux timekeeper
       parameters. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      095cf55d
    • P
      KVM: x86: introduce get_kvmclock_ns · 108b249c
      Paolo Bonzini 提交于
      Introduce a function that reads the exact nanoseconds value that is
      provided to the guest in kvmclock.  This crystallizes the notion of
      kvmclock as a thin veneer over a stable TSC, that the guest will
      (hopefully) convert with NTP.  In other words, kvmclock is *not* a
      paravirtualized host-to-guest NTP.
      
      Drop the get_kernel_ns() function, that was used both to get the base
      value of the master clock and to get the current value of kvmclock.
      The former use is replaced by ktime_get_boot_ns(), the latter is
      the purpose of get_kernel_ns().
      
      This also allows KVM to provide a Hyper-V time reference counter that
      is synchronized with the time that is computed from the TSC page.
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      108b249c
    • J
      x86/dumpstack: Remove dump_trace() and related callbacks · c8fe4609
      Josh Poimboeuf 提交于
      All previous users of dump_trace() have been converted to use the new
      unwind interfaces, so we can remove it and the related
      print_context_stack() and print_context_stack_bp() callback functions.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/5b97da3572b40b5a4d8e185cf2429308d0987a13.1474045023.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c8fe4609
    • J
      x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder · e18bcccd
      Josh Poimboeuf 提交于
      Convert show_trace_log_lvl() to use the new unwinder.  dump_trace() has
      been deprecated.
      
      show_trace_log_lvl() is special compared to other users of the unwinder.
      It's the only place where both reliable *and* unreliable addresses are
      needed.  With frame pointers enabled, most callers of the unwinder don't
      want to know about unreliable addresses.  But in this case, when we're
      dumping the stack to the console because something presumably went
      wrong, the unreliable addresses are useful:
      
      - They show stale data on the stack which can provide useful clues.
      
      - If something goes wrong with the unwinder, or if frame pointers are
        corrupt or missing, all the stack addresses still get shown.
      
      So in order to show all addresses on the stack, and at the same time
      figure out which addresses are reliable, we have to do the scanning and
      the unwinding in parallel.
      
      The scanning is done with the help of get_stack_info() to traverse the
      stacks.  The unwinding is done separately by the new unwinder.
      
      In theory we could simplify show_trace_log_lvl() by instead pushing some
      of this logic into the unwind code.  But then we would need some kind of
      "fake" frame logic in the unwinder which would add a lot of complexity
      and wouldn't be worth it in order to support only one user.
      
      Another benefit of this approach is that once we have a DWARF unwinder,
      we should be able to just plug it in with minimal impact to this code.
      
      Another change here is that callers of show_trace_log_lvl() don't need
      to provide the 'bp' argument.  The unwinder already finds the relevant
      frame pointer by unwinding until it reaches the first frame after the
      provided stack pointer.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/703b5998604c712a1f801874b43f35d6dac52ede.1474045023.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e18bcccd
    • J
      x86/unwind: Add new unwind interface and implementations · 7c7900f8
      Josh Poimboeuf 提交于
      The x86 stack dump code is a bit of a mess.  dump_trace() uses
      callbacks, and each user of it seems to have slightly different
      requirements, so there are several slightly different callbacks floating
      around.
      
      Also there are some upcoming features which will need more changes to
      the stack dump code, including the printing of stack pt_regs, reliable
      stack detection for live patching, and a DWARF unwinder.  Each of those
      features would at least need more callbacks and/or callback interfaces,
      resulting in a much bigger mess than what we have today.
      
      Before doing all that, we should try to clean things up and replace
      dump_trace() with something cleaner and more flexible.
      
      The new unwinder is a simple state machine which was heavily inspired by
      a suggestion from Andy Lutomirski:
      
        https://lkml.kernel.org/r/CALCETrUbNTqaM2LRyXGRx=kVLRPeY5A3Pc6k4TtQxF320rUT=w@mail.gmail.com
      
      It's also similar to the libunwind API:
      
        http://www.nongnu.org/libunwind/man/libunwind(3).html
      
      Some if its advantages:
      
      - Simplicity: no more callback sprawl and less code duplication.
      
      - Flexibility: it allows the caller to stop and inspect the stack state
        at each step in the unwinding process.
      
      - Modularity: the unwinder code, console stack dump code, and stack
        metadata analysis code are all better separated so that changing one
        of them shouldn't have much of an impact on any of the others.
      
      Two implementations are added which conform to the new unwind interface:
      
      - The frame pointer unwinder which is used for CONFIG_FRAME_POINTER=y.
      
      - The "guess" unwinder which is used for CONFIG_FRAME_POINTER=n.  This
        isn't an "unwinder" per se.  All it does is scan the stack for kernel
        text addresses.  But with no frame pointers, guesses are better than
        nothing in most cases.
      Suggested-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/6dc2f909c47533d213d0505f0a113e64585bec82.1474045023.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7c7900f8
    • J
      locking/rwsem, x86: Drop a bogus cc clobber · c907420f
      Jan Beulich 提交于
      With the addition of uses of GCC's condition code outputs in commit:
      
        35ccfb71 ("x86, asm: Use CC_SET()/CC_OUT() in <asm/rwsem.h>")
      
      ... there's now an overlap of outputs and clobbers in __down_write_trylock().
      
      Such overlaps are generally getting tagged with an error (occasionally
      even with an ICE). I can't really tell why plain GCC 6.2 doesn't detect
      this (judging by the code it is meant to), while the slightly modified
      one I use does. Since condition code clobbers are never necessary on x86
      (other than perhaps for documentation purposes, which doesn't really
      get done consistently), remove it altogether rather than inventing
      something like CC_CLOBBER (to accompany CC_SET/CC_OUT).
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/57E003CC0200007800110102@prv-mh.provo.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c907420f
    • D
      x86/apic: Get rid of apic_version[] array · cff9ab2b
      Denys Vlasenko 提交于
      The array has a size of MAX_LOCAL_APIC, which can be as large as 32k, so it
      can consume up to 128k.
      
      The array has been there forever and was never used for anything useful
      other than a version mismatch check which was introduced in 2009.
      
      There is no reason to store the version in an array. The kernel is not
      prepared to handle different APIC versions anyway, so the real important
      part is to detect a version mismatch and warn about it, which can be done
      with a single variable as well.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: Andy Lutomirski <luto@amacapital.net>
      CC: Borislav Petkov <bp@alien8.de>
      CC: Brian Gerst <brgerst@gmail.com>
      CC: Mike Travis <travis@sgi.com>
      Link: http://lkml.kernel.org/r/20160913181232.30815-1-dvlasenk@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      cff9ab2b
    • W
      x86/apic: Order irq_enter/exit() calls correctly vs. ack_APIC_irq() · b0f48706
      Wanpeng Li 提交于
      ===============================
      [ INFO: suspicious RCU usage. ]
      4.8.0-rc6+ #5 Not tainted
      -------------------------------
      ./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      RCU used illegally from idle CPU!
      rcu_scheduler_active = 1, debug_locks = 0
      RCU used illegally from extended quiescent state!
      no locks held by swapper/2/0.
      
      stack backtrace:
      CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.8.0-rc6+ #5
      Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
       0000000000000000 ffff8d1bd6003f10 ffffffff94446949 ffff8d1bd4a68000
       0000000000000001 ffff8d1bd6003f40 ffffffff940e9247 ffff8d1bbdfcf3d0
       000000000000080b 0000000000000000 0000000000000000 ffff8d1bd6003f70
      Call Trace:
       <IRQ>  [<ffffffff94446949>] dump_stack+0x99/0xd0
       [<ffffffff940e9247>] lockdep_rcu_suspicious+0xe7/0x120
       [<ffffffff9448e0d5>] do_trace_write_msr+0x135/0x140
       [<ffffffff9406e750>] native_write_msr+0x20/0x30
       [<ffffffff9406503d>] native_apic_msr_eoi_write+0x1d/0x30
       [<ffffffff9405b17e>] smp_trace_call_function_interrupt+0x1e/0x270
       [<ffffffff948cb1d6>] trace_call_function_interrupt+0x96/0xa0
       <EOI>  [<ffffffff947200f4>] ? cpuidle_enter_state+0xe4/0x360
       [<ffffffff947200df>] ? cpuidle_enter_state+0xcf/0x360
       [<ffffffff947203a7>] cpuidle_enter+0x17/0x20
       [<ffffffff940df008>] cpu_startup_entry+0x338/0x4d0
       [<ffffffff9405bfc4>] start_secondary+0x154/0x180
      
      This can be reproduced readily by running ftrace test case of kselftest.
      
      Move the irq_enter() call before ack_APIC_irq(), because irq_enter() tells
      the RCU susbstems to end the extended quiescent state, so that the
      following trace call in ack_APIC_irq() works correctly. The same applies to
      exiting_ack_irq() which calls ack_APIC_irq() after irq_exit().
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1474198491-3738-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      b0f48706