1. 10 12月, 2016 4 次提交
    • T
      x86: Remove empty idle.h header · 34bc3560
      Thomas Gleixner 提交于
      One include less is always a good thing(tm). Good riddance.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-6-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      34bc3560
    • B
      x86/amd: Simplify AMD E400 aware idle routine · 07c94a38
      Borislav Petkov 提交于
      Reorganize the E400 detection now that we have everything in place:
      switch the CPUs to broadcast mode after the LAPIC has been initialized
      and remove the facilities that were used previously on the idle path.
      
      Unfortunately static_cpu_has_bug() cannpt be used in the E400 idle routine
      because alternatives have been applied when the actual detection happens,
      so the static switching does not take effect and the test will stay
      false. Use boot_cpu_has_bug() instead which is definitely an improvement
      over the RDMSR and the cpumask handling.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-5-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      07c94a38
    • T
      x86/amd: Check for the C1E bug post ACPI subsystem init · e7ff3a47
      Thomas Gleixner 提交于
      AMD CPUs affected by the E400 erratum suffer from the issue that the
      local APIC timer stops when the CPU goes into C1E. Unfortunately there
      is no way to detect the affected CPUs on early boot. It's only possible
      to determine the range of possibly affected CPUs from the family/model
      range.
      
      The actual decision whether to enter C1E and thus cause the bug is done
      by the firmware and we need to detect that case late, after ACPI has
      been initialized.
      
      The current solution is to check in the idle routine whether the CPU is
      affected by reading the MSR_K8_INT_PENDING_MSG MSR and checking for the
      K8_INTP_C1E_ACTIVE_MASK bits. If one of the bits is set then the CPU is
      affected and the system is switched into forced broadcast mode.
      
      This is ineffective and on non-affected CPUs every entry to idle does
      the extra RDMSR.
      
      After doing some research it turns out that the bits are visible on the
      boot CPU right after the ACPI subsystem is initialized in the early
      boot process. So instead of polling for the bits in the idle loop, add
      a detection function after acpi_subsystem_init() and check for the MSR
      bits. If set, then the X86_BUG_AMD_APIC_C1E is set on the boot CPU and
      the TSC is marked unstable when X86_FEATURE_NONSTOP_TSC is not set as it
      will stop in C1E state as well.
      
      The switch to broadcast mode cannot be done at this point because the
      boot CPU still uses HPET as a clockevent device and the local APIC timer
      is not yet calibrated and installed. The switch to broadcast mode on the
      affected CPUs needs to be done when the local APIC timer is actually set
      up.
      
      This allows to cleanup the amd_e400_idle() function in the next step.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-4-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e7ff3a47
    • T
      x86/bugs: Separate AMD E400 erratum and C1E bug · 3344ed30
      Thomas Gleixner 提交于
      The workaround for the AMD Erratum E400 (Local APIC timer stops in C1E
      state) is a two step process:
      
       - Selection of the E400 aware idle routine
      
       - Detection whether the platform is affected
      
      The idle routine selection happens for possibly affected CPUs depending on
      family/model/stepping information. These range of CPUs is not necessarily
      affected as the decision whether to enable the C1E feature is made by the
      firmware. Unfortunately there is no way to query this at early boot.
      
      The current implementation polls a MSR in the E400 aware idle routine to
      detect whether the CPU is affected. This is inefficient on non affected
      CPUs because every idle entry has to do the MSR read.
      
      There is a better way to detect this before going idle for the first time
      which requires to seperate the bug flags:
      
        X86_BUG_AMD_E400 	- Selects the E400 aware idle routine and
        			  enables the detection
      			  
        X86_BUG_AMD_APIC_C1E  - Set when the platform is affected by E400
      
      Replace the current X86_BUG_AMD_APIC_C1E usage by the new X86_BUG_AMD_E400
      bug bit to select the idle routine which currently does an unconditional
      detection poll. X86_BUG_AMD_APIC_C1E is going to be used in later patches
      to remove the MSR polling and simplify the handling of this misfeature.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-3-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3344ed30
  2. 18 11月, 2016 3 次提交
  3. 12 11月, 2016 1 次提交
    • A
      x86: apm: avoid uninitialized data · 3a6d8676
      Arnd Bergmann 提交于
      apm_bios_call() can fail, and return a status in its argument structure.
      If that status however is zero during a call from
      apm_get_power_status(), we end up using data that may have never been
      set, as reported by "gcc -Wmaybe-uninitialized":
      
        arch/x86/kernel/apm_32.c: In function ‘apm’:
        arch/x86/kernel/apm_32.c:1729:17: error: ‘bx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
        arch/x86/kernel/apm_32.c:1835:5: error: ‘cx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
        arch/x86/kernel/apm_32.c:1730:17: note: ‘cx’ was declared here
        arch/x86/kernel/apm_32.c:1842:27: error: ‘dx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
        arch/x86/kernel/apm_32.c:1731:17: note: ‘dx’ was declared here
      
      This changes the function to return "APM_NO_ERROR" here, which makes the
      code more robust to broken BIOS versions, and avoids the warning.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NJiri Kosina <jkosina@suse.cz>
      Reviewed-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3a6d8676
  4. 29 10月, 2016 1 次提交
    • T
      x86/smpboot: Init apic mapping before usage · 1e90a13d
      Thomas Gleixner 提交于
      The recent changes, which forced the registration of the boot cpu on UP
      systems, which do not have ACPI tables, have been fixed for systems w/o
      local APIC, but left a wreckage for systems which have neither ACPI nor
      mptables, but the CPU has an APIC, e.g. virtualbox.
      
      The boot process crashes in prefill_possible_map() as it wants to register
      the boot cpu, which needs to access the local apic, but the local APIC is
      not yet mapped.
      
      There is no reason why init_apic_mapping() can't be invoked before
      prefill_possible_map(). So instead of playing another silly early mapping
      game, as the ACPI/mptables code does, we just move init_apic_mapping()
      before the call to prefill_possible_map().
      
      In hindsight, I should have noticed that combination earlier.
      
      Sorry for the churn (also in stable)!
      
      Fixes: ff856051 ("x86/boot/smp: Don't try to poke disabled/non-existent APIC")
      Reported-and-debugged-by: NMichal Necasek <michal.necasek@oracle.com>
      Reported-and-tested-by: NWolfgang Bauer <wbauer@tmo.at>
      Cc: prarit@redhat.com
      Cc: ville.syrjala@linux.intel.com
      Cc: michael.thayer@oracle.com
      Cc: knut.osmundsen@oracle.com
      Cc: frank.mehnert@oracle.com
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610282114380.5053@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1e90a13d
  5. 28 10月, 2016 1 次提交
    • B
      x86/microcode/AMD: Fix more fallout from CONFIG_RANDOMIZE_MEMORY=y · 1c27f646
      Borislav Petkov 提交于
      We needed the physical address of the container in order to compute the
      offset within the relocated ramdisk. And we did this by doing __pa() on
      the virtual address.
      
      However, __pa() does checks whether the physical address is within
      PAGE_OFFSET and __START_KERNEL_map - see __phys_addr() - which fail
      if we have CONFIG_RANDOMIZE_MEMORY enabled: we feed a virtual address
      which *doesn't* have the randomization offset into a function which uses
      PAGE_OFFSET which *does* have that offset.
      
      This makes this check fire:
      
      	VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x));
      			^^^^^^
      
      due to the randomization offset.
      
      The fix is as simple as using __pa_nodebug() because we do that
      randomization offset accounting later in that function ourselves.
      Reported-by: NBob Peterson <rpeterso@redhat.com>
      Tested-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: stable@vger.kernel.org # 4.9
      Link: http://lkml.kernel.org/r/20161027123623.j2jri5bandimboff@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1c27f646
  6. 26 10月, 2016 1 次提交
    • S
      x86: Fix export for mcount and __fentry__ · 5de0a8c0
      Steven Rostedt 提交于
      Commit 784d5699 ("x86: move exports to actual definitions") removed the
      EXPORT_SYMBOL(__fentry__) and EXPORT_SYMBOL(mcount) from x8664_ksyms_64.c,
      and added EXPORT_SYMBOL(function_hook) in mcount_64.S instead. The problem
      is that function_hook isn't a function at all, but a macro that is defined
      as either mcount or __fentry__ depending on the support from gcc.
      
      Originally, I thought this was a macro issue, like what __stringify()
      is used for. But the problem is a bit deeper. The Makefile.build has
      some magic that does post processing of files to create the CRC
      bindings. It does some searches for EXPORT_SYMBOL() and because it
      finds a macro name and not the actual functions, this causes
      function_hook not to be converted into mcount or __fentry__ and they
      are missed.
      
      Instead of adding more magic to Makefile.build, just add
      EXPORT_SYMBOL() for mcount and __fentry__ where the ifdef is used.
      Since this is assembly and not C, it doesn't require being set after
      the function is defined.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Tested-by: NBorislav Petkov <bp@alien8.de>
      Cc: Gabriel C <nix.or.die@gmail.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Link: http://lkml.kernel.org/r/20161024150148.4f9d90e4@gandalf.local.homeSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5de0a8c0
  7. 25 10月, 2016 2 次提交
    • A
      x86/quirks: Hide maybe-uninitialized warning · d320b9a5
      Arnd Bergmann 提交于
      gcc -Wmaybe-uninitialized detects that quirk_intel_brickland_xeon_ras_cap
      uses uninitialized data when CONFIG_PCI is not set:
      
        arch/x86/kernel/quirks.c: In function ‘quirk_intel_brickland_xeon_ras_cap’:
        arch/x86/kernel/quirks.c:641:13: error: ‘capid0’ is used uninitialized in this function [-Werror=uninitialized]
      
      However, the function is also not called in this configuration, so we
      can avoid the warning by moving the existing #ifdef to cover it as well.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-pci@vger.kernel.org
      Link: http://lkml.kernel.org/r/20161024153325.2752428-1-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d320b9a5
    • J
      x86/unwind: Fix empty stack dereference in guess unwinder · 7fbe6ac0
      Josh Poimboeuf 提交于
      Vince Waver reported the following bug:
      
        WARNING: CPU: 0 PID: 21338 at arch/x86/mm/fault.c:435 vmalloc_fault+0x58/0x1f0
        CPU: 0 PID: 21338 Comm: perf_fuzzer Not tainted 4.8.0+ #37
        Hardware name: Hewlett-Packard HP Compaq Pro 6305 SFF/1850, BIOS K06 v02.57 08/16/2013
        Call Trace:
         <NMI>  ? dump_stack+0x46/0x59
         ? __warn+0xd5/0xee
         ? vmalloc_fault+0x58/0x1f0
         ? __do_page_fault+0x6d/0x48e
         ? perf_log_throttle+0xa4/0xf4
         ? trace_page_fault+0x22/0x30
         ? __unwind_start+0x28/0x42
         ? perf_callchain_kernel+0x75/0xac
         ? get_perf_callchain+0x13a/0x1f0
         ? perf_callchain+0x6a/0x6c
         ? perf_prepare_sample+0x71/0x2eb
         ? perf_event_output_forward+0x1a/0x54
         ? __default_send_IPI_shortcut+0x10/0x2d
         ? __perf_event_overflow+0xfb/0x167
         ? x86_pmu_handle_irq+0x113/0x150
         ? native_read_msr+0x6/0x34
         ? perf_event_nmi_handler+0x22/0x39
         ? perf_ibs_nmi_handler+0x4a/0x51
         ? perf_event_nmi_handler+0x22/0x39
         ? nmi_handle+0x4d/0xf0
         ? perf_ibs_handle_irq+0x3d1/0x3d1
         ? default_do_nmi+0x3c/0xd5
         ? do_nmi+0x92/0x102
         ? end_repeat_nmi+0x1a/0x1e
         ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
         ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
         ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
         <EOE> ^A4---[ end trace 632723104d47d31a ]---
        BUG: stack guard page was hit at ffffc90008500000 (stack is ffffc900084fc000..ffffc900084fffff)
        kernel stack overflow (page fault): 0000 [#1] SMP
        ...
      
      The NMI hit in the entry code right after setting up the stack pointer
      from 'cpu_current_top_of_stack', so the kernel stack was empty.  The
      'guess' version of __unwind_start() attempted to dereference the "top of
      stack" pointer, which is not actually *on* the stack.
      
      Add a check in the guess unwinder to deal with an empty stack.  (The
      frame pointer unwinder already has such a check.)
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 7c7900f8 ("x86/unwind: Add new unwind interface and implementations")
      Link: http://lkml.kernel.org/r/20161024133127.e5evgeebdbohnmpb@trebleSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7fbe6ac0
  8. 24 10月, 2016 1 次提交
  9. 22 10月, 2016 1 次提交
    • V
      x86/boot/smp: Don't try to poke disabled/non-existent APIC · ff856051
      Ville Syrjälä 提交于
      Apparently trying to poke a disabled or non-existent APIC
      leads to a box that doesn't even boot. Let's not do that.
      
      No real clue if this is the right fix, but at least my
      P3 machine boots again.
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: dyoung@redhat.com
      Cc: kexec@lists.infradead.org
      Cc: stable@vger.kernel.org
      Fixes: 2a51fe08 ("arch/x86: Handle non enumerated CPU after physical hotplug")
      Link: http://lkml.kernel.org/r/1477102684-5092-1-git-send-email-ville.syrjala@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ff856051
  10. 20 10月, 2016 1 次提交
  11. 19 10月, 2016 3 次提交
  12. 16 10月, 2016 3 次提交
    • D
      x86/e820: Don't merge consecutive E820_PRAM ranges · 23446cb6
      Dan Williams 提交于
      Commit:
      
        917db484 ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation")
      
      ... fixed up the broken manipulations of max_pfn in the presence of
      E820_PRAM ranges.
      
      However, it also broke the sanitize_e820_map() support for not merging
      E820_PRAM ranges.
      
      Re-introduce the enabling to keep resource boundaries between
      consecutive defined ranges. Otherwise, for example, an environment that
      boots with memmap=2G!8G,2G!10G will end up with a single 4G /dev/pmem0
      device instead of a /dev/pmem0 and /dev/pmem1 device 2G in size.
      Reported-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Zhang Yi <yizhan@redhat.com>
      Cc: linux-nvdimm@lists.01.org
      Fixes: 917db484 ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation")
      Link: http://lkml.kernel.org/r/147629530854.10618.10383744751594021268.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      23446cb6
    • D
      kprobes: Unpoison stack in jprobe_return() for KASAN · 9f7d416c
      Dmitry Vyukov 提交于
      I observed false KSAN positives in the sctp code, when
      sctp uses jprobe_return() in jsctp_sf_eat_sack().
      
      The stray 0xf4 in shadow memory are stack redzones:
      
      [     ] ==================================================================
      [     ] BUG: KASAN: stack-out-of-bounds in memcmp+0xe9/0x150 at addr ffff88005e48f480
      [     ] Read of size 1 by task syz-executor/18535
      [     ] page:ffffea00017923c0 count:0 mapcount:0 mapping:          (null) index:0x0
      [     ] flags: 0x1fffc0000000000()
      [     ] page dumped because: kasan: bad access detected
      [     ] CPU: 1 PID: 18535 Comm: syz-executor Not tainted 4.8.0+ #28
      [     ] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      [     ]  ffff88005e48f2d0 ffffffff82d2b849 ffffffff0bc91e90 fffffbfff10971e8
      [     ]  ffffed000bc91e90 ffffed000bc91e90 0000000000000001 0000000000000000
      [     ]  ffff88005e48f480 ffff88005e48f350 ffffffff817d3169 ffff88005e48f370
      [     ] Call Trace:
      [     ]  [<ffffffff82d2b849>] dump_stack+0x12e/0x185
      [     ]  [<ffffffff817d3169>] kasan_report+0x489/0x4b0
      [     ]  [<ffffffff817d31a9>] __asan_report_load1_noabort+0x19/0x20
      [     ]  [<ffffffff82d49529>] memcmp+0xe9/0x150
      [     ]  [<ffffffff82df7486>] depot_save_stack+0x176/0x5c0
      [     ]  [<ffffffff817d2031>] save_stack+0xb1/0xd0
      [     ]  [<ffffffff817d27f2>] kasan_slab_free+0x72/0xc0
      [     ]  [<ffffffff817d05b8>] kfree+0xc8/0x2a0
      [     ]  [<ffffffff85b03f19>] skb_free_head+0x79/0xb0
      [     ]  [<ffffffff85b0900a>] skb_release_data+0x37a/0x420
      [     ]  [<ffffffff85b090ff>] skb_release_all+0x4f/0x60
      [     ]  [<ffffffff85b11348>] consume_skb+0x138/0x370
      [     ]  [<ffffffff8676ad7b>] sctp_chunk_put+0xcb/0x180
      [     ]  [<ffffffff8676ae88>] sctp_chunk_free+0x58/0x70
      [     ]  [<ffffffff8677fa5f>] sctp_inq_pop+0x68f/0xef0
      [     ]  [<ffffffff8675ee36>] sctp_assoc_bh_rcv+0xd6/0x4b0
      [     ]  [<ffffffff8677f2c1>] sctp_inq_push+0x131/0x190
      [     ]  [<ffffffff867bad69>] sctp_backlog_rcv+0xe9/0xa20
      [ ... ]
      [     ] Memory state around the buggy address:
      [     ]  ffff88005e48f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ]  ffff88005e48f400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ] >ffff88005e48f480: f4 f4 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ]                    ^
      [     ]  ffff88005e48f500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ]  ffff88005e48f580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ] ==================================================================
      
      KASAN stack instrumentation poisons stack redzones on function entry
      and unpoisons them on function exit. If a function exits abnormally
      (e.g. with a longjmp like jprobe_return()), stack redzones are left
      poisoned. Later this leads to random KASAN false reports.
      
      Unpoison stack redzones in the frames we are going to jump over
      before doing actual longjmp in jprobe_return().
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: kasan-dev@googlegroups.com
      Cc: surovegin@google.com
      Cc: rostedt@goodmis.org
      Link: http://lkml.kernel.org/r/1476454043-101898-1-git-send-email-dvyukov@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9f7d416c
    • D
      kprobes: Avoid false KASAN reports during stack copy · 9254139a
      Dmitry Vyukov 提交于
      Kprobes save and restore raw stack chunks with memcpy().
      With KASAN these chunks can contain poisoned stack redzones,
      as the result memcpy() interceptor produces false
      stack out-of-bounds reports.
      
      Use __memcpy() instead of memcpy() for stack copying.
      __memcpy() is not instrumented by KASAN and does not lead
      to the false reports.
      
      Currently there is a spew of KASAN reports during boot
      if CONFIG_KPROBES_SANITY_TEST is enabled:
      
      [   ] Kprobe smoke test: started
      [   ] ==================================================================
      [   ] BUG: KASAN: stack-out-of-bounds in setjmp_pre_handler+0x17c/0x280 at addr ffff88085259fba8
      [   ] Read of size 64 by task swapper/0/1
      [   ] page:ffffea00214967c0 count:0 mapcount:0 mapping:          (null) index:0x0
      [   ] flags: 0x2fffff80000000()
      [   ] page dumped because: kasan: bad access detected
      [...]
      Reported-by: NCAI Qian <caiqian@redhat.com>
      Tested-by: NCAI Qian <caiqian@redhat.com>
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kasan-dev@googlegroups.com
      [ Improved various details. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9254139a
  13. 14 10月, 2016 1 次提交
    • W
      x86/smp: Add irq_enter/exit() in smp_reschedule_interrupt() · 1ec6ec14
      Wanpeng Li 提交于
       ===============================
       [ INFO: suspicious RCU usage. ]
       4.8.0+ #24 Not tainted
       -------------------------------
       ./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!
       
       other info that might help us debug this:
       
       RCU used illegally from idle CPU!
       rcu_scheduler_active = 1, debug_locks = 0
       RCU used illegally from extended quiescent state!
       no locks held by swapper/1/0.
       
        [<ffffffff9d492b95>] do_trace_write_msr+0x135/0x140
        [<ffffffff9d06f860>] native_write_msr+0x20/0x30
        [<ffffffff9d065fad>] native_apic_msr_eoi_write+0x1d/0x30
        [<ffffffff9d05bd1d>] smp_reschedule_interrupt+0x1d/0x30
        [<ffffffff9d8daec6>] reschedule_interrupt+0x96/0xa0
      
      Reschedule interrupt may be called in cpu idle state. This causes lockdep 
      check warning above. 
      
      Add irq_enter/exit() in smp_reschedule_interrupt(), irq_enter() tells the RCU 
      subsystems to end the extended quiescent state, so the following trace call in 
      ack_APIC_irq() works correctly.
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Link: http://lkml.kernel.org/r/1476409733-5133-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1ec6ec14
  14. 12 10月, 2016 3 次提交
  15. 08 10月, 2016 4 次提交
    • T
      x86/apic: Prevent pointless warning messages · df610d67
      Thomas Gleixner 提交于
      Markus reported that he sees new warnings:
      
        APIC: NR_CPUS/possible_cpus limit of 4 reached.  Processor 4/0x84 ignored.
        APIC: NR_CPUS/possible_cpus limit of 4 reached.  Processor 5/0x85 ignored.
      
      This comes from the recent persistant cpuid - nodeid changes. The code
      which emits the warning has been called prior to these changes only for
      enabled processors. Now it's called for disabled processors as well to get
      the possible cpu accounting correct. So if the kernel is compiled for the
      number of actual available/enabled CPUs and the BIOS reports disabled CPUs
      as well then the above warnings are printed.
      
      That's a pointless exercise as it only makes sense if there are more CPUs
      enabled than the kernel supports.
      
      Nake the warning conditional on enabled processors so we are back to the
      state before these changes.
      
      Fixes: 8f54969d ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") 
      Reported-and-tested-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: linux-acpi@vger.kernel.org
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610071549330.19804@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      df610d67
    • T
      x86/acpi: Prevent LAPIC id 0xff from being accounted · f3bf1dbe
      Thomas Gleixner 提交于
      Yinghai reported that the recent changes to make the cpuid - nodeid
      relationship permanent causes a cpuid ordering regression on a system which
      has 2apic enabled..
      
      The reason is that the ACPI local APIC parser has no sanity check for
      apicid 0xff, which is an invalid id. So a CPU id for this invalid local
      APIC id is allocated and therefor breaks the cpuid ordering.
      
      Add a sanity check to acpi_parse_lapic() which ignores the invalid id.
      
      Fixes: 8f54969d ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping")
      Reported-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>,
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: douly.fnst@cn.fujitsu.com,
      Cc: zhugh.fnst@cn.fujitsu.com
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Lv Zheng <lv.zheng@intel.com>,
      Cc: robert.moore@intel.com
      Cc: linux-acpi@vger.kernel.org
      Link: https://lkml.kernel.org/r/CAE9FiQVQx6FRXT-RdR7Crz4dg5LeUWHcUSy1KacjR+JgU_vGJg@mail.gmail.com
      f3bf1dbe
    • C
      nmi_backtrace: generate one-line reports for idle cpus · 6727ad9e
      Chris Metcalf 提交于
      When doing an nmi backtrace of many cores, most of which are idle, the
      output is a little overwhelming and very uninformative.  Suppress
      messages for cpus that are idling when they are interrupted and just
      emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
      
      We do this by grouping all the cpuidle code together into a new
      .cpuidle.text section, and then checking the address of the interrupted
      PC to see if it lies within that section.
      
      This commit suitably tags x86 and tile idle routines, and only adds in
      the minimal framework for other architectures.
      
      Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.comSigned-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
      Tested-by: NPetr Mladek <pmladek@suse.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6727ad9e
    • C
      nmi_backtrace: add more trigger_*_cpu_backtrace() methods · 9a01c3ed
      Chris Metcalf 提交于
      Patch series "improvements to the nmi_backtrace code" v9.
      
      This patch series modifies the trigger_xxx_backtrace() NMI-based remote
      backtracing code to make it more flexible, and makes a few small
      improvements along the way.
      
      The motivation comes from the task isolation code, where there are
      scenarios where we want to be able to diagnose a case where some cpu is
      about to interrupt a task-isolated cpu.  It can be helpful to see both
      where the interrupting cpu is, and also an approximation of where the
      cpu that is being interrupted is.  The nmi_backtrace framework allows us
      to discover the stack of the interrupted cpu.
      
      I've tested that the change works as desired on tile, and build-tested
      x86, arm, mips, and sparc64.  For x86 I confirmed that the generic
      cpuidle stuff as well as the architecture-specific routines are in the
      new cpuidle section.  For arm, mips, and sparc I just build-tested it
      and made sure the generic cpuidle routines were in the new cpuidle
      section, but I didn't attempt to figure out which the platform-specific
      idle routines might be.  That might be more usefully done by someone
      with platform experience in follow-up patches.
      
      This patch (of 4):
      
      Currently you can only request a backtrace of either all cpus, or all
      cpus but yourself.  It can also be helpful to request a remote backtrace
      of a single cpu, and since we want that, the logical extension is to
      support a cpumask as the underlying primitive.
      
      This change modifies the existing lib/nmi_backtrace.c code to take a
      cpumask as its basic primitive, and modifies the linux/nmi.h code to use
      the new "cpumask" method instead.
      
      The existing clients of nmi_backtrace (arm and x86) are converted to
      using the new cpumask approach in this change.
      
      The other users of the backtracing API (sparc64 and mips) are converted
      to use the cpumask approach rather than the all/allbutself approach.
      The mips code ignored the "include_self" boolean but with this change it
      will now also dump a local backtrace if requested.
      
      Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.comSigned-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
      Reviewed-by: NAaron Tomlin <atomlin@redhat.com>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a01c3ed
  16. 07 10月, 2016 1 次提交
    • P
      arch/x86: Handle non enumerated CPU after physical hotplug · 2a51fe08
      Prarit Bhargava 提交于
      When a CPU is physically added to a system then the MADT table is not
      updated.
      
      If subsequently a kdump kernel is started on that physically added CPU then
      the ACPI enumeration fails to provide the information for this CPU which is
      now the boot CPU of the kdump kernel.
      
      As a consequence, generic_processor_info() is not invoked for that CPU so
      the number of enumerated processors is 0 and none of the initializations,
      including the logical package id management, are performed.
      
      We have code which relies on the correctness of the logical package map and
      other information which is initialized via generic_processor_info().
      Executing such code will result in undefined behaviour or kernel crashes.
      
      This problem applies only to the kdump kernel because a normal kexec will
      switch to the original boot CPU, which is enumerated in MADT, before
      jumping into the kexec kernel.
      
      The boot code already has a check for num_processors equal 0 in
      prefill_possible_map(). We can use that check as an indicator that the
      enumeration of the boot CPU did not happen and invoke generic_processor_info()
      for it. That initializes the relevant data for the boot CPU and therefore
      prevents subsequent failure.
      
      [ tglx: Refined the code and rewrote the changelog ]
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Fixes: 1f12e32f ("x86/topology: Create logical package id")
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: dyoung@redhat.com
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: kexec@lists.infradead.org
      Link: http://lkml.kernel.org/r/1475514432-27682-1-git-send-email-prarit@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2a51fe08
  17. 06 10月, 2016 1 次提交
    • J
      x86/unwind: Fix oprofile module link error · cfee9edd
      Josh Poimboeuf 提交于
      When compiling on x86 with CONFIG_OPROFILE=m and CONFIG_FRAME_POINTER=n,
      the oprofile module fails to link:
      
        ERROR: ftrace_graph_ret_addr" [arch/x86/oprofile/oprofile.ko] undefined!
      
      The problem was introduced when oprofile was converted to use the new
      x86 unwinder.  When frame pointers are disabled, the "guess" unwinder's
      unwind_get_return_address() is an inline function which calls
      ftrace_graph_ret_addr(), which is not exported.
      
      Fix it by converting the "guess" version of unwind_get_return_address()
      to an exported out-of-line function, just like its frame pointer
      counterpart.
      Reported-by: NKarl Beldan <karl.beldan@gmail.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: ec2ad9cc ("oprofile/x86: Convert x86_backtrace() to use the new unwinder")
      Link: http://lkml.kernel.org/r/be08d589f6474df78364e081c42777e382af9352.1475731632.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cfee9edd
  18. 05 10月, 2016 1 次提交
  19. 04 10月, 2016 1 次提交
    • M
      x86/irq: Prevent force migration of irqs which are not in the vector domain · db91aa79
      Mika Westerberg 提交于
      When a CPU is about to be offlined we call fixup_irqs() that resets IRQ
      affinities related to the CPU in question. The same thing is also done when
      the system is suspended to S-states like S3 (mem).
      
      For each IRQ we try to complete any on-going move regardless whether the
      IRQ is actually part of x86_vector_domain. For each IRQ descriptor we fetch
      its chip_data, assume it is of type struct apic_chip_data and manipulate it
      by clearing old_domain mask etc. For irq_chips that are not part of the
      x86_vector_domain, like those created by various GPIO drivers, will find
      their chip_data being changed unexpectly.
      
      Below is an example where GPIO chip owned by pinctrl-sunrisepoint.c gets
      corrupted after resume:
      
        # cat /sys/kernel/debug/gpio
        gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
         gpio-511 (                    |sysfs               ) in  hi
      
        # rtcwake -s10 -mmem
        <10 seconds passes>
      
        # cat /sys/kernel/debug/gpio
        gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
         gpio-511 (                    |sysfs               ) in  ?
      
      Note '?' in the output. It means the struct gpio_chip ->get function is
      NULL whereas before suspend it was there.
      
      Fix this by first checking that the IRQ belongs to x86_vector_domain before
      we try to use the chip_data as struct apic_chip_data.
      Reported-and-tested-by: NSakari Ailus <sakari.ailus@linux.intel.com>
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: stable@vger.kernel.org # 4.4+
      Link: http://lkml.kernel.org/r/20161003101708.34795-1-mika.westerberg@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      db91aa79
  20. 30 9月, 2016 5 次提交
  21. 27 9月, 2016 1 次提交