1. 21 11月, 2016 1 次提交
  2. 17 11月, 2016 2 次提交
  3. 16 11月, 2016 2 次提交
  4. 12 11月, 2016 1 次提交
    • A
      x86: apm: avoid uninitialized data · 3a6d8676
      Arnd Bergmann 提交于
      apm_bios_call() can fail, and return a status in its argument structure.
      If that status however is zero during a call from
      apm_get_power_status(), we end up using data that may have never been
      set, as reported by "gcc -Wmaybe-uninitialized":
      
        arch/x86/kernel/apm_32.c: In function ‘apm’:
        arch/x86/kernel/apm_32.c:1729:17: error: ‘bx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
        arch/x86/kernel/apm_32.c:1835:5: error: ‘cx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
        arch/x86/kernel/apm_32.c:1730:17: note: ‘cx’ was declared here
        arch/x86/kernel/apm_32.c:1842:27: error: ‘dx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
        arch/x86/kernel/apm_32.c:1731:17: note: ‘dx’ was declared here
      
      This changes the function to return "APM_NO_ERROR" here, which makes the
      code more robust to broken BIOS versions, and avoids the warning.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NJiri Kosina <jkosina@suse.cz>
      Reviewed-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3a6d8676
  5. 29 10月, 2016 1 次提交
    • T
      x86/smpboot: Init apic mapping before usage · 1e90a13d
      Thomas Gleixner 提交于
      The recent changes, which forced the registration of the boot cpu on UP
      systems, which do not have ACPI tables, have been fixed for systems w/o
      local APIC, but left a wreckage for systems which have neither ACPI nor
      mptables, but the CPU has an APIC, e.g. virtualbox.
      
      The boot process crashes in prefill_possible_map() as it wants to register
      the boot cpu, which needs to access the local apic, but the local APIC is
      not yet mapped.
      
      There is no reason why init_apic_mapping() can't be invoked before
      prefill_possible_map(). So instead of playing another silly early mapping
      game, as the ACPI/mptables code does, we just move init_apic_mapping()
      before the call to prefill_possible_map().
      
      In hindsight, I should have noticed that combination earlier.
      
      Sorry for the churn (also in stable)!
      
      Fixes: ff856051 ("x86/boot/smp: Don't try to poke disabled/non-existent APIC")
      Reported-and-debugged-by: NMichal Necasek <michal.necasek@oracle.com>
      Reported-and-tested-by: NWolfgang Bauer <wbauer@tmo.at>
      Cc: prarit@redhat.com
      Cc: ville.syrjala@linux.intel.com
      Cc: michael.thayer@oracle.com
      Cc: knut.osmundsen@oracle.com
      Cc: frank.mehnert@oracle.com
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610282114380.5053@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1e90a13d
  6. 28 10月, 2016 2 次提交
    • B
      x86/microcode/AMD: Fix more fallout from CONFIG_RANDOMIZE_MEMORY=y · 1c27f646
      Borislav Petkov 提交于
      We needed the physical address of the container in order to compute the
      offset within the relocated ramdisk. And we did this by doing __pa() on
      the virtual address.
      
      However, __pa() does checks whether the physical address is within
      PAGE_OFFSET and __START_KERNEL_map - see __phys_addr() - which fail
      if we have CONFIG_RANDOMIZE_MEMORY enabled: we feed a virtual address
      which *doesn't* have the randomization offset into a function which uses
      PAGE_OFFSET which *does* have that offset.
      
      This makes this check fire:
      
      	VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x));
      			^^^^^^
      
      due to the randomization offset.
      
      The fix is as simple as using __pa_nodebug() because we do that
      randomization offset accounting later in that function ourselves.
      Reported-by: NBob Peterson <rpeterso@redhat.com>
      Tested-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: stable@vger.kernel.org # 4.9
      Link: http://lkml.kernel.org/r/20161027123623.j2jri5bandimboff@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1c27f646
    • J
      x86/unwind: Ensure stack grows down · 24d86f59
      Josh Poimboeuf 提交于
      Add a sanity check to ensure the stack only grows down, and print a
      warning if the check fails.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161027131058.tpdffwlqipv7pcd6@trebleSigned-off-by: NIngo Molnar <mingo@kernel.org>
      24d86f59
  7. 27 10月, 2016 3 次提交
  8. 26 10月, 2016 3 次提交
    • S
      x86: Fix export for mcount and __fentry__ · 5de0a8c0
      Steven Rostedt 提交于
      Commit 784d5699 ("x86: move exports to actual definitions") removed the
      EXPORT_SYMBOL(__fentry__) and EXPORT_SYMBOL(mcount) from x8664_ksyms_64.c,
      and added EXPORT_SYMBOL(function_hook) in mcount_64.S instead. The problem
      is that function_hook isn't a function at all, but a macro that is defined
      as either mcount or __fentry__ depending on the support from gcc.
      
      Originally, I thought this was a macro issue, like what __stringify()
      is used for. But the problem is a bit deeper. The Makefile.build has
      some magic that does post processing of files to create the CRC
      bindings. It does some searches for EXPORT_SYMBOL() and because it
      finds a macro name and not the actual functions, this causes
      function_hook not to be converted into mcount or __fentry__ and they
      are missed.
      
      Instead of adding more magic to Makefile.build, just add
      EXPORT_SYMBOL() for mcount and __fentry__ where the ifdef is used.
      Since this is assembly and not C, it doesn't require being set after
      the function is defined.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Tested-by: NBorislav Petkov <bp@alien8.de>
      Cc: Gabriel C <nix.or.die@gmail.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Link: http://lkml.kernel.org/r/20161024150148.4f9d90e4@gandalf.local.homeSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5de0a8c0
    • J
      x86/dumpstack: Remove raw stack dump · 0ee1dd9f
      Josh Poimboeuf 提交于
      For mostly historical reasons, the x86 oops dump shows the raw stack
      values:
      
        ...
        [registers]
        Stack:
         ffff880079af7350 ffff880079905400 0000000000000000 ffffc900008f3ae0
         ffffffffa0196610 0000000000000001 00010000ffffffff 0000000087654321
         0000000000000002 0000000000000000 0000000000000000 0000000000000000
        Call Trace:
        ...
      
      This seems to be an artifact from long ago, and probably isn't needed
      anymore.  It generally just adds noise to the dump, and it can be
      actively harmful because it leaks kernel addresses.
      
      Linus says:
      
        "The stack dump actually goes back to forever, and it used to be
         useful back in 1992 or so. But it used to be useful mainly because
         stacks were simpler and we didn't have very good call traces anyway. I
         definitely remember having used them - I just do not remember having
         used them in the last ten+ years.
      
         Of course, it's still true that if you can trigger an oops, you've
         likely already lost the security game, but since the stack dump is so
         useless, let's aim to just remove it and make games like the above
         harder."
      
      This also removes the related 'kstack=' cmdline option and the
      'kstack_depth_to_print' sysctl.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/e83bd50df52d8fe88e94d2566426ae40d813bf8f.1477405374.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0ee1dd9f
    • J
      x86/dumpstack: Remove kernel text addresses from stack dump · bb5e5ce5
      Josh Poimboeuf 提交于
      Printing kernel text addresses in stack dumps is of questionable value,
      especially now that address randomization is becoming common.
      
      It can be a security issue because it leaks kernel addresses.  It also
      affects the usefulness of the stack dump.  Linus says:
      
        "I actually spend time cleaning up commit messages in logs, because
        useless data that isn't actually information (random hex numbers) is
        actively detrimental.
      
        It makes commit logs less legible.
      
        It also makes it harder to parse dumps.
      
        It's not useful. That makes it actively bad.
      
        I probably look at more oops reports than most people. I have not
        found the hex numbers useful for the last five years, because they are
        just randomized crap.
      
        The stack content thing just makes code scroll off the screen etc, for
        example."
      
      The only real downside to removing these addresses is that they can be
      used to disambiguate duplicate symbol names.  However such cases are
      rare, and the context of the stack dump should be enough to be able to
      figure it out.
      
      There's now a 'faddr2line' script which can be used to convert a
      function address to a file name and line:
      
        $ ./scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60
        write_sysrq_trigger+0x51/0x60:
        write_sysrq_trigger at drivers/tty/sysrq.c:1098
      
      Or gdb can be used:
      
        $ echo "list *write_sysrq_trigger+0x51" |gdb ~/k/vmlinux |grep "is in"
        (gdb) 0xffffffff815b5d83 is in driver_probe_device (/home/jpoimboe/git/linux/drivers/base/dd.c:378).
      
      (But note that when there are duplicate symbol names, gdb will only show
      the first symbol it finds.  faddr2line is recommended over gdb because
      it handles duplicates and it also does function size checking.)
      
      Here's an example of what a stack dump looks like after this change:
      
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: sysrq_handle_crash+0x45/0x80
        PGD 36bfa067 [   29.650644] PUD 7aca3067
        Oops: 0002 [#1] PREEMPT SMP
        Modules linked in: ...
        CPU: 1 PID: 786 Comm: bash Tainted: G            E   4.9.0-rc1+ #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
        task: ffff880078582a40 task.stack: ffffc90000ba8000
        RIP: 0010:sysrq_handle_crash+0x45/0x80
        RSP: 0018:ffffc90000babdc8 EFLAGS: 00010296
        RAX: ffff880078582a40 RBX: 0000000000000063 RCX: 0000000000000001
        RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000292
        RBP: ffffc90000babdc8 R08: 0000000b31866061 R09: 0000000000000000
        R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
        R13: 0000000000000007 R14: ffffffff81ee8680 R15: 0000000000000000
        FS:  00007ffb43869700(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 000000007a3e9000 CR4: 00000000001406e0
        Stack:
         ffffc90000babe00 ffffffff81572d08 ffffffff81572bd5 0000000000000002
         0000000000000000 ffff880079606600 00007ffb4386e000 ffffc90000babe20
         ffffffff81573201 ffff880036a3fd00 fffffffffffffffb ffffc90000babe40
        Call Trace:
         __handle_sysrq+0x138/0x220
         ? __handle_sysrq+0x5/0x220
         write_sysrq_trigger+0x51/0x60
         proc_reg_write+0x42/0x70
         __vfs_write+0x37/0x140
         ? preempt_count_sub+0xa1/0x100
         ? __sb_start_write+0xf5/0x210
         ? vfs_write+0x183/0x1a0
         vfs_write+0xb8/0x1a0
         SyS_write+0x58/0xc0
         entry_SYSCALL_64_fastpath+0x1f/0xc2
        RIP: 0033:0x7ffb42f55940
        RSP: 002b:00007ffd33bb6b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
        RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007ffb42f55940
        RDX: 0000000000000002 RSI: 00007ffb4386e000 RDI: 0000000000000001
        RBP: 0000000000000011 R08: 00007ffb4321ea40 R09: 00007ffb43869700
        R10: 00007ffb43869700 R11: 0000000000000246 R12: 0000000000778a10
        R13: 00007ffd33bb5c00 R14: 0000000000000007 R15: 0000000000000010
        Code: 34 e8 d0 34 bc ff 48 c7 c2 3b 2b 57 81 be 01 00 00 00 48 c7 c7 e0 dd e5 81 e8 a8 55 ba ff c7 05 0e 3f de 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 e8 4c 49 bc ff 84 c0 75 c3 48 c7
        RIP: sysrq_handle_crash+0x45/0x80 RSP: ffffc90000babdc8
        CR2: 0000000000000000
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/69329cb29b8f324bb5fcea14d61d224807fb6488.1477405374.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bb5e5ce5
  9. 25 10月, 2016 2 次提交
    • A
      x86/quirks: Hide maybe-uninitialized warning · d320b9a5
      Arnd Bergmann 提交于
      gcc -Wmaybe-uninitialized detects that quirk_intel_brickland_xeon_ras_cap
      uses uninitialized data when CONFIG_PCI is not set:
      
        arch/x86/kernel/quirks.c: In function ‘quirk_intel_brickland_xeon_ras_cap’:
        arch/x86/kernel/quirks.c:641:13: error: ‘capid0’ is used uninitialized in this function [-Werror=uninitialized]
      
      However, the function is also not called in this configuration, so we
      can avoid the warning by moving the existing #ifdef to cover it as well.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-pci@vger.kernel.org
      Link: http://lkml.kernel.org/r/20161024153325.2752428-1-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d320b9a5
    • J
      x86/unwind: Fix empty stack dereference in guess unwinder · 7fbe6ac0
      Josh Poimboeuf 提交于
      Vince Waver reported the following bug:
      
        WARNING: CPU: 0 PID: 21338 at arch/x86/mm/fault.c:435 vmalloc_fault+0x58/0x1f0
        CPU: 0 PID: 21338 Comm: perf_fuzzer Not tainted 4.8.0+ #37
        Hardware name: Hewlett-Packard HP Compaq Pro 6305 SFF/1850, BIOS K06 v02.57 08/16/2013
        Call Trace:
         <NMI>  ? dump_stack+0x46/0x59
         ? __warn+0xd5/0xee
         ? vmalloc_fault+0x58/0x1f0
         ? __do_page_fault+0x6d/0x48e
         ? perf_log_throttle+0xa4/0xf4
         ? trace_page_fault+0x22/0x30
         ? __unwind_start+0x28/0x42
         ? perf_callchain_kernel+0x75/0xac
         ? get_perf_callchain+0x13a/0x1f0
         ? perf_callchain+0x6a/0x6c
         ? perf_prepare_sample+0x71/0x2eb
         ? perf_event_output_forward+0x1a/0x54
         ? __default_send_IPI_shortcut+0x10/0x2d
         ? __perf_event_overflow+0xfb/0x167
         ? x86_pmu_handle_irq+0x113/0x150
         ? native_read_msr+0x6/0x34
         ? perf_event_nmi_handler+0x22/0x39
         ? perf_ibs_nmi_handler+0x4a/0x51
         ? perf_event_nmi_handler+0x22/0x39
         ? nmi_handle+0x4d/0xf0
         ? perf_ibs_handle_irq+0x3d1/0x3d1
         ? default_do_nmi+0x3c/0xd5
         ? do_nmi+0x92/0x102
         ? end_repeat_nmi+0x1a/0x1e
         ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
         ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
         ? entry_SYSCALL_64_after_swapgs+0x12/0x4a
         <EOE> ^A4---[ end trace 632723104d47d31a ]---
        BUG: stack guard page was hit at ffffc90008500000 (stack is ffffc900084fc000..ffffc900084fffff)
        kernel stack overflow (page fault): 0000 [#1] SMP
        ...
      
      The NMI hit in the entry code right after setting up the stack pointer
      from 'cpu_current_top_of_stack', so the kernel stack was empty.  The
      'guess' version of __unwind_start() attempted to dereference the "top of
      stack" pointer, which is not actually *on* the stack.
      
      Add a check in the guess unwinder to deal with an empty stack.  (The
      frame pointer unwinder already has such a check.)
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 7c7900f8 ("x86/unwind: Add new unwind interface and implementations")
      Link: http://lkml.kernel.org/r/20161024133127.e5evgeebdbohnmpb@trebleSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7fbe6ac0
  10. 24 10月, 2016 1 次提交
  11. 22 10月, 2016 1 次提交
    • V
      x86/boot/smp: Don't try to poke disabled/non-existent APIC · ff856051
      Ville Syrjälä 提交于
      Apparently trying to poke a disabled or non-existent APIC
      leads to a box that doesn't even boot. Let's not do that.
      
      No real clue if this is the right fix, but at least my
      P3 machine boots again.
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: dyoung@redhat.com
      Cc: kexec@lists.infradead.org
      Cc: stable@vger.kernel.org
      Fixes: 2a51fe08 ("arch/x86: Handle non enumerated CPU after physical hotplug")
      Link: http://lkml.kernel.org/r/1477102684-5092-1-git-send-email-ville.syrjala@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ff856051
  12. 21 10月, 2016 6 次提交
    • J
      x86/dumpstack: Print orig_ax in __show_regs() · 6fa81a12
      Josh Poimboeuf 提交于
      The value of regs->orig_ax contains potentially useful debugging data:
      For syscalls it contains the syscall number.  For interrupts it contains
      the (negated) vector number.  To reduce noise, print it only if it has a
      useful value (i.e., something other than -1).
      
      Here's what it looks like for a write syscall:
      
        RIP: 0033:[<00007f53ad7b1940>] 0x7f53ad7b1940
        RSP: 002b:00007fff8de66558 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
        RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f53ad7b1940
        RDX: 0000000000000002 RSI: 00007f53ae0ca000 RDI: 0000000000000001
        ...
      Suggested-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/93f0fe0307a4af884d3fca00edabcc8cff236002.1476973742.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6fa81a12
    • J
      x86/dumpstack: Fix duplicate RIP address display in __show_regs() · 1141c3e3
      Josh Poimboeuf 提交于
      The RIP address is shown twice in __show_regs().  Before:
      
        RIP: 0010:[<ffffffff81070446>]  [<ffffffff81070446>] native_write_msr+0x6/0x30
      
      After:
      
        RIP: 0010:[<ffffffff81070446>] native_write_msr+0x6/0x30
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/b3fda66f36761759b000883b059cdd9a7649dcc1.1476973742.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1141c3e3
    • J
      x86/dumpstack: Print any pt_regs found on the stack · 3b3fa11b
      Josh Poimboeuf 提交于
      Now that we can find pt_regs registers on the stack, print them.  Here's
      an example of what it looks like:
      
        Call Trace:
         <IRQ>
         [<ffffffff8144b793>] dump_stack+0x86/0xc3
         [<ffffffff81142c73>] hrtimer_interrupt+0xb3/0x1c0
         [<ffffffff8105eb86>] local_apic_timer_interrupt+0x36/0x60
         [<ffffffff818b27cd>] smp_apic_timer_interrupt+0x3d/0x50
         [<ffffffff818b06ee>] apic_timer_interrupt+0x9e/0xb0
        RIP: 0010:[<ffffffff818aef43>]  [<ffffffff818aef43>] _raw_spin_unlock_irq+0x33/0x60
        RSP: 0018:ffff880079c4f760  EFLAGS: 00000202
        RAX: ffff880078738000 RBX: ffff88007d3da0c0 RCX: 0000000000000007
        RDX: 0000000000006d78 RSI: ffff8800787388f0 RDI: ffff880078738000
        RBP: ffff880079c4f768 R08: 0000002199088f38 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81e0d540
        R13: ffff8800369fb700 R14: 0000000000000000 R15: ffff880078738000
         <EOI>
         [<ffffffff810e1f14>] finish_task_switch+0xb4/0x250
         [<ffffffff810e1ed6>] ? finish_task_switch+0x76/0x250
         [<ffffffff818a7b61>] __schedule+0x3e1/0xb20
         ...
         [<ffffffff810759c8>] trace_do_page_fault+0x58/0x2c0
         [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
         [<ffffffff818b1dd8>] async_page_fault+0x28/0x30
        RIP: 0010:[<ffffffff8145b062>]  [<ffffffff8145b062>] __clear_user+0x42/0x70
        RSP: 0018:ffff880079c4fd38  EFLAGS: 00010202
        RAX: 0000000000000000 RBX: 0000000000000138 RCX: 0000000000000138
        RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000061b640
        RBP: ffff880079c4fd48 R08: 0000002198feefd7 R09: ffffffff82a40928
        R10: 0000000000000001 R11: 0000000000000000 R12: 000000000061b640
        R13: 0000000000000000 R14: ffff880079c50000 R15: ffff8800791d7400
         [<ffffffff8145b043>] ? __clear_user+0x23/0x70
         [<ffffffff8145b0fb>] clear_user+0x2b/0x40
         [<ffffffff812fbda2>] load_elf_binary+0x1472/0x1750
         [<ffffffff8129a591>] search_binary_handler+0xa1/0x200
         [<ffffffff8129b69b>] do_execveat_common.isra.36+0x6cb/0x9f0
         [<ffffffff8129b5f3>] ? do_execveat_common.isra.36+0x623/0x9f0
         [<ffffffff8129bcaa>] SyS_execve+0x3a/0x50
         [<ffffffff81003f5c>] do_syscall_64+0x6c/0x1e0
         [<ffffffff818afa3f>] entry_SYSCALL64_slow_path+0x25/0x25
        RIP: 0033:[<00007fd2e2f2e537>]  [<00007fd2e2f2e537>] 0x7fd2e2f2e537
        RSP: 002b:00007ffc449c5fc8  EFLAGS: 00000246
        RAX: ffffffffffffffda RBX: 00007ffc449c8860 RCX: 00007fd2e2f2e537
        RDX: 000000000127cc40 RSI: 00007ffc449c8860 RDI: 00007ffc449c6029
        RBP: 00007ffc449c60b0 R08: 65726f632d667265 R09: 00007ffc449c5e20
        R10: 00000000000005a7 R11: 0000000000000246 R12: 000000000127cc40
        R13: 000000000127ce05 R14: 00007ffc449c6029 R15: 000000000127ce01
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/5cc2c512ec82cfba00dd22467644d4ed751a48c0.1476973742.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3b3fa11b
    • J
      x86/dumpstack: Print stack identifier on its own line · 79439d8e
      Josh Poimboeuf 提交于
      show_trace_log_lvl() prints the stack id (e.g. "<IRQ>") without a
      newline so that any stack address printed after it will appear on the
      same line.  That causes the first stack address to be vertically
      misaligned with the rest, making it visually cluttered and slightly
      confusing:
      
        Call Trace:
         <IRQ> [<ffffffff814431c3>] dump_stack+0x86/0xc3
         [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160
         [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0
         ...
         <EOI> [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60
         [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250
         [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
      
      It will look worse once we start printing pt_regs registers found in the
      middle of the stack:
      
        <IRQ> RIP: 0010:[<ffffffff8189c6c3>]  [<ffffffff8189c6c3>] _raw_spin_unlock_irq+0x33/0x60
        RSP: 0018:ffff88007876f720  EFLAGS: 00000206
        RAX: ffff8800786caa40 RBX: ffff88007d5da140 RCX: 0000000000000007
        ...
      
      Improve readability by adding a newline to the stack name:
      
        Call Trace:
         <IRQ>
         [<ffffffff814431c3>] dump_stack+0x86/0xc3
         [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160
         [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0
         ...
         <EOI>
         [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60
         [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250
         [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
      
      Now that "continued" lines are no longer needed, we can also remove the
      hack of using the empty string (aka KERN_CONT) and replace it with
      KERN_DEFAULT.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/9bdd6dee2c74555d45500939fcc155997dc7889e.1476973742.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      79439d8e
    • J
      x86/unwind: Create stack frames for saved syscall registers · acb4608a
      Josh Poimboeuf 提交于
      The entry code doesn't encode the pt_regs pointer for syscalls.  But the
      pt_regs are always at the same location, so we can add a manual check
      for them.
      
      A later patch prints them as part of the oops stack dump.  They could be
      useful, for example, to determine the arguments to a system call.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/e176aa9272930cd3f51fda0b94e2eae356677da4.1476973742.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      acb4608a
    • J
      x86/entry/unwind: Create stack frames for saved interrupt registers · 946c1911
      Josh Poimboeuf 提交于
      With frame pointers, when a task is interrupted, its stack is no longer
      completely reliable because the function could have been interrupted
      before it had a chance to save the previous frame pointer on the stack.
      So the caller of the interrupted function could get skipped by a stack
      trace.
      
      This is problematic for live patching, which needs to know whether a
      stack trace of a sleeping task can be relied upon.  There's currently no
      way to detect if a sleeping task was interrupted by a page fault
      exception or preemption before it went to sleep.
      
      Another issue is that when dumping the stack of an interrupted task, the
      unwinder has no way of knowing where the saved pt_regs registers are, so
      it can't print them.
      
      This solves those issues by encoding the pt_regs pointer in the frame
      pointer on entry from an interrupt or an exception.
      
      This patch also updates the unwinder to be able to decode it, because
      otherwise the unwinder would be broken by this change.
      
      Note that this causes a change in the behavior of the unwinder: each
      instance of a pt_regs on the stack is now considered a "frame".  So
      callers of unwind_get_return_address() will now get an occasional
      'regs->ip' address that would have previously been skipped over.
      Suggested-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/8b9f84a21e39d249049e0547b559ff8da0df0988.1476973742.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      946c1911
  13. 20 10月, 2016 8 次提交
  14. 19 10月, 2016 3 次提交
  15. 16 10月, 2016 3 次提交
    • D
      x86/e820: Don't merge consecutive E820_PRAM ranges · 23446cb6
      Dan Williams 提交于
      Commit:
      
        917db484 ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation")
      
      ... fixed up the broken manipulations of max_pfn in the presence of
      E820_PRAM ranges.
      
      However, it also broke the sanitize_e820_map() support for not merging
      E820_PRAM ranges.
      
      Re-introduce the enabling to keep resource boundaries between
      consecutive defined ranges. Otherwise, for example, an environment that
      boots with memmap=2G!8G,2G!10G will end up with a single 4G /dev/pmem0
      device instead of a /dev/pmem0 and /dev/pmem1 device 2G in size.
      Reported-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Zhang Yi <yizhan@redhat.com>
      Cc: linux-nvdimm@lists.01.org
      Fixes: 917db484 ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation")
      Link: http://lkml.kernel.org/r/147629530854.10618.10383744751594021268.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      23446cb6
    • D
      kprobes: Unpoison stack in jprobe_return() for KASAN · 9f7d416c
      Dmitry Vyukov 提交于
      I observed false KSAN positives in the sctp code, when
      sctp uses jprobe_return() in jsctp_sf_eat_sack().
      
      The stray 0xf4 in shadow memory are stack redzones:
      
      [     ] ==================================================================
      [     ] BUG: KASAN: stack-out-of-bounds in memcmp+0xe9/0x150 at addr ffff88005e48f480
      [     ] Read of size 1 by task syz-executor/18535
      [     ] page:ffffea00017923c0 count:0 mapcount:0 mapping:          (null) index:0x0
      [     ] flags: 0x1fffc0000000000()
      [     ] page dumped because: kasan: bad access detected
      [     ] CPU: 1 PID: 18535 Comm: syz-executor Not tainted 4.8.0+ #28
      [     ] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      [     ]  ffff88005e48f2d0 ffffffff82d2b849 ffffffff0bc91e90 fffffbfff10971e8
      [     ]  ffffed000bc91e90 ffffed000bc91e90 0000000000000001 0000000000000000
      [     ]  ffff88005e48f480 ffff88005e48f350 ffffffff817d3169 ffff88005e48f370
      [     ] Call Trace:
      [     ]  [<ffffffff82d2b849>] dump_stack+0x12e/0x185
      [     ]  [<ffffffff817d3169>] kasan_report+0x489/0x4b0
      [     ]  [<ffffffff817d31a9>] __asan_report_load1_noabort+0x19/0x20
      [     ]  [<ffffffff82d49529>] memcmp+0xe9/0x150
      [     ]  [<ffffffff82df7486>] depot_save_stack+0x176/0x5c0
      [     ]  [<ffffffff817d2031>] save_stack+0xb1/0xd0
      [     ]  [<ffffffff817d27f2>] kasan_slab_free+0x72/0xc0
      [     ]  [<ffffffff817d05b8>] kfree+0xc8/0x2a0
      [     ]  [<ffffffff85b03f19>] skb_free_head+0x79/0xb0
      [     ]  [<ffffffff85b0900a>] skb_release_data+0x37a/0x420
      [     ]  [<ffffffff85b090ff>] skb_release_all+0x4f/0x60
      [     ]  [<ffffffff85b11348>] consume_skb+0x138/0x370
      [     ]  [<ffffffff8676ad7b>] sctp_chunk_put+0xcb/0x180
      [     ]  [<ffffffff8676ae88>] sctp_chunk_free+0x58/0x70
      [     ]  [<ffffffff8677fa5f>] sctp_inq_pop+0x68f/0xef0
      [     ]  [<ffffffff8675ee36>] sctp_assoc_bh_rcv+0xd6/0x4b0
      [     ]  [<ffffffff8677f2c1>] sctp_inq_push+0x131/0x190
      [     ]  [<ffffffff867bad69>] sctp_backlog_rcv+0xe9/0xa20
      [ ... ]
      [     ] Memory state around the buggy address:
      [     ]  ffff88005e48f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ]  ffff88005e48f400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ] >ffff88005e48f480: f4 f4 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ]                    ^
      [     ]  ffff88005e48f500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ]  ffff88005e48f580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ] ==================================================================
      
      KASAN stack instrumentation poisons stack redzones on function entry
      and unpoisons them on function exit. If a function exits abnormally
      (e.g. with a longjmp like jprobe_return()), stack redzones are left
      poisoned. Later this leads to random KASAN false reports.
      
      Unpoison stack redzones in the frames we are going to jump over
      before doing actual longjmp in jprobe_return().
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: kasan-dev@googlegroups.com
      Cc: surovegin@google.com
      Cc: rostedt@goodmis.org
      Link: http://lkml.kernel.org/r/1476454043-101898-1-git-send-email-dvyukov@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9f7d416c
    • D
      kprobes: Avoid false KASAN reports during stack copy · 9254139a
      Dmitry Vyukov 提交于
      Kprobes save and restore raw stack chunks with memcpy().
      With KASAN these chunks can contain poisoned stack redzones,
      as the result memcpy() interceptor produces false
      stack out-of-bounds reports.
      
      Use __memcpy() instead of memcpy() for stack copying.
      __memcpy() is not instrumented by KASAN and does not lead
      to the false reports.
      
      Currently there is a spew of KASAN reports during boot
      if CONFIG_KPROBES_SANITY_TEST is enabled:
      
      [   ] Kprobe smoke test: started
      [   ] ==================================================================
      [   ] BUG: KASAN: stack-out-of-bounds in setjmp_pre_handler+0x17c/0x280 at addr ffff88085259fba8
      [   ] Read of size 64 by task swapper/0/1
      [   ] page:ffffea00214967c0 count:0 mapcount:0 mapping:          (null) index:0x0
      [   ] flags: 0x2fffff80000000()
      [   ] page dumped because: kasan: bad access detected
      [...]
      Reported-by: NCAI Qian <caiqian@redhat.com>
      Tested-by: NCAI Qian <caiqian@redhat.com>
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kasan-dev@googlegroups.com
      [ Improved various details. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9254139a
  16. 14 10月, 2016 1 次提交
    • W
      x86/smp: Add irq_enter/exit() in smp_reschedule_interrupt() · 1ec6ec14
      Wanpeng Li 提交于
       ===============================
       [ INFO: suspicious RCU usage. ]
       4.8.0+ #24 Not tainted
       -------------------------------
       ./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!
       
       other info that might help us debug this:
       
       RCU used illegally from idle CPU!
       rcu_scheduler_active = 1, debug_locks = 0
       RCU used illegally from extended quiescent state!
       no locks held by swapper/1/0.
       
        [<ffffffff9d492b95>] do_trace_write_msr+0x135/0x140
        [<ffffffff9d06f860>] native_write_msr+0x20/0x30
        [<ffffffff9d065fad>] native_apic_msr_eoi_write+0x1d/0x30
        [<ffffffff9d05bd1d>] smp_reschedule_interrupt+0x1d/0x30
        [<ffffffff9d8daec6>] reschedule_interrupt+0x96/0xa0
      
      Reschedule interrupt may be called in cpu idle state. This causes lockdep 
      check warning above. 
      
      Add irq_enter/exit() in smp_reschedule_interrupt(), irq_enter() tells the RCU 
      subsystems to end the extended quiescent state, so the following trace call in 
      ack_APIC_irq() works correctly.
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Link: http://lkml.kernel.org/r/1476409733-5133-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1ec6ec14