1. 28 10月, 2016 2 次提交
  2. 27 10月, 2016 3 次提交
  3. 26 10月, 2016 6 次提交
    • P
      x86/decoder: Use stderr if insn sanity test fails · bb12d674
      Paul Bolle 提交于
      If the instruction sanity test fails, it prints a "Failure" message to
      stdout. Make this program behave like the rest of the build and print
      that message to stderr.
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1477428965-20548-3-git-send-email-pebolle@tiscali.nlSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bb12d674
    • P
      x86/decoder: Use stdout if insn decoder test is successful · bdcc18b5
      Paul Bolle 提交于
      If the instruction decoder test ran successful it prints a message like
      this to stderr:
          Succeed: decoded and checked 1767380 instructions
      
      But, as described in "console mode programming user interface guidelines
      version 101" which doesn't exist, programs should use stderr for errors
      or warnings. We're told about a successful run here, so the instruction
      decoder test should use stdout.
      
      Let's fix the typo too, while we're at it.
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1477428965-20548-2-git-send-email-pebolle@tiscali.nlSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bdcc18b5
    • J
      mm/page_alloc: Remove kernel address exposure in free_reserved_area() · adb1fe9a
      Josh Poimboeuf 提交于
      Linus suggested we try to remove some of the low-hanging fruit related
      to kernel address exposure in dmesg.  The only leaks I see on my local
      system are:
      
        Freeing SMP alternatives memory: 32K (ffffffff9e309000 - ffffffff9e311000)
        Freeing initrd memory: 10588K (ffffa0b736b42000 - ffffa0b737599000)
        Freeing unused kernel memory: 3592K (ffffffff9df87000 - ffffffff9e309000)
        Freeing unused kernel memory: 1352K (ffffa0b7288ae000 - ffffa0b728a00000)
        Freeing unused kernel memory: 632K (ffffa0b728d62000 - ffffa0b728e00000)
      
      Linus says:
      
        "I suspect we should just remove [the addresses in the 'Freeing'
         messages]. I'm sure they are useful in theory, but I suspect they
         were more useful back when the whole "free init memory" was
         originally done.
      
         These days, if we have a use-after-free, I suspect the init-mem
         situation is the easiest situation by far. Compared to all the dynamic
         allocations which are much more likely to show it anyway. So having
         debug output for that case is likely not all that productive."
      
      With this patch the freeing messages now look like this:
      
        Freeing SMP alternatives memory: 32K
        Freeing initrd memory: 10588K
        Freeing unused kernel memory: 3592K
        Freeing unused kernel memory: 1352K
        Freeing unused kernel memory: 632K
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/6836ff90c45b71d38e5d4405aec56fa9e5d1d4b2.1477405374.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      adb1fe9a
    • J
      x86/dumpstack: Remove raw stack dump · 0ee1dd9f
      Josh Poimboeuf 提交于
      For mostly historical reasons, the x86 oops dump shows the raw stack
      values:
      
        ...
        [registers]
        Stack:
         ffff880079af7350 ffff880079905400 0000000000000000 ffffc900008f3ae0
         ffffffffa0196610 0000000000000001 00010000ffffffff 0000000087654321
         0000000000000002 0000000000000000 0000000000000000 0000000000000000
        Call Trace:
        ...
      
      This seems to be an artifact from long ago, and probably isn't needed
      anymore.  It generally just adds noise to the dump, and it can be
      actively harmful because it leaks kernel addresses.
      
      Linus says:
      
        "The stack dump actually goes back to forever, and it used to be
         useful back in 1992 or so. But it used to be useful mainly because
         stacks were simpler and we didn't have very good call traces anyway. I
         definitely remember having used them - I just do not remember having
         used them in the last ten+ years.
      
         Of course, it's still true that if you can trigger an oops, you've
         likely already lost the security game, but since the stack dump is so
         useless, let's aim to just remove it and make games like the above
         harder."
      
      This also removes the related 'kstack=' cmdline option and the
      'kstack_depth_to_print' sysctl.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/e83bd50df52d8fe88e94d2566426ae40d813bf8f.1477405374.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0ee1dd9f
    • J
      x86/dumpstack: Remove kernel text addresses from stack dump · bb5e5ce5
      Josh Poimboeuf 提交于
      Printing kernel text addresses in stack dumps is of questionable value,
      especially now that address randomization is becoming common.
      
      It can be a security issue because it leaks kernel addresses.  It also
      affects the usefulness of the stack dump.  Linus says:
      
        "I actually spend time cleaning up commit messages in logs, because
        useless data that isn't actually information (random hex numbers) is
        actively detrimental.
      
        It makes commit logs less legible.
      
        It also makes it harder to parse dumps.
      
        It's not useful. That makes it actively bad.
      
        I probably look at more oops reports than most people. I have not
        found the hex numbers useful for the last five years, because they are
        just randomized crap.
      
        The stack content thing just makes code scroll off the screen etc, for
        example."
      
      The only real downside to removing these addresses is that they can be
      used to disambiguate duplicate symbol names.  However such cases are
      rare, and the context of the stack dump should be enough to be able to
      figure it out.
      
      There's now a 'faddr2line' script which can be used to convert a
      function address to a file name and line:
      
        $ ./scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60
        write_sysrq_trigger+0x51/0x60:
        write_sysrq_trigger at drivers/tty/sysrq.c:1098
      
      Or gdb can be used:
      
        $ echo "list *write_sysrq_trigger+0x51" |gdb ~/k/vmlinux |grep "is in"
        (gdb) 0xffffffff815b5d83 is in driver_probe_device (/home/jpoimboe/git/linux/drivers/base/dd.c:378).
      
      (But note that when there are duplicate symbol names, gdb will only show
      the first symbol it finds.  faddr2line is recommended over gdb because
      it handles duplicates and it also does function size checking.)
      
      Here's an example of what a stack dump looks like after this change:
      
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: sysrq_handle_crash+0x45/0x80
        PGD 36bfa067 [   29.650644] PUD 7aca3067
        Oops: 0002 [#1] PREEMPT SMP
        Modules linked in: ...
        CPU: 1 PID: 786 Comm: bash Tainted: G            E   4.9.0-rc1+ #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
        task: ffff880078582a40 task.stack: ffffc90000ba8000
        RIP: 0010:sysrq_handle_crash+0x45/0x80
        RSP: 0018:ffffc90000babdc8 EFLAGS: 00010296
        RAX: ffff880078582a40 RBX: 0000000000000063 RCX: 0000000000000001
        RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000292
        RBP: ffffc90000babdc8 R08: 0000000b31866061 R09: 0000000000000000
        R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
        R13: 0000000000000007 R14: ffffffff81ee8680 R15: 0000000000000000
        FS:  00007ffb43869700(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 000000007a3e9000 CR4: 00000000001406e0
        Stack:
         ffffc90000babe00 ffffffff81572d08 ffffffff81572bd5 0000000000000002
         0000000000000000 ffff880079606600 00007ffb4386e000 ffffc90000babe20
         ffffffff81573201 ffff880036a3fd00 fffffffffffffffb ffffc90000babe40
        Call Trace:
         __handle_sysrq+0x138/0x220
         ? __handle_sysrq+0x5/0x220
         write_sysrq_trigger+0x51/0x60
         proc_reg_write+0x42/0x70
         __vfs_write+0x37/0x140
         ? preempt_count_sub+0xa1/0x100
         ? __sb_start_write+0xf5/0x210
         ? vfs_write+0x183/0x1a0
         vfs_write+0xb8/0x1a0
         SyS_write+0x58/0xc0
         entry_SYSCALL_64_fastpath+0x1f/0xc2
        RIP: 0033:0x7ffb42f55940
        RSP: 002b:00007ffd33bb6b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
        RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007ffb42f55940
        RDX: 0000000000000002 RSI: 00007ffb4386e000 RDI: 0000000000000001
        RBP: 0000000000000011 R08: 00007ffb4321ea40 R09: 00007ffb43869700
        R10: 00007ffb43869700 R11: 0000000000000246 R12: 0000000000778a10
        R13: 00007ffd33bb5c00 R14: 0000000000000007 R15: 0000000000000010
        Code: 34 e8 d0 34 bc ff 48 c7 c2 3b 2b 57 81 be 01 00 00 00 48 c7 c7 e0 dd e5 81 e8 a8 55 ba ff c7 05 0e 3f de 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 e8 4c 49 bc ff 84 c0 75 c3 48 c7
        RIP: sysrq_handle_crash+0x45/0x80 RSP: ffffc90000babdc8
        CR2: 0000000000000000
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/69329cb29b8f324bb5fcea14d61d224807fb6488.1477405374.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bb5e5ce5
    • J
      scripts/faddr2line: Fix "size mismatch" error · efdb4167
      Josh Poimboeuf 提交于
      I'm not sure how we missed this problem before.  When I take a function
      address and size from an oops and give it to faddr2line, it usually
      complains about a size mismatch:
      
        $ scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60
        skipping write_sysrq_trigger address at 0xffffffff815731a1 due to size mismatch (0x60 != 83)
        no match for write_sysrq_trigger+0x51/0x60
      
      The problem is caused by differences in how kallsyms and faddr2line
      determine a function's size.
      
      kallsyms calculates a function's size by parsing the output of 'nm -n'
      and subtracting the next function's address from the current function's
      address.  This means that nop instructions after the end of the function
      are included in the size.
      
      In contrast, faddr2line reads the size from the symbol table, which does
      *not* include the ending nops in the function's size.
      
      Change faddr2line to calculate the size from the output of 'nm -n' to be
      consistent with kallsyms and oops outputs.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/bd313ed7c4003f6b1fda63e825325c44a9d837de.1477405374.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      efdb4167
  4. 25 10月, 2016 1 次提交
  5. 21 10月, 2016 7 次提交
    • J
      x86/dumpstack: Print orig_ax in __show_regs() · 6fa81a12
      Josh Poimboeuf 提交于
      The value of regs->orig_ax contains potentially useful debugging data:
      For syscalls it contains the syscall number.  For interrupts it contains
      the (negated) vector number.  To reduce noise, print it only if it has a
      useful value (i.e., something other than -1).
      
      Here's what it looks like for a write syscall:
      
        RIP: 0033:[<00007f53ad7b1940>] 0x7f53ad7b1940
        RSP: 002b:00007fff8de66558 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
        RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f53ad7b1940
        RDX: 0000000000000002 RSI: 00007f53ae0ca000 RDI: 0000000000000001
        ...
      Suggested-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/93f0fe0307a4af884d3fca00edabcc8cff236002.1476973742.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6fa81a12
    • J
      x86/dumpstack: Fix duplicate RIP address display in __show_regs() · 1141c3e3
      Josh Poimboeuf 提交于
      The RIP address is shown twice in __show_regs().  Before:
      
        RIP: 0010:[<ffffffff81070446>]  [<ffffffff81070446>] native_write_msr+0x6/0x30
      
      After:
      
        RIP: 0010:[<ffffffff81070446>] native_write_msr+0x6/0x30
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/b3fda66f36761759b000883b059cdd9a7649dcc1.1476973742.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1141c3e3
    • J
      x86/dumpstack: Print any pt_regs found on the stack · 3b3fa11b
      Josh Poimboeuf 提交于
      Now that we can find pt_regs registers on the stack, print them.  Here's
      an example of what it looks like:
      
        Call Trace:
         <IRQ>
         [<ffffffff8144b793>] dump_stack+0x86/0xc3
         [<ffffffff81142c73>] hrtimer_interrupt+0xb3/0x1c0
         [<ffffffff8105eb86>] local_apic_timer_interrupt+0x36/0x60
         [<ffffffff818b27cd>] smp_apic_timer_interrupt+0x3d/0x50
         [<ffffffff818b06ee>] apic_timer_interrupt+0x9e/0xb0
        RIP: 0010:[<ffffffff818aef43>]  [<ffffffff818aef43>] _raw_spin_unlock_irq+0x33/0x60
        RSP: 0018:ffff880079c4f760  EFLAGS: 00000202
        RAX: ffff880078738000 RBX: ffff88007d3da0c0 RCX: 0000000000000007
        RDX: 0000000000006d78 RSI: ffff8800787388f0 RDI: ffff880078738000
        RBP: ffff880079c4f768 R08: 0000002199088f38 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81e0d540
        R13: ffff8800369fb700 R14: 0000000000000000 R15: ffff880078738000
         <EOI>
         [<ffffffff810e1f14>] finish_task_switch+0xb4/0x250
         [<ffffffff810e1ed6>] ? finish_task_switch+0x76/0x250
         [<ffffffff818a7b61>] __schedule+0x3e1/0xb20
         ...
         [<ffffffff810759c8>] trace_do_page_fault+0x58/0x2c0
         [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
         [<ffffffff818b1dd8>] async_page_fault+0x28/0x30
        RIP: 0010:[<ffffffff8145b062>]  [<ffffffff8145b062>] __clear_user+0x42/0x70
        RSP: 0018:ffff880079c4fd38  EFLAGS: 00010202
        RAX: 0000000000000000 RBX: 0000000000000138 RCX: 0000000000000138
        RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000061b640
        RBP: ffff880079c4fd48 R08: 0000002198feefd7 R09: ffffffff82a40928
        R10: 0000000000000001 R11: 0000000000000000 R12: 000000000061b640
        R13: 0000000000000000 R14: ffff880079c50000 R15: ffff8800791d7400
         [<ffffffff8145b043>] ? __clear_user+0x23/0x70
         [<ffffffff8145b0fb>] clear_user+0x2b/0x40
         [<ffffffff812fbda2>] load_elf_binary+0x1472/0x1750
         [<ffffffff8129a591>] search_binary_handler+0xa1/0x200
         [<ffffffff8129b69b>] do_execveat_common.isra.36+0x6cb/0x9f0
         [<ffffffff8129b5f3>] ? do_execveat_common.isra.36+0x623/0x9f0
         [<ffffffff8129bcaa>] SyS_execve+0x3a/0x50
         [<ffffffff81003f5c>] do_syscall_64+0x6c/0x1e0
         [<ffffffff818afa3f>] entry_SYSCALL64_slow_path+0x25/0x25
        RIP: 0033:[<00007fd2e2f2e537>]  [<00007fd2e2f2e537>] 0x7fd2e2f2e537
        RSP: 002b:00007ffc449c5fc8  EFLAGS: 00000246
        RAX: ffffffffffffffda RBX: 00007ffc449c8860 RCX: 00007fd2e2f2e537
        RDX: 000000000127cc40 RSI: 00007ffc449c8860 RDI: 00007ffc449c6029
        RBP: 00007ffc449c60b0 R08: 65726f632d667265 R09: 00007ffc449c5e20
        R10: 00000000000005a7 R11: 0000000000000246 R12: 000000000127cc40
        R13: 000000000127ce05 R14: 00007ffc449c6029 R15: 000000000127ce01
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/5cc2c512ec82cfba00dd22467644d4ed751a48c0.1476973742.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3b3fa11b
    • J
      x86/dumpstack: Print stack identifier on its own line · 79439d8e
      Josh Poimboeuf 提交于
      show_trace_log_lvl() prints the stack id (e.g. "<IRQ>") without a
      newline so that any stack address printed after it will appear on the
      same line.  That causes the first stack address to be vertically
      misaligned with the rest, making it visually cluttered and slightly
      confusing:
      
        Call Trace:
         <IRQ> [<ffffffff814431c3>] dump_stack+0x86/0xc3
         [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160
         [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0
         ...
         <EOI> [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60
         [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250
         [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
      
      It will look worse once we start printing pt_regs registers found in the
      middle of the stack:
      
        <IRQ> RIP: 0010:[<ffffffff8189c6c3>]  [<ffffffff8189c6c3>] _raw_spin_unlock_irq+0x33/0x60
        RSP: 0018:ffff88007876f720  EFLAGS: 00000206
        RAX: ffff8800786caa40 RBX: ffff88007d5da140 RCX: 0000000000000007
        ...
      
      Improve readability by adding a newline to the stack name:
      
        Call Trace:
         <IRQ>
         [<ffffffff814431c3>] dump_stack+0x86/0xc3
         [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160
         [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0
         ...
         <EOI>
         [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60
         [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250
         [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
      
      Now that "continued" lines are no longer needed, we can also remove the
      hack of using the empty string (aka KERN_CONT) and replace it with
      KERN_DEFAULT.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/9bdd6dee2c74555d45500939fcc155997dc7889e.1476973742.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      79439d8e
    • J
      x86/unwind: Create stack frames for saved syscall registers · acb4608a
      Josh Poimboeuf 提交于
      The entry code doesn't encode the pt_regs pointer for syscalls.  But the
      pt_regs are always at the same location, so we can add a manual check
      for them.
      
      A later patch prints them as part of the oops stack dump.  They could be
      useful, for example, to determine the arguments to a system call.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/e176aa9272930cd3f51fda0b94e2eae356677da4.1476973742.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      acb4608a
    • J
      x86/entry/unwind: Create stack frames for saved interrupt registers · 946c1911
      Josh Poimboeuf 提交于
      With frame pointers, when a task is interrupted, its stack is no longer
      completely reliable because the function could have been interrupted
      before it had a chance to save the previous frame pointer on the stack.
      So the caller of the interrupted function could get skipped by a stack
      trace.
      
      This is problematic for live patching, which needs to know whether a
      stack trace of a sleeping task can be relied upon.  There's currently no
      way to detect if a sleeping task was interrupted by a page fault
      exception or preemption before it went to sleep.
      
      Another issue is that when dumping the stack of an interrupted task, the
      unwinder has no way of knowing where the saved pt_regs registers are, so
      it can't print them.
      
      This solves those issues by encoding the pt_regs pointer in the frame
      pointer on entry from an interrupt or an exception.
      
      This patch also updates the unwinder to be able to decode it, because
      otherwise the unwinder would be broken by this change.
      
      Note that this causes a change in the behavior of the unwinder: each
      instance of a pt_regs on the stack is now considered a "frame".  So
      callers of unwind_get_return_address() will now get an occasional
      'regs->ip' address that would have previously been skipped over.
      Suggested-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/8b9f84a21e39d249049e0547b559ff8da0df0988.1476973742.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      946c1911
    • A
      entry/64: Remove unused ZERO_EXTRA_REGS macro · 29a6d796
      Alexander Kuleshov 提交于
      Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Link: http://lkml.kernel.org/r/20161020120704.24042-1-kuleshovmail@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      29a6d796
  6. 20 10月, 2016 16 次提交
  7. 19 10月, 2016 5 次提交
    • L
      Merge branch 'gup_flag-cleanups' · 63ae602c
      Linus Torvalds 提交于
      Merge the gup_flags cleanups from Lorenzo Stoakes:
       "This patch series adjusts functions in the get_user_pages* family such
        that desired FOLL_* flags are passed as an argument rather than
        implied by flags.
      
        The purpose of this change is to make the use of FOLL_FORCE explicit
        so it is easier to grep for and clearer to callers that this flag is
        being used.  The use of FOLL_FORCE is an issue as it overrides missing
        VM_READ/VM_WRITE flags for the VMA whose pages we are reading
        from/writing to, which can result in surprising behaviour.
      
        The patch series came out of the discussion around commit 38e08854
        ("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"),
        which addressed a BUG_ON() being triggered when a page was faulted in
        with PROT_NONE set but having been overridden by FOLL_FORCE.
        do_numa_page() was run on the assumption the page _must_ be one marked
        for NUMA node migration as an actual PROT_NONE page would have been
        dealt with prior to this code path, however FOLL_FORCE introduced a
        situation where this assumption did not hold.
      
        See
      
            https://marc.info/?l=linux-mm&m=147585445805166
      
        for the patch proposal"
      
      Additionally, there's a fix for an ancient bug related to FOLL_FORCE and
      FOLL_WRITE by me.
      
      [ This branch was rebased recently to add a few more acked-by's and
        reviewed-by's ]
      
      * gup_flag-cleanups:
        mm: replace access_process_vm() write parameter with gup_flags
        mm: replace access_remote_vm() write parameter with gup_flags
        mm: replace __access_remote_vm() write parameter with gup_flags
        mm: replace get_user_pages_remote() write/force parameters with gup_flags
        mm: replace get_user_pages() write/force parameters with gup_flags
        mm: replace get_vaddr_frames() write/force parameters with gup_flags
        mm: replace get_user_pages_locked() write/force parameters with gup_flags
        mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
        mm: remove write/force parameters from __get_user_pages_unlocked()
        mm: remove write/force parameters from __get_user_pages_locked()
        mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
      63ae602c
    • L
      mm: replace access_process_vm() write parameter with gup_flags · f307ab6d
      Lorenzo Stoakes 提交于
      This removes the 'write' argument from access_process_vm() and replaces
      it with 'gup_flags' as use of this function previously silently implied
      FOLL_FORCE, whereas after this patch callers explicitly pass this flag.
      
      We make this explicit as use of FOLL_FORCE can result in surprising
      behaviour (and hence bugs) within the mm subsystem.
      Signed-off-by: NLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f307ab6d
    • L
      mm: replace access_remote_vm() write parameter with gup_flags · 6347e8d5
      Lorenzo Stoakes 提交于
      This removes the 'write' argument from access_remote_vm() and replaces
      it with 'gup_flags' as use of this function previously silently implied
      FOLL_FORCE, whereas after this patch callers explicitly pass this flag.
      
      We make this explicit as use of FOLL_FORCE can result in surprising
      behaviour (and hence bugs) within the mm subsystem.
      Signed-off-by: NLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6347e8d5
    • L
      mm: replace __access_remote_vm() write parameter with gup_flags · 442486ec
      Lorenzo Stoakes 提交于
      This removes the 'write' argument from __access_remote_vm() and replaces
      it with 'gup_flags' as use of this function previously silently implied
      FOLL_FORCE, whereas after this patch callers explicitly pass this flag.
      
      We make this explicit as use of FOLL_FORCE can result in surprising
      behaviour (and hence bugs) within the mm subsystem.
      Signed-off-by: NLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      442486ec
    • L
      mm: replace get_user_pages_remote() write/force parameters with gup_flags · 9beae1ea
      Lorenzo Stoakes 提交于
      This removes the 'write' and 'force' from get_user_pages_remote() and
      replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
      callers as use of this flag can result in surprising behaviour (and
      hence bugs) within the mm subsystem.
      Signed-off-by: NLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9beae1ea