1. 04 6月, 2015 1 次提交
    • I
      x86/asm/entry: Move entry_64.S and entry_32.S to arch/x86/entry/ · 905a36a2
      Ingo Molnar 提交于
      Create a new directory hierarchy for the low level x86 entry code:
      
          arch/x86/entry/*
      
      This will host all the low level glue that is currently scattered
      all across arch/x86/.
      
      Start with entry_64.S and entry_32.S.
      
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      905a36a2
  2. 02 6月, 2015 2 次提交
    • J
      x86/asm/entry/64: Fold identical code paths · 2f63b9db
      Jan Beulich 提交于
      retint_kernel doesn't require %rcx to be pointing to thread info
      (anymore?), and the code on the two alternative paths is - not
      really surprisingly - identical.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/556C664F020000780007FB64@mail.emea.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2f63b9db
    • I
      x86/debug: Remove perpetually broken, unmaintainable dwarf annotations · 131484c8
      Ingo Molnar 提交于
      So the dwarf2 annotations in low level assembly code have
      become an increasing hindrance: unreadable, messy macros
      mixed into some of the most security sensitive code paths
      of the Linux kernel.
      
      These debug info annotations don't even buy the upstream
      kernel anything: dwarf driven stack unwinding has caused
      problems in the past so it's out of tree, and the upstream
      kernel only uses the much more robust framepointers based
      stack unwinding method.
      
      In addition to that there's a steady, slow bitrot going
      on with these annotations, requiring frequent fixups.
      There's no tooling and no functionality upstream that
      keeps it correct.
      
      So burn down the sick forest, allowing new, healthier growth:
      
         27 files changed, 350 insertions(+), 1101 deletions(-)
      
      Someone who has the willingness and time to do this
      properly can attempt to reintroduce dwarf debuginfo in x86
      assembly code plus dwarf unwinding from first principles,
      with the following conditions:
      
       - it should be maximally readable, and maximally low-key to
         'ordinary' code reading and maintenance.
      
       - find a build time method to insert dwarf annotations
         automatically in the most common cases, for pop/push
         instructions that manipulate the stack pointer. This could
         be done for example via a preprocessing step that just
         looks for common patterns - plus special annotations for
         the few cases where we want to depart from the default.
         We have hundreds of CFI annotations, so automating most of
         that makes sense.
      
       - it should come with build tooling checks that ensure that
         CFI annotations are sensible. We've seen such efforts from
         the framepointer side, and there's no reason it couldn't be
         done on the dwarf side.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      131484c8
  3. 17 5月, 2015 1 次提交
    • D
      x86/asm/entry/64: Use shorter MOVs from segment registers · adeb5537
      Denys Vlasenko 提交于
      The "movw %ds,%cx" instruction needs a 0x66 prefix, while
      "movl %ds,%ecx" does not.
      
      The difference is that latter form (on 64-bit CPUs)
      overwrites the entire %ecx, not only its lower half.
      
      But subsequent code doesn't depend on the value of upper
      half of %ecx, so we can safely use the shorter instruction.
      
      The new code is also faster than the old one - now we don't
      depend on the old value of %ecx, but this code fragment is
      not performance-critical so it does not matter much.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1431722346-26585-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      adeb5537
  4. 08 5月, 2015 4 次提交
    • D
      x86/entry: Define 'cpu_current_top_of_stack' for 64-bit code · 3a23208e
      Denys Vlasenko 提交于
      32-bit code has PER_CPU_VAR(cpu_current_top_of_stack).
      64-bit code uses somewhat more obscure: PER_CPU_VAR(cpu_tss + TSS_sp0).
      
      Define the 'cpu_current_top_of_stack' macro on CONFIG_X86_64
      as well so that the PER_CPU_VAR(cpu_current_top_of_stack)
      expression can be used in both 32-bit and 64-bit code.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1429889495-27850-3-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3a23208e
    • D
      x86/entry: Stop using PER_CPU_VAR(kernel_stack) · 63332a84
      Denys Vlasenko 提交于
      PER_CPU_VAR(kernel_stack) is redundant:
      
        - On the 64-bit build, we can use PER_CPU_VAR(cpu_tss + TSS_sp0).
        - On the 32-bit build, we can use PER_CPU_VAR(cpu_current_top_of_stack).
      
      PER_CPU_VAR(kernel_stack) will be deleted by a separate change.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1429889495-27850-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      63332a84
    • D
      x86/asm/entry/64: Clean up usage of TEST insns · 03335e95
      Denys Vlasenko 提交于
      By the nature of TEST operation, it is often possible
      to test a narrower part of the operand:
      
          "testl $3, mem"  -> "testb $3, mem"
      
      This results in shorter insns, because TEST insn has no
      sign-entending byte-immediate forms unlike other ALU ops.
      
         text	   data	    bss	    dec	    hex	filename
        11674	      0	      0	  11674	   2d9a	entry_64.o.before
        11658	      0	      0	  11658	   2d8a	entry_64.o
      
      Changes in object code:
      
      -	f7 84 24 88 00 00 00 03 00 00 00 	testl  $0x3,0x88(%rsp)
      +	f6 84 24 88 00 00 00 03	         	testb  $0x3,0x88(%rsp)
      -	f7 44 24 68 03 00 00 00          	testl  $0x3,0x68(%rsp)
      +	f6 44 24 68 03                  	testb  $0x3,0x68(%rsp)
      -	f7 84 24 90 00 00 00 03 00 00 00	testl  $0x3,0x90(%rsp)
      +	f6 84 24 90 00 00 00 03         	testb  $0x3,0x90(%rsp)
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1430140912-7960-2-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      03335e95
    • D
      x86/asm/entry/64: Tidy up JZ insns after TESTs · dde74f2e
      Denys Vlasenko 提交于
      After TESTs, use logically correct JZ/JNZ mnemonics instead of
      JE/JNE. This doesn't change code.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1430140912-7960-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dde74f2e
  5. 27 4月, 2015 1 次提交
  6. 22 4月, 2015 2 次提交
    • D
      x86/asm/entry/64: Merge 32-bit execve stubs with x32 ones, as they are identical · ac7f5dfb
      Denys Vlasenko 提交于
      Run-tested.
      Suggested-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1429632194-13445-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ac7f5dfb
    • D
      x86/asm/entry/64: Implement better check for canonical addresses · 17be0aec
      Denys Vlasenko 提交于
      This change makes the check exact (no more false positives
      on "negative" addresses).
      
      Andy explains:
      
       "Canonical addresses either start with 17 zeros or 17 ones.
      
        In the old code, we checked that the top (64-47) = 17 bits were all
        zero.  We did this by shifting right by 47 bits and making sure that
        nothing was left.
      
        In the new code, we're shifting left by (64 - 48) = 16 bits and then
        signed shifting right by the same amount, this propagating the 17th
        highest bit to all positions to its left.  If we get the same value we
        started with, then we're good to go."
      
      While it isn't really important to be fully correct here -
      almost all addresses we'll ever see will be userspace ones,
      but OTOH it looks to be cheap enough: the new code uses
      two more ALU ops but preserves %rcx, allowing to not
      reload it from pt_regs->cx again.
      
      On disassembly level, the changes are:
      
        cmp %rcx,0x80(%rsp) -> mov 0x80(%rsp),%r11; cmp %rcx,%r11
        shr $0x2f,%rcx      -> shl $0x10,%rcx; sar $0x10,%rcx; cmp %rcx,%r11
        mov 0x58(%rsp),%rcx -> (eliminated)
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1429633649-20169-1-git-send-email-dvlasenk@redhat.com
      [ Changelog massage. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      17be0aec
  7. 09 4月, 2015 8 次提交
  8. 08 4月, 2015 3 次提交
    • D
      x86/asm/entry/64: Add forgotten CFI annotation · 8b3607b5
      Denys Vlasenko 提交于
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1428424967-14460-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8b3607b5
    • D
      x86/asm/entry/irq: Simplify interrupt dispatch table (IDT) layout · 3304c9c3
      Denys Vlasenko 提交于
      Interrupt entry points are handled with the following code,
      each 32-byte code block contains seven entry points:
      
      		...
      		[push][jump 22] // 4 bytes
      		[push][jump 18] // 4 bytes
      		[push][jump 14] // 4 bytes
      		[push][jump 10] // 4 bytes
      		[push][jump  6] // 4 bytes
      		[push][jump  2] // 4 bytes
      		[push][jump common_interrupt][padding] // 8 bytes
      
      		[push][jump]
      		[push][jump]
      		[push][jump]
      		[push][jump]
      		[push][jump]
      		[push][jump]
      		[push][jump common_interrupt][padding]
      
      		[padding_2]
      	common_interrupt:
      
      And there is a table which holds pointers to every entry point,
      IOW: to every push.
      
      In cold cache, two jumps are still costlier than one, even
      though we get the benefit of them residing in the same
      cacheline.
      
      This change replaces short jumps with near ones to
      'common_interrupt', and pads every push+jump pair to 8 bytes. This
      way, each interrupt takes only one jump.
      
      This change replaces ".p2align CONFIG_X86_L1_CACHE_SHIFT" before
      dispatch table with ".align 8" - we do not need anything
      stronger than that.
      
      The table of entry addresses (the interrupt[] array) is no
      longer necessary, the address of entries can be easily
      calculated as (irq_entries_start + i*8).
      
         text	   data	    bss	    dec	    hex	filename
        12546	      0	      0	  12546	   3102	entry_64.o.before
        11626	      0	      0	  11626	   2d6a	entry_64.o
      
      The size decrease is because 1656 bytes of .init.rodata are
      gone. That's initdata, though. The resident size does go up a
      bit.
      
      Run-tested (32 and 64 bits).
      Acked-and-Tested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1428090553-7283-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3304c9c3
    • D
      x86/asm/entry/64: Move opportunistic sysret code to syscall code path · fffbb5dc
      Denys Vlasenko 提交于
      This change does two things:
      
      Copy-pastes "retint_swapgs:" code into syscall handling code,
      the copy is under "syscall_return:" label. The code is unchanged
      apart from some label renames.
      
      Removes "opportunistic sysret" code from "retint_swapgs:" code
      block, since now it won't be reached by syscall return. This in
      fact removes most of the code in question.
      
         text	   data	    bss	    dec	    hex	filename
        12530	      0	      0	  12530	   30f2	entry_64.o.before
        12562	      0	      0	  12562	   3112	entry_64.o
      
      Run-tested.
      Acked-and-Tested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427993219-7291-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fffbb5dc
  9. 06 4月, 2015 1 次提交
    • D
      x86/asm/entry: Clear EXTRA_REGS for all executable formats · fc3e958a
      Denys Vlasenko 提交于
      On failure, sys_execve() does not clobber EXTRA_REGS, so we can
      just return to userpsace without saving/restoring them.
      
      On success, ELF_PLAT_INIT() in sys_execve() clears all these
      registers.
      
      On other executable formats:
      
        - binfmt_flat.c has similar FLAT_PLAT_INIT, but x86 (and everyone
          else except sh) doesn't define it.
      
        - binfmt_elf_fdpic.c has ELF_FDPIC_PLAT_INIT, but x86 (and most
          others) doesn't define it.
      
        - There are no such hooks in binfmt_aout.c et al. We inherit
          EXTRA_REGS from the prior executable.
      
      This inconsistency was not intended.
      
      This change removes SAVE/RESTORE_EXTRA_REGS in stub_execve,
      removes register clearing in ELF_PLAT_INIT(),
      and instead simply clears them on success in stub_execve.
      
      Run-tested.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1428173719-7637-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fc3e958a
  10. 02 4月, 2015 2 次提交
  11. 01 4月, 2015 5 次提交
    • D
      x86/asm/entry/64: Use local label to skip around sycall dispatch · a6de5a21
      Denys Vlasenko 提交于
      Logically, we just want to jump around the following instruction
      and its prologue/epilogue:
      
        call *sys_call_table(,%rax,8)
      
      if the syscall number is too big - we do not specifically target
      the "int_ret_from_sys_call" label.
      
      Use a local, numerical label for this jump, for more clarity.
      
      This also makes the code smaller:
      
       -ffffffff8187756b:      0f 87 0f 00 00 00       ja     ffffffff81877580 <int_ret_from_sys_call>
       +ffffffff8187756b:      77 0f                   ja     ffffffff8187757c <int_ret_from_sys_call>
      
      because jumps to global labels are never translated to short jump
      instructions by GAS.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427821211-25099-9-git-send-email-dvlasenk@redhat.com
      [ Improved the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a6de5a21
    • D
      x86/asm/entry/64: Simplify looping around preempt_schedule_irq() · 36acef25
      Denys Vlasenko 提交于
      At the 'exit_intr' label we test whether interrupt/exception was in
      kernel. If it did, we jump to the preemption check. If preemption
      does happen (IOW if we call preempt_schedule_irq()), we go back to
      'exit_intr'.
      
      But it's pointless, we already know that the test succeeded last
      time, preemption doesn't change the fact that interrupt/exception
      was in the kernel.
      
      We can go back directly to checking PER_CPU_VAR(__preempt_count) instead.
      
      This makes the 'exit_intr' label unused, drop it.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427821211-25099-5-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      36acef25
    • D
      x86/asm/entry/64: Remove redundant DISABLE_INTERRUPTS() · 32a04077
      Denys Vlasenko 提交于
      At this location, we already have interrupts off, always.
      To be more specific, we already disabled them here:
      
          ret_from_intr:
      	    DISABLE_INTERRUPTS(CLBR_NONE)
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427821211-25099-4-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      32a04077
    • D
      x86/asm/entry/64: Simplify retint_kernel label usage, make retint_restore_args label local · 6ba71b76
      Denys Vlasenko 提交于
      Get rid of #define obfuscation of retint_kernel in
      CONFIG_PREEMPT case by defining retint_kernel label always, not
      only for CONFIG_PREEMPT.
      
      Strip retint_kernel of .global-ness (ENTRY macro) - it has no
      users outside of this file.
      
      This looks like cosmetics, but it is not:
      "je LABEL" can be optimized to short jump by assember
      only if LABEL is not global, for global labels jump is always
      a near one with relocation.
      
      Convert retint_restore_args to a local numeric label, making it
      clearer that it is not used elsewhere in the file.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427821211-25099-3-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6ba71b76
    • D
      x86/asm/entry/64: Do not TRACE_IRQS fast SYSRET64 path · 4416c5a6
      Denys Vlasenko 提交于
      SYSRET code path has a small irq-off block.
      On this code path, TRACE_IRQS_ON can't be called right before
      interrupts are enabled for real, we can't clobber registers
      there. So current code does it earlier, in a safe place.
      
      But with this, TRACE_IRQS_OFF/ON frames just two fast
      instructions, which is ridiculous: now most of irq-off block is
      _outside_ of the framing.
      
      Do the same thing that we do on SYSCALL entry: do not track this
      irq-off block, it is very small to ever cause noticeable irq
      latency.
      
      Be careful: make sure that "jnz int_ret_from_sys_call_irqs_off"
      now does invoke TRACE_IRQS_OFF - move
      int_ret_from_sys_call_irqs_off label before TRACE_IRQS_OFF.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427821211-25099-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4416c5a6
  12. 31 3月, 2015 2 次提交
    • D
      x86/asm/entry/64: Do not GET_THREAD_INFO() too early · a3675b32
      Denys Vlasenko 提交于
      At exit_intr, we GET_THREAD_INFO(%rcx) and then jump to
      retint_kernel if saved CS was from kernel. But the code at
      retint_kernel doesn't need %rcx.
      
      Move GET_THREAD_INFO(%rcx) down, after CS check and branch.
      
      While at it, remove "has a correct top of stack" comment.
      After recent changes which eliminated FIXUP_TOP_OF_STACK,
      we always have a correct pt_regs layout.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427738975-7391-5-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a3675b32
    • D
      x86/asm/entry/64: Move retint_kernel code block closer to its user · 627276cb
      Denys Vlasenko 提交于
      The "retint_kernel" code block is misplaced. Since its logical
      continuation is "retint_restore_args", it is more natural to
      place it above that label. This also makes two jumps "short".
      
      This change only moves code block around, without changing
      logic.
      
      This enables the next simplification: making
      "retint_restore_args" label a local numeric one.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427738975-7391-2-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      627276cb
  13. 27 3月, 2015 3 次提交
    • D
      x86/asm/entry/64: Add missing CFI annotation · 27be87c5
      Denys Vlasenko 提交于
      This is a missing bit of the recent MOV-to-PUSH conversion.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427452582-21624-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      27be87c5
    • D
      x86/asm/entry/64: Use smaller instructions · 47eb582e
      Denys Vlasenko 提交于
      The $AUDIT_ARCH_X86_64 parameter to syscall_trace_enter_phase1/2
      is a 32-bit constant, loading it with 32-bit MOV produces 5-byte
      insn instead of 10-byte MOVABS one.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427303896-24023-3-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      47eb582e
    • D
      x86/asm/entry/64: Use better label name, fix comments · 146b2b09
      Denys Vlasenko 提交于
      A named label "ret_from_sys_call" implies that there are jumps
      to this location from elsewhere, as happens with many other
      labels in this file.
      
      But this label is used only by the JMP a few insns above.
      To make that obvious, use local numeric label instead.
      
      Improve comments:
      
      "and return regs->ax" isn't too informative. We always return
      regs->ax.
      
      The comment suggesting that it'd be cool to use rip relative
      addressing for CALL is deleted. It's unclear why that would be
      an improvement - we aren't striving to use position-independent
      code here. PIC code here would require something like LEA
      sys_call_table(%rip),reg + CALL *(reg,%rax*8)...
      
      "iret frame is also incomplete" is no longer true, fix that too.
      
      Also fix typo in comment.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427303896-24023-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      146b2b09
  14. 25 3月, 2015 5 次提交