1. 03 4月, 2015 6 次提交
    • B
      x86/mm/KASLR: Propagate KASLR status to kernel proper · 78cac48c
      Borislav Petkov 提交于
      Commit:
      
        e2b32e67 ("x86, kaslr: randomize module base load address")
      
      made module base address randomization unconditional and didn't regard
      disabled KKASLR due to CONFIG_HIBERNATION and command line option
      "nokaslr". For more info see (now reverted) commit:
      
        f47233c2 ("x86/mm/ASLR: Propagate base load address calculation")
      
      In order to propagate KASLR status to kernel proper, we need a single bit
      in boot_params.hdr.loadflags and we've chosen bit 1 thus leaving the
      top-down allocated bits for bits supposed to be used by the bootloader.
      
      Originally-From: Jiri Kosina <jkosina@suse.cz>
      Suggested-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      78cac48c
    • B
      x86/asm/entry: Drop now unused ENABLE_INTERRUPTS_SYSEXIT32 · 47091e3c
      Borislav Petkov 提交于
      Commit:
      
        4214a16b ("x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER")
      
      removed the last user of ENABLE_INTERRUPTS_SYSEXIT32. Kill the
      macro now too.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Link: http://lkml.kernel.org/r/1428049714-829-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      47091e3c
    • A
      x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER · 4214a16b
      Andy Lutomirski 提交于
      SYSEXIT is scary on 64-bit kernels -- SYSEXIT must be invoked
      with usergs and IRQs on.  That means that we rely on STI to
      correctly mask interrupts for one instruction.  This is okay by
      itself, but the semantics with respect to NMIs are unclear.
      
      Avoid the whole issue by using SYSRETL instead.  For background,
      Intel CPUs don't allow SYSCALL from compat mode, but they do
      allow SYSRETL back to compat mode.  Go figure.
      
      To avoid doing too much at once, this doesn't revamp the calling
      convention.  We still return with EBP, EDX, and ECX on the user
      stack.
      
      Oddly this seems to be 30 cycles or so faster.  Avoiding POPFQ
      and STI will account for under half of that, I think, so my best
      guess is that Intel just optimizes SYSRET much better than
      SYSEXIT.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/57a0bf1b5230b2716a64ebe48e9bc1110f7ab433.1428019097.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4214a16b
    • A
      x86/asm/entry/32: Stop caching MSR_IA32_SYSENTER_ESP in tss.sp1 · cf9328cc
      Andy Lutomirski 提交于
      We write a stack pointer to MSR_IA32_SYSENTER_ESP exactly once,
      and we unnecessarily cache the value in tss.sp1.  We never
      read the cached value.
      
      Remove all of the caching.  It serves no purpose.
      Suggested-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/05a0163eb33ef5208363f0015496855da7cebadd.1428002830.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cf9328cc
    • A
      x86/asm/entry/32: Improve a TOP_OF_KERNEL_STACK_PADDING comment · ff8287f3
      Andy Lutomirski 提交于
      At Denys' request, clean up the comment describing stack padding
      in the 32-bit sysenter path.
      
      No code changes.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/41fee7bb8490ae840fe7ef2699f9c2feb932e729.1428002830.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ff8287f3
    • R
      x86/asm: Add support for the CLWB instruction · d9dc64f3
      Ross Zwisler 提交于
      Add support for the new CLWB (cache line write back)
      instruction.  This instruction was announced in the document
      "Intel Architecture Instruction Set Extensions Programming
      Reference" with reference number 319433-022.
      
        https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
      
      The CLWB instruction is used to write back the contents of
      dirtied cache lines to memory without evicting the cache lines
      from the processor's cache hierarchy.  This should be used in
      favor of clflushopt or clflush in cases where you require the
      cache line to be written to memory but plan to access the data
      again in the near future.
      
      One of the main use cases for this is with persistent memory
      where CLWB can be used with PCOMMIT to ensure that data has been
      accepted to memory and is durable on the DIMM.
      
      This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH
      and PCOMMIT with appropriate fencing:
      
      void flush_and_commit_buffer(void *vaddr, unsigned int size)
      {
      	void *vend = vaddr + size - 1;
      
      	for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
      		clwb(vaddr);
      
      	/* Flush any possible final partial cacheline */
      	clwb(vend);
      
      	/*
      	 * Use SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes.
      	 * (MFENCE via mb() also works)
      	 */
      	wmb();
      
      	/* PCOMMIT and the required SFENCE for ordering */
      	pcommit_sfence();
      }
      
      After this function completes the data pointed to by vaddr is
      has been accepted to memory and will be durable if the vaddr
      points to persistent memory.
      
      Regarding the details of how the alternatives assembly is set
      up, we need one additional byte at the beginning of the CLFLUSH
      so that we can flip it into a CLFLUSHOPT by changing that byte
      into a 0x66 prefix.  Two options are to either insert a 1 byte
      ASM_NOP1, or to add a 1 byte NOP_DS_PREFIX.  Both have no
      functional effect with the plain CLFLUSH, but I've been told
      that executing a CLFLUSH + prefix should be faster than
      executing a CLFLUSH + NOP.
      
      We had to hard code the assembly for CLWB because, lacking the
      ability to assemble the CLWB instruction itself, the next
      closest thing is to have an xsaveopt instruction with a 0x66
      prefix.  Unfortunately XSAVEOPT itself is also relatively new,
      and isn't included by all the GCC versions that the kernel needs
      to support.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1422377631-8986-3-git-send-email-ross.zwisler@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d9dc64f3
  2. 02 4月, 2015 3 次提交
  3. 01 4月, 2015 7 次提交
    • D
      x86/asm/entry/64: Use local label to skip around sycall dispatch · a6de5a21
      Denys Vlasenko 提交于
      Logically, we just want to jump around the following instruction
      and its prologue/epilogue:
      
        call *sys_call_table(,%rax,8)
      
      if the syscall number is too big - we do not specifically target
      the "int_ret_from_sys_call" label.
      
      Use a local, numerical label for this jump, for more clarity.
      
      This also makes the code smaller:
      
       -ffffffff8187756b:      0f 87 0f 00 00 00       ja     ffffffff81877580 <int_ret_from_sys_call>
       +ffffffff8187756b:      77 0f                   ja     ffffffff8187757c <int_ret_from_sys_call>
      
      because jumps to global labels are never translated to short jump
      instructions by GAS.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427821211-25099-9-git-send-email-dvlasenk@redhat.com
      [ Improved the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a6de5a21
    • D
      x86/asm: Replace "MOVQ $imm, %reg" with MOVL · a734b4a2
      Denys Vlasenko 提交于
      There is no reason to use MOVQ to load a non-negative immediate
      constant value into a 64-bit register. MOVL does the same, since
      the upper 32 bits are zero-extended by the CPU.
      
      This makes the code a bit smaller, while leaving functionality
      unchanged.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427821211-25099-8-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a734b4a2
    • D
      x86/asm/entry/64: Simplify looping around preempt_schedule_irq() · 36acef25
      Denys Vlasenko 提交于
      At the 'exit_intr' label we test whether interrupt/exception was in
      kernel. If it did, we jump to the preemption check. If preemption
      does happen (IOW if we call preempt_schedule_irq()), we go back to
      'exit_intr'.
      
      But it's pointless, we already know that the test succeeded last
      time, preemption doesn't change the fact that interrupt/exception
      was in the kernel.
      
      We can go back directly to checking PER_CPU_VAR(__preempt_count) instead.
      
      This makes the 'exit_intr' label unused, drop it.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427821211-25099-5-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      36acef25
    • D
      x86/asm/entry/64: Remove redundant DISABLE_INTERRUPTS() · 32a04077
      Denys Vlasenko 提交于
      At this location, we already have interrupts off, always.
      To be more specific, we already disabled them here:
      
          ret_from_intr:
      	    DISABLE_INTERRUPTS(CLBR_NONE)
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427821211-25099-4-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      32a04077
    • D
      x86/asm/entry/64: Simplify retint_kernel label usage, make retint_restore_args label local · 6ba71b76
      Denys Vlasenko 提交于
      Get rid of #define obfuscation of retint_kernel in
      CONFIG_PREEMPT case by defining retint_kernel label always, not
      only for CONFIG_PREEMPT.
      
      Strip retint_kernel of .global-ness (ENTRY macro) - it has no
      users outside of this file.
      
      This looks like cosmetics, but it is not:
      "je LABEL" can be optimized to short jump by assember
      only if LABEL is not global, for global labels jump is always
      a near one with relocation.
      
      Convert retint_restore_args to a local numeric label, making it
      clearer that it is not used elsewhere in the file.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427821211-25099-3-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6ba71b76
    • D
      x86/asm/entry/32: Use smaller PUSH instructions instead of MOV, to build 'pt_regs' on stack · 4c9c0e91
      Denys Vlasenko 提交于
      This mimics the recent similar 64-bit change.
      Saves ~110 bytes of code.
      
      Patch was run-tested on 32 and 64 bits, Intel and AMD CPU.
      I also looked at the diff of entry_64.o disassembly, to have
      a different view of the changes.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427821211-25099-2-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4c9c0e91
    • D
      x86/asm/entry/64: Do not TRACE_IRQS fast SYSRET64 path · 4416c5a6
      Denys Vlasenko 提交于
      SYSRET code path has a small irq-off block.
      On this code path, TRACE_IRQS_ON can't be called right before
      interrupts are enabled for real, we can't clobber registers
      there. So current code does it earlier, in a safe place.
      
      But with this, TRACE_IRQS_OFF/ON frames just two fast
      instructions, which is ridiculous: now most of irq-off block is
      _outside_ of the framing.
      
      Do the same thing that we do on SYSCALL entry: do not track this
      irq-off block, it is very small to ever cause noticeable irq
      latency.
      
      Be careful: make sure that "jnz int_ret_from_sys_call_irqs_off"
      now does invoke TRACE_IRQS_OFF - move
      int_ret_from_sys_call_irqs_off label before TRACE_IRQS_OFF.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427821211-25099-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4416c5a6
  4. 31 3月, 2015 3 次提交
    • I
      x86/asm/entry: Remove user_mode_ignore_vm86() · 55474c48
      Ingo Molnar 提交于
      user_mode_ignore_vm86() can be used instead of user_mode(), in
      places where we have already done a v8086_mode() security
      check of ptregs.
      
      But doing this check in the wrong place would be a bug that
      could result in security problems, and also the naming still
      isn't very clear.
      
      Furthermore, it only affects 32-bit kernels, while most
      development happens on 64-bit kernels.
      
      If we replace them with user_mode() checks then the cost is only
      a very minor increase in various slowpaths:
      
         text             data   bss     dec              hex    filename
         10573391         703562 1753042 13029995         c6d26b vmlinux.o.before
         10573423         703562 1753042 13030027         c6d28b vmlinux.o.after
      
      So lets get rid of this distinction once and for all.
      Acked-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150329090233.GA1963@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      55474c48
    • D
      x86/asm/entry/64: Do not GET_THREAD_INFO() too early · a3675b32
      Denys Vlasenko 提交于
      At exit_intr, we GET_THREAD_INFO(%rcx) and then jump to
      retint_kernel if saved CS was from kernel. But the code at
      retint_kernel doesn't need %rcx.
      
      Move GET_THREAD_INFO(%rcx) down, after CS check and branch.
      
      While at it, remove "has a correct top of stack" comment.
      After recent changes which eliminated FIXUP_TOP_OF_STACK,
      we always have a correct pt_regs layout.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427738975-7391-5-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a3675b32
    • D
      x86/asm/entry/64: Move retint_kernel code block closer to its user · 627276cb
      Denys Vlasenko 提交于
      The "retint_kernel" code block is misplaced. Since its logical
      continuation is "retint_restore_args", it is more natural to
      place it above that label. This also makes two jumps "short".
      
      This change only moves code block around, without changing
      logic.
      
      This enables the next simplification: making
      "retint_restore_args" label a local numeric one.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1427738975-7391-2-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      627276cb
  5. 27 3月, 2015 9 次提交
  6. 25 3月, 2015 12 次提交
    • I
      Merge branch 'x86/urgent' into x86/asm, to resolve conflict · 06ab9c1b
      Ingo Molnar 提交于
      Conflicts:
      	arch/x86/kernel/entry_64.S
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      06ab9c1b
    • I
      x86/asm: Further improve segment.h readability · 72d64cc7
      Ingo Molnar 提交于
       - extend/clarify explanations where necessary
      
       - move comments from macro values to before the macro, to
         make them more consistent, and to reduce preprocessor overhead
      
       - sort GDT index and selector values likewise by number
      
       - use consistent, modern kernel coding style across the file
      
       - capitalize consistently
      
       - use consistent vertical spacing
      
       - remove the unused get_limit() method (noticed by Andy Lutomirski)
      
      No change in code (verified with objdump -d):
      
       64-bit defconfig+kvmconfig:
      
         815a129bc1f80de6445c1d8ca5b97cad  vmlinux.o.before.asm
         815a129bc1f80de6445c1d8ca5b97cad  vmlinux.o.after.asm
      
       32-bit defconfig+kvmconfig:
      
         e659ef045159ddf41a0771b33a34aae5  vmlinux.o.before.asm
         e659ef045159ddf41a0771b33a34aae5  vmlinux.o.after.asm
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      72d64cc7
    • A
      x86/asm/entry: Check for syscall exit work with IRQs disabled · b3494a4a
      Andy Lutomirski 提交于
      We currently have a race: if we're preempted during syscall
      exit, we can fail to process syscall return work that is queued
      up while we're preempted in ret_from_sys_call after checking
      ti.flags.
      
      Fix it by disabling interrupts before checking ti.flags.
      Reported-by: NStefan Seyfried <stefan.seyfried@googlemail.com>
      Reported-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Fixes: 96b6352c ("x86_64, entry: Remove the syscall exit audit")
      Link: http://lkml.kernel.org/r/189320d42b4d671df78c10555976bb10af1ffc75.1427137498.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b3494a4a
    • I
      x86/asm/entry/64: Rename THREAD_INFO() to ASM_THREAD_INFO() · dca5b52a
      Ingo Molnar 提交于
      The THREAD_INFO() macro has a somewhat confusingly generic name,
      defined in a generic .h C header file. It also does not make it
      clear that it constructs a memory operand for use in assembly
      code.
      
      Rename it to ASM_THREAD_INFO() to make it all glaringly
      obvious on first glance.
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/20150324184442.GC14760@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dca5b52a
    • I
      x86/asm/entry/64: Merge the field offset into the THREAD_INFO() macro · f9d71854
      Ingo Molnar 提交于
      Before:
      
         TI_sysenter_return+THREAD_INFO(%rsp,3*8),%r10d
      
      After:
      
         movl    THREAD_INFO(TI_sysenter_return, %rsp, 3*8), %r10d
      
      to turn it into a clear thread_info accessor.
      
      No code changed:
      
       md5:
         fb4cb2b3ce05d89940ca304efc8ff183  ia32entry.o.before.asm
         fb4cb2b3ce05d89940ca304efc8ff183  ia32entry.o.after.asm
      
         e39f2958a5d1300158e276e4f7663263  entry_64.o.before.asm
         e39f2958a5d1300158e276e4f7663263  entry_64.o.after.asm
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/20150324184411.GB14760@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f9d71854
    • I
      x86/asm/entry/64: Improve the THREAD_INFO() macro explanation · 1ddc6f3c
      Ingo Molnar 提交于
      Explain the background, and add a real example.
      Acked-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/20150324184311.GA14760@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1ddc6f3c
    • I
      x86/asm/entry/64: Always set up SYSENTER MSRs · d56fe4bf
      Ingo Molnar 提交于
      On CONFIG_IA32_EMULATION=y kernels we set up
      MSR_IA32_SYSENTER_CS/ESP/EIP, but on !CONFIG_IA32_EMULATION
      kernels we leave them unchanged.
      
      Clear them to make sure the instruction is disabled properly.
      
      SYSCALL is set up properly in both cases.
      Acked-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d56fe4bf
    • D
      x86/asm: Deobfuscate segment.h · 84f53788
      Denys Vlasenko 提交于
      This file just defines a number of constants, and a few macros
      and inline functions. It is particularly badly written.
      
      For example, it is not trivial to see how descriptors are
      numbered (you'd expect that should be easy, right?).
      
      This change deobfuscates it via the following changes:
      
      Group all GDT_ENTRY_foo together (move intervening stuff away).
      
      Number them explicitly: use a number, not PREV_DEFINE+1, +2, +3:
      I want to immediately see that GDT_ENTRY_PNPBIOS_CS32 is 18.
      Seeing (GDT_ENTRY_KERNEL_BASE+6) instead is not useful.
      
      The above change allows to remove GDT_ENTRY_KERNEL_BASE
      and GDT_ENTRY_PNPBIOS_BASE, which weren't used anywhere else.
      
      After a group of GDT_ENTRY_foo, define all selector values.
      
      Remove or improve some comments. In particular:
      Comment deleted as stating the obvious:
          /*
           * The GDT has 32 entries
           */
          #define GDT_ENTRIES 32
      
      "The segment offset needs to contain a RPL. Grr. -AK"
          changed to
      "Selectors need to also have a correct RPL (+3 thingy)"
      
      "GDT layout to get 64bit syscall right (sysret hardcodes gdt
      offsets)" expanded into a description *how exactly* sysret
      hardcodes them.
      
      Patch was tested to compile and not change vmlinux.o
      on 32-bit and 64-bit builds (verified with objdump).
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      84f53788
    • D
      x86/asm/entry/64: Get rid of int_ret_from_sys_call_fixup · 65c23774
      Denys Vlasenko 提交于
      With the FIXUP_TOP_OF_STACK macro removed, this intermediate jump
      is unnecessary.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1426785469-15125-5-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      65c23774
    • D
      x86/asm/entry/64: Get rid of the FIXUP_TOP_OF_STACK/RESTORE_TOP_OF_STACK macros · a71ffdd7
      Denys Vlasenko 提交于
      The FIXUP_TOP_OF_STACK macro is only necessary because we don't save %r11
      to pt_regs->r11 on SYSCALL64 fast path, but we want ptrace to see it populated.
      
      Bite the bullet, add a single additional PUSH instruction, and remove
      the FIXUP_TOP_OF_STACK macro.
      
      The RESTORE_TOP_OF_STACK macro is already a nop. Remove it too.
      
      On SandyBridge CPU, it does not get slower:
      measured 54.22 ns per getpid syscall before and after last two
      changes on defconfig kernel.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1426785469-15125-4-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a71ffdd7
    • D
      x86/asm/entry/64: Use PUSH instructions to build pt_regs on stack · 9ed8e7d8
      Denys Vlasenko 提交于
      With this change, on SYSCALL64 code path we are now populating
      pt_regs->cs, pt_regs->ss and pt_regs->rcx unconditionally and
      therefore don't need to do that in FIXUP_TOP_OF_STACK.
      
      We lose a number of large instructions there:
      
          text    data     bss     dec     hex filename
         13298       0       0   13298    33f2 entry_64_before.o
         12978       0       0   12978    32b2 entry_64.o
      
      What's more important, we convert two "MOVQ $imm,off(%rsp)" to
      "PUSH $imm" (the ones which fill pt_regs->cs,ss).
      
      Before this patch, placing them on fast path was slowing it down
      by two cycles: this form of MOV is very large, 12 bytes, and
      this probably reduces decode bandwidth to one instruction per cycle
      when CPU sees them.
      
      Therefore they were living in FIXUP_TOP_OF_STACK instead (away
      from fast path).
      
      "PUSH $imm" is a small 2-byte instruction. Moving it to fast path does
      not slow it down in my measurements.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1426785469-15125-3-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9ed8e7d8
    • D
      x86/asm/entry: Get rid of KERNEL_STACK_OFFSET · ef593260
      Denys Vlasenko 提交于
      PER_CPU_VAR(kernel_stack) was set up in a way where it points
      five stack slots below the top of stack.
      
      Presumably, it was done to avoid one "sub $5*8,%rsp"
      in syscall/sysenter code paths, where iret frame needs to be
      created by hand.
      
      Ironically, none of them benefits from this optimization,
      since all of them need to allocate additional data on stack
      (struct pt_regs), so they still have to perform subtraction.
      
      This patch eliminates KERNEL_STACK_OFFSET.
      
      PER_CPU_VAR(kernel_stack) now points directly to top of stack.
      pt_regs allocations are adjusted to allocate iret frame as well.
      Hopefully we can merge it later with 32-bit specific
      PER_CPU_VAR(cpu_current_top_of_stack) variable...
      
      Net result in generated code is that constants in several insns
      are changed.
      
      This change is necessary for changing struct pt_regs creation
      in SYSCALL64 code path from MOV to PUSH instructions.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ef593260