1. 03 4月, 2015 1 次提交
    • R
      x86/asm: Add support for the CLWB instruction · d9dc64f3
      Ross Zwisler 提交于
      Add support for the new CLWB (cache line write back)
      instruction.  This instruction was announced in the document
      "Intel Architecture Instruction Set Extensions Programming
      Reference" with reference number 319433-022.
      
        https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
      
      The CLWB instruction is used to write back the contents of
      dirtied cache lines to memory without evicting the cache lines
      from the processor's cache hierarchy.  This should be used in
      favor of clflushopt or clflush in cases where you require the
      cache line to be written to memory but plan to access the data
      again in the near future.
      
      One of the main use cases for this is with persistent memory
      where CLWB can be used with PCOMMIT to ensure that data has been
      accepted to memory and is durable on the DIMM.
      
      This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH
      and PCOMMIT with appropriate fencing:
      
      void flush_and_commit_buffer(void *vaddr, unsigned int size)
      {
      	void *vend = vaddr + size - 1;
      
      	for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
      		clwb(vaddr);
      
      	/* Flush any possible final partial cacheline */
      	clwb(vend);
      
      	/*
      	 * Use SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes.
      	 * (MFENCE via mb() also works)
      	 */
      	wmb();
      
      	/* PCOMMIT and the required SFENCE for ordering */
      	pcommit_sfence();
      }
      
      After this function completes the data pointed to by vaddr is
      has been accepted to memory and will be durable if the vaddr
      points to persistent memory.
      
      Regarding the details of how the alternatives assembly is set
      up, we need one additional byte at the beginning of the CLFLUSH
      so that we can flip it into a CLFLUSHOPT by changing that byte
      into a 0x66 prefix.  Two options are to either insert a 1 byte
      ASM_NOP1, or to add a 1 byte NOP_DS_PREFIX.  Both have no
      functional effect with the plain CLFLUSH, but I've been told
      that executing a CLFLUSH + prefix should be faster than
      executing a CLFLUSH + NOP.
      
      We had to hard code the assembly for CLWB because, lacking the
      ability to assemble the CLWB instruction itself, the next
      closest thing is to have an xsaveopt instruction with a 0x66
      prefix.  Unfortunately XSAVEOPT itself is also relatively new,
      and isn't included by all the GCC versions that the kernel needs
      to support.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1422377631-8986-3-git-send-email-ross.zwisler@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d9dc64f3
  2. 02 4月, 2015 1 次提交
  3. 31 3月, 2015 1 次提交
    • I
      x86/asm/entry: Remove user_mode_ignore_vm86() · 55474c48
      Ingo Molnar 提交于
      user_mode_ignore_vm86() can be used instead of user_mode(), in
      places where we have already done a v8086_mode() security
      check of ptregs.
      
      But doing this check in the wrong place would be a bug that
      could result in security problems, and also the naming still
      isn't very clear.
      
      Furthermore, it only affects 32-bit kernels, while most
      development happens on 64-bit kernels.
      
      If we replace them with user_mode() checks then the cost is only
      a very minor increase in various slowpaths:
      
         text             data   bss     dec              hex    filename
         10573391         703562 1753042 13029995         c6d26b vmlinux.o.before
         10573423         703562 1753042 13030027         c6d28b vmlinux.o.after
      
      So lets get rid of this distinction once and for all.
      Acked-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150329090233.GA1963@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      55474c48
  4. 27 3月, 2015 3 次提交
  5. 25 3月, 2015 7 次提交
    • I
      x86/asm: Further improve segment.h readability · 72d64cc7
      Ingo Molnar 提交于
       - extend/clarify explanations where necessary
      
       - move comments from macro values to before the macro, to
         make them more consistent, and to reduce preprocessor overhead
      
       - sort GDT index and selector values likewise by number
      
       - use consistent, modern kernel coding style across the file
      
       - capitalize consistently
      
       - use consistent vertical spacing
      
       - remove the unused get_limit() method (noticed by Andy Lutomirski)
      
      No change in code (verified with objdump -d):
      
       64-bit defconfig+kvmconfig:
      
         815a129bc1f80de6445c1d8ca5b97cad  vmlinux.o.before.asm
         815a129bc1f80de6445c1d8ca5b97cad  vmlinux.o.after.asm
      
       32-bit defconfig+kvmconfig:
      
         e659ef045159ddf41a0771b33a34aae5  vmlinux.o.before.asm
         e659ef045159ddf41a0771b33a34aae5  vmlinux.o.after.asm
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      72d64cc7
    • I
      x86/asm/entry/64: Rename THREAD_INFO() to ASM_THREAD_INFO() · dca5b52a
      Ingo Molnar 提交于
      The THREAD_INFO() macro has a somewhat confusingly generic name,
      defined in a generic .h C header file. It also does not make it
      clear that it constructs a memory operand for use in assembly
      code.
      
      Rename it to ASM_THREAD_INFO() to make it all glaringly
      obvious on first glance.
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/20150324184442.GC14760@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dca5b52a
    • I
      x86/asm/entry/64: Merge the field offset into the THREAD_INFO() macro · f9d71854
      Ingo Molnar 提交于
      Before:
      
         TI_sysenter_return+THREAD_INFO(%rsp,3*8),%r10d
      
      After:
      
         movl    THREAD_INFO(TI_sysenter_return, %rsp, 3*8), %r10d
      
      to turn it into a clear thread_info accessor.
      
      No code changed:
      
       md5:
         fb4cb2b3ce05d89940ca304efc8ff183  ia32entry.o.before.asm
         fb4cb2b3ce05d89940ca304efc8ff183  ia32entry.o.after.asm
      
         e39f2958a5d1300158e276e4f7663263  entry_64.o.before.asm
         e39f2958a5d1300158e276e4f7663263  entry_64.o.after.asm
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/20150324184411.GB14760@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f9d71854
    • I
      x86/asm/entry/64: Improve the THREAD_INFO() macro explanation · 1ddc6f3c
      Ingo Molnar 提交于
      Explain the background, and add a real example.
      Acked-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/20150324184311.GA14760@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1ddc6f3c
    • D
      x86/asm: Deobfuscate segment.h · 84f53788
      Denys Vlasenko 提交于
      This file just defines a number of constants, and a few macros
      and inline functions. It is particularly badly written.
      
      For example, it is not trivial to see how descriptors are
      numbered (you'd expect that should be easy, right?).
      
      This change deobfuscates it via the following changes:
      
      Group all GDT_ENTRY_foo together (move intervening stuff away).
      
      Number them explicitly: use a number, not PREV_DEFINE+1, +2, +3:
      I want to immediately see that GDT_ENTRY_PNPBIOS_CS32 is 18.
      Seeing (GDT_ENTRY_KERNEL_BASE+6) instead is not useful.
      
      The above change allows to remove GDT_ENTRY_KERNEL_BASE
      and GDT_ENTRY_PNPBIOS_BASE, which weren't used anywhere else.
      
      After a group of GDT_ENTRY_foo, define all selector values.
      
      Remove or improve some comments. In particular:
      Comment deleted as stating the obvious:
          /*
           * The GDT has 32 entries
           */
          #define GDT_ENTRIES 32
      
      "The segment offset needs to contain a RPL. Grr. -AK"
          changed to
      "Selectors need to also have a correct RPL (+3 thingy)"
      
      "GDT layout to get 64bit syscall right (sysret hardcodes gdt
      offsets)" expanded into a description *how exactly* sysret
      hardcodes them.
      
      Patch was tested to compile and not change vmlinux.o
      on 32-bit and 64-bit builds (verified with objdump).
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      84f53788
    • D
      x86/asm/entry: Get rid of KERNEL_STACK_OFFSET · ef593260
      Denys Vlasenko 提交于
      PER_CPU_VAR(kernel_stack) was set up in a way where it points
      five stack slots below the top of stack.
      
      Presumably, it was done to avoid one "sub $5*8,%rsp"
      in syscall/sysenter code paths, where iret frame needs to be
      created by hand.
      
      Ironically, none of them benefits from this optimization,
      since all of them need to allocate additional data on stack
      (struct pt_regs), so they still have to perform subtraction.
      
      This patch eliminates KERNEL_STACK_OFFSET.
      
      PER_CPU_VAR(kernel_stack) now points directly to top of stack.
      pt_regs allocations are adjusted to allocate iret frame as well.
      Hopefully we can merge it later with 32-bit specific
      PER_CPU_VAR(cpu_current_top_of_stack) variable...
      
      Net result in generated code is that constants in several insns
      are changed.
      
      This change is necessary for changing struct pt_regs creation
      in SYSCALL64 code path from MOV to PUSH instructions.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ef593260
    • D
      x86/asm/entry/64: Change the THREAD_INFO() definition to not depend on KERNEL_STACK_OFFSET · b3fe8ba3
      Denys Vlasenko 提交于
      This changes the THREAD_INFO() definition and all its callsites
      so that they do not count stack position from
      (top of stack - KERNEL_STACK_OFFSET), but from top of stack.
      
      Semi-mysterious expressions THREAD_INFO(%rsp,RIP) - "why RIP??"
      are now replaced by more logical THREAD_INFO(%rsp,SIZEOF_PTREGS)
      - "calculate thread_info's address using information that
      rsp is SIZEOF_PTREGS bytes below top of stack".
      
      While at it, replace "(off)-THREAD_SIZE(reg)" with equivalent
      "((off)-THREAD_SIZE)(reg)". The form without parentheses
      falsely looks like we invoke THREAD_SIZE() macro.
      
      Improve comment atop THREAD_INFO macro definition.
      
      This patch does not change generated code (verified by objdump).
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1426785469-15125-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b3fe8ba3
  6. 23 3月, 2015 4 次提交
  7. 20 3月, 2015 1 次提交
    • R
      Revert "x86/PCI: Refine the way to release PCI IRQ resources" · 9e8ce4b9
      Rafael J. Wysocki 提交于
      Commit b4b55cda (Refine the way to release PCI IRQ resources)
      introduced a regression in the PCI IRQ resource management by causing
      the IRQ resource of a device, established when pci_enabled_device()
      is called on a fully disabled device, to be released when the driver
      is unbound from the device, regardless of the enable_cnt.
      
      This leads to the situation that an ill-behaved driver can now make a
      device unusable to subsequent drivers by an imbalance in their use of
      pci_enable/disable_device().  That is a serious problem for secondary
      drivers like vfio-pci, which are innocent of the transgressions of
      the previous driver.
      
      Since the solution of this problem is not immediate and requires
      further discussion, revert commit b4b55cda and the issue it was
      supposed to address (a bug related to xen-pciback) will be taken
      care of in a different way going forward.
      Reported-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9e8ce4b9
  8. 17 3月, 2015 9 次提交
  9. 16 3月, 2015 1 次提交
    • B
      Revert "x86/mm/ASLR: Propagate base load address calculation" · 69797daf
      Borislav Petkov 提交于
      This reverts commit:
      
        f47233c2 ("x86/mm/ASLR: Propagate base load address calculation")
      
      The main reason for the revert is that the new boot flag does not work
      at all currently, and in order to make this work, we need non-trivial
      changes to the x86 boot code which we didn't manage to get done in
      time for merging.
      
      And even if we did, they would've been too risky so instead of
      rushing things and break booting 4.1 on boxes left and right, we
      will be very strict and conservative and will take our time with
      this to fix and test it properly.
      Reported-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: H. Peter Anvin <hpa@linux.intel.com
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Junjie Mao <eternal.n08@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Link: http://lkml.kernel.org/r/20150316100628.GD22995@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69797daf
  10. 13 3月, 2015 1 次提交
  11. 10 3月, 2015 2 次提交
    • D
      x86/asm/entry/64: Save user RSP in pt_regs->sp on SYSCALL64 fastpath · 263042e4
      Denys Vlasenko 提交于
      Prepare for the removal of 'usersp', by simplifying PER_CPU(old_rsp) usage:
      
        - use it only as temp storage
      
        - store the userspace stack pointer immediately in pt_regs->sp
          on syscall entry, instead of using it later, on syscall exit.
      
        - change C code to use pt_regs->sp only, instead of PER_CPU(old_rsp)
          and task->thread.usersp.
      
      FIXUP/RESTORE_TOP_OF_STACK are simplified as well.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1425926364-9526-4-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      263042e4
    • D
      x86/asm/entry/64: Save R11 into pt_regs->flags on SYSCALL64 fastpath · 29722cd4
      Denys Vlasenko 提交于
      Before this patch, R11 was saved in pt_regs->r11.
      
      Which looks natural, but requires messy shuffling to/from iret
      frame whenever ptrace or e.g. sys_iopl() wants to modify flags -
      because that's how this register is used by SYSCALL/SYSRET.
      
      This patch saves R11 in pt_regs->flags, and uses that value for
      the SYSRET64 instruction. Shuffling is eliminated.
      
      FIXUP/RESTORE_TOP_OF_STACK are simplified.
      
      stub_iopl is no longer needed: pt_regs->flags needs no fixing up.
      
      Testing shows that syscall fast path is ~54.3 ns before
      and after the patch (on 2.7 GHz Sandy Bridge CPU).
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1425926364-9526-2-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      29722cd4
  12. 07 3月, 2015 1 次提交
  13. 06 3月, 2015 5 次提交
  14. 05 3月, 2015 3 次提交