1. 24 8月, 2016 2 次提交
    • B
      sched/x86: Add 'struct inactive_task_frame' to better document the sleeping task stack frame · 7b32aead
      Brian Gerst 提交于
      Add 'struct inactive_task_frame', which defines the layout of the stack for
      a sleeping process.  For now, the only defined field is the BP register
      (frame pointer).
      Signed-off-by: NBrian Gerst <brgerst@gmail.com>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1471106302-10159-4-git-send-email-brgerst@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7b32aead
    • A
      x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y) · e37e43a4
      Andy Lutomirski 提交于
      This allows x86_64 kernels to enable vmapped stacks by setting
      HAVE_ARCH_VMAP_STACK=y - which enables the CONFIG_VMAP_STACK=y
      high level Kconfig option.
      
      There are a couple of interesting bits:
      
      First, x86 lazily faults in top-level paging entries for the vmalloc
      area.  This won't work if we get a page fault while trying to access
      the stack: the CPU will promote it to a double-fault and we'll die.
      To avoid this problem, probe the new stack when switching stacks and
      forcibly populate the pgd entry for the stack when switching mms.
      
      Second, once we have guard pages around the stack, we'll want to
      detect and handle stack overflow.
      
      I didn't enable it on x86_32.  We'd need to rework the double-fault
      code a bit and I'm concerned about running out of vmalloc virtual
      addresses under some workloads.
      
      This patch, by itself, will behave somewhat erratically when the
      stack overflows while RSP is still more than a few tens of bytes
      above the bottom of the stack.  Specifically, we'll get #PF and make
      it to no_context and them oops without reliably triggering a
      double-fault, and no_context doesn't know about stack overflows.
      The next patch will improve that case.
      
      Thank you to Nadav and Brian for helping me pay enough attention to
      the SDM to hopefully get this right.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/c88f3e2920b18e6cc621d772a04a62c06869037e.1470907718.git.luto@kernel.org
      [ Minor edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e37e43a4
  2. 05 5月, 2016 1 次提交
  3. 25 9月, 2015 1 次提交
    • A
      x86/sched/64: Don't save flags on context switch (reinstated) · 3f2c5085
      Andy Lutomirski 提交于
      This reinstates the following commit:
      
        2c7577a7 ("sched/x86_64: Don't save flags on context switch")
      
      which was reverted in:
      
        512255a2 ("Revert 'sched/x86_64: Don't save flags on context switch'")
      
      Historically, Linux has always saved and restored EFLAGS across
      context switches.  As far as I know, the only reason to do this
      is because of the NT flag.  In particular, if something calls
      switch_to() with the NT flag set, then we don't want to leak the
      NT flag into a different task that might try to IRET and fail
      because NT is set.
      
      Before this commit:
      
        8c7aa698 ("x86_64, entry: Filter RFLAGS.NT on entry from userspace")
      
      we could run system call bodies with NT set.  This would be a DoS or possibly
      privilege escalation hole if scheduling in such a system call would leak
      NT into a different task.
      
      Importantly, we don't need to worry about NT being set while
      preemptible or across page faults.  The only way we can schedule
      due to preemption or a page fault is in an interrupt entry that
      nests inside the SYSENTER prologue.  The CPU will clear NT when
      entering through an interrupt gate, so we won't schedule with NT
      set.
      
      The only other interesting flags are IOPL and AC.  Allowing
      switch_to() to change IOPL has no effect, as the value loaded
      during kernel execution doesn't matter at all except between a
      SYSENTER entry and the subsequent PUSHF, and anythign that
      interrupts in that window will restore IOPL on return.
      
      If we call __switch_to() with AC set, we have bigger problems.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/d4440fdc2a89247bffb7c003d2a9a2952bd46827.1441146105.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3f2c5085
  4. 18 8月, 2015 1 次提交
  5. 28 10月, 2014 1 次提交
  6. 07 8月, 2013 1 次提交
  7. 29 3月, 2012 1 次提交