1. 10 3月, 2016 4 次提交
    • A
      x86/entry: Vastly simplify SYSENTER TF (single-step) handling · f2b37575
      Andy Lutomirski 提交于
      Due to a blatant design error, SYSENTER doesn't clear TF (single-step).
      
      As a result, if a user does SYSENTER with TF set, we will single-step
      through the kernel until something clears TF.  There is absolutely
      nothing we can do to prevent this short of turning off SYSENTER [1].
      
      Simplify the handling considerably with two changes:
      
        1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
           add TF to that list of flags to sanitize with no overhead whatsoever.
      
        2. Teach do_debug() to ignore single-step traps in the SYSENTER prologue.
      
      That's all we need to do.
      
      Don't get too excited -- our handling is still buggy on 32-bit
      kernels.  There's nothing wrong with the SYSENTER code itself, but
      the #DB prologue has a clever fixup for traps on the very first
      instruction of entry_SYSENTER_32, and the fixup doesn't work quite
      correctly.  The next two patches will fix that.
      
      [1] We could probably prevent it by forcing BTF on at all times and
          making sure we clear TF before any branches in the SYSENTER
          code.  Needless to say, this is a bad idea.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/a30d2ea06fe4b621fe6a9ef911b02c0f38feb6f2.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f2b37575
    • A
      x86/entry/32: Restore FLAGS on SYSEXIT · c2c9b52f
      Andy Lutomirski 提交于
      We weren't restoring FLAGS at all on SYSEXIT.  Apparently no one cared.
      
      With this patch applied, native kernels should always honor
      task_pt_regs()->flags, which opens the door for some sys_iopl()
      cleanups.  I'll do those as a separate series, though, since getting
      it right will involve tweaking some paravirt ops.
      
      ( The short version is that, before this patch, sys_iopl(), invoked via
        SYSENTER, wasn't guaranteed to ever transfer the updated
        regs->flags, so sys_iopl() had to change the hardware flags register
        as well. )
      Reported-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3f98b207472dc9784838eb5ca2b89dcc845ce269.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c2c9b52f
    • A
      x86/entry/32: Filter NT and speed up AC filtering in SYSENTER · 67f590e8
      Andy Lutomirski 提交于
      This makes the 32-bit code work just like the 64-bit code.  It should
      speed up syscalls on 32-bit kernels on Skylake by something like 20
      cycles (by analogy to the 64-bit compat case).
      
      It also cleans up NT just like we do for the 64-bit case.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/07daef3d44bd1ed62a2c866e143e8df64edb40ee.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      67f590e8
    • A
      x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test · e7860411
      Andy Lutomirski 提交于
      CLAC is slow, and the SYSENTER code already has an unlikely path
      that runs if unusual flags are set.  Drop the CLAC and instead rely
      on the unlikely path to clear AC.
      
      This seems to save ~24 cycles on my Skylake laptop.  (Hey, Intel,
      make this faster please!)
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/90d6db2189f9add83bc7bddd75a0c19ebbd676b2.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e7860411
  2. 08 3月, 2016 1 次提交
    • A
      x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled · 58a5aac5
      Andy Lutomirski 提交于
      x86_64 has very clean espfix handling on paravirt: espfix64 is set
      up in native_iret, so paravirt systems that override iret bypass
      espfix64 automatically.  This is robust and straightforward.
      
      x86_32 is messier.  espfix is set up before the IRET paravirt patch
      point, so it can't be directly conditionalized on whether we use
      native_iret.  We also can't easily move it into native_iret without
      regressing performance due to a bizarre consideration.  Specifically,
      on 64-bit kernels, the logic is:
      
        if (regs->ss & 0x4)
                setup_espfix;
      
      On 32-bit kernels, the logic is:
      
        if ((regs->ss & 0x4) && (regs->cs & 0x3) == 3 &&
            (regs->flags & X86_EFLAGS_VM) == 0)
                setup_espfix;
      
      The performance of setup_espfix itself is essentially irrelevant, but
      the comparison happens on every IRET so its performance matters.  On
      x86_64, there's no need for any registers except flags to implement
      the comparison, so we fold the whole thing into native_iret.  On
      x86_32, we don't do that because we need a free register to
      implement the comparison efficiently.  We therefore do espfix setup
      before restoring registers on x86_32.
      
      This patch gets rid of the explicit paravirt_enabled check by
      introducing X86_BUG_ESPFIX on 32-bit systems and using an ALTERNATIVE
      to skip espfix on paravirt systems where iret != native_iret.  This is
      also messy, but it's at least in line with other things we do.
      
      This improves espfix performance by removing a branch, but no one
      cares.  More importantly, it removes a paravirt_enabled user, which is
      good because paravirt_enabled is ill-defined and is going away.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: boris.ostrovsky@oracle.com
      Cc: david.vrabel@citrix.com
      Cc: konrad.wilk@oracle.com
      Cc: lguest@lists.ozlabs.org
      Cc: xen-devel@lists.xensource.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      58a5aac5
  3. 25 2月, 2016 1 次提交
  4. 24 2月, 2016 1 次提交
  5. 17 2月, 2016 1 次提交
  6. 01 2月, 2016 3 次提交
  7. 30 1月, 2016 2 次提交
  8. 29 1月, 2016 8 次提交
  9. 21 1月, 2016 1 次提交
  10. 19 1月, 2016 1 次提交
  11. 13 1月, 2016 1 次提交
  12. 12 1月, 2016 4 次提交
  13. 21 12月, 2015 2 次提交
  14. 19 12月, 2015 1 次提交
  15. 14 12月, 2015 1 次提交
  16. 11 12月, 2015 4 次提交
  17. 02 12月, 2015 1 次提交
  18. 24 11月, 2015 2 次提交
    • A
      x86/entry/64: Bypass enter_from_user_mode on non-context-tracking boots · 478dc89c
      Andy Lutomirski 提交于
      On CONFIG_CONTEXT_TRACKING kernels that have context tracking
      disabled at runtime (which includes most distro kernels), we
      still have the overhead of a call to enter_from_user_mode in
      interrupt and exception entries.
      
      If jump labels are available, this uses the jump label
      infrastructure to skip the call.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/73ee804fff48cd8c66b65b724f9f728a11a8c686.1447361906.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      478dc89c
    • A
      x86/entry/64: Fix irqflag tracing wrt context tracking · f1075053
      Andy Lutomirski 提交于
      Paolo pointed out that enter_from_user_mode could be called
      while irqflags were traced as though IRQs were on.
      
      In principle, this could confuse lockdep.  It doesn't cause any
      problems that I've seen in any configuration, but if I build
      with CONFIG_DEBUG_LOCKDEP=y, enable a nohz_full CPU, and add
      code like:
      
      	if (irqs_disabled()) {
      		spin_lock(&something);
      		spin_unlock(&something);
      	}
      
      to the top of enter_from_user_mode, then lockdep will complain
      without this fix.  It seems that lockdep's irqflags sanity
      checks are too weak to detect this bug without forcing the
      issue.
      
      This patch adds one byte to normal kernels, and it's IMO a bit
      ugly. I haven't spotted a better way to do this yet, though.
      The issue is that we can't do TRACE_IRQS_OFF until after SWAPGS
      (if needed), but we're also supposed to do it before calling C
      code.
      
      An alternative approach would be to call trace_hardirqs_off in
      enter_from_user_mode.  That would be less code and would not
      bloat normal kernels at all, but it would be harder to see how
      the code worked.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/86237e362390dfa6fec12de4d75a238acb0ae787.1447361906.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f1075053
  19. 23 11月, 2015 1 次提交