• A
    x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup · 7536656f
    Andy Lutomirski 提交于
    Right after SYSENTER, we can get a #DB or NMI.  On x86_32, there's no IST,
    so the exception handler is invoked on the temporary SYSENTER stack.
    
    Because the SYSENTER stack is very small, we have a fixup to switch
    off the stack quickly when this happens.  The old fixup had several issues:
    
     1. It checked the interrupt frame's CS and EIP.  This wasn't
        obviously correct on Xen or if vm86 mode was in use [1].
    
     2. In the NMI handler, it did some frightening digging into the
        stack frame.  I'm not convinced this digging was correct.
    
     3. The fixup didn't switch stacks and then switch back.  Instead, it
        synthesized a brand new stack frame that would redirect the IRET
        back to the SYSENTER code.  That frame was highly questionable.
        For one thing, if NMI nested inside #DB, we would effectively
        abort the #DB prologue, which was probably safe but was
        frightening.  For another, the code used PUSHFL to write the
        FLAGS portion of the frame, which was simply bogus -- by the time
        PUSHFL was called, at least TF, NT, VM, and all of the arithmetic
        flags were clobbered.
    
    Simplify this considerably.  Instead of looking at the saved frame
    to see where we came from, check the hardware ESP register against
    the SYSENTER stack directly.  Malicious user code cannot spoof the
    kernel ESP register, and by moving the check after SAVE_ALL, we can
    use normal PER_CPU accesses to find all the relevant addresses.
    
    With this patch applied, the improved syscall_nt_32 test finally
    passes on 32-bit kernels.
    
    [1] It isn't obviously correct, but it is nonetheless safe from vm86
        shenanigans as far as I can tell.  A user can't point EIP at
        entry_SYSENTER_32 while in vm86 mode because entry_SYSENTER_32,
        like all kernel addresses, is greater than 0xffff and would thus
        violate the CS segment limit.
    Signed-off-by: NAndy Lutomirski <luto@kernel.org>
    Cc: Andrew Cooper <andrew.cooper3@citrix.com>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: Denys Vlasenko <dvlasenk@redhat.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Link: http://lkml.kernel.org/r/b2cdbc037031c07ecf2c40a96069318aec0e7971.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
    7536656f
entry_32.S 25.4 KB