• Z
    x86/boot/compressed: Register dummy NMI handler in EFI boot loader, to avoid kdump crashes · 28410785
    Zeng Heng 提交于
    hulk inclusion
    category: bugfix
    issue: https://gitee.com/openeuler/kernel/issues/I69VF6
    
    --------------------------------
    
    If kdump is enabled, when using mce_inject to inject errors, EFI
    boot loader would decompress & load second kernel for saving the
    vmcore file.
    
    For normal errors that is fine. However, in the MCE case, the panic
    CPU that firstly enters into mce_panic() is running within NMI
    interrupt context, and the processor blocks delivery of subsequent
    NMIs until the next execution of the IRET instruction.
    
    When the panic CPU takes long time in the panic processing route,
    and causes the watchdog timeout, at this moment, the processor
    already receives NMI interrupt in the background.
    
    In the reproducer sequence below, panic CPU would run into EFI loader
    and raise page fault exception (like visiting `vidmem` variable
    when attempting to call debug_putstr()), the CPU would execute IRET
    instruction when it exits from the page fault handler.
    
    But the loader never registers handler for NMI vector in IDT,
    lack of vector handler would cause reboot, which interrupts
    kdump procedure and fails to save the vmcore file.
    
    Here is steps to reproduce the above issue (it's sporadic):
    
      1. # cat uncorrected
         CPU 1 BANK 4
         STATUS uncorrected 0xc0
         MCGSTATUS  EIPV MCIP
         ADDR 0x1234
         RIP 0xdeadbabe
         RAISINGCPU 0
         MCGCAP SER CMCI TES 0x6
      2. # modprobe mce_inject
      3. # mce-inject uncorrected
    
    For increasing the probability of reproduction of this issue, there are
    two ways to increase the probability of the bug:
    
      1. modify the threshold value of watchdog (increase NMI frequency);
      2. and/or add delays before panic() in mce_panic() and modify
         PANIC_TIMEOUT macro;
    
    Fixes: ca0e22d4 ("x86/boot/compressed/64: Always switch to own page table")
    Signed-off-by: NZeng Heng <zengheng4@huawei.com>
    [ Tidy up changelog, add comments. ]
    Signed-off-by: NIngo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/20230110102745.2514694-1-zengheng4@huawei.com
    28410785
idt_64.c 1.3 KB