1. 07 8月, 2019 4 次提交
    • Z
      xen/pv: Fix a boot up hang revealed by int3 self test · 11cb9f87
      Zhenzhong Duan 提交于
      [ Upstream commit b23e5844dfe78a80ba672793187d3f52e4b528d7 ]
      
      Commit 7457c0da024b ("x86/alternatives: Add int3_emulate_call()
      selftest") is used to ensure there is a gap setup in int3 exception stack
      which could be used for inserting call return address.
      
      This gap is missed in XEN PV int3 exception entry path, then below panic
      triggered:
      
      [    0.772876] general protection fault: 0000 [#1] SMP NOPTI
      [    0.772886] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0+ #11
      [    0.772893] RIP: e030:int3_magic+0x0/0x7
      [    0.772905] RSP: 3507:ffffffff82203e98 EFLAGS: 00000246
      [    0.773334] Call Trace:
      [    0.773334]  alternative_instructions+0x3d/0x12e
      [    0.773334]  check_bugs+0x7c9/0x887
      [    0.773334]  ? __get_locked_pte+0x178/0x1f0
      [    0.773334]  start_kernel+0x4ff/0x535
      [    0.773334]  ? set_init_arg+0x55/0x55
      [    0.773334]  xen_start_kernel+0x571/0x57a
      
      For 64bit PV guests, Xen's ABI enters the kernel with using SYSRET, with
      %rcx/%r11 on the stack. To convert back to "normal" looking exceptions,
      the xen thunks do 'xen_*: pop %rcx; pop %r11; jmp *'.
      
      E.g. Extracting 'xen_pv_trap xenint3' we have:
      xen_xenint3:
       pop %rcx;
       pop %r11;
       jmp xenint3
      
      As xenint3 and int3 entry code are same except xenint3 doesn't generate
      a gap, we can fix it by using int3 and drop useless xenint3.
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      11cb9f87
    • A
      x86: math-emu: Hide clang warnings for 16-bit overflow · 1b84e674
      Arnd Bergmann 提交于
      [ Upstream commit 29e7e9664aec17b94a9c8c5a75f8d216a206aa3a ]
      
      clang warns about a few parts of the math-emu implementation
      where a 16-bit integer becomes negative during assignment:
      
      arch/x86/math-emu/poly_tan.c:88:35: error: implicit conversion from 'int' to 'short' changes value from 49216 to -16320 [-Werror,-Wconstant-conversion]
                                            (0x41 + EXTENDED_Ebias) | SIGN_Negative);
                                            ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
      arch/x86/math-emu/fpu_emu.h:180:58: note: expanded from macro 'setexponent16'
       #define setexponent16(x,y)  { (*(short *)&((x)->exp)) = (y); }
                                                            ~  ^
      arch/x86/math-emu/reg_constant.c:37:32: error: implicit conversion from 'int' to 'short' changes value from 49085 to -16451 [-Werror,-Wconstant-conversion]
      FPU_REG const CONST_PI2extra = MAKE_REG(NEG, -66,
                                     ^~~~~~~~~~~~~~~~~~
      arch/x86/math-emu/reg_constant.c:21:25: note: expanded from macro 'MAKE_REG'
                      ((EXTENDED_Ebias+(e)) | ((SIGN_##s != 0)*0x8000)) }
                       ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
      arch/x86/math-emu/reg_constant.c:48:28: error: implicit conversion from 'int' to 'short' changes value from 65535 to -1 [-Werror,-Wconstant-conversion]
      FPU_REG const CONST_QNaN = MAKE_REG(NEG, EXP_OVER, 0x00000000, 0xC0000000);
                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      arch/x86/math-emu/reg_constant.c:21:25: note: expanded from macro 'MAKE_REG'
                      ((EXTENDED_Ebias+(e)) | ((SIGN_##s != 0)*0x8000)) }
                       ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
      
      The code is correct as is, so add a typecast to shut up the warnings.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190712090816.350668-1-arnd@arndb.deSigned-off-by: NSasha Levin <sashal@kernel.org>
      1b84e674
    • Q
      x86/apic: Silence -Wtype-limits compiler warnings · 242666b2
      Qian Cai 提交于
      [ Upstream commit ec6335586953b0df32f83ef696002063090c7aef ]
      
      There are many compiler warnings like this,
      
      In file included from ./arch/x86/include/asm/smp.h:13,
                       from ./arch/x86/include/asm/mmzone_64.h:11,
                       from ./arch/x86/include/asm/mmzone.h:5,
                       from ./include/linux/mmzone.h:969,
                       from ./include/linux/gfp.h:6,
                       from ./include/linux/mm.h:10,
                       from arch/x86/kernel/apic/io_apic.c:34:
      arch/x86/kernel/apic/io_apic.c: In function 'check_timer':
      ./arch/x86/include/asm/apic.h:37:11: warning: comparison of unsigned
      expression >= 0 is always true [-Wtype-limits]
         if ((v) <= apic_verbosity) \
                 ^~
      arch/x86/kernel/apic/io_apic.c:2160:2: note: in expansion of macro
      'apic_printk'
        apic_printk(APIC_QUIET, KERN_INFO "..TIMER: vector=0x%02X "
        ^~~~~~~~~~~
      ./arch/x86/include/asm/apic.h:37:11: warning: comparison of unsigned
      expression >= 0 is always true [-Wtype-limits]
         if ((v) <= apic_verbosity) \
                 ^~
      arch/x86/kernel/apic/io_apic.c:2207:4: note: in expansion of macro
      'apic_printk'
          apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
          ^~~~~~~~~~~
      
      APIC_QUIET is 0, so silence them by making apic_verbosity type int.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1562621805-24789-1-git-send-email-cai@lca.pwSigned-off-by: NSasha Levin <sashal@kernel.org>
      242666b2
    • A
      x86: kvm: avoid constant-conversion warning · 80f58147
      Arnd Bergmann 提交于
      [ Upstream commit a6a6d3b1f867d34ba5bd61aa7bb056b48ca67cff ]
      
      clang finds a contruct suspicious that converts an unsigned
      character to a signed integer and back, causing an overflow:
      
      arch/x86/kvm/mmu.c:4605:39: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -205 to 51 [-Werror,-Wconstant-conversion]
                      u8 wf = (pfec & PFERR_WRITE_MASK) ? ~w : 0;
                         ~~                               ^~
      arch/x86/kvm/mmu.c:4607:38: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -241 to 15 [-Werror,-Wconstant-conversion]
                      u8 uf = (pfec & PFERR_USER_MASK) ? ~u : 0;
                         ~~                              ^~
      arch/x86/kvm/mmu.c:4609:39: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -171 to 85 [-Werror,-Wconstant-conversion]
                      u8 ff = (pfec & PFERR_FETCH_MASK) ? ~x : 0;
                         ~~                               ^~
      
      Add an explicit cast to tell clang that everything works as
      intended here.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Link: https://github.com/ClangBuiltLinux/linux/issues/95Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      80f58147
  2. 31 7月, 2019 2 次提交
  3. 28 7月, 2019 2 次提交
  4. 26 7月, 2019 11 次提交
  5. 21 7月, 2019 7 次提交
    • J
      x86/entry/32: Fix ENDPROC of common_spurious · d173ce09
      Jiri Slaby 提交于
      [ Upstream commit 1cbec37b3f9cff074a67bef4fc34b30a09958a0a ]
      
      common_spurious is currently ENDed erroneously. common_interrupt is used
      in its ENDPROC. So fix this mistake.
      
      Found by my asm macros rewrite patchset.
      
      Fixes: f8a8fe61fec8 ("x86/irq: Seperate unused system vectors from spurious entry again")
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190709063402.19847-1-jslaby@suse.czSigned-off-by: NSasha Levin <sashal@kernel.org>
      d173ce09
    • T
      x86/irq: Seperate unused system vectors from spurious entry again · fc6975ee
      Thomas Gleixner 提交于
      commit f8a8fe61fec8006575699559ead88b0b833d5cad upstream
      
      Quite some time ago the interrupt entry stubs for unused vectors in the
      system vector range got removed and directly mapped to the spurious
      interrupt vector entry point.
      
      Sounds reasonable, but it's subtly broken. The spurious interrupt vector
      entry point pushes vector number 0xFF on the stack which makes the whole
      logic in __smp_spurious_interrupt() pointless.
      
      As a consequence any spurious interrupt which comes from a vector != 0xFF
      is treated as a real spurious interrupt (vector 0xFF) and not
      acknowledged. That subsequently stalls all interrupt vectors of equal and
      lower priority, which brings the system to a grinding halt.
      
      This can happen because even on 64-bit the system vector space is not
      guaranteed to be fully populated. A full compile time handling of the
      unused vectors is not possible because quite some of them are conditonally
      populated at runtime.
      
      Bring the entry stubs back, which wastes 160 bytes if all stubs are unused,
      but gains the proper handling back. There is no point to selectively spare
      some of the stubs which are known at compile time as the required code in
      the IDT management would be way larger and convoluted.
      
      Do not route the spurious entries through common_interrupt and do_IRQ() as
      the original code did. Route it to smp_spurious_interrupt() which evaluates
      the vector number and acts accordingly now that the real vector numbers are
      handed in.
      
      Fixup the pr_warn so the actual spurious vector (0xff) is clearly
      distiguished from the other vectors and also note for the vectored case
      whether it was pending in the ISR or not.
      
       "Spurious APIC interrupt (vector 0xFF) on CPU#0, should never happen."
       "Spurious interrupt vector 0xed on CPU#1. Acked."
       "Spurious interrupt vector 0xee on CPU#1. Not pending!."
      
      Fixes: 2414e021 ("x86: Avoid building unused IRQ entry stubs")
      Reported-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Link: https://lkml.kernel.org/r/20190628111440.550568228@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      fc6975ee
    • T
      x86/irq: Handle spurious interrupt after shutdown gracefully · 9494cd39
      Thomas Gleixner 提交于
      commit b7107a67f0d125459fe41f86e8079afd1a5e0b15 upstream
      
      Since the rework of the vector management, warnings about spurious
      interrupts have been reported. Robert provided some more information and
      did an initial analysis. The following situation leads to these warnings:
      
         CPU 0                  CPU 1               IO_APIC
      
                                                    interrupt is raised
                                                    sent to CPU1
      			  Unable to handle
      			  immediately
      			  (interrupts off,
      			   deep idle delay)
         mask()
         ...
         free()
           shutdown()
           synchronize_irq()
           clear_vector()
                                do_IRQ()
                                  -> vector is clear
      
      Before the rework the vector entries of legacy interrupts were statically
      assigned and occupied precious vector space while most of them were
      unused. Due to that the above situation was handled silently because the
      vector was handled and the core handler of the assigned interrupt
      descriptor noticed that it is shut down and returned.
      
      While this has been usually observed with legacy interrupts, this situation
      is not limited to them. Any other interrupt source, e.g. MSI, can cause the
      same issue.
      
      After adding proper synchronization for level triggered interrupts, this
      can only happen for edge triggered interrupts where the IO-APIC obviously
      cannot provide information about interrupts in flight.
      
      While the spurious warning is actually harmless in this case it worries
      users and driver developers.
      
      Handle it gracefully by marking the vector entry as VECTOR_SHUTDOWN instead
      of VECTOR_UNUSED when the vector is freed up.
      
      If that above late handling happens the spurious detector will not complain
      and switch the entry to VECTOR_UNUSED. Any subsequent spurious interrupt on
      that line will trigger the spurious warning as before.
      
      Fixes: 464d1230 ("x86/vector: Switch IOAPIC to global reservation mode")
      Reported-by: NRobert Hodaszi <Robert.Hodaszi@digi.com>
      Signed-off-by: Thomas Gleixner <tglx@linutronix.de>-
      Tested-by: NRobert Hodaszi <Robert.Hodaszi@digi.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Link: https://lkml.kernel.org/r/20190628111440.459647741@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      9494cd39
    • T
      x86/ioapic: Implement irq_get_irqchip_state() callback · 7897f5a4
      Thomas Gleixner 提交于
      commit dfe0cf8b51b07e56ded571e3de0a4a9382517231 upstream
      
      When an interrupt is shut down in free_irq() there might be an inflight
      interrupt pending in the IO-APIC remote IRR which is not yet serviced. That
      means the interrupt has been sent to the target CPUs local APIC, but the
      target CPU is in a state which delays the servicing.
      
      So free_irq() would proceed to free resources and to clear the vector
      because synchronize_hardirq() does not see an interrupt handler in
      progress.
      
      That can trigger a spurious interrupt warning, which is harmless and just
      confuses users, but it also can leave the remote IRR in a stale state
      because once the handler is invoked the interrupt resources might be freed
      already and therefore acknowledgement is not possible anymore.
      
      Implement the irq_get_irqchip_state() callback for the IO-APIC irq chip. The
      callback is invoked from free_irq() via __synchronize_hardirq(). Check the
      remote IRR bit of the interrupt and return 'in flight' if it is set and the
      interrupt is configured in level mode. For edge mode the remote IRR has no
      meaning.
      
      As this is only meaningful for level triggered interrupts this won't cure
      the potential spurious interrupt warning for edge triggered interrupts, but
      the edge trigger case does not result in stale hardware state. This has to
      be addressed at the vector/interrupt entry level seperately.
      
      Fixes: 464d1230 ("x86/vector: Switch IOAPIC to global reservation mode")
      Reported-by: NRobert Hodaszi <Robert.Hodaszi@digi.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Link: https://lkml.kernel.org/r/20190628111440.370295517@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      7897f5a4
    • K
      x86/boot/64: Add missing fixup_pointer() for next_early_pgt access · 94968c37
      Kirill A. Shutemov 提交于
      [ Upstream commit c1887159eb48ba40e775584cfb2a443962cf1a05 ]
      
      __startup_64() uses fixup_pointer() to access global variables in a
      position-independent fashion. Access to next_early_pgt was wrapped into the
      helper, but one instance in the 5-level paging branch was missed.
      
      GCC generates a R_X86_64_PC32 PC-relative relocation for the access which
      doesn't trigger the issue, but Clang emmits a R_X86_64_32S which leads to
      an invalid memory access and system reboot.
      
      Fixes: 187e91fe ("x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt'")
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alexander Potapenko <glider@google.com>
      Link: https://lkml.kernel.org/r/20190620112422.29264-1-kirill.shutemov@linux.intel.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      94968c37
    • K
      x86/boot/64: Fix crash if kernel image crosses page table boundary · 729d25f4
      Kirill A. Shutemov 提交于
      [ Upstream commit 81c7ed296dcd02bc0b4488246d040e03e633737a ]
      
      A kernel which boots in 5-level paging mode crashes in a small percentage
      of cases if KASLR is enabled.
      
      This issue was tracked down to the case when the kernel image unpacks in a
      way that it crosses an 1G boundary. The crash is caused by an overrun of
      the PMD page table in __startup_64() and corruption of P4D page table
      allocated next to it. This particular issue is not visible with 4-level
      paging as P4D page tables are not used.
      
      But the P4D and the PUD calculation have similar problems.
      
      The PMD index calculation is wrong due to operator precedence, which fails
      to confine the PMDs in the PMD array on wrap around.
      
      The P4D calculation for 5-level paging and the PUD calculation calculate
      the first index correctly, but then blindly increment it which causes the
      same issue when a kernel image is located across a 512G and for 5-level
      paging across a 46T boundary.
      
      This wrap around mishandling was introduced when these parts moved from
      assembly to C.
      
      Restore it to the correct behaviour.
      
      Fixes: c88d7150 ("x86/boot/64: Rewrite startup_64() in C")
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20190620112345.28833-1-kirill.shutemov@linux.intel.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      729d25f4
    • C
      x86/apic: Fix integer overflow on 10 bit left shift of cpu_khz · 2a6ee369
      Colin Ian King 提交于
      [ Upstream commit ea136a112d89bade596314a1ae49f748902f4727 ]
      
      The left shift of unsigned int cpu_khz will overflow for large values of
      cpu_khz, so cast it to a long long before shifting it to avoid overvlow.
      For example, this can happen when cpu_khz is 4194305, i.e. ~4.2 GHz.
      
      Addresses-Coverity: ("Unintentional integer overflow")
      Fixes: 8c3ba8d0 ("x86, apic: ack all pending irqs when crashed/on kexec")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: kernel-janitors@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190619181446.13635-1-colin.king@canonical.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      2a6ee369
  6. 14 7月, 2019 3 次提交
  7. 10 7月, 2019 5 次提交
    • W
      KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPIC · f25c0695
      Wanpeng Li 提交于
      commit bb34e690e9340bc155ebed5a3d75fc63ff69e082 upstream.
      
      Thomas reported that:
      
       | Background:
       |
       |    In preparation of supporting IPI shorthands I changed the CPU offline
       |    code to software disable the local APIC instead of just masking it.
       |    That's done by clearing the APIC_SPIV_APIC_ENABLED bit in the APIC_SPIV
       |    register.
       |
       | Failure:
       |
       |    When the CPU comes back online the startup code triggers occasionally
       |    the warning in apic_pending_intr_clear(). That complains that the IRRs
       |    are not empty.
       |
       |    The offending vector is the local APIC timer vector who's IRR bit is set
       |    and stays set.
       |
       | It took me quite some time to reproduce the issue locally, but now I can
       | see what happens.
       |
       | It requires apicv_enabled=0, i.e. full apic emulation. With apicv_enabled=1
       | (and hardware support) it behaves correctly.
       |
       | Here is the series of events:
       |
       |     Guest CPU
       |
       |     goes down
       |
       |       native_cpu_disable()
       |
       | 			apic_soft_disable();
       |
       |     play_dead()
       |
       |     ....
       |
       |     startup()
       |
       |       if (apic_enabled())
       |         apic_pending_intr_clear()	<- Not taken
       |
       |      enable APIC
       |
       |         apic_pending_intr_clear()	<- Triggers warning because IRR is stale
       |
       | When this happens then the deadline timer or the regular APIC timer -
       | happens with both, has fired shortly before the APIC is disabled, but the
       | interrupt was not serviced because the guest CPU was in an interrupt
       | disabled region at that point.
       |
       | The state of the timer vector ISR/IRR bits:
       |
       |     	     	       	        ISR     IRR
       | before apic_soft_disable()    0	      1
       | after apic_soft_disable()     0	      1
       |
       | On startup		      		 0	      1
       |
       | Now one would assume that the IRR is cleared after the INIT reset, but this
       | happens only on CPU0.
       |
       | Why?
       |
       | Because our CPU0 hotplug is just for testing to make sure nothing breaks
       | and goes through an NMI wakeup vehicle because INIT would send it through
       | the boots-trap code which is not really working if that CPU was not
       | physically unplugged.
       |
       | Now looking at a real world APIC the situation in that case is:
       |
       |     	     	       	      	ISR     IRR
       | before apic_soft_disable()    0	      1
       | after apic_soft_disable()     0	      1
       |
       | On startup		      		 0	      0
       |
       | Why?
       |
       | Once the dying CPU reenables interrupts the pending interrupt gets
       | delivered as a spurious interupt and then the state is clear.
       |
       | While that CPU0 hotplug test case is surely an esoteric issue, the APIC
       | emulation is still wrong, Even if the play_dead() code would not enable
       | interrupts then the pending IRR bit would turn into an ISR .. interrupt
       | when the APIC is reenabled on startup.
      
      From SDM 10.4.7.2 Local APIC State After It Has Been Software Disabled
      * Pending interrupts in the IRR and ISR registers are held and require
        masking or handling by the CPU.
      
      In Thomas's testing, hardware cpu will not respect soft disable LAPIC
      when IRR has already been set or APICv posted-interrupt is in flight,
      so we can skip soft disable APIC checking when clearing IRR and set ISR,
      continue to respect soft disable APIC when attempting to set IRR.
      Reported-by: NRong Chen <rong.a.chen@intel.com>
      Reported-by: NFeng Tang <feng.tang@intel.com>
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Rong Chen <rong.a.chen@intel.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f25c0695
    • P
      KVM: x86: degrade WARN to pr_warn_ratelimited · f6472f50
      Paolo Bonzini 提交于
      commit 3f16a5c318392cbb5a0c7a3d19dff8c8ef3c38ee upstream.
      
      This warning can be triggered easily by userspace, so it should certainly not
      cause a panic if panic_on_warn is set.
      
      Reported-by: syzbot+c03f30b4f4c46bdf8575@syzkaller.appspotmail.com
      Suggested-by: NAlexander Potapenko <glider@google.com>
      Acked-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6472f50
    • K
      x86/boot/compressed/64: Do not corrupt EDX on EFER.LME=1 setting · 6bf96773
      Kirill A. Shutemov 提交于
      [ Upstream commit 45b13b424faafb81c8c44541f093a682fdabdefc ]
      
      RDMSR in the trampoline code overwrites EDX but that register is used
      to indicate whether 5-level paging has to be enabled and if clobbered,
      leads to failure to boot on a 5-level paging machine.
      
      Preserve EDX on the stack while we are dealing with EFER.
      
      Fixes: b677dfae5aa1 ("x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode")
      Reported-by: NKyle D Pelton <kyle.d.pelton@intel.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: dave.hansen@linux.intel.com
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Huang <wei@redhat.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190206115253.1907-1-kirill.shutemov@linux.intel.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      6bf96773
    • P
      ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code() · c854d9b6
      Petr Mladek 提交于
      commit d5b844a2cf507fc7642c9ae80a9d585db3065c28 upstream.
      
      The commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text
      permissions race") causes a possible deadlock between register_kprobe()
      and ftrace_run_update_code() when ftrace is using stop_machine().
      
      The existing dependency chain (in reverse order) is:
      
      -> #1 (text_mutex){+.+.}:
             validate_chain.isra.21+0xb32/0xd70
             __lock_acquire+0x4b8/0x928
             lock_acquire+0x102/0x230
             __mutex_lock+0x88/0x908
             mutex_lock_nested+0x32/0x40
             register_kprobe+0x254/0x658
             init_kprobes+0x11a/0x168
             do_one_initcall+0x70/0x318
             kernel_init_freeable+0x456/0x508
             kernel_init+0x22/0x150
             ret_from_fork+0x30/0x34
             kernel_thread_starter+0x0/0xc
      
      -> #0 (cpu_hotplug_lock.rw_sem){++++}:
             check_prev_add+0x90c/0xde0
             validate_chain.isra.21+0xb32/0xd70
             __lock_acquire+0x4b8/0x928
             lock_acquire+0x102/0x230
             cpus_read_lock+0x62/0xd0
             stop_machine+0x2e/0x60
             arch_ftrace_update_code+0x2e/0x40
             ftrace_run_update_code+0x40/0xa0
             ftrace_startup+0xb2/0x168
             register_ftrace_function+0x64/0x88
             klp_patch_object+0x1a2/0x290
             klp_enable_patch+0x554/0x980
             do_one_initcall+0x70/0x318
             do_init_module+0x6e/0x250
             load_module+0x1782/0x1990
             __s390x_sys_finit_module+0xaa/0xf0
             system_call+0xd8/0x2d0
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(text_mutex);
                                     lock(cpu_hotplug_lock.rw_sem);
                                     lock(text_mutex);
        lock(cpu_hotplug_lock.rw_sem);
      
      It is similar problem that has been solved by the commit 2d1e38f5
      ("kprobes: Cure hotplug lock ordering issues"). Many locks are involved.
      To be on the safe side, text_mutex must become a low level lock taken
      after cpu_hotplug_lock.rw_sem.
      
      This can't be achieved easily with the current ftrace design.
      For example, arm calls set_all_modules_text_rw() already in
      ftrace_arch_code_modify_prepare(), see arch/arm/kernel/ftrace.c.
      This functions is called:
      
        + outside stop_machine() from ftrace_run_update_code()
        + without stop_machine() from ftrace_module_enable()
      
      Fortunately, the problematic fix is needed only on x86_64. It is
      the only architecture that calls set_all_modules_text_rw()
      in ftrace path and supports livepatching at the same time.
      
      Therefore it is enough to move text_mutex handling from the generic
      kernel/trace/ftrace.c into arch/x86/kernel/ftrace.c:
      
         ftrace_arch_code_modify_prepare()
         ftrace_arch_code_modify_post_process()
      
      This patch basically reverts the ftrace part of the problematic
      commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module
      text permissions race"). And provides x86_64 specific-fix.
      
      Some refactoring of the ftrace code will be needed when livepatching
      is implemented for arm or nds32. These architectures call
      set_all_modules_text_rw() and use stop_machine() at the same time.
      
      Link: http://lkml.kernel.org/r/20190627081334.12793-1-pmladek@suse.com
      
      Fixes: 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race")
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Reported-by: NMiroslav Benes <mbenes@suse.cz>
      Reviewed-by: NMiroslav Benes <mbenes@suse.cz>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      [
        As reviewed by Miroslav Benes <mbenes@suse.cz>, removed return value of
        ftrace_run_update_code() as it is a void function.
      ]
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c854d9b6
    • K
      x86/CPU: Add more Icelake model numbers · 5284327f
      Kan Liang 提交于
      [ Upstream commit e35faeb64146f2015f2aec14b358ae508e4066db ]
      
      Add the CPUID model numbers of Icelake (ICL) desktop and server
      processors to the Intel family list.
      
       [ Qiuxu: Sort the macros by model number. ]
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
      Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com>
      Cc: rui.zhang@intel.com
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190603134122.13853-1-kan.liang@linux.intel.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      5284327f
  8. 03 7月, 2019 4 次提交
  9. 25 6月, 2019 1 次提交
  10. 22 6月, 2019 1 次提交
    • F
      x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor · a35e7822
      Frank van der Linden 提交于
      [ Upstream commit 2ac44ab608705948564791ce1d15d43ba81a1e38 ]
      
      For F17h AMD CPUs, the CPB capability ('Core Performance Boost') is forcibly set,
      because some versions of that chip incorrectly report that they do not have it.
      
      However, a hypervisor may filter out the CPB capability, for good
      reasons. For example, KVM currently does not emulate setting the CPB
      bit in MSR_K7_HWCR, and unchecked MSR access errors will be thrown
      when trying to set it as a guest:
      
      	unchecked MSR access error: WRMSR to 0xc0010015 (tried to write 0x0000000001000011) at rIP: 0xffffffff890638f4 (native_write_msr+0x4/0x20)
      
      	Call Trace:
      	boost_set_msr+0x50/0x80 [acpi_cpufreq]
      	cpuhp_invoke_callback+0x86/0x560
      	sort_range+0x20/0x20
      	cpuhp_thread_fun+0xb0/0x110
      	smpboot_thread_fn+0xef/0x160
      	kthread+0x113/0x130
      	kthread_create_worker_on_cpu+0x70/0x70
      	ret_from_fork+0x35/0x40
      
      To avoid this issue, don't forcibly set the CPB capability for a CPU
      when running under a hypervisor.
      Signed-off-by: NFrank van der Linden <fllinden@amazon.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bp@alien8.de
      Cc: jiaxun.yang@flygoat.com
      Fixes: 0237199186e7 ("x86/CPU/AMD: Set the CPB bit unconditionally on F17h")
      Link: http://lkml.kernel.org/r/20190522221745.GA15789@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com
      [ Minor edits to the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a35e7822