1. 13 7月, 2019 13 次提交
  2. 11 7月, 2019 2 次提交
  3. 10 7月, 2019 2 次提交
    • A
      x86/pgtable/32: Fix LOWMEM_PAGES constant · 26515699
      Arnd Bergmann 提交于
      clang points out that the computation of LOWMEM_PAGES causes a signed
      integer overflow on 32-bit x86:
      
      arch/x86/kernel/head32.c:83:20: error: signed shift result (0x100000000) requires 34 bits to represent, but 'int' only has 32 bits [-Werror,-Wshift-overflow]
                      (PAGE_TABLE_SIZE(LOWMEM_PAGES) << PAGE_SHIFT);
                                       ^~~~~~~~~~~~
      arch/x86/include/asm/pgtable_32.h:109:27: note: expanded from macro 'LOWMEM_PAGES'
       #define LOWMEM_PAGES ((((2<<31) - __PAGE_OFFSET) >> PAGE_SHIFT))
                               ~^ ~~
      arch/x86/include/asm/pgtable_32.h:98:34: note: expanded from macro 'PAGE_TABLE_SIZE'
       #define PAGE_TABLE_SIZE(pages) ((pages) / PTRS_PER_PGD)
      
      Use the _ULL() macro to make it a 64-bit constant.
      
      Fixes: 1e620f9b ("x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190710130522.1802800-1-arnd@arndb.de
      26515699
    • P
      x86/alternatives: Fix int3_emulate_call() selftest stack corruption · ecc60610
      Peter Zijlstra 提交于
      KASAN shows the following splat during boot:
      
        BUG: KASAN: unknown-crash in unwind_next_frame+0x3f6/0x490
        Read of size 8 at addr ffffffff84007db0 by task swapper/0
      
        CPU: 0 PID: 0 Comm: swapper Tainted: G                T 5.2.0-rc6-00013-g7457c0da #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
        Call Trace:
         dump_stack+0x19/0x1b
         print_address_description+0x1b0/0x2b2
         __kasan_report+0x10f/0x171
         kasan_report+0x12/0x1c
         __asan_load8+0x54/0x81
         unwind_next_frame+0x3f6/0x490
         unwind_next_frame+0x1b/0x23
         arch_stack_walk+0x68/0xa5
         stack_trace_save+0x7b/0xa0
         save_trace+0x3c/0x93
         mark_lock+0x1ef/0x9b1
         lock_acquire+0x122/0x221
         __mutex_lock+0xb6/0x731
         mutex_lock_nested+0x16/0x18
         _vm_unmap_aliases+0x141/0x183
         vm_unmap_aliases+0x14/0x16
         change_page_attr_set_clr+0x15e/0x2f2
         set_memory_4k+0x2a/0x2c
         check_bugs+0x11fd/0x1298
         start_kernel+0x793/0x7eb
         x86_64_start_reservations+0x55/0x76
         x86_64_start_kernel+0x87/0xaa
         secondary_startup_64+0xa4/0xb0
      
        Memory state around the buggy address:
         ffffffff84007c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
         ffffffff84007d00: f1 00 00 00 00 00 00 00 00 00 f2 f2 f2 f3 f3 f3
        >ffffffff84007d80: f3 79 be 52 49 79 be 00 00 00 00 00 00 00 00 f1
      
      It turns out that int3_selftest() is corrupting the stack.  The problem is
      that the KASAN-ified version of int3_magic() is much less trivial than the
      C code appears.  It clobbers several unexpected registers.  So when the
      selftest's INT3 is converted to an emulated call to int3_magic(), the
      registers are clobbered and Bad Things happen when the function returns.
      
      Fix this by converting int3_magic() to the trivial ASM function it should
      be, avoiding all calling convention issues. Also add ASM_CALL_CONSTRAINT to
      the INT3 ASM, since it contains a 'CALL'.
      
      [peterz: cribbed changelog from josh]
      
      Fixes: 7457c0da ("x86/alternatives: Add int3_emulate_call() selftest")
      Reported-by: Nkernel test robot <rong.a.chen@intel.com>
      Debugged-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Link: https://lkml.kernel.org/r/20190709125744.GB3402@hirez.programming.kicks-ass.net
      ecc60610
  4. 09 7月, 2019 4 次提交
  5. 07 7月, 2019 2 次提交
    • S
      x86/fpu: Inline fpu__xstate_clear_all_cpu_caps() · 7891bc0a
      Sebastian Andrzej Siewior 提交于
      All fpu__xstate_clear_all_cpu_caps() does is to invoke one simple
      function since commit
      
        73e3a7d2 ("x86/fpu: Remove the explicit clearing of XSAVE dependent features")
      
      so invoke that function directly and remove the wrapper.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190704060743.rvew4yrjd6n33uzx@linutronix.de
      7891bc0a
    • S
      x86/fpu: Make 'no387' and 'nofxsr' command line options useful · 9838e3bf
      Sebastian Andrzej Siewior 提交于
      The command line option `no387' is designed to disable the FPU
      entirely. This only 'works' with CONFIG_MATH_EMULATION enabled.
      
      But on 64bit this cannot work because user space expects SSE to work which
      required basic FPU support. MATH_EMULATION does not help because SSE is not
      emulated.
      
      The command line option `nofxsr' should also be limited to 32bit because
      FXSR is part of the required flags on 64bit so turning it off is not
      possible.
      
      Clearing X86_FEATURE_FPU without emulation enabled will not work anyway and
      hang in fpu__init_system_early_generic() before the console is enabled.
      
      Setting additioal dependencies, ensures that the CPU still boots on a
      modern CPU. Otherwise, dropping FPU will leave FXSR enabled causing the
      kernel to crash early in fpu__init_system_mxcsr().
      
      With XSAVE support it will crash in fpu__init_cpu_xstate(). The problem is
      that xsetbv() with XMM set and SSE cleared is not allowed.  That means
      XSAVE has to be disabled. The XSAVE support is disabled in
      fpu__init_system_xstate_size_legacy() but it is too late. It can be
      removed, it has been added in commit
      
        1f999ab5 ("x86, xsave: Disable xsave in i387 emulation mode")
      
      to use `no387' on a CPU with XSAVE support.
      
      All this happens before console output.
      
      After hat, the next possible crash is in RAID6 detect code because MMX
      remained enabled. With a 3DNOW enabled config it will explode in memcpy()
      for instance due to kernel_fpu_begin() but this is unconditionally enabled.
      
      This is enough to boot a Debian Wheezy on a 32bit qemu "host" CPU which
      supports everything up to XSAVES, AVX2 without 3DNOW. Later, Debian
      increased the minimum requirements to i686 which means it does not boot
      userland atleast due to CMOV.
      
      After masking the additional features it still keeps SSE4A and 3DNOW*
      enabled (if present on the host) but those are unused in the kernel.
      
      Restrict `no387' and `nofxsr' otions to 32bit only. Add dependencies for
      FPU, FXSR to additionaly mask CMOV, MMX, XSAVE if FXSR or FPU is cleared.
      Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190703083247.57kjrmlxkai3vpw3@linutronix.de
      9838e3bf
  6. 06 7月, 2019 1 次提交
  7. 05 7月, 2019 2 次提交
    • S
      docs: s390: unify and update s390dbf kdocs at debug.c · 0328e519
      Steffen Maier 提交于
      For non-static-inlines, debug.c already had non-compliant function
      header docs. So move the pure prototype kdocs of
      ("s390: include/asm/debug.h add kerneldoc markups")
      from debug.h to debug.c and merge them with the old function docs.
      Also, I had the impression that kdoc typically is at the implementation
      in the compile unit rather than at the prototype in the header file.
      
      While at it, update the short kdoc description to distinguish the
      different functions. And a few more consistency cleanups.
      
      Added a new kdoc for debug_set_critical() since debug.h comments it
      as part of the API.
      Signed-off-by: NSteffen Maier <maier@linux.ibm.com>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Message-Id: <1562149189-1417-3-git-send-email-maier@linux.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      0328e519
    • Z
      KVM: arm64/sve: Fix vq_present() macro to yield a bool · e644fa18
      Zhang Lei 提交于
      The original implementation of vq_present() relied on aggressive
      inlining in order for the compiler to know that the code is
      correct, due to some const-casting issues.  This was causing sparse
      and clang to complain, while GCC compiled cleanly.
      
      Commit 0c529ff7 addressed this problem, but since vq_present()
      is no longer a function, there is now no implicit casting of the
      returned value to the return type (bool).
      
      In set_sve_vls(), this uncast bit value is compared against a bool,
      and so may spuriously compare as unequal when both are nonzero.  As
      a result, KVM may reject valid SVE vector length configurations as
      invalid, and vice versa.
      
      Fix it by forcing the returned value to a bool.
      Signed-off-by: NZhang Lei <zhang.lei@jp.fujitsu.com>
      Fixes: 0c529ff7 ("KVM: arm64: Implement vq_present() as a macro")
      Signed-off-by: Dave Martin <Dave.Martin@arm.com> [commit message rewrite]
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e644fa18
  8. 04 7月, 2019 2 次提交
  9. 03 7月, 2019 12 次提交
    • T
      x86/fsgsbase: Revert FSGSBASE support · 049331f2
      Thomas Gleixner 提交于
      The FSGSBASE series turned out to have serious bugs and there is still an
      open issue which is not fully understood yet.
      
      The confidence in those changes has become close to zero especially as the
      test cases which have been shipped with that series were obviously never
      run before sending the final series out to LKML.
      
        ./fsgsbase_64 >/dev/null
        Segmentation fault
      
      As the merge window is close, the only sane decision is to revert FSGSBASE
      support. The revert is necessary as this branch has been merged into
      perf/core already and rebasing all of that a few days before the merge
      window is not the most brilliant idea.
      
      I could definitely slap myself for not noticing the test case fail when
      merging that series, but TBH my expectations weren't that low back
      then. Won't happen again.
      
      Revert the following commits:
      539bca53 ("x86/entry/64: Fix and clean up paranoid_exit")
      2c7b5ac5 ("Documentation/x86/64: Add documentation for GS/FS addressing mode")
      f987c955 ("x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2")
      2032f1f9 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
      5bf0cab6 ("x86/entry/64: Document GSBASE handling in the paranoid path")
      708078f6 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")
      79e1932f ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
      1d07316b ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry")
      f60a83df ("x86/process/64: Use FSGSBASE instructions on thread copy and ptrace")
      1ab5f3f7 ("x86/process/64: Use FSBSBASE in switch_to() if available")
      a86b4625 ("x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions")
      8b71340d ("x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions")
      b64ed19b ("x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Chang S. Bae <chang.seok.bae@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      049331f2
    • A
      crypto: arm64/aes-ce - implement 5 way interleave for ECB, CBC and CTR · 7367bfeb
      Ard Biesheuvel 提交于
      This implements 5-way interleaving for ECB, CBC decryption and CTR,
      resulting in a speedup of ~11% on Marvell ThunderX2, which has a
      very deep pipeline and therefore a high issue latency for NEON
      instructions operating on the same registers.
      
      Note that XTS is left alone: implementing 5-way interleave there
      would either involve spilling of the calculated tweaks to the
      stack, or recalculating them after the encryption operation, and
      doing either of those would most likely penalize low end cores.
      
      For ECB, this is not a concern at all, given that we have plenty
      of spare registers. For CTR and CBC decryption, we take advantage
      of the fact that v16 is not used by the CE version of the code
      (which is the only one targeted by the optimization), and so we
      can reshuffle the code a bit and avoid having to spill to memory
      (with the exception of one extra reload in the CBC routine)
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      7367bfeb
    • A
      crypto: arm64/aes-ce - add 5 way interleave routines · e2174139
      Ard Biesheuvel 提交于
      In preparation of tweaking the accelerated AES chaining mode routines
      to be able to use a 5-way stride, implement the core routines to
      support processing 5 blocks of input at a time. While at it, drop
      the 2 way versions, which have been unused for a while now.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      e2174139
    • L
      bpf, x32: Fix bug with ALU64 {LSH, RSH, ARSH} BPF_K shift by 0 · 6fa632e7
      Luke Nelson 提交于
      The current x32 BPF JIT does not correctly compile shift operations when
      the immediate shift amount is 0. The expected behavior is for this to
      be a no-op.
      
      The following program demonstrates the bug. The expexceted result is 1,
      but the current JITed code returns 2.
      
        r0 = 1
        r1 = 1
        r1 <<= 0
        if r1 == 1 goto end
        r0 = 2
      end:
        exit
      
      This patch simplifies the code and fixes the bug.
      
      Fixes: 03f5781b ("bpf, x86_32: add eBPF JIT compiler for ia32")
      Co-developed-by: NXi Wang <xi.wang@gmail.com>
      Signed-off-by: NXi Wang <xi.wang@gmail.com>
      Signed-off-by: NLuke Nelson <luke.r.nels@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      6fa632e7
    • L
      bpf, x32: Fix bug with ALU64 {LSH, RSH, ARSH} BPF_X shift by 0 · 68a8357e
      Luke Nelson 提交于
      The current x32 BPF JIT for shift operations is not correct when the
      shift amount in a register is 0. The expected behavior is a no-op, whereas
      the current implementation changes bits in the destination register.
      
      The following example demonstrates the bug. The expected result of this
      program is 1, but the current JITed code returns 2.
      
        r0 = 1
        r1 = 1
        r2 = 0
        r1 <<= r2
        if r1 == 1 goto end
        r0 = 2
      end:
        exit
      
      The bug is caused by an incorrect assumption by the JIT that a shift by
      32 clear the register. On x32 however, shifts use the lower 5 bits of
      the source, making a shift by 32 equivalent to a shift by 0.
      
      This patch fixes the bug using double-precision shifts, which also
      simplifies the code.
      
      Fixes: 03f5781b ("bpf, x86_32: add eBPF JIT compiler for ia32")
      Co-developed-by: NXi Wang <xi.wang@gmail.com>
      Signed-off-by: NXi Wang <xi.wang@gmail.com>
      Signed-off-by: NLuke Nelson <luke.r.nels@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      68a8357e
    • M
      clocksource/drivers: Continue making Hyper-V clocksource ISA agnostic · dd2cb348
      Michael Kelley 提交于
      Continue consolidating Hyper-V clock and timer code into an ISA
      independent Hyper-V clocksource driver.
      
      Move the existing clocksource code under drivers/hv and arch/x86 to the new
      clocksource driver while separating out the ISA dependencies. Update
      Hyper-V initialization to call initialization and cleanup routines since
      the Hyper-V synthetic clock is not independently enumerated in ACPI.
      
      Update Hyper-V clocksource users in KVM and VDSO to get definitions from
      the new include file.
      
      No behavior is changed and no new functionality is added.
      Suggested-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: "bp@alien8.de" <bp@alien8.de>
      Cc: "will.deacon@arm.com" <will.deacon@arm.com>
      Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>
      Cc: "mark.rutland@arm.com" <mark.rutland@arm.com>
      Cc: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>
      Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
      Cc: "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>
      Cc: "olaf@aepfle.de" <olaf@aepfle.de>
      Cc: "apw@canonical.com" <apw@canonical.com>
      Cc: "jasowang@redhat.com" <jasowang@redhat.com>
      Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com>
      Cc: Sunil Muthuswamy <sunilmut@microsoft.com>
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: "sashal@kernel.org" <sashal@kernel.org>
      Cc: "vincenzo.frascino@arm.com" <vincenzo.frascino@arm.com>
      Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
      Cc: "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org>
      Cc: "linux-kselftest@vger.kernel.org" <linux-kselftest@vger.kernel.org>
      Cc: "arnd@arndb.de" <arnd@arndb.de>
      Cc: "linux@armlinux.org.uk" <linux@armlinux.org.uk>
      Cc: "ralf@linux-mips.org" <ralf@linux-mips.org>
      Cc: "paul.burton@mips.com" <paul.burton@mips.com>
      Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>
      Cc: "salyzyn@android.com" <salyzyn@android.com>
      Cc: "pcc@google.com" <pcc@google.com>
      Cc: "shuah@kernel.org" <shuah@kernel.org>
      Cc: "0x7f454c46@gmail.com" <0x7f454c46@gmail.com>
      Cc: "linux@rasmusvillemoes.dk" <linux@rasmusvillemoes.dk>
      Cc: "huw@codeweavers.com" <huw@codeweavers.com>
      Cc: "sfr@canb.auug.org.au" <sfr@canb.auug.org.au>
      Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>
      Cc: "rkrcmar@redhat.com" <rkrcmar@redhat.com>
      Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>
      Link: https://lkml.kernel.org/r/1561955054-1838-3-git-send-email-mikelley@microsoft.com
      dd2cb348
    • M
      clocksource/drivers: Make Hyper-V clocksource ISA agnostic · fd1fea68
      Michael Kelley 提交于
      Hyper-V clock/timer code and data structures are currently mixed
      in with other code in the ISA independent drivers/hv directory as
      well as the ISA dependent Hyper-V code under arch/x86.
      
      Consolidate this code and data structures into a Hyper-V clocksource driver
      to better follow the Linux model. In doing so, separate out the ISA
      dependent portions so the new clocksource driver works for x86 and for the
      in-process Hyper-V on ARM64 code.
      
      To start, move the existing clockevents code to create the new clocksource
      driver. Update the VMbus driver to call initialization and cleanup routines
      since the Hyper-V synthetic timers are not independently enumerated in
      ACPI.
      
      No behavior is changed and no new functionality is added.
      Suggested-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: "bp@alien8.de" <bp@alien8.de>
      Cc: "will.deacon@arm.com" <will.deacon@arm.com>
      Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>
      Cc: "mark.rutland@arm.com" <mark.rutland@arm.com>
      Cc: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>
      Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
      Cc: "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>
      Cc: "olaf@aepfle.de" <olaf@aepfle.de>
      Cc: "apw@canonical.com" <apw@canonical.com>
      Cc: "jasowang@redhat.com" <jasowang@redhat.com>
      Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com>
      Cc: Sunil Muthuswamy <sunilmut@microsoft.com>
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: "sashal@kernel.org" <sashal@kernel.org>
      Cc: "vincenzo.frascino@arm.com" <vincenzo.frascino@arm.com>
      Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
      Cc: "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org>
      Cc: "linux-kselftest@vger.kernel.org" <linux-kselftest@vger.kernel.org>
      Cc: "arnd@arndb.de" <arnd@arndb.de>
      Cc: "linux@armlinux.org.uk" <linux@armlinux.org.uk>
      Cc: "ralf@linux-mips.org" <ralf@linux-mips.org>
      Cc: "paul.burton@mips.com" <paul.burton@mips.com>
      Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>
      Cc: "salyzyn@android.com" <salyzyn@android.com>
      Cc: "pcc@google.com" <pcc@google.com>
      Cc: "shuah@kernel.org" <shuah@kernel.org>
      Cc: "0x7f454c46@gmail.com" <0x7f454c46@gmail.com>
      Cc: "linux@rasmusvillemoes.dk" <linux@rasmusvillemoes.dk>
      Cc: "huw@codeweavers.com" <huw@codeweavers.com>
      Cc: "sfr@canb.auug.org.au" <sfr@canb.auug.org.au>
      Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>
      Cc: "rkrcmar@redhat.com" <rkrcmar@redhat.com>
      Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>
      Link: https://lkml.kernel.org/r/1561955054-1838-2-git-send-email-mikelley@microsoft.com
      fd1fea68
    • T
      x86/irq: Seperate unused system vectors from spurious entry again · f8a8fe61
      Thomas Gleixner 提交于
      Quite some time ago the interrupt entry stubs for unused vectors in the
      system vector range got removed and directly mapped to the spurious
      interrupt vector entry point.
      
      Sounds reasonable, but it's subtly broken. The spurious interrupt vector
      entry point pushes vector number 0xFF on the stack which makes the whole
      logic in __smp_spurious_interrupt() pointless.
      
      As a consequence any spurious interrupt which comes from a vector != 0xFF
      is treated as a real spurious interrupt (vector 0xFF) and not
      acknowledged. That subsequently stalls all interrupt vectors of equal and
      lower priority, which brings the system to a grinding halt.
      
      This can happen because even on 64-bit the system vector space is not
      guaranteed to be fully populated. A full compile time handling of the
      unused vectors is not possible because quite some of them are conditonally
      populated at runtime.
      
      Bring the entry stubs back, which wastes 160 bytes if all stubs are unused,
      but gains the proper handling back. There is no point to selectively spare
      some of the stubs which are known at compile time as the required code in
      the IDT management would be way larger and convoluted.
      
      Do not route the spurious entries through common_interrupt and do_IRQ() as
      the original code did. Route it to smp_spurious_interrupt() which evaluates
      the vector number and acts accordingly now that the real vector numbers are
      handed in.
      
      Fixup the pr_warn so the actual spurious vector (0xff) is clearly
      distiguished from the other vectors and also note for the vectored case
      whether it was pending in the ISR or not.
      
       "Spurious APIC interrupt (vector 0xFF) on CPU#0, should never happen."
       "Spurious interrupt vector 0xed on CPU#1. Acked."
       "Spurious interrupt vector 0xee on CPU#1. Not pending!."
      
      Fixes: 2414e021 ("x86: Avoid building unused IRQ entry stubs")
      Reported-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Link: https://lkml.kernel.org/r/20190628111440.550568228@linutronix.de
      f8a8fe61
    • T
      x86/irq: Handle spurious interrupt after shutdown gracefully · b7107a67
      Thomas Gleixner 提交于
      Since the rework of the vector management, warnings about spurious
      interrupts have been reported. Robert provided some more information and
      did an initial analysis. The following situation leads to these warnings:
      
         CPU 0                  CPU 1               IO_APIC
      
                                                    interrupt is raised
                                                    sent to CPU1
      			  Unable to handle
      			  immediately
      			  (interrupts off,
      			   deep idle delay)
         mask()
         ...
         free()
           shutdown()
           synchronize_irq()
           clear_vector()
                                do_IRQ()
                                  -> vector is clear
      
      Before the rework the vector entries of legacy interrupts were statically
      assigned and occupied precious vector space while most of them were
      unused. Due to that the above situation was handled silently because the
      vector was handled and the core handler of the assigned interrupt
      descriptor noticed that it is shut down and returned.
      
      While this has been usually observed with legacy interrupts, this situation
      is not limited to them. Any other interrupt source, e.g. MSI, can cause the
      same issue.
      
      After adding proper synchronization for level triggered interrupts, this
      can only happen for edge triggered interrupts where the IO-APIC obviously
      cannot provide information about interrupts in flight.
      
      While the spurious warning is actually harmless in this case it worries
      users and driver developers.
      
      Handle it gracefully by marking the vector entry as VECTOR_SHUTDOWN instead
      of VECTOR_UNUSED when the vector is freed up.
      
      If that above late handling happens the spurious detector will not complain
      and switch the entry to VECTOR_UNUSED. Any subsequent spurious interrupt on
      that line will trigger the spurious warning as before.
      
      Fixes: 464d1230 ("x86/vector: Switch IOAPIC to global reservation mode")
      Reported-by: NRobert Hodaszi <Robert.Hodaszi@digi.com>
      Signed-off-by: Thomas Gleixner <tglx@linutronix.de>-
      Tested-by: NRobert Hodaszi <Robert.Hodaszi@digi.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Link: https://lkml.kernel.org/r/20190628111440.459647741@linutronix.de
      b7107a67
    • T
      x86/ioapic: Implement irq_get_irqchip_state() callback · dfe0cf8b
      Thomas Gleixner 提交于
      When an interrupt is shut down in free_irq() there might be an inflight
      interrupt pending in the IO-APIC remote IRR which is not yet serviced. That
      means the interrupt has been sent to the target CPUs local APIC, but the
      target CPU is in a state which delays the servicing.
      
      So free_irq() would proceed to free resources and to clear the vector
      because synchronize_hardirq() does not see an interrupt handler in
      progress.
      
      That can trigger a spurious interrupt warning, which is harmless and just
      confuses users, but it also can leave the remote IRR in a stale state
      because once the handler is invoked the interrupt resources might be freed
      already and therefore acknowledgement is not possible anymore.
      
      Implement the irq_get_irqchip_state() callback for the IO-APIC irq chip. The
      callback is invoked from free_irq() via __synchronize_hardirq(). Check the
      remote IRR bit of the interrupt and return 'in flight' if it is set and the
      interrupt is configured in level mode. For edge mode the remote IRR has no
      meaning.
      
      As this is only meaningful for level triggered interrupts this won't cure
      the potential spurious interrupt warning for edge triggered interrupts, but
      the edge trigger case does not result in stale hardware state. This has to
      be addressed at the vector/interrupt entry level seperately.
      
      Fixes: 464d1230 ("x86/vector: Switch IOAPIC to global reservation mode")
      Reported-by: NRobert Hodaszi <Robert.Hodaszi@digi.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Link: https://lkml.kernel.org/r/20190628111440.370295517@linutronix.de
      dfe0cf8b
    • J
      ftrace/x86: Anotate text_mutex split between... · 074376ac
      Jiri Kosina 提交于
      ftrace/x86: Anotate text_mutex split between ftrace_arch_code_modify_post_process() and ftrace_arch_code_modify_prepare()
      
      ftrace_arch_code_modify_prepare() is acquiring text_mutex, while the
      corresponding release is happening in ftrace_arch_code_modify_post_process().
      
      This has already been documented in the code, but let's also make the fact
      that this is intentional clear to the semantic analysis tools such as sparse.
      
      Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1906292321170.27227@cbobk.fhfr.pm
      
      Fixes: 39611265 ("ftrace/x86: Add a comment to why we take text_mutex in ftrace_arch_code_modify_prepare()")
      Fixes: d5b844a2 ("ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()")
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      074376ac
    • W
      KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPIC · bb34e690
      Wanpeng Li 提交于
      Thomas reported that:
      
       | Background:
       |
       |    In preparation of supporting IPI shorthands I changed the CPU offline
       |    code to software disable the local APIC instead of just masking it.
       |    That's done by clearing the APIC_SPIV_APIC_ENABLED bit in the APIC_SPIV
       |    register.
       |
       | Failure:
       |
       |    When the CPU comes back online the startup code triggers occasionally
       |    the warning in apic_pending_intr_clear(). That complains that the IRRs
       |    are not empty.
       |
       |    The offending vector is the local APIC timer vector who's IRR bit is set
       |    and stays set.
       |
       | It took me quite some time to reproduce the issue locally, but now I can
       | see what happens.
       |
       | It requires apicv_enabled=0, i.e. full apic emulation. With apicv_enabled=1
       | (and hardware support) it behaves correctly.
       |
       | Here is the series of events:
       |
       |     Guest CPU
       |
       |     goes down
       |
       |       native_cpu_disable()
       |
       | 			apic_soft_disable();
       |
       |     play_dead()
       |
       |     ....
       |
       |     startup()
       |
       |       if (apic_enabled())
       |         apic_pending_intr_clear()	<- Not taken
       |
       |      enable APIC
       |
       |         apic_pending_intr_clear()	<- Triggers warning because IRR is stale
       |
       | When this happens then the deadline timer or the regular APIC timer -
       | happens with both, has fired shortly before the APIC is disabled, but the
       | interrupt was not serviced because the guest CPU was in an interrupt
       | disabled region at that point.
       |
       | The state of the timer vector ISR/IRR bits:
       |
       |     	     	       	        ISR     IRR
       | before apic_soft_disable()    0	      1
       | after apic_soft_disable()     0	      1
       |
       | On startup		      		 0	      1
       |
       | Now one would assume that the IRR is cleared after the INIT reset, but this
       | happens only on CPU0.
       |
       | Why?
       |
       | Because our CPU0 hotplug is just for testing to make sure nothing breaks
       | and goes through an NMI wakeup vehicle because INIT would send it through
       | the boots-trap code which is not really working if that CPU was not
       | physically unplugged.
       |
       | Now looking at a real world APIC the situation in that case is:
       |
       |     	     	       	      	ISR     IRR
       | before apic_soft_disable()    0	      1
       | after apic_soft_disable()     0	      1
       |
       | On startup		      		 0	      0
       |
       | Why?
       |
       | Once the dying CPU reenables interrupts the pending interrupt gets
       | delivered as a spurious interupt and then the state is clear.
       |
       | While that CPU0 hotplug test case is surely an esoteric issue, the APIC
       | emulation is still wrong, Even if the play_dead() code would not enable
       | interrupts then the pending IRR bit would turn into an ISR .. interrupt
       | when the APIC is reenabled on startup.
      
      From SDM 10.4.7.2 Local APIC State After It Has Been Software Disabled
      * Pending interrupts in the IRR and ISR registers are held and require
        masking or handling by the CPU.
      
      In Thomas's testing, hardware cpu will not respect soft disable LAPIC
      when IRR has already been set or APICv posted-interrupt is in flight,
      so we can skip soft disable APIC checking when clearing IRR and set ISR,
      continue to respect soft disable APIC when attempting to set IRR.
      Reported-by: NRong Chen <rong.a.chen@intel.com>
      Reported-by: NFeng Tang <feng.tang@intel.com>
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Rong Chen <rong.a.chen@intel.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bb34e690