提交 · 72c3c0fe54a3f3ddea8f5ca468ddf9deaf2100b7 · openeuler / Kernel

24 7月, 2020 11 次提交

x86/kvm: Use generic xfer to guest work function · 72c3c0fe

由 Thomas Gleixner 提交于 7月 23, 2020

Use the generic infrastructure to check for and handle pending work before
transitioning into guest mode.

This now handles TIF_NOTIFY_RESUME as well which was ignored so
far. Handling it is important as this covers task work and task work will
be used to offload the heavy lifting of POSIX CPU timers to thread context.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200722220520.979724969@linutronix.de

72c3c0fe

x86/entry: Cleanup idtentry_enter/exit · a27a0a55

由 Thomas Gleixner 提交于 7月 23, 2020

Remove the temporary defines and fixup all references.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200722220520.855839271@linutronix.de

a27a0a55

x86/entry: Use generic interrupt entry/exit code · bdcd178a

由 Thomas Gleixner 提交于 7月 23, 2020

Replace the x86 code with the generic variant. Use temporary defines for
idtentry_* which will be cleaned up in the next step.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200722220520.711492752@linutronix.de

bdcd178a

x86/entry: Cleanup idtentry_entry/exit_user · 517e4992

由 Thomas Gleixner 提交于 7月 23, 2020

Cleanup the temporary defines and use irqentry_ instead of idtentry_.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200722220520.602603691@linutronix.de

517e4992

x86/entry: Use generic syscall exit functionality · 167fd210

由 Thomas Gleixner 提交于 7月 23, 2020

Replace the x86 variant with the generic version. Provide the relevant
architecture specific helper functions and defines.

Use a temporary define for idtentry_exit_user which will be cleaned up
seperately.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NKees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200722220520.494648601@linutronix.de

167fd210

x86/entry: Use generic syscall entry function · 27d6b4d1

由 Thomas Gleixner 提交于 7月 23, 2020

Replace the syscall entry work handling with the generic version. Provide
the necessary helper inlines to handle the real architecture specific
parts, e.g. ptrace.

Use a temporary define for idtentry_enter_user which will be cleaned up
seperately.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200722220520.376213694@linutronix.de

27d6b4d1

x86/ptrace: Provide pt_regs helper for entry/exit · 0bf019ea

由 Thomas Gleixner 提交于 7月 23, 2020

As a preparatory step for moving the syscall and interrupt entry/exit
handling into generic code, provide a pt_regs helper which retrieves the
interrupt state from pt_regs. This is required to check whether interrupts
are reenabled by return from interrupt/exception.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200722220520.258511584@linutronix.de

0bf019ea

x86/entry: Move user return notifier out of loop · a377ac1c

由 Thomas Gleixner 提交于 7月 23, 2020

Guests and user space share certain MSRs. KVM sets these MSRs to guest
values once and does not set them back to user space values on every VM
exit to spare the costly MSR operations.

User return notifiers ensure that these MSRs are set back to the correct
values before returning to user space in exit_to_usermode_loop().

There is no reason to evaluate the TIF flag indicating that user return
notifiers need to be invoked in the loop. The important point is that they
are invoked before returning to user space.

Move the invocation out of the loop into the section which does the last
preperatory steps before returning to user space. That section is not
preemptible and runs with interrupts disabled until the actual return.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200722220520.159112003@linutronix.de

a377ac1c

x86/entry: Consolidate 32/64 bit syscall entry · 0b085e68

由 Thomas Gleixner 提交于 7月 23, 2020

64bit and 32bit entry code have the same open coded syscall entry handling
after the bitwidth specific bits.

Move it to a helper function and share the code.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200722220520.051234096@linutronix.de

0b085e68

x86/entry: Consolidate check_user_regs() · 8d5ea35c

由 Thomas Gleixner 提交于 7月 23, 2020

The user register sanity check is sprinkled all over the place. Move it
into enter_from_user_mode().
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200722220519.943016204@linutronix.de

8d5ea35c

x86: Correct noinstr qualifiers · 7f6fa101

由 Ira Weiny 提交于 7月 23, 2020

The noinstr qualifier is to be specified before the return type in the
same way inline is used.

These 2 cases were missed by previous patches.
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200723161405.852613-1-ira.weiny@intel.com

7f6fa101

19 7月, 2020 2 次提交

x86/boot: Don't add the EFI stub to targets · da05b143

由 Arvind Sankar 提交于 7月 14, 2020

vmlinux-objs-y is added to targets, which currently means that the EFI
stub gets added to the targets as well. It shouldn't be added since it
is built elsewhere.

This confuses Makefile.build which interprets the EFI stub as a target
	$(obj)/$(objtree)/drivers/firmware/efi/libstub/lib.a
and will create drivers/firmware/efi/libstub/ underneath
arch/x86/boot/compressed, to hold this supposed target, if building
out-of-tree. [0]

Fix this by pulling the stub out of vmlinux-objs-y into efi-obj-y.

[0] See scripts/Makefile.build near the end:
    # Create directories for object files if they do not exist
Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMasahiro Yamada <masahiroy@kernel.org>
Acked-by: NArd Biesheuvel <ardb@kernel.org>
Link: https://lkml.kernel.org/r/20200715032631.1562882-1-nivedita@alum.mit.edu

da05b143

x86/entry: Actually disable stack protector · 58ac3154

由 Kees Cook 提交于 6月 26, 2020

Some builds of GCC enable stack protector by default. Simply removing
the arguments is not sufficient to disable stack protector, as the stack
protector for those GCC builds must be explicitly disabled. Remove the
argument removals and add -fno-stack-protector. Additionally include
missed x32 argument updates, and adjust whitespace for readability.

Fixes: 20355e5f ("x86/entry: Exclude low level entry code from sanitizing")
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/202006261333.585319CA6B@keescook

58ac3154

18 7月, 2020 2 次提交

x86/ioperm: Fix io bitmap invalidation on Xen PV · cadfad87

由 Andy Lutomirski 提交于 7月 17, 2020

tss_invalidate_io_bitmap() wasn't wired up properly through the pvop
machinery, so the TSS and Xen's io bitmap would get out of sync
whenever disabling a valid io bitmap.

Add a new pvop for tss_invalidate_io_bitmap() to fix it.

This is XSA-329.

Fixes: 22fe5b04 ("x86/ioperm: Move TSS bitmap update to exit to user work")
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/d53075590e1f91c19f8af705059d3ff99424c020.1595030016.git.luto@kernel.org

cadfad87

genirq/affinity: Handle affinity setting on inactive interrupts correctly · baedb87d

由 Thomas Gleixner 提交于 7月 17, 2020

Setting interrupt affinity on inactive interrupts is inconsistent when
hierarchical irq domains are enabled. The core code should just store the
affinity and not call into the irq chip driver for inactive interrupts
because the chip drivers may not be in a state to handle such requests.

X86 has a hacky workaround for that but all other irq chips have not which
causes problems e.g. on GIC V3 ITS.

Instead of adding more ugly hacks all over the place, solve the problem in
the core code. If the affinity is set on an inactive interrupt then:

    - Store it in the irq descriptors affinity mask
    - Update the effective affinity to reflect that so user space has
      a consistent view
    - Don't call into the irq chip driver

This is the core equivalent of the X86 workaround and works correctly
because the affinity setting is established in the irq chip when the
interrupt is activated later on.

Note, that this is only effective when hierarchical irq domains are enabled
by the architecture. Doing it unconditionally would break legacy irq chip
implementations.

For hierarchial irq domains this works correctly as none of the drivers can
have a dependency on affinity setting in inactive state by design.

Remove the X86 workaround as it is not longer required.

Fixes: 02edee15 ("x86/apic/vector: Ignore set_affinity call for inactive interrupts")
Reported-by: NAli Saidi <alisaidi@amazon.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NAli Saidi <alisaidi@amazon.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com
Link: https://lkml.kernel.org/r/877dv2rv25.fsf@nanos.tec.linutronix.de

baedb87d

16 7月, 2020 4 次提交

x86: math-emu: Fix up 'cmp' insn for clang ias · 81e96851

由 Arnd Bergmann 提交于 5月 27, 2020

The clang integrated assembler requires the 'cmp' instruction to
have a length prefix here:

arch/x86/math-emu/wm_sqrt.S:212:2: error: ambiguous instructions require an explicit suffix (could be 'cmpb', 'cmpw', or 'cmpl')
 cmp $0xffffffff,-24(%ebp)
 ^

Make this a 32-bit comparison, which it was clearly meant to be.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Link: https://lkml.kernel.org/r/20200527135352.1198078-1-arnd@arndb.de

81e96851

x86/entry: Fix vectors to IDTENTRY_SYSVEC for CONFIG_HYPERV · 5769fe26

由 Sedat Dilek 提交于 7月 14, 2020

When assembling with Clang via `make LLVM_IAS=1` and CONFIG_HYPERV enabled,
we observe the following error:

<instantiation>:9:6: error: expected absolute expression
 .if HYPERVISOR_REENLIGHTENMENT_VECTOR == 3
     ^
<instantiation>:1:1: note: while in macro instantiation
idtentry HYPERVISOR_REENLIGHTENMENT_VECTOR asm_sysvec_hyperv_reenlightenment sysvec_hyperv_reenlightenment has_error_code=0
^
./arch/x86/include/asm/idtentry.h:627:1: note: while in macro instantiation
idtentry_sysvec HYPERVISOR_REENLIGHTENMENT_VECTOR sysvec_hyperv_reenlightenment;
^
<instantiation>:9:6: error: expected absolute expression
 .if HYPERVISOR_STIMER0_VECTOR == 3
     ^
<instantiation>:1:1: note: while in macro instantiation
idtentry HYPERVISOR_STIMER0_VECTOR asm_sysvec_hyperv_stimer0 sysvec_hyperv_stimer0 has_error_code=0
^
./arch/x86/include/asm/idtentry.h:628:1: note: while in macro instantiation
idtentry_sysvec HYPERVISOR_STIMER0_VECTOR sysvec_hyperv_stimer0;

This is caused by typos in arch/x86/include/asm/idtentry.h:

HYPERVISOR_REENLIGHTENMENT_VECTOR -> HYPERV_REENLIGHTENMENT_VECTOR
HYPERVISOR_STIMER0_VECTOR         -> HYPERV_STIMER0_VECTOR

For more details see ClangBuiltLinux issue #1088.

Fixes: a16be368 ("x86/entry: Convert various hypervisor vectors to IDTENTRY_SYSVEC")
Suggested-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NSedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
Reviewed-by: NWei Liu <wei.liu@kernel.org>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1088
Link: https://github.com/ClangBuiltLinux/linux/issues/1043
Link: https://lore.kernel.org/patchwork/patch/1272115/
Link: https://lkml.kernel.org/r/20200714194740.4548-1-sedat.dilek@gmail.com

5769fe26

x86/entry: Add compatibility with IAS · 6ee93f8d

由 Jian Cai 提交于 7月 14, 2020

Clang's integrated assembler does not allow symbols with non-absolute
values to be reassigned. Modify the interrupt entry loop macro to be
compatible with IAS by using a label and an offset.
Reported-by: NNick Desaulniers <ndesaulniers@google.com>
Reported-by: NSedat Dilek <sedat.dilek@gmail.com>
Suggested-by: NNick Desaulniers <ndesaulniers@google.com>
Suggested-by: NBrian Gerst <brgerst@gmail.com>
Suggested-by: NArvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: NJian Cai <caij2003@gmail.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> #
Link: https://github.com/ClangBuiltLinux/linux/issues/1043
Link: https://lkml.kernel.org/r/20200714233024.1789985-1-caij2003@gmail.com

6ee93f8d

x86/idtentry: Remove stale comment · 790ce3b4

由 Thomas Gleixner 提交于 7月 14, 2020

Stack switching for interrupt handlers happens in C now for both 64 and
32bit. Remove the stale comment which claims the contrary.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

790ce3b4

14 7月, 2020 1 次提交

irqdomain/treewide: Keep firmware node unconditionally allocated · e3beca48

由 Thomas Gleixner 提交于 7月 09, 2020

Quite some non OF/ACPI users of irqdomains allocate firmware nodes of type
IRQCHIP_FWNODE_NAMED or IRQCHIP_FWNODE_NAMED_ID and free them right after
creating the irqdomain. The only purpose of these FW nodes is to convey
name information. When this was introduced the core code did not store the
pointer to the node in the irqdomain. A recent change stored the firmware
node pointer in irqdomain for other reasons and missed to notice that the
usage sites which do the alloc_fwnode/create_domain/free_fwnode sequence
are broken by this. Storing a dangling pointer is dangerous itself, but in
case that the domain is destroyed later on this leads to a double free.

Remove the freeing of the firmware node after creating the irqdomain from
all affected call sites to cure this.

Fixes: 711419e5 ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode")
Reported-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NMarc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/873661qakd.fsf@nanos.tec.linutronix.de

e3beca48

10 7月, 2020 1 次提交

KVM: nVMX: fixes for preemption timer migration · 83d31e52

由 Paolo Bonzini 提交于 7月 09, 2020

Commit 850448f3 ("KVM: nVMX: Fix VMX preemption timer migration",
2020-06-01) accidentally broke nVMX live migration from older version
by changing the userspace ABI.  Restore it and, while at it, ensure
that vmx->nested.has_preemption_timer_deadline is always initialized
according to the KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE flag.

Cc: Makarand Sonare <makarandsonare@google.com>
Fixes: 850448f3 ("KVM: nVMX: Fix VMX preemption timer migration")
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

83d31e52

09 7月, 2020 3 次提交

x86/entry/common: Make prepare_exit_to_usermode() static · bd87e6f6

由 Thomas Gleixner 提交于 7月 08, 2020

No users outside this file anymore.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NAndy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200708192934.301116609@linutronix.de

bd87e6f6

x86/entry: Mark check_user_regs() noinstr · 006e1ced

由 Thomas Gleixner 提交于 7月 08, 2020

It's called from the non-instrumentable section.

Fixes: c9c26150 ("x86/entry: Assert that syscalls are on the right stack")
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NAndy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200708192934.191497962@linutronix.de

006e1ced

x86/traps: Disable interrupts in exc_aligment_check() · bce9b042

由 Thomas Gleixner 提交于 7月 08, 2020

exc_alignment_check() fails to disable interrupts before returning to the
entry code.

Fixes: ca4c6a98 ("x86/traps: Make interrupt enable/disable symmetric in C code")
Reported-by: syzbot+0889df9502bc0f112b31@syzkaller.appspotmail.com
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NAndy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200708192934.076519438@linutronix.de

bce9b042

07 7月, 2020 1 次提交

x86/entry: Rename idtentry_enter/exit_cond_rcu() to idtentry_enter/exit() · b037b09b

由 Andy Lutomirski 提交于 7月 03, 2020

They were originally called _cond_rcu because they were special versions
with conditional RCU handling. Now they're the standard entry and exit
path, so the _cond_rcu part is just confusing. Drop it.

Also change the signature to make them more extensible and more foolproof.

No functional change -- it's pure refactoring.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/247fc67685263e0b673e1d7f808182d28ff80359.1593795633.git.luto@kernel.org

b037b09b

06 7月, 2020 2 次提交

x86/ldt: use "pr_info_once()" instead of open-coding it badly · bb5a93aa

由 Linus Torvalds 提交于 7月 05, 2020

Using a mutex for "print this warning only once" is so overdesigned as
to be actively offensive to my sensitive stomach.

Just use "pr_info_once()" that already does this, although in a
(harmlessly) racy manner that can in theory cause the message to be
printed twice if more than one CPU races on that "is this the first
time" test.

[ If somebody really cares about that harmless data race (which sounds
  very unlikely indeed), that person can trivially fix printk_once() by
  using a simple atomic access, preferably with an optimistic non-atomic
  test first before even bothering to treat the pointless "make sure it
  is _really_ just once" case.

  A mutex is most definitely never the right primitive to use for
  something like this. ]

Yes, this is a small and meaningless detail in a code path that hardly
matters.  But let's keep some code quality standards here, and not
accept outrageously bad code.

Link: https://lore.kernel.org/lkml/CAHk-=wgV9toS7GU3KmNpj8hCS9SeF+A0voHS8F275_mgLhL4Lw@mail.gmail.com/
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bb5a93aa

x86/entry/32: Fix XEN_PV build dependency · a4c0e91d

由 Ingo Molnar 提交于 7月 05, 2020

xenpv_exc_nmi() and xenpv_exc_debug() are only defined on 64-bit kernels,
but they snuck into the 32-bit build via <asm/identry.h>, causing the link
to fail:

  ld: arch/x86/entry/entry_32.o: in function `asm_xenpv_exc_nmi':
  (.entry.text+0x817): undefined reference to `xenpv_exc_nmi'

  ld: arch/x86/entry/entry_32.o: in function `asm_xenpv_exc_debug':
  (.entry.text+0x827): undefined reference to `xenpv_exc_debug'

Only use them on 64-bit kernels.

Fixes: f41f0824: ("x86/entry/xen: Route #DB correctly on Xen PV")
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

a4c0e91d

05 7月, 2020 5 次提交

x86/ldt: Disable 16-bit segments on Xen PV · cc801833

由 Andy Lutomirski 提交于 7月 03, 2020

Xen PV doesn't implement ESPFIX64, so they don't work right. Disable
them. Also print a warning the first time anyone tries to use a
16-bit segment on a Xen PV guest that would otherwise allow it
to help people diagnose this change in behavior.

This gets us closer to having all x86 selftests pass on Xen PV.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/92b2975459dfe5929ecf34c3896ad920bd9e3f2d.1593795633.git.luto@kernel.org

cc801833

x86/entry/32: Fix #MC and #DB wiring on x86_32 · 13cbc0cd

由 Andy Lutomirski 提交于 7月 03, 2020

DEFINE_IDTENTRY_MCE and DEFINE_IDTENTRY_DEBUG were wired up as non-RAW
on x86_32, but the code expected them to be RAW.

Get rid of all the macro indirection for them on 32-bit and just use
DECLARE_IDTENTRY_RAW and DEFINE_IDTENTRY_RAW directly.

Also add a warning to make sure that we only hit the _kernel paths
in kernel mode.
Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/9e90a7ee8e72fd757db6d92e1e5ff16339c1ecf9.1593795633.git.luto@kernel.org

13cbc0cd

x86/entry/xen: Route #DB correctly on Xen PV · f41f0824

由 Andy Lutomirski 提交于 7月 03, 2020

On Xen PV, #DB doesn't use IST. It still needs to be correctly routed
depending on whether it came from user or kernel mode.

Get rid of DECLARE/DEFINE_IDTENTRY_XEN -- it was too hard to follow the
logic. Instead, route #DB and NMI through DECLARE/DEFINE_IDTENTRY_RAW on
Xen, and do the right thing for #DB. Also add more warnings to the
exc_debug* handlers to make this type of failure more obvious.

This fixes various forms of corruption that happen when usermode
triggers #DB on Xen PV.

Fixes: 4c0dcd83 ("x86/entry: Implement user mode C entry points for #DB and #MCE")
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/4163e733cce0b41658e252c6c6b3464f33fdff17.1593795633.git.luto@kernel.org

f41f0824

x86/entry, selftests: Further improve user entry sanity checks · 3c73b81a

由 Andy Lutomirski 提交于 7月 03, 2020

Chasing down a Xen bug caused me to realize that the new entry sanity
checks are still fairly weak. Add some more checks.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/881de09e786ab93ce56ee4a2437ba2c308afe7a9.1593795633.git.luto@kernel.org

3c73b81a

x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER · db5b2c5a

由 Andy Lutomirski 提交于 7月 03, 2020

Move the clearing of the high bits of RAX after Xen PV joins the SYSENTER
path so that Xen PV doesn't skip it.

Arguably this code should be deleted instead, but that would belong in the
merge window.

Fixes: ffae641f ("x86/entry/64/compat: Fix Xen PV SYSENTER frame setup")
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/9d33b3f3216dcab008070f1c28b6091ae7199969.1593795633.git.luto@kernel.org

db5b2c5a

04 7月, 2020 4 次提交

vmalloc: fix the owner argument for the new __vmalloc_node_range callers · a3a66c38

由 Christoph Hellwig 提交于 7月 03, 2020

Fix the recently added new __vmalloc_node_range callers to pass the
correct values as the owner for display in /proc/vmallocinfo.

Fixes: 800e26b8 ("x86/hyperv: allocate the hypercall page with only read and execute bits")
Fixes: 10d5e97c ("arm64: use PAGE_KERNEL_ROX directly in alloc_insn_page")
Fixes: 7a0e27b2 ("mm: remove vmalloc_exec")
Reported-by: NArd Biesheuvel <ardb@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200627075649.2455097-1-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a3a66c38

KVM: VMX: Use KVM_POSSIBLE_CR*_GUEST_BITS to initialize guest/host masks · fa71e952

由 Sean Christopherson 提交于 7月 02, 2020

Use the "common" KVM_POSSIBLE_CR*_GUEST_BITS defines to initialize the
CR0/CR4 guest host masks instead of duplicating most of the CR4 mask and
open coding the CR0 mask. SVM doesn't utilize the masks, i.e. the masks
are effectively VMX specific even if they're not named as such. This
avoids duplicate code, better documents the guest owned CR0 bit, and
eliminates the need for a build-time assertion to keep VMX and x86
synchronized.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703040422.31536-3-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fa71e952

KVM: x86: Mark CR4.TSD as being possibly owned by the guest · 7c83d096

由 Sean Christopherson 提交于 7月 02, 2020

Mark CR4.TSD as being possibly owned by the guest as that is indeed the
case on VMX.  Without TSD being tagged as possibly owned by the guest, a
targeted read of CR4 to get TSD could observe a stale value.  This bug
is benign in the current code base as the sole consumer of TSD is the
emulator (for RDTSC) and the emulator always "reads" the entirety of CR4
when grabbing bits.

Add a build-time assertion in to ensure VMX doesn't hand over more CR4
bits without also updating x86.

Fixes: 52ce3c21 ("x86,kvm,vmx: Don't trap writes to CR4.TSD")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703040422.31536-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7c83d096

KVM: x86: Inject #GP if guest attempts to toggle CR4.LA57 in 64-bit mode · d74fcfc1

由 Sean Christopherson 提交于 7月 02, 2020

Inject a #GP on MOV CR4 if CR4.LA57 is toggled in 64-bit mode, which is
illegal per Intel's SDM:

  CR4.LA57
    57-bit linear addresses (bit 12 of CR4) ... blah blah blah ...
    This bit cannot be modified in IA-32e mode.

Note, the pseudocode for MOV CR doesn't call out the fault condition,
which is likely why the check was missed during initial development.
This is arguably an SDM bug and will hopefully be fixed in future
release of the SDM.

Fixes: fd8cb433 ("KVM: MMU: Expose the LA57 feature to VM.")
Cc: stable@vger.kernel.org
Reported-by: NSebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703021714.5549-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d74fcfc1

01 7月, 2020 3 次提交

x86/entry/64/compat: Fix Xen PV SYSENTER frame setup · ffae641f

由 Andy Lutomirski 提交于 6月 26, 2020

The SYSENTER frame setup was nonsense. It worked by accident because the
normal code into which the Xen asm jumped (entry_SYSENTER_32/compat) threw
away SP without touching the stack. entry_SYSENTER_compat was recently
modified such that it relied on having a valid stack pointer, so now the
Xen asm needs to invoke it with a valid stack.

Fix it up like SYSCALL: use the Xen-provided frame and skip the bare
metal prologue.

Fixes: 1c3e5d3f ("x86/entry: Make entry_64_compat.S objtool clean")
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lkml.kernel.org/r/947880c41ade688ff4836f665d0c9fcaa9bd1201.1593191971.git.luto@kernel.org

ffae641f

x86/entry: Move SYSENTER's regs->sp and regs->flags fixups into C · d1721250

由 Andy Lutomirski 提交于 6月 26, 2020

The SYSENTER asm (32-bit and compat) contains fixups for regs->sp and
regs->flags. Move the fixups into C and fix some comments while at it.

This is a valid cleanup all by itself, and it also simplifies the
subsequent patch that will fix Xen PV SYSENTER.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/fe62bef67eda7fac75b8f3dbafccf571dc4ece6b.1593191971.git.luto@kernel.org

d1721250

x86/entry: Assert that syscalls are on the right stack · c9c26150

由 Andy Lutomirski 提交于 6月 26, 2020

Now that the entry stack is a full page, it's too easy to regress the
system call entry code and end up on the wrong stack without noticing.
Assert that all system calls (SYSCALL64, SYSCALL32, SYSENTER, and INT80)
are on the right stack and have pt_regs in the right place.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/52059e42bb0ab8551153d012d68f7be18d72ff8e.1593191971.git.luto@kernel.org

c9c26150

30 6月, 2020 1 次提交

x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted · 009bce1d

由 Sean Christopherson 提交于 6月 05, 2020

Choo! Choo!  All aboard the Split Lock Express, with direct service to
Wreckage!

Skip split_lock_verify_msr() if the CPU isn't whitelisted as a possible
SLD-enabled CPU model to avoid writing MSR_TEST_CTRL.  MSR_TEST_CTRL
exists, and is writable, on many generations of CPUs.  Writing the MSR,
even with '0', can result in bizarre, undocumented behavior.

This fixes a crash on Haswell when resuming from suspend with a live KVM
guest.  Because APs use the standard SMP boot flow for resume, they will
go through split_lock_init() and the subsequent RDMSR/WRMSR sequence,
which runs even when sld_state==sld_off to ensure SLD is disabled.  On
Haswell (at least, my Haswell), writing MSR_TEST_CTRL with '0' will
succeed and _may_ take the SMT _sibling_ out of VMX root mode.

When KVM has an active guest, KVM performs VMXON as part of CPU onlining
(see kvm_starting_cpu()).  Because SMP boot is serialized, the resulting
flow is effectively:

  on_each_ap_cpu() {
     WRMSR(MSR_TEST_CTRL, 0)
     VMXON
  }

As a result, the WRMSR can disable VMX on a different CPU that has
already done VMXON.  This ultimately results in a #UD on VMPTRLD when
KVM regains control and attempt run its vCPUs.

The above voodoo was confirmed by reworking KVM's VMXON flow to write
MSR_TEST_CTRL prior to VMXON, and to serialize the sequence as above.
Further verification of the insanity was done by redoing VMXON on all
APs after the initial WRMSR->VMXON sequence.  The additional VMXON,
which should VM-Fail, occasionally succeeded, and also eliminated the
unexpected #UD on VMPTRLD.

The damage done by writing MSR_TEST_CTRL doesn't appear to be limited
to VMX, e.g. after suspend with an active KVM guest, subsequent reboots
almost always hang (even when fudging VMXON), a #UD on a random Jcc was
observed, suspend/resume stability is qualitatively poor, and so on and
so forth.

  kernel BUG at arch/x86/kvm/x86.c:386!
  CPU: 1 PID: 2592 Comm: CPU 6/KVM Tainted: G      D
  Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
  RIP: 0010:kvm_spurious_fault+0xf/0x20
  Call Trace:
   vmx_vcpu_load_vmcs+0x1fb/0x2b0
   vmx_vcpu_load+0x3e/0x160
   kvm_arch_vcpu_load+0x48/0x260
   finish_task_switch+0x140/0x260
   __schedule+0x460/0x720
   _cond_resched+0x2d/0x40
   kvm_arch_vcpu_ioctl_run+0x82e/0x1ca0
   kvm_vcpu_ioctl+0x363/0x5c0
   ksys_ioctl+0x88/0xa0
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x4c/0x170
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: dbaba470 ("x86/split_lock: Rework the initialization flow of split lock detection")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200605192605.7439-1-sean.j.christopherson@intel.com

009bce1d

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功