提交 · bf241682936293291dcf40fd93cdd0f5e6222902 · openeuler / Kernel

22 4月, 2019 9 次提交

csky: Reconstruct signal processing · bf241682

由 Guo Ren 提交于 5年前

Linux kernel has provided some apis for arch signal's implementation.
For example:
	restore_saved_sigmask()
	set_current_blocked()
	restore_altstack()

But in last version of csky signal.c didn't use them and some codes are
confusing, so reconstruct signal.c with reference to riscv's code.

Now csky signal.c implementation are very close to riscv and we can
get the following benefits:
 - Clear code structure
 - The signal code of riscv and csky can be reviewed together
 - Promoting the unification of arch's signal implementation

Also modified the related code in entry.S
Signed-off-by: NGuo Ren <ren_guo@c-sky.com>
Cc: Arnd Bergmann <arnd@arndb.de>

bf241682

csky: Use in_syscall & forget_syscall instead of r11_sig · f4625ee0

由 Guo Ren 提交于 5年前

We could use regs->sr 16-24 bits to detect syscall: VEC_TRAP0 and
r11_sig is no necessary for current implementation.

In this patch, we implement the in_syscall and forget_syscall which are
inspired from arm & nds32, but csky pt_regs has no syscall_num element
and we just set zero to regs->sr's vector-bits-field instead.

For ret_from_fork, current task was forked from parent which is in syscall
progress and its regs->sr has been already setted with VEC_TRAP0. See:
arch/csky/kernel/process.c: copy_thread()
Signed-off-by: NGuo Ren <ren_guo@c-sky.com>

f4625ee0

csky: Add non-uapi asm/ptrace.h namespace · f335b10f

由 Guo Ren 提交于 5年前

Move #ifdef __KERNEL__ code in the uapi namespace to non-uapi
include/asm/ptrace.h namespace and remove #ifdef __KERNEL__ in
include/asm/ptrace.h. Seperate ptrace.h in uapi and non-uapi is more
common and clear.
Signed-off-by: NGuo Ren <ren_guo@c-sky.com>
Cc: Dmitry V. Levin <ldv@altlinux.org>

f335b10f

csky: mm/fault.c: Remove duplicate header · ce63cd5b

由 Jagadeesh Pagadala 提交于 5年前

Remove duplicate header which is included twice.
Signed-off-by: NJagadeesh Pagadala <jagdsh.linux@gmail.com>
Signed-off-by: NGuo Ren <ren_guo@c-sky.com>

ce63cd5b

csky: remove redundant generic-y · 1b2707fb

由 Masahiro Yamada 提交于 5年前

Since commit 7cbbbb8b ("kbuild: warn redundant generic-y"),
redundant generic-y is reported. I missed to delete this one.

scripts/Makefile.asm-generic:25: redundant generic-y found in arch/csky/include/asm/Kbuild: ftrace.h

In this case, csky-specific implementation exists in
arch/csky/include/asm/ftrace.h
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NGuo Ren <ren_guo@c-sky.com>

1b2707fb

csky: Update syscall_trace_enter/exit implementation · 2f7932b0

由 Guo Ren 提交于 5年前

Previous syscall_trace implementation couldn't support AUDITSYSCALL and
SYSCALL_TRACEPOINTS. Now we redesign it to support audit_syscall
and syscall_tracepoints just like other archs'.
Signed-off-by: NGuo Ren <ren_guo@c-sky.com>
Cc: Dmitry V. Levin <ldv@altlinux.org>
Cc: Arnd Bergmann <arnd@arndb.de>

2f7932b0

csky: Add perf callchain support · cfa4d93b

由 Mao Han 提交于 6年前

This patch add support for perf callchain sampling on csky platform.
As fp is used to unwind the stack, the program being sampled and the
C library need to be compiled with -mbacktrace for user callchains,
kernel callchains require CONFIG_STACKTRACE = y.

Changelog:
 - Coding convention with Christoph's advice for riscv's.
Signed-off-by: NMao Han <han_mao@c-sky.com>
Signed-off-by: NGuo Ren <ren_guo@c-sky.com>
Cc: Christoph Hellwig <hch@infradead.org>

cfa4d93b

csky/ftrace: Add dynamic function tracer (include graph tracer) · 28bb030f

由 Guo Ren 提交于 5年前

Support dynamic ftrace including dynamic graph tracer. Gcc-csky with -pg
will produce call site in every function prologue and we can use these
call site to hook trace function.

gcc with -pg origin call site:
	push	lr
	jbsr	_mcount
	nop32
	nop32

If the (callee - caller)'s offset is in range of bsr instruction, we'll
modify code with:
	push	lr
	bsr	_mcount
	nop32
	nop32
Else if the (callee - caller)'s offset is out of bsr instrunction, we'll
modify code with:
	push	lr
	movih	r26, ...
	ori	r26, ...
	jsr	r26

(r26 is reserved for jsr link reg in csky abiv2 spec.)
Signed-off-by: NGuo Ren <ren_guo@c-sky.com>

28bb030f

csky: Fixup vdsp&fpu issues in kernel · 3dfc242f

由 Guo Ren 提交于 6年前

This fixup is continue to commit 35ff802a (csky: fixup remove
vdsp implement for kernel.) and in that patch I didn't finish the
job. We must forbid gcc to generate any vdsp & fpu instructions
and remove vdsp asm in memmove.S.

eg: For GCC it's -mcpu=ck860 and For AS it's -Wa,-mcpu=ck860fv
Signed-off-by: NGuo Ren <ren_guo@c-sky.com>

3dfc242f

20 4月, 2019 1 次提交

x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority · 2ee27796

由 Hans de Goede 提交于 6年前

The "ENERGY_PERF_BIAS: Set to 'normal', was 'performance'" message triggers
on pretty much every Intel machine. The purpose of log messages with
a warning level is to notify the user of something which potentially is
a problem, or at least somewhat unexpected.

This message clearly does not match those criteria, so lower its log
priority from warning to info.
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181230172715.17469-1-hdegoede@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

2ee27796

19 4月, 2019 4 次提交

x86/cpu/bugs: Use __initconst for 'const' init data · 1de7edbb

由 Andi Kleen 提交于 5年前

Some of the recently added const tables use __initdata which causes section
attribute conflicts.

Use __initconst instead.

Fixes: fa1202ef ("x86/speculation: Add command line control")
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190330004743.29541-9-andi@firstfloor.org

1de7edbb

x86/kprobes: Avoid kretprobe recursion bug · b191fa96

由 Masami Hiramatsu 提交于 5年前

Avoid kretprobe recursion loop bg by setting a dummy
kprobes to current_kprobe per-CPU variable.

This bug has been introduced with the asm-coded trampoline
code, since previously it used another kprobe for hooking
the function return placeholder (which only has a nop) and
trampoline handler was called from that kprobe.

This revives the old lost kprobe again.

With this fix, we don't see deadlock anymore.

And you can see that all inner-called kretprobe are skipped.

  event_1                                  235               0
  event_2                                19375           19612

The 1st column is recorded count and the 2nd is missed count.
Above shows (event_1 rec) + (event_2 rec) ~= (event_2 missed)
(some difference are here because the counter is racy)
Reported-by: NAndrea Righi <righi.andrea@gmail.com>
Tested-by: NAndrea Righi <righi.andrea@gmail.com>
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: c9becf58 ("[PATCH] kretprobe: kretprobe-booster")
Link: http://lkml.kernel.org/r/155094064889.6137.972160690963039.stgit@devboxSigned-off-by: NIngo Molnar <mingo@kernel.org>

b191fa96

x86/kprobes: Verify stack frame on kretprobe · 3ff9c075

由 Masami Hiramatsu 提交于 5年前

Verify the stack frame pointer on kretprobe trampoline handler,
If the stack frame pointer does not match, it skips the wrong
entry and tries to find correct one.

This can happen if user puts the kretprobe on the function
which can be used in the path of ftrace user-function call.
Such functions should not be probed, so this adds a warning
message that reports which function should be blacklisted.
Tested-by: NAndrea Righi <righi.andrea@gmail.com>
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/155094059185.6137.15527904013362842072.stgit@devboxSigned-off-by: NIngo Molnar <mingo@kernel.org>

3ff9c075

arm64: futex: Restore oldval initialization to work around buggy compilers · ff8acf92

由 Nathan Chancellor 提交于 5年前

Commit 045afc24 ("arm64: futex: Fix FUTEX_WAKE_OP atomic ops with
non-zero result value") removed oldval's zero initialization in
arch_futex_atomic_op_inuser because it is not necessary. Unfortunately,
Android's arm64 GCC 4.9.4 [1] does not agree:

../kernel/futex.c: In function 'do_futex':
../kernel/futex.c:1658:17: warning: 'oldval' may be used uninitialized
in this function [-Wmaybe-uninitialized]
   return oldval == cmparg;
                 ^
In file included from ../kernel/futex.c:73:0:
../arch/arm64/include/asm/futex.h:53:6: note: 'oldval' was declared here
  int oldval, ret, tmp;
      ^

GCC fails to follow that when ret is non-zero, futex_atomic_op_inuser
returns right away, avoiding the uninitialized use that it claims.
Restoring the zero initialization works around this issue.

[1]: https://android.googlesource.com/platform/prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/

Cc: stable@vger.kernel.org
Fixes: 045afc24 ("arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value")
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

ff8acf92

18 4月, 2019 2 次提交

perf/x86/amd: Add event map for AMD Family 17h · 3fe3331b

由 Kim Phillips 提交于 5年前

Family 17h differs from prior families by:

 - Does not support an L2 cache miss event
 - It has re-enumerated PMC counters for:
   - L2 cache references
   - front & back end stalled cycles

So we add a new amd_f17h_perfmon_event_map[] so that the generic
perf event names will resolve to the correct h/w events on
family 17h and above processors.

Reference sections 2.1.13.3.3 (stalls) and 2.1.13.3.6 (L2):

  https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdfSigned-off-by: NKim Phillips <kim.phillips@amd.com>
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: e40ed154 ("perf/x86: Add perf support for AMD family-17h processors")
[ Improved the formatting a bit. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

3fe3331b

x86/mm/KASLR: Fix the size of the direct mapping section · ec393710

由 Baoquan He 提交于 5年前

kernel_randomize_memory() uses __PHYSICAL_MASK_SHIFT to calculate
the maximum amount of system RAM supported. The size of the direct
mapping section is obtained from the smaller one of the below two
values:

  (actual system RAM size + padding size) vs (max system RAM size supported)

This calculation is wrong since commit

  b83ce5ee ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52").

In it, __PHYSICAL_MASK_SHIFT was changed to be 52, regardless of whether
the kernel is using 4-level or 5-level page tables. Thus, it will always
use 4 PB as the maximum amount of system RAM, even in 4-level paging
mode where it should actually be 64 TB.

Thus, the size of the direct mapping section will always
be the sum of the actual system RAM size plus the padding size.

Even when the amount of system RAM is 64 TB, the following layout will
still be used. Obviously KALSR will be weakened significantly.

   |____|_______actual RAM_______|_padding_|______the rest_______|
   0            64TB                                            ~120TB

Instead, it should be like this:

   |____|_______actual RAM_______|_________the rest______________|
   0            64TB                                            ~120TB

The size of padding region is controlled by
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING, which is 10 TB by default.

The above issue only exists when
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING is set to a non-zero value,
which is the case when CONFIG_MEMORY_HOTPLUG is enabled. Otherwise,
using __PHYSICAL_MASK_SHIFT doesn't affect KASLR.

Fix it by replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS.

 [ bp: Massage commit message. ]

Fixes: b83ce5ee ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
Signed-off-by: NBaoquan He <bhe@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NThomas Garnier <thgarnie@google.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: frank.ramsay@hpe.com
Cc: herbert@gondor.apana.org.au
Cc: kirill@shutemov.name
Cc: mike.travis@hpe.com
Cc: thgarnie@google.com
Cc: x86-ml <x86@kernel.org>
Cc: yamada.masahiro@socionext.com
Link: https://lkml.kernel.org/r/20190417083536.GE7065@MiWiFi-R3L-srv

ec393710

17 4月, 2019 1 次提交

s390: correct some inline assembly constraints · 35af0d46

由 Vasily Gorbik 提交于 5年前

Inline assembly code changed in this patch should really use "Q"
constraint "Memory reference without index register and with short
displacement". The kernel build with kasan instrumentation enabled
might occasionally break otherwise (due to stack instrumentation).
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

35af0d46

16 4月, 2019 23 次提交

KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing · 7a223e06

由 Vitaly Kuznetsov 提交于 5年前

In __apic_accept_irq() interface trig_mode is int and actually on some code
paths it is set above u8:

kvm_apic_set_irq() extracts it from 'struct kvm_lapic_irq' where trig_mode
is u16. This is done on purpose as e.g. kvm_set_msi_irq() sets it to
(1 << 15) & e->msi.data

kvm_apic_local_deliver sets it to reg & (1 << 15).

Fix the immediate issue by making 'tm' into u16. We may also want to adjust
__apic_accept_irq() interface and use proper sizes for vector, level,
trig_mode but this is not urgent.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7a223e06

KVM: fix spectrev1 gadgets · 1d487e9b

由 Paolo Bonzini 提交于 5年前

These were found with smatch, and then generalized when applicable.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1d487e9b

KVM: x86: fix warning Using plain integer as NULL pointer · be43c440

由 Hariprasad Kelam 提交于 5年前

Changed passing argument as "0 to NULL" which resolves below sparse warning

arch/x86/kvm/x86.c:3096:61: warning: Using plain integer as NULL pointer
Signed-off-by: NHariprasad Kelam <hariprasad.kelam@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

be43c440

KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels · b68f3cc7

由 Sean Christopherson 提交于 5年前

Invoking the 64-bit variation on a 32-bit kenrel will crash the guest,
trigger a WARN, and/or lead to a buffer overrun in the host, e.g.
rsm_load_state_64() writes r8-r15 unconditionally, but enum kvm_reg and
thus x86_emulate_ctxt._regs only define r8-r15 for CONFIG_X86_64.

KVM allows userspace to report long mode support via CPUID, even though
the guest is all but guaranteed to crash if it actually tries to enable
long mode. But, a pure 32-bit guest that is ignorant of long mode will
happily plod along.

SMM complicates things as 64-bit CPUs use a different SMRAM save state
area. KVM handles this correctly for 64-bit kernels, e.g. uses the
legacy save state map if userspace has hid long mode from the guest,
but doesn't fare well when userspace reports long mode support on a
32-bit host kernel (32-bit KVM doesn't support 64-bit guests).

Since the alternative is to crash the guest, e.g. by not loading state
or explicitly requesting shutdown, unconditionally use the legacy SMRAM
save state map for 32-bit KVM. If a guest has managed to get far enough
to handle SMIs when running under a weird/buggy userspace hypervisor,
then don't deliberately crash the guest since there are no downsides
(from KVM's perspective) to allow it to continue running.

Fixes: 660a5d51 ("KVM: x86: save/load state on SMM switch")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b68f3cc7

KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU · 8f4dc2e7

由 Sean Christopherson 提交于 5年前

Neither AMD nor Intel CPUs have an EFER field in the legacy SMRAM save
state area, i.e. don't save/restore EFER across SMM transitions. KVM
somewhat models this, e.g. doesn't clear EFER on entry to SMM if the
guest doesn't support long mode. But during RSM, KVM unconditionally
clears EFER so that it can get back to pure 32-bit mode in order to
start loading CRs with their actual non-SMM values.

Clear EFER only when it will be written when loading the non-SMM state
so as to preserve bits that can theoretically be set on 32-bit vCPUs,
e.g. KVM always emulates EFER_SCE.

And because CR4.PAE is cleared only to play nice with EFER, wrap that
code in the long mode check as well. Note, this may result in a
compiler warning about cr4 being consumed uninitialized. Re-read CR4
even though it's technically unnecessary, as doing so allows for more
readable code and RSM emulation is not a performance critical path.

8f4dc2e7

KVM: x86: clear SMM flags before loading state while leaving SMM · 9ec19493

由 Sean Christopherson 提交于 5年前

RSM emulation is currently broken on VMX when the interrupted guest has
CR4.VMXE=1.  Stop dancing around the issue of HF_SMM_MASK being set when
loading SMSTATE into architectural state, e.g. by toggling it for
problematic flows, and simply clear HF_SMM_MASK prior to loading
architectural state (from SMRAM save state area).
Reported-by: NJon Doron <arilou@gmail.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Fixes: 5bea5123 ("KVM: VMX: check nested state and CR4.VMXE against SMM")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9ec19493

KVM: x86: Open code kvm_set_hflags · c5833c7a

由 Sean Christopherson 提交于 5年前

Prepare for clearing HF_SMM_MASK prior to loading state from the SMRAM
save state map, i.e. kvm_smm_changed() needs to be called after state
has been loaded and so cannot be done automatically when setting
hflags from RSM.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c5833c7a

KVM: x86: Load SMRAM in a single shot when leaving SMM · ed19321f

由 Sean Christopherson 提交于 5年前

RSM emulation is currently broken on VMX when the interrupted guest has
CR4.VMXE=1.  Rather than dance around the issue of HF_SMM_MASK being set
when loading SMSTATE into architectural state, ideally RSM emulation
itself would be reworked to clear HF_SMM_MASK prior to loading non-SMM
architectural state.

Ostensibly, the only motivation for having HF_SMM_MASK set throughout
the loading of state from the SMRAM save state area is so that the
memory accesses from GET_SMSTATE() are tagged with role.smm.  Load
all of the SMRAM save state area from guest memory at the beginning of
RSM emulation, and load state from the buffer instead of reading guest
memory one-by-one.

This paves the way for clearing HF_SMM_MASK prior to loading state,
and also aligns RSM with the enter_smm() behavior, which fills a
buffer and writes SMRAM save state in a single go.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ed19321f

KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU · e51bfdb6

由 Liran Alon 提交于 5年前

Issue was discovered when running kvm-unit-tests on KVM running as L1 on
top of Hyper-V.

When vmx_instruction_intercept unit-test attempts to run RDPMC to test
RDPMC-exiting, it is intercepted by L1 KVM which it's EXIT_REASON_RDPMC
handler raise #GP because vCPU exposed by Hyper-V doesn't support PMU.
Instead of unit-test expectation to be reflected with EXIT_REASON_RDPMC.

The reason vmx_instruction_intercept unit-test attempts to run RDPMC
even though Hyper-V doesn't support PMU is because L1 expose to L2
support for RDPMC-exiting. Which is reasonable to assume that is
supported only in case CPU supports PMU to being with.

Above issue can easily be simulated by modifying
vmx_instruction_intercept config in x86/unittests.cfg to run QEMU with
"-cpu host,+vmx,-pmu" and run unit-test.

To handle issue, change KVM to expose RDPMC-exiting only when guest
supports PMU.
Reported-by: NSaar Amar <saaramar@microsoft.com>
Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e51bfdb6

KVM: x86: Raise #GP when guest vCPU do not support PMU · 672ff6cf

由 Liran Alon 提交于 5年前

Before this change, reading a VMware pseduo PMC will succeed even when
PMU is not supported by guest. This can easily be seen by running
kvm-unit-test vmware_backdoors with "-cpu host,-pmu" option.
Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

672ff6cf

x86/kvm: move kvm_load/put_guest_xcr0 into atomic context · 1811d979

由 WANG Chao 提交于 5年前

guest xcr0 could leak into host when MCE happens in guest mode. Because
do_machine_check() could schedule out at a few places.

For example:

kvm_load_guest_xcr0
...
kvm_x86_ops->run(vcpu) {
  vmx_vcpu_run
    vmx_complete_atomic_exit
      kvm_machine_check
        do_machine_check
          do_memory_failure
            memory_failure
              lock_page

In this case, host_xcr0 is 0x2ff, guest vcpu xcr0 is 0xff. After schedule
out, host cpu has guest xcr0 loaded (0xff).

In __switch_to {
     switch_fpu_finish
       copy_kernel_to_fpregs
         XRSTORS

If any bit i in XSTATE_BV[i] == 1 and xcr0[i] == 0, XRSTORS will
generate #GP (In this case, bit 9). Then ex_handler_fprestore kicks in
and tries to reinitialize fpu by restoring init fpu state. Same story as
last #GP, except we get DOUBLE FAULT this time.

Cc: stable@vger.kernel.org
Signed-off-by: NWANG Chao <chao.wang@ucloud.cn>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1811d979

KVM: x86: svm: make sure NMI is injected after nmi_singlestep · 99c22179

由 Vitaly Kuznetsov 提交于 5年前

I noticed that apic test from kvm-unit-tests always hangs on my EPYC 7401P,
the hanging test nmi-after-sti is trying to deliver 30000 NMIs and tracing
shows that we're sometimes able to deliver a few but never all.

When we're trying to inject an NMI we may fail to do so immediately for
various reasons, however, we still need to inject it so enable_nmi_window()
arms nmi_singlestep mode. #DB occurs as expected, but we're not checking
for pending NMIs before entering the guest and unless there's a different
event to process, the NMI will never get delivered.

Make KVM_REQ_EVENT request on the vCPU from db_interception() to make sure
pending NMIs are checked and possibly injected.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

99c22179

svm/avic: Fix invalidate logical APIC id entry · e44e3eac

由 Suthikulpanit, Suravee 提交于 5年前

Only clear the valid bit when invalidate logical APIC id entry.
The current logic clear the valid bit, but also set the rest of
the bits (including reserved bits) to 1.

Fixes: 98d90582 ('svm: Fix AVIC DFR and LDR handling')
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e44e3eac

Revert "svm: Fix AVIC incomplete IPI emulation" · 4a58038b

由 Suthikulpanit, Suravee 提交于 5年前

This reverts commit bb218fbc.

As Oren Twaig pointed out the old discussion:

  https://patchwork.kernel.org/patch/8292231/

that the change coud potentially cause an extra IPI to be sent to
the destination vcpu because the AVIC hardware already set the IRR bit
before the incomplete IPI #VMEXIT with id=1 (target vcpu is not running).
Since writting to ICR and ICR2 will also set the IRR. If something triggers
the destination vcpu to get scheduled before the emulation finishes, then
this could result in an additional IPI.

Also, the issue mentioned in the commit bb218fbc was misdiagnosed.

Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: NOren Twaig <oren@scalemp.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a58038b

kvm: mmu: Fix overflow on kvm mmu page limit calculation · bc8a3d89

由 Ben Gardon 提交于 5年前

KVM bases its memory usage limits on the total number of guest pages
across all memslots. However, those limits, and the calculations to
produce them, use 32 bit unsigned integers. This can result in overflow
if a VM has more guest pages that can be represented by a u32. As a
result of this overflow, KVM can use a low limit on the number of MMU
pages it will allocate. This makes KVM unable to map all of guest memory
at once, prompting spurious faults.

Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch
	introduced no new failures.
Signed-off-by: NBen Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bc8a3d89

KVM: nVMX: always use early vmcs check when EPT is disabled · 2b27924b

由 Paolo Bonzini 提交于 5年前

The remaining failures of vmx.flat when EPT is disabled are caused by
incorrectly reflecting VMfails to the L1 hypervisor.  What happens is
that nested_vmx_restore_host_state corrupts the guest CR3, reloading it
with the host's shadow CR3 instead, because it blindly loads GUEST_CR3
from the vmcs01.

For simplicity let's just always use hardware VMCS checks when EPT is
disabled.  This way, nested_vmx_restore_host_state is not reached at
all (or at least shouldn't be reached).
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2b27924b

x86/Kconfig: Fix spelling mistake "effectivness" -> "effectiveness" · a943245a

由 Colin Ian King 提交于 5年前

The Kconfig text contains a spelling mistake, fix it.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/20190416105751.18899-1-colin.king@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

a943245a

perf/x86: Fix incorrect PEBS_REGS · 9d5dcc93

由 Kan Liang 提交于 5年前

PEBS_REGS used as mask for the supported registers for large PEBS.
However, the mask cannot filter the sample_regs_user/sample_regs_intr
correctly.

(1ULL << PERF_REG_X86_*) should be used to replace PERF_REG_X86_*, which
is only the index.

Rename PEBS_REGS to PEBS_GP_REGS, because the mask is only for general
purpose registers.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Fixes: 2fe1bc1f ("perf/x86: Enable free running PEBS for REGS_USER/INTR")
Link: https://lkml.kernel.org/r/20190402194509.2832-2-kan.liang@linux.intel.com
[ Renamed it to PEBS_GP_REGS - as 'GPRS' is used elsewhere ;-) ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

9d5dcc93

KVM: nVMX: allow tests to use bad virtual-APIC page address · 69090810

由 Paolo Bonzini 提交于 5年前

As mentioned in the comment, there are some special cases where we can simply
clear the TPR shadow bit from the CPU-based execution controls in the vmcs02.
Handle them so that we can remove some XFAILs from vmx.flat.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

69090810

x86/mm/tlb: Revert "x86/mm: Align TLB invalidation info" · 780e0106

由 Peter Zijlstra 提交于 5年前

Revert the following commit:

  515ab7c4: ("x86/mm: Align TLB invalidation info")

I found out (the hard way) that under some .config options (notably L1_CACHE_SHIFT=7)
and compiler combinations this on-stack alignment leads to a 320 byte
stack usage, which then triggers a KASAN stack warning elsewhere.

Using 320 bytes of stack space for a 40 byte structure is ludicrous and
clearly not right.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Acked-by: NNadav Amit <namit@vmware.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 515ab7c4 ("x86/mm: Align TLB invalidation info")
Link: http://lkml.kernel.org/r/20190416080335.GM7905@worktop.programming.kicks-ass.net
[ Minor changelog edits. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

780e0106

x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51T · 0082517f

由 Jian-Hong Pan 提交于 5年前

Upon reboot, the Acer TravelMate X514-51T laptop appears to complete the
shutdown process, but then it hangs in BIOS POST with a black screen.

The problem is intermittent - at some points it has appeared related to
Secure Boot settings or different kernel builds, but ultimately we have
not been able to identify the exact conditions that trigger the issue to
come and go.

Besides, the EFI mode cannot be disabled in the BIOS of this model.

However, after extensive testing, we observe that using the EFI reboot
method reliably avoids the issue in all cases.

So add a boot time quirk to use EFI reboot on such systems.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=203119Signed-off-by: NJian-Hong Pan <jian-hong@endlessm.com>
Signed-off-by: NDaniel Drake <drake@endlessm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Cc: linux@endlessm.com
Link: http://lkml.kernel.org/r/20190412080152.3718-1-jian-hong@endlessm.com
[ Fix !CONFIG_EFI build failure, clarify the code and the changelog a bit. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

0082517f

x86/mm: Prevent bogus warnings with "noexec=off" · 510bb96f

由 Thomas Gleixner 提交于 5年前

Xose Vazquez Perez reported boot warnings when NX is disabled on the kernel command line.

__early_set_fixmap() triggers this warning:

  attempted to set unsupported pgprot:    8000000000000163
			       bits:      8000000000000000
			       supported: 7fffffffffffffff

  WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/pgtable.h:537
			    __early_set_fixmap+0xa2/0xff

because it uses __default_kernel_pte_mask to mask out unsupported bits.

Use __supported_pte_mask instead.

Disabling NX on the command line also triggers the NX warning in the page
table mapping check:

  WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:262 note_page+0x2ae/0x650
  ....

Make the warning depend on NX set in __supported_pte_mask.
Reported-by: NXose Vazquez Perez <xose.vazquez@gmail.com>
Tested-by: NXose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1904151037530.1729@nanos.tec.linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

510bb96f

x86/build/lto: Fix truncated .bss with -fdata-sections · 6a03469a

由 Sami Tolvanen 提交于 5年前

With CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y, we compile the kernel with
-fdata-sections, which also splits the .bss section.

The new section, with a new .bss.* name, which pattern gets missed by the
main x86 linker script which only expects the '.bss' name. This results
in the discarding of the second part and a too small, truncated .bss
section and an unhappy, non-working kernel.

Use the common BSS_MAIN macro in the linker script to properly capture
and merge all the generated BSS sections.
Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190415164956.124067-1-samitolvanen@google.com
[ Extended the changelog. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

6a03469a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功