提交 · 729c15c20f1a7c9ad1d09a603ad1aa7fb60b1f88 · openeuler / Kernel

28 9月, 2020 14 次提交

KVM: x86: rename KVM_REQ_GET_VMCS12_PAGES · 729c15c2

由 Paolo Bonzini 提交于 9月 22, 2020

We are going to use it for SVM too, so use a more generic name.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

729c15c2

KVM: x86: Introduce MSR filtering · 1a155254

由 Alexander Graf 提交于 9月 25, 2020

It's not desireable to have all MSRs always handled by KVM kernel space. Some
MSRs would be useful to handle in user space to either emulate behavior (like
uCode updates) or differentiate whether they are valid based on the CPU model.

To allow user space to specify which MSRs it wants to see handled by KVM,
this patch introduces a new ioctl to push filter rules with bitmaps into
KVM. Based on these bitmaps, KVM can then decide whether to reject MSR access.
With the addition of KVM_CAP_X86_USER_SPACE_MSR it can also deflect the
denied MSR events to user space to operate on.

If no filter is populated, MSR handling stays identical to before.
Signed-off-by: NAlexander Graf <graf@amazon.com>

Message-Id: <20200925143422.21718-8-graf@amazon.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1a155254

KVM: x86: Add infrastructure for MSR filtering · 51de8151

由 Alexander Graf 提交于 9月 25, 2020

In the following commits we will add pieces of MSR filtering.
To ensure that code compiles even with the feature half-merged, let's add
a few stubs and struct definitions before the real patches start.
Signed-off-by: NAlexander Graf <graf@amazon.com>

Message-Id: <20200925143422.21718-4-graf@amazon.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

51de8151

KVM: x86: Allow deflecting unknown MSR accesses to user space · 1ae09954

由 Alexander Graf 提交于 9月 25, 2020

MSRs are weird. Some of them are normal control registers, such as EFER.
Some however are registers that really are model specific, not very
interesting to virtualization workloads, and not performance critical.
Others again are really just windows into package configuration.

Out of these MSRs, only the first category is necessary to implement in
kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against
certain CPU models and MSRs that contain information on the package level
are much better suited for user space to process. However, over time we have
accumulated a lot of MSRs that are not the first category, but still handled
by in-kernel KVM code.

This patch adds a generic interface to handle WRMSR and RDMSR from user
space. With this, any future MSR that is part of the latter categories can
be handled in user space.

Furthermore, it allows us to replace the existing "ignore_msrs" logic with
something that applies per-VM rather than on the full system. That way you
can run productive VMs in parallel to experimental ones where you don't care
about proper MSR handling.
Signed-off-by: NAlexander Graf <graf@amazon.com>
Reviewed-by: NJim Mattson <jmattson@google.com>

Message-Id: <20200925143422.21718-3-graf@amazon.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1ae09954

KVM: x86: Rename "shared_msrs" to "user_return_msrs" · 7e34fbd0

由 Sean Christopherson 提交于 9月 23, 2020

Rename the "shared_msrs" mechanism, which is used to defer restoring
MSRs that are only consumed when running in userspace, to a more banal
but less likely to be confusing "user_return_msrs".

The "shared" nomenclature is confusing as it's not obvious who is
sharing what, e.g. reasonable interpretations are that the guest value
is shared by vCPUs in a VM, or that the MSR value is shared/common to
guest and host, both of which are wrong.

"shared" is also misleading as the MSR value (in hardware) is not
guaranteed to be shared/reused between VMs (if that's indeed the correct
interpretation of the name), as the ability to share values between VMs
is simply a side effect (albiet a very nice side effect) of deferring
restoration of the host value until returning from userspace.

"user_return" avoids the above confusion by describing the mechanism
itself instead of its effects.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7e34fbd0

KVM: x86: Add intr/vectoring info and error code to kvm_exit tracepoint · 235ba74f

由 Sean Christopherson 提交于 9月 23, 2020

Extend the kvm_exit tracepoint to align it with kvm_nested_vmexit in
terms of what information is captured. On SVM, add interrupt info and
error code, while on VMX it add IDT vectoring and error code. This
sets the stage for macrofying the kvm_exit tracepoint definition so that
it can be reused for kvm_nested_vmexit without loss of information.

Opportunistically stuff a zero for VM_EXIT_INTR_INFO if the VM-Enter
failed, as the field is guaranteed to be invalid. Note, it'd be
possible to further filter the interrupt/exception fields based on the
VM-Exit reason, but the helper is intended only for tracepoints, i.e.
an extra VMREAD or two is a non-issue, the failed VM-Enter case is just
low hanging fruit.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923201349.16097-5-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

235ba74f

KVM: VMX: Rename RDTSCP secondary exec control name to insert "ENABLE" · 7f3603b6

由 Sean Christopherson 提交于 9月 23, 2020

Rename SECONDARY_EXEC_RDTSCP to SECONDARY_EXEC_ENABLE_RDTSCP in
preparation for consolidating the logic for adjusting secondary exec
controls based on the guest CPUID model.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923165048.20486-4-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7f3603b6

KVM: x86: Add kvm_x86_ops hook to short circuit emulation · 09e3e2a1

由 Sean Christopherson 提交于 9月 15, 2020

Replace the existing kvm_x86_ops.need_emulation_on_page_fault() with a
more generic is_emulatable(), and unconditionally call the new function
in x86_emulate_instruction().

KVM will use the generic hook to support multiple security related
technologies that prevent emulation in one way or another. Similar to
the existing AMD #NPF case where emulation of the current instruction is
not possible due to lack of information, AMD's SEV-ES and Intel's SGX
and TDX will introduce scenarios where emulation is impossible due to
the guest's register state being inaccessible. And again similar to the
existing #NPF case, emulation can be initiated by kvm_mmu_page_fault(),
i.e. outside of the control of vendor-specific code.

While the cause and architecturally visible behavior of the various
cases are different, e.g. SGX will inject a #UD, AMD #NPF is a clean
resume or complete shutdown, and SEV-ES and TDX "return" an error, the
impact on the common emulation code is identical: KVM must stop
emulation immediately and resume the guest.

Query is_emulatable() in handle_ud() as well so that the
force_emulation_prefix code doesn't incorrectly modify RIP before
calling emulate_instruction() in the absurdly unlikely scenario that
KVM encounters forced emulation in conjunction with "do not emulate".

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200915232702.15945-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

09e3e2a1

KVM: SVM: Add new intercept word in vmcb_control_area · 4c44e8d6

由 Babu Moger 提交于 9月 11, 2020

The new intercept bits have been added in vmcb control area to support
few more interceptions. Here are the some of them.
 - INTERCEPT_INVLPGB,
 - INTERCEPT_INVLPGB_ILLEGAL,
 - INTERCEPT_INVPCID,
 - INTERCEPT_MCOMMIT,
 - INTERCEPT_TLBSYNC,

Add a new intercept word in vmcb_control_area to support these instructions.
Also update kvm_nested_vmrun trace function to support the new addition.

AMD documentation for these instructions is available at "AMD64
Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593
Rev. 3.34(or later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985251547.11252.16994139329949066945.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4c44e8d6

KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors · c62e2e94

由 Babu Moger 提交于 9月 11, 2020

Convert all the intercepts to one array of 32 bit vectors in
vmcb_control_area. This makes it easy for future intercept vector
additions. Also update trace functions.
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985250813.11252.5736581193881040525.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c62e2e94

KVM: SVM: Modify intercept_exceptions to generic intercepts · 9780d51d

由 Babu Moger 提交于 9月 11, 2020

Modify intercept_exceptions to generic intercepts in vmcb_control_area. Use
the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to
set/clear/test the intercept_exceptions bits.
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985250037.11252.1361972528657052410.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9780d51d

KVM: SVM: Change intercept_dr to generic intercepts · 30abaa88

由 Babu Moger 提交于 9月 11, 2020

Modify intercept_dr to generic intercepts in vmcb_control_area. Use
the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept
to set/clear/test the intercept_dr bits.
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985249255.11252.10000868032136333355.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

30abaa88

KVM: SVM: Change intercept_cr to generic intercepts · 03bfeeb9

由 Babu Moger 提交于 9月 11, 2020

Change intercept_cr to generic intercepts in vmcb_control_area.
Use the new vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept
where applicable.
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985248506.11252.9081085950784508671.stgit@bmoger-ubuntu>
[Change constant names. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

03bfeeb9

KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept) · c45ad722

由 Babu Moger 提交于 9月 11, 2020

This is in preparation for the future intercept vector additions.

Add new functions vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept
using kernel APIs __set_bit, __clear_bit and test_bit espectively.
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <159985247876.11252.16039238014239824460.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c45ad722

08 9月, 2020 3 次提交

KVM: SVM: Use __packed shorthand · 976bc5e2

由 Borislav Petkov 提交于 9月 07, 2020

Use the shorthand to make it more readable.

No functional changes.
Signed-off-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-5-joro@8bytes.org

976bc5e2

KVM: SVM: Add GHCB Accessor functions · 3702c2f4

由 Joerg Roedel 提交于 9月 07, 2020

Building a correct GHCB for the hypervisor requires setting valid bits
in the GHCB. Simplify that process by providing accessor functions to
set values and to update the valid bitmap and to check the valid bitmap
in KVM.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-4-joro@8bytes.org

3702c2f4

KVM: SVM: Add GHCB definitions · d07f46f9

由 Tom Lendacky 提交于 9月 07, 2020

Extend the vmcb_safe_area with SEV-ES fields and add a new
'struct ghcb' which will be used for guest-hypervisor communication.
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-3-joro@8bytes.org

d07f46f9

04 9月, 2020 2 次提交

x86/entry: Fix AC assertion · 662a0221

由 Peter Zijlstra 提交于 9月 02, 2020

The WARN added in commit 3c73b81a ("x86/entry, selftests: Further
improve user entry sanity checks") unconditionally triggers on a IVB
machine because it does not support SMAP.

For !SMAP hardware the CLAC/STAC instructions are patched out and thus if
userspace sets AC, it is still have set after entry.

Fixes: 3c73b81a ("x86/entry, selftests: Further improve user entry sanity checks")
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NDaniel Thompson <daniel.thompson@linaro.org>
Acked-by: NAndy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200902133200.666781610@infradead.org

662a0221

tracing/kprobes, x86/ptrace: Fix regs argument order for i386 · 2356bb4b

由 Vamshi K Sthambamkadi 提交于 8月 28, 2020

On i386, the order of parameters passed on regs is eax,edx,and ecx
(as per regparm(3) calling conventions).

Change the mapping in regs_get_kernel_argument(), so that arg1=ax
arg2=dx, and arg3=cx.

Running the selftests testcase kprobes_args_use.tc shows the result
as passed.

Fixes: 3c88ee19 ("x86: ptrace: Add function argument access API")
Signed-off-by: NVamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200828113242.GA1424@cosmos

2356bb4b

30 8月, 2020 1 次提交

x86/cpufeatures: Enumerate TSX suspend load address tracking instructions · 18ec63fa

由 Kyung Min Park 提交于 8月 25, 2020

Intel TSX suspend load tracking instructions aim to give a way to choose
which memory accesses do not need to be tracked in the TSX read set. Add
TSX suspend load tracking CPUID feature flag TSXLDTRK for enumeration.

A processor supports Intel TSX suspend load address tracking if
CPUID.0x07.0x0:EDX[16] is present. Two instructions XSUSLDTRK, XRESLDTRK
are available when this feature is present.

The CPU feature flag is shown as "tsxldtrk" in /proc/cpuinfo.
Signed-off-by: NKyung Min Park <kyung.min.park@intel.com>
Signed-off-by: NCathy Zhang <cathy.zhang@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1598316478-23337-2-git-send-email-cathy.zhang@intel.com

18ec63fa

26 8月, 2020 1 次提交

cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic · bf9282dc

由 Peter Zijlstra 提交于 8月 12, 2020

This allows moving the leave_mm() call into generic code before
rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMarco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200821085348.369441600@infradead.org

bf9282dc

22 8月, 2020 1 次提交

KVM: Pass MMU notifier range flags to kvm_unmap_hva_range() · fdfe7cbd

由 Will Deacon 提交于 8月 11, 2020

The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.

Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.

Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: NWill Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-2-will@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fdfe7cbd

20 8月, 2020 1 次提交

efi/x86: Move 32-bit code into efi_32.c · 39ada88f

由 Ard Biesheuvel 提交于 8月 13, 2020

Now that the old memmap code has been removed, some code that was left
behind in arch/x86/platform/efi/efi.c is only used for 32-bit builds,
which means it can live in efi_32.c as well. So move it over.
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

39ada88f

19 8月, 2020 1 次提交

x86/cpu: Fix typos and improve the comments in sync_core() · 40eb0cb4

由 Ingo Molnar 提交于 8月 18, 2020

- Fix typos.

- Move the compiler barrier comment to the top, because it's valid for the
  whole function, not just the legacy branch.
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200818053130.GA3161093@gmail.comReviewed-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>

40eb0cb4

18 8月, 2020 1 次提交

x86/cpu: Use XGETBV and XSETBV mnemonics in fpu/internal.h · 86109813

由 Uros Bizjak 提交于 7月 07, 2020

Current minimum required version of binutils is 2.23, which supports
XGETBV and XSETBV instruction mnemonics.

Replace the byte-wise specification of XGETBV and XSETBV with these
proper mnemonics.
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200707174722.58651-1-ubizjak@gmail.com

86109813

17 8月, 2020 1 次提交

x86/cpu: Use SERIALIZE in sync_core() when available · bf9c912f

由 Ricardo Neri 提交于 8月 06, 2020

The SERIALIZE instruction gives software a way to force the processor to
complete all modifications to flags, registers and memory from previous
instructions and drain all buffered writes to memory before the next
instruction is fetched and executed. Thus, it serves the purpose of
sync_core(). Use it when available.
Suggested-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200807032833.17484-1-ricardo.neri-calderon@linux.intel.com

bf9c912f

13 8月, 2020 1 次提交

uaccess: remove segment_eq · 428e2976

由 Christoph Hellwig 提交于 8月 11, 2020

segment_eq is only used to implement uaccess_kernel.  Just open code
uaccess_kernel in the arch uaccess headers and remove one layer of
indirection.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Acked-by: NGreentime Hu <green.hu@gmail.com>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Link: http://lkml.kernel.org/r/20200710135706.537715-5-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

428e2976

11 8月, 2020 2 次提交

x86/hyperv: Make hv_setup_sched_clock inline · b9d8cf2e

由 Michael Kelley 提交于 8月 09, 2020

Make hv_setup_sched_clock inline so the reference to pv_ops works
correctly with objtool updates to detect noinstr violations.
See https://lore.kernel.org/patchwork/patch/1283635/Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1597022991-24088-1-git-send-email-mikelley@microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>

b9d8cf2e

x86/xen: remove 32-bit Xen PV guest support · a13f2ef1

由 Juergen Gross 提交于 6月 29, 2020

Xen is requiring 64-bit machines today and since Xen 4.14 it can be
built without 32-bit PV guest support. There is no need to carry the
burden of 32-bit PV guest support in the kernel any longer, as new
guests can be either HVM or PVH, or they can use a 64 bit kernel.

Remove the 32-bit Xen PV support from the kernel.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>

a13f2ef1

08 8月, 2020 4 次提交

asm-generic: pgalloc: provide generic pgd_free() · f9cb654c

由 Mike Rapoport 提交于 8月 06, 2020

Most architectures define pgd_free() as a wrapper for free_page().

Provide a generic version in asm-generic/pgalloc.h and enable its use for
most architectures.
Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-7-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f9cb654c

asm-generic: pgalloc: provide generic pud_alloc_one() and pud_free_one() · d9e8b929

由 Mike Rapoport 提交于 8月 06, 2020

Several architectures define pud_alloc_one() as a wrapper for
__get_free_page() and pud_free() as a wrapper for free_page().

Provide a generic implementation in asm-generic/pgalloc.h and use it where
appropriate.
Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-6-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d9e8b929

asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one() · 1355c31e

由 Mike Rapoport 提交于 8月 06, 2020

For most architectures that support >2 levels of page tables,
pmd_alloc_one() is a wrapper for __get_free_pages(), sometimes with
__GFP_ZERO and sometimes followed by memset(0) instead.

More elaborate versions on arm64 and x86 account memory for the user page
tables and call to pgtable_pmd_page_ctor() as the part of PMD page
initialization.

Move the arm64 version to include/asm-generic/pgalloc.h and use the
generic version on several architectures.

The pgtable_pmd_page_ctor() is a NOP when ARCH_ENABLE_SPLIT_PMD_PTLOCK is
not enabled, so there is no functional change for most architectures
except of the addition of __GFP_ACCOUNT for allocation of user page
tables.

The pmd_free() is a wrapper for free_page() in all the cases, so no
functional change here.
Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-5-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1355c31e

mm: remove unneeded includes of <asm/pgalloc.h> · ca15ca40

由 Mike Rapoport 提交于 8月 06, 2020

Patch series "mm: cleanup usage of <asm/pgalloc.h>"

Most architectures have very similar versions of pXd_alloc_one() and
pXd_free_one() for intermediate levels of page table.  These patches add
generic versions of these functions in <asm-generic/pgalloc.h> and enable
use of the generic functions where appropriate.

In addition, functions declared and defined in <asm/pgalloc.h> headers are
used mostly by core mm and early mm initialization in arch and there is no
actual reason to have the <asm/pgalloc.h> included all over the place.
The first patch in this series removes unneeded includes of
<asm/pgalloc.h>

In the end it didn't work out as neatly as I hoped and moving
pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require
unnecessary changes to arches that have custom page table allocations, so
I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
to mm/.

This patch (of 8):

In most cases <asm/pgalloc.h> header is required only for allocations of
page table memory.  Most of the .c files that include that header do not
use symbols declared in <asm/pgalloc.h> and do not require that header.

As for the other header files that used to include <asm/pgalloc.h>, it is
possible to move that include into the .c file that actually uses symbols
from <asm/pgalloc.h> and drop the include from the header file.

The process was somewhat automated using

	sed -i -E '/[<"]asm\/pgalloc\.h/d' \
                $(grep -L -w -f /tmp/xx \
                        $(git grep -E -l '[<"]asm/pgalloc\.h'))

where /tmp/xx contains all the symbols defined in
arch/*/include/asm/pgalloc.h.

[rppt@linux.ibm.com: fix powerpc warning]
Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ca15ca40

07 8月, 2020 1 次提交

Revert "x86/mm/64: Do not sync vmalloc/ioremap mappings" · 7b4ea945

由 Linus Torvalds 提交于 8月 06, 2020

This reverts commit 8bb9bf24.

It seems the vmalloc page tables aren't always preallocated in all
situations, because Jason Donenfeld reports an oops with this commit:

  BUG: unable to handle page fault for address: ffffe8ffffd00608
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP
  CPU: 2 PID: 22 Comm: kworker/2:0 Not tainted 5.8.0+ #154
  RIP: process_one_work+0x2c/0x2d0
  Code: 41 56 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 08 48 8b 06 4c 8b 67 40 49 89 c6 45 30 f6 a8 04 b8 00 00 00 00 4c 0f 44 f0 <49> 8b 46 08 44 8b a8 00 01 05
  Call Trace:
   worker_thread+0x4b/0x3b0
   ? rescuer_thread+0x360/0x360
   kthread+0x116/0x140
   ? __kthread_create_worker+0x110/0x110
   ret_from_fork+0x1f/0x30
  CR2: ffffe8ffffd00608

and that page fault address is right in that vmalloc space, and we
clearly don't have a PGD/P4D entry for it.

Looking at the "Code:" line, the actual fault seems to come from the
'pwq->wq' dereference at the top of the process_one_work() function:

        struct pool_workqueue *pwq = get_work_pwq(work);
        struct worker_pool *pool = worker->pool;
        bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;

so 'struct pool_workqueue *pwq' is the allocation that hasn't been
synchronized across CPUs.

Just revert for now, while Joerg figures out the cause.
Reported-and-bisected-by: NJason A. Donenfeld <Jason@zx2c4.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7b4ea945

06 8月, 2020 3 次提交

locking/seqlock, headers: Untangle the spaghetti monster · 0cd39f46

由 Peter Zijlstra 提交于 8月 06, 2020

By using lockdep_assert_*() from seqlock.h, the spaghetti monster
attacked.

Attack back by reducing seqlock.h dependencies from two key high level headers:

 - <linux/seqlock.h>:               -Remove <linux/ww_mutex.h>
 - <linux/time.h>:                  -Remove <linux/seqlock.h>
 - <linux/sched.h>:                 +Add    <linux/seqlock.h>

The price was to add it to sched.h ...

Core header fallout, we add direct header dependencies instead of gaining them
parasitically from higher level headers:

 - <linux/dynamic_queue_limits.h>:  +Add <asm/bug.h>
 - <linux/hrtimer.h>:               +Add <linux/seqlock.h>
 - <linux/ktime.h>:                 +Add <asm/bug.h>
 - <linux/lockdep.h>:               +Add <linux/smp.h>
 - <linux/sched.h>:                 +Add <linux/seqlock.h>
 - <linux/videodev2.h>:             +Add <linux/kernel.h>

Arch headers fallout:

 - PARISC: <asm/timex.h>:           +Add <asm/special_insns.h>
 - SH:     <asm/io.h>:              +Add <asm/page.h>
 - SPARC:  <asm/timer_64.h>:        +Add <uapi/asm/asi.h>
 - SPARC:  <asm/vvar.h>:            +Add <asm/processor.h>, <asm/barrier.h>
                                    -Remove <linux/seqlock.h>
 - X86:    <asm/fixmap.h>:          +Add <asm/pgtable_types.h>
                                    -Remove <asm/acpi.h>

There's also a bunch of parasitic header dependency fallout in .c files, not listed
separately.

[ mingo: Extended the changelog, split up & fixed the original patch. ]
Co-developed-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200804133438.GK2674@hirez.programming.kicks-ass.net

0cd39f46

x86/headers: Remove APIC headers from <asm/smp.h> · 13c01139

由 Ingo Molnar 提交于 8月 06, 2020

The APIC headers are relatively complex and bring in additional
header dependencies - while smp.h is a relatively simple header
included from high level headers.

Remove the dependency and add in the missing #include's in .c
files where they gained it indirectly before.
Signed-off-by: NIngo Molnar <mingo@kernel.org>

13c01139

vdso/treewide: Add vdso_data pointer argument to __arch_get_hw_counter() · 4c5a116a

由 Thomas Gleixner 提交于 8月 04, 2020

MIPS already uses and S390 will need the vdso data pointer in
__arch_get_hw_counter().

This works nicely as long as the architecture does not support time
namespaces in the VDSO. With time namespaces enabled the regular
accessor to the vdso data pointer __arch_get_vdso_data() will return the
namespace specific VDSO data page for tasks which are part of a
non-root time namespace. This would cause the architectures which need
the vdso data pointer in __arch_get_hw_counter() to access the wrong
vdso data page.

Add a vdso_data pointer argument to __arch_get_hw_counter() and hand it in
from the call sites in the core code. For architectures which do not need
the data pointer in their counter accessor function the compiler will just
optimize it out.

Fix up all existing architecture implementations and make MIPS utilize the
pointer instead of invoking the accessor function.

No functional change and no change in the resulting object code (except
MIPS).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/draft-87wo2ekuzn.fsf@nanos.tec.linutronix.de

4c5a116a

03 8月, 2020 1 次提交

xen: hypercall.h: fix duplicated word · 4e722d4f

由 Randy Dunlap 提交于 7月 25, 2020

Change the repeated word "as" to "as a".
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: xen-devel@lists.xenproject.org
Link: https://lore.kernel.org/r/20200726001731.19540-1-rdunlap@infradead.orgSigned-off-by: NJuergen Gross <jgross@suse.com>

4e722d4f

31 7月, 2020 2 次提交

x86: Add support for ZSTD compressed kernel · fb46d057

由 Nick Terrell 提交于 7月 30, 2020

- Add support for zstd compressed kernel

- Define __DISABLE_EXPORTS in Makefile

- Remove __DISABLE_EXPORTS definition from kaslr.c

- Bump the heap size for zstd.

- Update the documentation.

Integrates the ZSTD decompression code to the x86 pre-boot code.

Zstandard requires slightly more memory during the kernel decompression
on x86 (192 KB vs 64 KB), and the memory usage is independent of the
window size.

__DISABLE_EXPORTS is now defined in the Makefile, which covers both
the existing use in kaslr.c, and the use needed by the zstd decompressor
in misc.c.

This patch has been boot tested with both a zstd and gzip compressed
kernel on i386 and x86_64 using buildroot and QEMU.

Additionally, this has been tested in production on x86_64 devices.
We saw a 2 second boot time reduction by switching kernel compression
from xz to zstd.
Signed-off-by: NNick Terrell <terrelln@fb.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200730190841.2071656-7-nickrterrell@gmail.com

fb46d057

KVM: x86: Specify max TDP level via kvm_configure_mmu() · 83013059

由 Sean Christopherson 提交于 7月 15, 2020

Capture the max TDP level during kvm_configure_mmu() instead of using a
kvm_x86_ops hook to do it at every vCPU creation.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200716034122.5998-10-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

83013059

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功