提交 · e080a3b6b056a2729d07ce26b51c28b926c144c1 · openeuler / Kernel

17 9月, 2022 1 次提交

mm: page_table_check: add hooks to public helpers · e080a3b6

由知远pimo 提交于 9月 17, 2022

mainline inclusion
from mainline-v6.0
commit de8c8e52
category: feature
bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I56FG1?from=project-issue
CVE: N/A

Reference:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=de8c8e52836d0082188508548d0b939f49f7f0e6

-------------------------------------------------
Move ptep_clear() to the include/linux/pgtable.h and add page table check
relate hooks to some helpers, it's prepare for support page table check
feature on new architecture.

Optimize the implementation of ptep_clear(), page table hooks added page
table check stubs, the interface control should be at stubs, there is no
rationale for doing a IS_ENABLED() check here.

For architectures that do not enable CONFIG_PAGE_TABLE_CHECK, they will
call a fallback page table check stubs[1] when getting their page table
helpers[2] in include/linux/pgtable.h.

[1] page table check stubs defined in include/linux/page_table_check.h
[2] ptep_clear() ptep_get_and_clear()  pmdp_huge_get_and_clear()
pudp_huge_get_and_clear()

Link: https://lkml.kernel.org/r/20220507110114.4128854-4-tongtiangen@huawei.comSigned-off-by: NTong Tiangen <tongtiangen@huawei.com>
Acked-by: NPasha Tatashin <pasha.tatashin@soleen.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
(cherry picked from commit de8c8e52)
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZeng Zhimin <im_zzm@126.com>

e080a3b6

16 9月, 2022 2 次提交

mm: page_table_check: move pxx_user_accessible_page into x86 · 21bdebbb

由知远pimo 提交于 9月 16, 2022

mainline inclusion
from mainline-v6.0
commit e5a55401
category: feature
bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I56FG1?from=project-issue
CVE: N/A

Reference:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e5a554014618308f046af99ab9c950165ed6cb11

-------------------------------------------------
The pxx_user_accessible_page() checks the PTE bit, it's
architecture-specific code, move them into x86's pgtable.h.

These helpers are being moved out to make the page table check framework
platform independent.

Link: https://lkml.kernel.org/r/20220507110114.4128854-3-tongtiangen@huawei.comSigned-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NTong Tiangen <tongtiangen@huawei.com>
Acked-by: NPasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
(cherry picked from commit e5a55401)
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZeng Zhimin <im_zzm@126.com>

21bdebbb

x86: mm: add x86_64 support for page table check · ec9e357e

由知远pimo 提交于 9月 16, 2022

mainline inclusion
from mainline-v6.0
commit d283d422
category: feature
bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I56FG1?from=project-issue
CVE: N/A

Reference:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d283d422c6c4f0264fe8ecf5ae80036bf73f4594

-------------------------------------------------

Add page table check hooks into routines that modify user page tables.

Link: https://lkml.kernel.org/r/20211221154650.1047963-5-pasha.tatashin@soleen.comSigned-off-by: NPasha Tatashin <pasha.tatashin@soleen.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NZeng Zhimin <im_zzm@126.com>

ec9e357e

01 9月, 2022 1 次提交

KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in host · 73788673

由 Wei Huang 提交于 8月 18, 2021

mainline inclusion
from mainline-5.15
commit cb0f722a
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5NGRU
CVE: NA

-------------------------------------------------

When the 5-level page table CPU flag is set in the host, but the guest
has CR4.LA57=0 (including the case of a 32-bit guest), the top level of
the shadow NPT page tables will be fixed, consisting of one pointer to
a lower-level table and 511 non-present entries.  Extend the existing
code that creates the fixed PML4 or PDP table, to provide a fixed PML5
table if needed.

This is not needed on EPT because the number of layers in the tables
is specified in the EPTP instead of depending on the host CR4.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NWei Huang <wei.huang2@amd.com>
Message-Id: <20210818165549.3771014-3-wei.huang2@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NXie Haocheng <haocheng.xie@amd.com>

73788673

31 8月, 2022 8 次提交

KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255 · f54c63c4

由 Suravee Suthikulpanit 提交于 2月 10, 2022

mainline inclusion
from mainline-5.18-rc1
commit 4a204f78
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5NGRU
CVE: NA

-------------------------------------------------

Expand KVM's mask for the AVIC host physical ID to the full 12 bits defined
by the architecture.  The number of bits consumed by hardware is model
specific, e.g. early CPUs ignored bits 11:8, but there is no way for KVM
to enumerate the "true" size.  So, KVM must allow using all bits, else it
risks rejecting completely legal x2APIC IDs on newer CPUs.

This means KVM relies on hardware to not assign x2APIC IDs that exceed the
"true" width of the field, but presumably hardware is smart enough to tie
the width to the max x2APIC ID.  KVM also relies on hardware to support at
least 8 bits, as the legacy xAPIC ID is writable by software.  But, those
assumptions are unavoidable due to the lack of any way to enumerate the
"true" width.

Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Suggested-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Fixes: 44a95dae ("KVM: x86: Detect and Initialize AVIC support")
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220211000851.185799-1-suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NXie Haocheng <haocheng.xie@amd.com>

f54c63c4

KVM: x86: SVM: move avic definitions from AMD's spec to svm.h · bd57e6ac

由 Maxim Levitsky 提交于 2月 07, 2022

mainline inclusion
from mainline-v5.17
commit 39150352
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5NGRU
CVE: NA

-------------------------------------------------

asm/svm.h is the correct place for all values that are defined in
the SVM spec, and that includes AVIC.

Also add some values from the spec that were not defined before
and will be soon useful.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220207155447.840194-10-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NXie Haocheng <haocheng.xie@amd.com>

bd57e6ac

KVM: x86: Prevent KVM SVM from loading on kernels with 5-level paging · bc1d5726

由 Sean Christopherson 提交于 5月 05, 2021

mainline inclusion
from mainline-v5.13
commit 03ca4589
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5NGRU
CVE: NA

-------------------------------------------------

Disallow loading KVM SVM if 5-level paging is supported.  In theory, NPT
for L1 should simply work, but there unknowns with respect to how the
guest's MAXPHYADDR will be handled by hardware.

Nested NPT is more problematic, as running an L1 VMM that is using
2-level page tables requires stacking single-entry PDP and PML4 tables in
KVM's NPT for L2, as there are no equivalent entries in L1's NPT to
shadow.  Barring hardware magic, for 5-level paging, KVM would need stack
another layer to handle PML5.

Opportunistically rename the lm_root pointer, which is used for the
aforementioned stacking when shadowing 2-level L1 NPT, to pml4_root to
call out that it's specifically for PML4.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210505204221.1934471-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NXie Haocheng <haocheng.xie@amd.com>

bc1d5726

x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumeration · 84649e0a

由 Yazen Ghannam 提交于 12月 16, 2021

mainline inclusion
from mainline-v5.17
commit 91f75eb4
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5NGRU
CVE: NA

-------------------------------------------------

AMD systems currently lay out MCA bank types such that the type of bank
number "i" is either the same across all CPUs or is Reserved/Read-as-Zero.

For example:

  Bank # | CPUx | CPUy
    0      LS     LS
    1      RAZ    UMC
    2      CS     CS
    3      SMU    RAZ

Future AMD systems will lay out MCA bank types such that the type of
bank number "i" may be different across CPUs.

For example:

  Bank # | CPUx | CPUy
    0      LS     LS
    1      RAZ    UMC
    2      CS     NBIO
    3      SMU    RAZ

Change the structures that cache MCA bank types to be per-CPU and update
smca_get_bank_type() to handle this change.

Move some SMCA-specific structures to amd.c from mce.h, since they no
longer need to be global.

Break out the "count" for bank types from struct smca_hwid, since this
should provide a per-CPU count rather than a system-wide count.

Apply the "const" qualifier to the struct smca_hwid_mcatypes array. The
values in this array should not change at runtime.
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211216162905.4132657-3-yazen.ghannam@amd.comSigned-off-by: NXie Haocheng <haocheng.xie@amd.com>

84649e0a

x86/MCE/AMD, EDAC/amd64: Move address translation to AMD64 EDAC · 2a673e92

由 Yazen Ghannam 提交于 10月 28, 2021

mainline inclusion
from mainline-v5.17
commit 0b746e8c
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5NGRU
CVE: NA

-------------------------------------------------

The address translation code used for current AMD systems is
non-architectural. So move it to EDAC.
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211028175728.121452-2-yazen.ghannam@amd.comSigned-off-by: NXie Haocheng <haocheng.xie@amd.com>

2a673e92

x86/MCE/AMD: Export smca_get_bank_type symbol · 29547565

由 Mukul Joshi 提交于 3月 27, 2021

mainline inclusion
from mainline-v5.16
commit f38ce910
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5NGRU
CVE: NA

-------------------------------------------------

Export smca_get_bank_type for use in the AMD GPU
driver to determine MCA bank while handling correctable
and uncorrectable errors in GPU UMC.
Signed-off-by: NMukul Joshi <mukul.joshi@amd.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NXie Haocheng <haocheng.xie@amd.com>

29547565

x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types · 80fbd8c5

由 Yazen Ghannam 提交于 12月 16, 2021

mainline inclusion
from mainline-v5.17
commit 5176a93a
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5NGRU
CVE: NA

-------------------------------------------------

Add HWID and McaType values for new SMCA bank types, and add their error
descriptions to edac_mce_amd.

The "PHY" bank types all have the same error descriptions, and the NBIF
and SHUB bank types have the same error descriptions. So reuse the same
arrays where appropriate.

  [ bp: Remove useless comments over hwid types. ]
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211216162905.4132657-2-yazen.ghannam@amd.comSigned-off-by: NXie Haocheng <haocheng.xie@amd.com>

80fbd8c5

x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types · bde2c0fb

由 Muralidhara M K 提交于 5月 26, 2021

mainline inclusion
from mainline-v5.14
commit 94a311ce
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5NGRU
CVE: NA

-------------------------------------------------

Add the (HWID, MCATYPE) tuples and names for new SMCA bank types.

Also, add their respective error descriptions to the MCE decoding module
edac_mce_amd. Also while at it, optimize the string names for some SMCA
banks.

 [ bp: Drop repeated comments, explain why UMC_V2 is a separate entry. ]
Signed-off-by: NMuralidhara M K <muralimk@amd.com>
Signed-off-by: NNaveen Krishna Chatradhi  <nchatrad@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NYazen Ghannam <yazen.ghannam@amd.com>
Link: https://lkml.kernel.org/r/20210526164601.66228-1-nchatrad@amd.comSigned-off-by: NXie Haocheng <haocheng.xie@amd.com>

bde2c0fb

29 8月, 2022 4 次提交

KVM: VMX: enable IPI virtualization · ba83444c

由 Chao Gao 提交于 4月 19, 2022

mainline inclusion
from mainline-v6.0-rc1
commit d588bb9b
category: feature
feature: IPI Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d588bb9be1da6aa750aa64875fe57369db983d8b

Intel-SIG: commit d588bb9b ("KVM: VMX: enable IPI virtualization")

-------------------------------------

KVM: VMX: enable IPI virtualization

With IPI virtualization enabled, the processor emulates writes to
APIC registers that would send IPIs. The processor sets the bit
corresponding to the vector in target vCPU's PIR and may send a
notification (IPI) specified by NDST and NV fields in target vCPU's
Posted-Interrupt Descriptor (PID). It is similar to what IOMMU
engine does when dealing with posted interrupt from devices.

A PID-pointer table is used by the processor to locate the PID of a
vCPU with the vCPU's APIC ID. The table size depends on maximum APIC
ID assigned for current VM session from userspace. Allocating memory
for PID-pointer table is deferred to vCPU creation, because irqchip
mode and VM-scope maximum APIC ID is settled at that point. KVM can
skip PID-pointer table allocation if !irqchip_in_kernel().

Like VT-d PI, if a vCPU goes to blocked state, VMM needs to switch its
notification vector to wakeup vector. This can ensure that when an IPI
for blocked vCPUs arrives, VMM can get control and wake up blocked
vCPUs. And if a VCPU is preempted, its posted interrupt notification
is suppressed.

Note that IPI virtualization can only virualize physical-addressing,
flat mode, unicast IPIs. Sending other IPIs would still cause a
trap-like APIC-write VM-exit and need to be handled by VMM.
Signed-off-by: NChao Gao <chao.gao@intel.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Message-Id: <20220419154510.11938-1-guang.zeng@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

ba83444c

KVM: x86: Allow userspace to set maximum VCPU id for VM · c731a570

由 Zeng Guang 提交于 4月 19, 2022

mainline inclusion
from mainline-v6.0-rc1
commit 35875316
category: feature
feature: IPI Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=35875316384b71d23dc2a45a969732fc8cab16af

Intel-SIG: commit 35875316 ("KVM: x86: Allow userspace to set maximum VCPU id for VM")

-------------------------------------

KVM: x86: Allow userspace to set maximum VCPU id for VM

Introduce new max_vcpu_ids in KVM for x86 architecture. Userspace
can assign maximum possible vcpu id for current VM session using
KVM_CAP_MAX_VCPU_ID of KVM_ENABLE_CAP ioctl().

This is done for x86 only because the sole use case is to guide
memory allocation for PID-pointer table, a structure needed to
enable VMX IPI.

By default, max_vcpu_ids set as KVM_MAX_VCPU_ID.
Suggested-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Message-Id: <20220419154444.11888-1-guang.zeng@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

c731a570

KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config · df66c9d1

由 Robert Hoo 提交于 4月 19, 2022

mainline inclusion
from mainline-v6.0-rc1
commit 1ad4e543
category: feature
feature: IPI Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1ad4e5438c67a01620ed67cea959de89f4430515

Intel-SIG: commit 1ad4e543 ("KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config")

-------------------------------------

KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config

Check VMX features on tertiary execution control in VMCS config setup.
Sub-features in tertiary execution control to be enabled are adjusted
according to hardware capabilities although no sub-feature is enabled
in this patch.

EVMCSv1 doesn't support tertiary VM-execution control, so disable it
when EVMCSv1 is in use. And define the auxiliary functions for Tertiary
control field here, using the new BUILD_CONTROLS_SHADOW().
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NRobert Hoo <robert.hu@linux.intel.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153400.11642-1-guang.zeng@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

df66c9d1

x86/cpu: Add new VMX feature, Tertiary VM-Execution control · 581829c2

由 Robert Hoo 提交于 4月 19, 2022

mainline inclusion
from mainline-v6.0-rc1
commit 465932db
category: feature
feature: IPI Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=465932db25f3664893b66152c7b190afd28c32db

Intel-SIG: commit 465932db ("x86/cpu: Add new VMX feature, Tertiary VM-Execution control")

-------------------------------------

x86/cpu: Add new VMX feature, Tertiary VM-Execution control

A new 64-bit control field "tertiary processor-based VM-execution
controls", is defined [1]. It's controlled by bit 17 of the primary
processor-based VM-execution controls.

Different from its brother VM-execution fields, this tertiary VM-
execution controls field is 64 bit. So it occupies 2 vmx_feature_leafs,
TERTIARY_CTLS_LOW and TERTIARY_CTLS_HIGH.

Its companion VMX capability reporting MSR,MSR_IA32_VMX_PROCBASED_CTLS3
(0x492), is also semantically different from its brothers, whose 64 bits
consist of all allow-1, rather than 32-bit allow-0 and 32-bit allow-1 [1][2].
Therefore, its init_vmx_capabilities() is a little different from others.

[1] ISE 6.2 "VMCS Changes"
https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

[2] SDM Vol3. Appendix A.3
Reviewed-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NRobert Hoo <robert.hu@linux.intel.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Message-Id: <20220419153240.11549-1-guang.zeng@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

581829c2

19 8月, 2022 2 次提交

x86/cpu: Fix core name for Sapphire Rapids · f6a3d7dc

由 Andi Kleen 提交于 5月 13, 2021

mainline inclusion
from mainline-v5.14-rc2
commit 28188cc4
category: feature
feature: SPR PMU uncore support
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5BECO

Intel-SIG: commit 28188cc4 x86/cpu: Fix core name for Sapphire
Rapids
This commit is backported as a dependency for SPR PMU uncore support.

-------------------------------------

Sapphire Rapids uses Golden Cove, not Willow Cove.

Fixes: 53375a5a ("x86/cpu: Resort and comment Intel models")
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210513163904.3083274-1-ak@linux.intel.comSigned-off-by: NYunying Sun <yunying.sun@intel.com>

f6a3d7dc

x86/cpu: Resort and comment Intel models · 47870730

由 Peter Zijlstra 提交于 3月 15, 2021

mainline inclusion
from mainline-v5.13-rc1
commit 53375a5a
category: feature
feature: SPR PMU uncore support
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5BECO

Intel-SIG: commit 53375a5a x86/cpu: Resort and comment Intel models
This commit is backported as a dependency for SPR PMU uncore support.

-------------------------------------

The INTEL_FAM6 list has become a mess again. Try and bring some sanity
back into it.

Where previously we had one microarch per year and a number of SKUs
within that, this no longer seems to be the case. We now get different
uarch names that share a 'core' design.

Add the core name starting at skylake and reorder to keep the cores
in chronological order. Furthermore, Intel marketed the names {Amber,
Coffee, Whiskey} Lake, but those are in fact steppings of Kaby Lake, add
comments for them.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/YE+HhS8i0gshHD3W@hirez.programming.kicks-ass.netSigned-off-by: NYunying Sun <yunying.sun@intel.com>

47870730

18 8月, 2022 1 次提交

uaccess: add generic fallback version of copy_mc_to_user() · 146db387

由 Tong Tiangen 提交于 8月 17, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GB28
CVE: NA

-------------------------------

x86/powerpc has it's implementation of copy_mc_to_user(), we add generic
fallback in include/linux/uaccess.h prepare for other architechures to
enable CONFIG_ARCH_HAS_COPY_MC.
Signed-off-by: NTong Tiangen <tongtiangen@huawei.com>

146db387

15 8月, 2022 2 次提交

x86/traps: Handle #DB for bus lock · e03f129e

由 Fenghua Yu 提交于 3月 22, 2021

mainline inclusion
from mainline-v5.12-rc4
commit ebb1064e
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5G10C
CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=ebb1064e

Intel-SIG: commit ebb1064e x86/traps: Handle #DB for bus lock

--------------------------------

Bus locks degrade performance for the whole system, not just for the CPU
that requested the bus lock. Two CPU features "#AC for split lock" and
"#DB for bus lock" provide hooks so that the operating system may choose
one of several mitigation strategies.

bus lock feature to cover additional situations with new options to
mitigate.

split_lock_detect=
		#AC for split lock		#DB for bus lock

off		Do nothing			Do nothing

warn		Kernel OOPs			Warn once per task and
		Warn once per task and		and continues to run.
		disable future checking
	 	When both features are
		supported, warn in #AC

fatal		Kernel OOPs			Send SIGBUS to user.
		Send SIGBUS to user
		When both features are
		supported, fatal in #AC

ratelimit:N	Do nothing			Limit bus lock rate to
						N per second in the
						current non-root user.

Default option is "warn".

Hardware only generates #DB for bus lock detect when CPL>0 to avoid
nested #DB from multiple bus locks while the first #DB is being handled.
So no need to handle #DB for bus lock detected in the kernel.

while #AC for split lock is enabled by split lock detection bit 29 in
TEST_CTRL MSR.

Both breakpoint and bus lock in the same instruction can trigger one #DB.
The bus lock is handled before the breakpoint in the #DB handler.

Delivery of #DB for bus lock in userspace clears DR6[11], which is set by
the #DB handler right after reading DR6.
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210322135325.682257-3-fenghua.yu@intel.com

(cherry picked from commit ebb1064e)
Signed-off-by: NEthan Zhao <haifeng.zhao@linux.intel.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

e03f129e

x86/cpufeatures: Enumerate #DB for bus lock detection · cb3763f7

由 Fenghua Yu 提交于 3月 22, 2021

mainline inclusion
from mainline-v5.12-rc4
commit f21d4d3b
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5G10C
CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=f21d4d3b

Intel-SIG: commit f21d4d3b x86/cpufeatures: Enumerate #DB for bus lock detection

--------------------------------

A bus lock is acquired through either a split locked access to writeback
(WB) memory or any locked access to non-WB memory. This is typically >1000
cycles slower than an atomic operation within a cache line. It also
disrupts performance on other cores.

Some CPUs have the ability to notify the kernel by a #DB trap after a user
instruction acquires a bus lock and is executed. This allows the kernel to
enforce user application throttling or mitigation. Both breakpoint and bus
lock can trigger the #DB trap in the same instruction and the ordering of
handling them is the kernel #DB handler's choice.

The CPU feature flag to be shown in /proc/cpuinfo will be "bus_lock_detect".
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210322135325.682257-2-fenghua.yu@intel.com

(cherry picked from commit f21d4d3b)
Signed-off-by: NEthan Zhao <haifeng.zhao@linux.intel.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

cb3763f7

05 8月, 2022 2 次提交

perf/x86/intel: Add perf core PMU support for Sapphire Rapids · 299583ee

由 Kan Liang 提交于 1月 28, 2021

mainline inclusion
from mainline-v5.12-rc1
commit 61b985e3
category: feature
feature: SPR PMU core event enhancement
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I596BF

Intel-SIG: commit 61b985e3 ("perf/x86/intel: Add perf core PMU support
for Sapphire Rapids")

-------------------------------------

Add perf core PMU support for the Intel Sapphire Rapids server, which is
the successor of the Intel Ice Lake server. The enabling code is based
on Ice Lake, but there are several new features introduced.

The event encoding is changed and simplified, e.g., the event codes
which are below 0x90 are restricted to counters 0-3. The event codes
which above 0x90 are likely to have no restrictions. The event
constraints, extra_regs(), and hardware cache events table are changed
accordingly.

A new Precise Distribution (PDist) facility is introduced, which
further minimizes the skid when a precise event is programmed on the GP
counter 0. Enable the Precise Distribution (PDist) facility with :ppp
event. For this facility to work, the period must be initialized with a
value larger than 127. Add spr_limit_period() to apply the limit for
:ppp event.

Two new data source fields, data block & address block, are added in the
PEBS Memory Info Record for the load latency event. To enable the
feature,
- An auxiliary event has to be enabled together with the load latency
event on Sapphire Rapids. A new flag PMU_FL_MEM_LOADS_AUX is
introduced to indicate the case. A new event, mem-loads-aux, is
exposed to sysfs for the user tool.
Add a check in hw_config(). If the auxiliary event is not detected,
return an unique error -ENODATA.
- The union perf_mem_data_src is extended to support the new fields.
- Ice Lake and earlier models do not support block information, but the
fields may be set by HW on some machines. Add pebs_no_block to
explicitly indicate the previous platforms which don't support the new
block fields. Accessing the new block fields are ignored on those
platforms.

A new store Latency facility is introduced, which leverages the PEBS
facility where it can provide additional information about sampled
stores. The additional information includes the data address, memory
auxiliary info (e.g. Data Source, STLB miss) and the latency of the
store access. To enable the facility, the new event (0x02cd) has to be
programed on the GP counter 0. A new flag PERF_X86_EVENT_PEBS_STLAT is
introduced to indicate the event. The store_latency_data() is introduced
to parse the memory auxiliary info.

The layout of access latency field of PEBS Memory Info Record has been
changed. Two latency, instruction latency (bit 15:0) and cache access
latency (bit 47:32) are recorded.
- The cache access latency is similar to previous memory access latency.
For loads, the latency starts by the actual cache access until the
data is returned by the memory subsystem.
For stores, the latency starts when the demand write accesses the L1
data cache and lasts until the cacheline write is completed in the
memory subsystem.
The cache access latency is stored in low 32bits of the sample type
PERF_SAMPLE_WEIGHT_STRUCT.
- The instruction latency starts by the dispatch of the load operation
for execution and lasts until completion of the instruction it belongs
to.
Add a new flag PMU_FL_INSTR_LATENCY to indicate the instruction
latency support. The instruction latency is stored in the bit 47:32
of the sample type PERF_SAMPLE_WEIGHT_STRUCT.

Extends the PERF_METRICS MSR to feature TMA method level 2 metrics. The
lower half of the register is the TMA level 1 metrics (legacy). The
upper half is also divided into four 8-bit fields for the new level 2
metrics. Expose all eight Topdown metrics events to user space.

The full description for the SPR features can be found at Intel
Architecture Instruction Set Extensions and Future Features
Programming Reference, 319433-041.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1611873611-156687-5-git-send-email-kan.liang@linux.intel.comSigned-off-by: NYunying Sun <yunying.sun@intel.com>
Signed-off-by: NJun Tian <jun.j.tian@intel.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

299583ee

perf/x86/intel: Filter unsupported Topdown metrics event · 14f42d56

由 Kan Liang 提交于 1月 28, 2021

mainline inclusion
from mainline-v5.12-rc1
commit 1ab5f235
category: feature
feature: SPR PMU core event enhancement
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I596BF

Intel-SIG: commit 1ab5f235 ("perf/x86/intel: Filter unsupported
Topdown metrics event")

-------------------------------------

Intel Sapphire Rapids server will introduce 8 metrics events. Intel
Ice Lake only supports 4 metrics events. A perf tool user may mistakenly
use the unsupported events via RAW format on Ice Lake. The user can
still get a value from the unsupported Topdown metrics event once the
following Sapphire Rapids enabling patch is applied.

To enable the 8 metrics events on Intel Sapphire Rapids, the
INTEL_TD_METRIC_MAX has to be updated, which impacts the
is_metric_event(). The is_metric_event() is a generic function.
On Ice Lake, the newly added SPR metrics events will be mistakenly
accepted as metric events on creation. At runtime, the unsupported
Topdown metrics events will be updated.

Add a variable num_topdown_events in x86_pmu to indicate the available
number of the Topdown metrics event on the platform. Apply the number
into is_metric_event(). Only the supported Topdown metrics events
should be created as metrics events.

Apply the num_topdown_events in icl_update_topdown_event() as well. The
function can be reused by the following patch.
Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1611873611-156687-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: NYunying Sun <yunying.sun@intel.com>
Signed-off-by: NJun Tian <jun.j.tian@intel.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

14f42d56

04 8月, 2022 17 次提交

x86/cpu: Load microcode during restore_processor_state() · 161ff738

由 Borislav Petkov 提交于 8月 04, 2022

stable inclusion
from stable-v5.10.114
commit 2ab14625b879eec22854f1dbe61d51570b427513
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5IY1V

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2ab14625b879eec22854f1dbe61d51570b427513

--------------------------------

commit f9e14dbb upstream.

When resuming from system sleep state, restore_processor_state()
restores the boot CPU MSRs. These MSRs could be emulated by microcode.
If microcode is not loaded yet, writing to emulated MSRs leads to
unchecked MSR access error:

  ...
  PM: Calling lapic_suspend+0x0/0x210
  unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0...0) at rIP: ... (native_write_msr)
  Call Trace:
    <TASK>
    ? restore_processor_state
    x86_acpi_suspend_lowlevel
    acpi_suspend_enter
    suspend_devices_and_enter
    pm_suspend.cold
    state_store
    kobj_attr_store
    sysfs_kf_write
    kernfs_fop_write_iter
    new_sync_write
    vfs_write
    ksys_write
    __x64_sys_write
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
   RIP: 0033:0x7fda13c260a7

To ensure microcode emulated MSRs are available for restoration, load
the microcode on the boot CPU before restoring these MSRs.

  [ Pawan: write commit message and productize it. ]

Fixes: e2a1256b ("x86/speculation: Restore speculation related MSRs during S3 resume")
Reported-by: NKyle D. Pelton <kyle.d.pelton@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
Tested-by: NKyle D. Pelton <kyle.d.pelton@intel.com>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215841
Link: https://lore.kernel.org/r/4350dfbf785cd482d3fafa72b2b49c83102df3ce.1650386317.git.pawan.kumar.gupta@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

161ff738

x86/cpufeatures: Put the AMX macros in the word 18 block · f7d8e961

由 Jim Mattson 提交于 2月 03, 2022

mainline inclusion
from mainline-v5.18-rc1
commit fa31a4d6
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit fa31a4d6 x86/cpufeatures: Put the AMX macros in the word 18 block.

--------------------------------

These macros are for bits in CPUID.(EAX=7,ECX=0):EDX, not for bits in
CPUID(EAX=7,ECX=1):EAX. Put them with their brethren.

  [ bp: Sort word 18 bits properly, as caught by Like Xu
    <like.xu.linux@gmail.com> ]
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20220203194308.2469117-1-jmattson@google.comSigned-off-by: NLin Wang <lin.x.wang@intel.com>

f7d8e961

x86/fpu: Optimize out sigframe xfeatures when in init state · 2b81aa3d

由 Dave Hansen 提交于 11月 02, 2021

mainline inclusion
from mainline-v5.16-rc1
commit 30d02551
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 30d02551 x86/fpu: Optimize out sigframe xfeatures when in init state.

--------------------------------

tl;dr: AMX state is ~8k. Signal frames can have space for this
~8k and each signal entry writes out all 8k even if it is zeros.
Skip writing zeros for AMX to speed up signal delivery by about
4% overall when AMX is in its init state.

This is a user-visible change to the sigframe ABI.

== Hardware XSAVE Background ==

XSAVE state components may be tracked by the processor as being
in their initial configuration. Software can detect which
features are in this configuration by looking at the XSTATE_BV
field in an XSAVE buffer or with the XGETBV(1) instruction.

Both the XSAVE and XSAVEOPT instructions enumerate features s
being in the initial configuration via the XSTATE_BV field in the
XSAVE header, However, XSAVEOPT declines to actually write
features in their initial configuration to the buffer. XSAVE
writes the feature unconditionally, regardless of whether it is
in the initial configuration or not.

Basically, XSAVE users never need to inspect XSTATE_BV to
determine if the feature has been written to the buffer.
XSAVEOPT users *do* need to inspect XSTATE_BV. They might also
need to clear out the buffer if they want to make an isolated
change to the state, like modifying one register.

== Software Signal / XSAVE Background ==

Signal frames have historically been written with XSAVE itself.
Each state is written in its entirety, regardless of being in its
initial configuration.

In other words, the signal frame ABI uses the XSAVE behavior, not
the XSAVEOPT behavior.

== Problem ==

This means that any application which has acquired permission to
use AMX via ARCH_REQ_XCOMP_PERM will write 8k of state to the
signal frame. This 8k write will occur even when AMX was in its
initial configuration and software *knows* this because of
XSTATE_BV.

This problem also exists to a lesser degree with AVX-512 and its
2k of state. However, AVX-512 use does not require
ARCH_REQ_XCOMP_PERM and is more likely to have existing users
which would be impacted by any change in behavior.

== Solution ==

Stop writing out AMX xfeatures which are in their initial state
to the signal frame. This effectively makes the signal frame
XSAVE buffer look as if it were written with a combination of
XSAVEOPT and XSAVE behavior. Userspace which handles XSAVEOPT-
style buffers should be able to handle this naturally.

For now, include only the AMX xfeatures: XTILE and XTILEDATA in
this new behavior. These require new ABI to use anyway, which
makes their users very unlikely to be broken. This XSAVEOPT-like
behavior should be expected for all future dynamic xfeatures. It
may also be extended to legacy features like AVX-512 in the
future.

Only attempt this optimization on systems with dynamic features.
Disable dynamic feature support (XFD) if XGETBV1 is unavailable
by adding a CPUID dependency.

This has been measured to reduce the *overall* cycle cost of
signal delivery by about 4%.

Fixes: 2308ee57 ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: N"Chang S. Bae" <chang.seok.bae@intel.com>
Link: https://lore.kernel.org/r/20211102224750.FA412E26@davehans-spike.ostc.intel.comSigned-off-by: NLin Wang <lin.x.wang@intel.com>

2b81aa3d

x86/fpu/amx: Enable the AMX feature in 64-bit mode · 987f0402

由 Chang S. Bae 提交于 10月 21, 2021

mainline inclusion
from mainline-v5.16-rc1
commit 2308ee57
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 2308ee57 x86/fpu/amx: Enable the AMX feature in 64-bit mode.

--------------------------------

Add the AMX state components in XFEATURE_MASK_USER_SUPPORTED and the
TILE_DATA component to the dynamic states and update the permission check
table accordingly.

This is only effective on 64 bit kernels as for 32bit kernels
XFEATURE_MASK_TILE is defined as 0.

TILE_DATA is caller-saved state and the only dynamic state. Add build time
sanity check to ensure the assumption that every dynamic feature is caller-
saved.

Make AMX state depend on XFD as it is dynamic feature.
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-24-chang.seok.bae@intel.comSigned-off-by: NLin Wang <lin.x.wang@intel.com>

987f0402

x86/fpu/amx: Define AMX state components and have it used for boot-time checks · af5963c9

由 Chang S. Bae 提交于 10月 21, 2021

mainline inclusion
from mainline-v5.16-rc1
commit eec2113e
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit eec2113e x86/fpu/amx: Define AMX state components and have it used for boot-time checks.

--------------------------------

The XSTATE initialization uses check_xstate_against_struct() to sanity
check the size of XSTATE-enabled features. AMX is a XSAVE-enabled feature,
and its size is not hard-coded but discoverable at run-time via CPUID.

The AMX state is composed of state components 17 and 18, which are all user
state components. The first component is the XTILECFG state of a 64-byte
tile-related control register. The state component 18, called XTILEDATA,
contains the actual tile data, and the state size varies on
implementations. The architectural maximum, as defined in the CPUID(0x1d,
1): EAX[15:0], is a byte less than 64KB. The first implementation supports
8KB.

Check the XTILEDATA state size dynamically. The feature introduces the new
tile register, TMM. Define one register struct only and read the number of
registers from CPUID. Cross-check the overall size with CPUID again.
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-21-chang.seok.bae@intel.comSigned-off-by: NLin Wang <lin.x.wang@intel.com>

af5963c9

x86/fpu/xstate: Add fpstate_realloc()/free() · cb0b181e

由 Chang S. Bae 提交于 10月 21, 2021

mainline inclusion
from mainline-v5.16-rc1
commit 500afbf6
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 500afbf6 x86/fpu/xstate: Add fpstate_realloc()/free().

--------------------------------

The fpstate embedded in struct fpu is the default state for storing the FPU
registers. It's sized so that the default supported features can be stored.
For dynamically enabled features the register buffer is too small.

The #NM handler detects first use of a feature which is disabled in the
XFD MSR. After handling permission checks it recalculates the size for
kernel space and user space state and invokes fpstate_realloc() which
tries to reallocate fpstate and install it.

Provide the allocator function which checks whether the current buffer size
is sufficient and if not allocates one. If allocation is successful the new
fpstate is initialized with the new features and sizes and the now enabled
features is removed from the task's XFD mask.

realloc_fpstate() uses vzalloc(). If use of this mechanism grows to
re-allocate buffers larger than 64KB, a more sophisticated allocation
scheme that includes purpose-built reclaim capability might be justified.
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-19-chang.seok.bae@intel.comSigned-off-by: NLin Wang <lin.x.wang@intel.com>

cb0b181e

x86/fpu/xstate: Add XFD #NM handler · b2cd63e9

由 Chang S. Bae 提交于 10月 21, 2021

mainline inclusion
from mainline-v5.16-rc1
commit 783e87b4
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 783e87b4 x86/fpu/xstate: Add XFD #NM handler.

--------------------------------

If the XFD MSR has feature bits set then #NM will be raised when user space
attempts to use an instruction related to one of these features.

When the task has no permissions to use that feature, raise SIGILL, which
is the same behavior as #UD.

If the task has permissions, calculate the new buffer size for the extended
feature set and allocate a larger fpstate. In the unlikely case that
vzalloc() fails, SIGSEGV is raised.

The allocation function will be added in the next step. Provide a stub
which fails for now.

  [ tglx: Updated serialization ]
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211021225527.10184-18-chang.seok.bae@intel.comSigned-off-by: NLin Wang <lin.x.wang@intel.com>

b2cd63e9

x86/fpu: Add XFD state to fpstate · aef806ba

由 Chang S. Bae 提交于 10月 21, 2021

mainline inclusion
from mainline-v5.16-rc1
commit 8bf26758
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 8bf26758 x86/fpu: Add XFD state to fpstate.

--------------------------------

Add storage for XFD register state to struct fpstate. This will be used to
store the XFD MSR state. This will be used for switching the XFD MSR when
FPU content is restored.

Add a per-CPU variable to cache the current MSR value so the MSR has only
to be written when the values are different.
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-15-chang.seok.bae@intel.comSigned-off-by: NLin Wang <lin.x.wang@intel.com>

aef806ba

x86/msr-index: Add MSRs for XFD · 9b2245e9

由 Chang S. Bae 提交于 10月 21, 2021

mainline inclusion
from mainline-v5.16-rc1
commit dae1bd58
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit dae1bd58 x86/msr-index: Add MSRs for XFD.

--------------------------------

XFD introduces two MSRs:

    - IA32_XFD to enable/disable a feature controlled by XFD

    - IA32_XFD_ERR to expose to the #NM trap handler which feature
      was tried to be used for the first time.

Both use the same xstate-component bitmap format, used by XCR0.
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-14-chang.seok.bae@intel.comSigned-off-by: NLin Wang <lin.x.wang@intel.com>

9b2245e9

x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit · 9736b120

由 Chang S. Bae 提交于 10月 21, 2021

mainline inclusion
from mainline-v5.16-rc1
commit c3511016
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit c3511016 x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit.

--------------------------------

Intel's eXtended Feature Disable (XFD) feature is an extension of the XSAVE
architecture. XFD allows the kernel to enable a feature state in XCR0 and
to receive a #NM trap when a task uses instructions accessing that state.

This is going to be used to postpone the allocation of a larger XSTATE
buffer for a task to the point where it is actually using a related
instruction after the permission to use that facility has been granted.

XFD is not used by the kernel, but only applied to userspace. This is a
matter of policy as the kernel knows how a fpstate is reallocated and the
XFD state.

The compacted XSAVE format is adjustable for dynamic features. Make XFD
depend on XSAVES.
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-13-chang.seok.bae@intel.comSigned-off-by: NLin Wang <lin.x.wang@intel.com>

9736b120

x86/fpu: Prepare fpu_clone() for dynamically enabled features · e8da61bf

由 Thomas Gleixner 提交于 10月 21, 2021

mainline inclusion
from mainline-v5.16-rc1
commit 9e798e9a
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 9e798e9a x86/fpu: Prepare fpu_clone() for dynamically enabled features.

--------------------------------

The default portion of the parent's FPU state is saved in a child task.
With dynamic features enabled, the non-default portion is not saved in a
child's fpstate because these register states are defined to be
caller-saved. The new task's fpstate is therefore the default buffer.

Fork inherits the permission of the parent.

Also, do not use memcpy() when TIF_NEED_FPU_LOAD is set because it is
invalid when the parent has dynamic features.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-11-chang.seok.bae@intel.comSigned-off-by: NLin Wang <lin.x.wang@intel.com>

e8da61bf

x86/fpu: Add basic helpers for dynamically enabled features · ac0b4078

由 Thomas Gleixner 提交于 10月 21, 2021

mainline inclusion
from mainline-v5.16-rc1
commit 23686ef2
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 23686ef2 x86/fpu: Add basic helpers for dynamically enabled features.

--------------------------------

To allow building up the infrastructure required to support dynamically
enabled FPU features, add:

 - XFEATURES_MASK_DYNAMIC

   This constant will hold xfeatures which can be dynamically enabled.

 - fpu_state_size_dynamic()

   A static branch for 64-bit and a simple 'return false' for 32-bit.

   This helper allows to add dynamic-feature-specific changes to common
   code which is shared between 32-bit and 64-bit without #ifdeffery.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-8-chang.seok.bae@intel.comSigned-off-by: NLin Wang <lin.x.wang@intel.com>

ac0b4078

x86/arch_prctl: Add controls for dynamic XSTATE components · e4e4c815

由 Chang S. Bae 提交于 10月 21, 2021

mainline inclusion
from mainline-v5.16-rc1
commit db8268df
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit db8268df x86/arch_prctl: Add controls for dynamic XSTATE components.

--------------------------------

Dynamically enabled XSTATE features are by default disabled for all
processes. A process has to request permission to use such a feature.

To support this implement a architecture specific prctl() with the options:

- ARCH_GET_XCOMP_SUPP

Copies the supported feature bitmap into the user space provided
u64 storage. The pointer is handed in via arg2

- ARCH_GET_XCOMP_PERM

Copies the process wide permitted feature bitmap into the user space
provided u64 storage. The pointer is handed in via arg2

- ARCH_REQ_XCOMP_PERM

Request permission for a feature set. A feature set can be mapped to a
facility, e.g. AMX, and can require one or more XSTATE components to
be enabled.

The feature argument is the number of the highest XSTATE component
which is required for a facility to work.

The request argument is not a user supplied bitmap because that makes
filtering harder (think seccomp) and even impossible because to
support 32bit tasks the argument would have to be a pointer.

The permission mechanism works this way:

Task asks for permission for a facility and kernel checks whether that's
supported. If supported it does:

1) Check whether permission has already been granted

2) Compute the size of the required kernel and user space buffer
(sigframe) size.

3) Validate that no task has a sigaltstack installed
which is smaller than the resulting sigframe size

4) Add the requested feature bit(s) to the permission bitmap of
current->group_leader->fpu and store the sizes in the group
leaders fpu struct as well.

If that is successful then the feature is still not enabled for any of the
tasks. The first usage of a related instruction will result in a #NM
trap. The trap handler validates the permission bit of the tasks group
leader and if permitted it installs a larger kernel buffer and transfers
the permission and size info to the new fpstate container which makes all
the FPU functions which require per task information aware of the extended
feature set.

[ tglx: Adopted to new base code, added missing serialization,
massaged namings, comments and changelog ]
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-7-chang.seok.bae@intel.comSigned-off-by: NLin Wang <lin.x.wang@intel.com>

e4e4c815

x86/fpu: Add fpu_state_config::legacy_features · 300f972d

由 Thomas Gleixner 提交于 10月 21, 2021

mainline inclusion
from mainline-v5.16-rc1
commit c33f0a81
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit c33f0a81 x86/fpu: Add fpu_state_config::legacy_features.

--------------------------------

The upcoming prctl() which is required to request the permission for a
dynamically enabled feature will also provide an option to retrieve the
supported features. If the CPU does not support XSAVE, the supported
features would be 0 even when the CPU supports FP and SSE.

Provide separate storage for the legacy feature set to avoid that and fill
in the bits in the legacy init function.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-6-chang.seok.bae@intel.comSigned-off-by: NLin Wang <lin.x.wang@intel.com>

300f972d

x86/fpu: Add members to struct fpu to cache permission information · 40388d86

由 Thomas Gleixner 提交于 10月 21, 2021

mainline inclusion
from mainline-v5.16-rc1
commit 6f6a7c09
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 6f6a7c09 x86/fpu: Add members to struct fpu to cache permission information.

--------------------------------

Dynamically enabled features can be requested by any thread of a running
process at any time. The request does neither enable the feature nor
allocate larger buffers. It just stores the permission to use the feature
by adding the features to the permission bitmap and by calculating the
required sizes for kernel and user space.

The reallocation of the kernel buffer happens when the feature is used
for the first time which is caught by an exception. The permission
bitmap is then checked and if the feature is permitted, then it becomes
fully enabled. If not, the task dies similarly to a task which uses an
undefined instruction.

The size information is precomputed to allow proper sigaltstack size checks
once the feature is permitted, but not yet in use because otherwise this
would open race windows where too small stacks could be installed causing
a later fail on signal delivery.

Initialize them to the default feature set and sizes.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-5-chang.seok.bae@intel.comSigned-off-by: NLin Wang <lin.x.wang@intel.com>

40388d86

x86/fpu: Remove old KVM FPU interface · f163f065

由 Thomas Gleixner 提交于 10月 22, 2021

mainline inclusion
from mainline-v5.16-rc1
commit 582b01b6
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 582b01b6 x86/fpu: Remove old KVM FPU interface.

--------------------------------

No more users.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211022185313.074853631@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>

f163f065

x86/kvm: Convert FPU handling to a single swap buffer · 55807a74

由 Thomas Gleixner 提交于 10月 22, 2021

mainline inclusion
from mainline-v5.16-rc1
commit d69c1382
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit d69c1382 x86/kvm: Convert FPU handling to a single swap buffer.

--------------------------------

For the upcoming AMX support it's necessary to do a proper integration with
KVM. Currently KVM allocates two FPU structs which are used for saving the user
state of the vCPU thread and restoring the guest state when entering
vcpu_run() and doing the reverse operation before leaving vcpu_run().

With the new fpstate mechanism this can be reduced to one extra buffer by
swapping the fpstate pointer in current::thread::fpu. This makes the
upcoming support for AMX and XFD simpler because then fpstate information
(features, sizes, xfd) are always consistent and it does not require any
nasty workarounds.

Convert the KVM FPU code over to this new scheme.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211022185313.019454292@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>

55807a74

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功