提交 · c2fe3cd4604ac87c587db05d41843d667dc43815 · openeuler / Kernel

15 11月, 2020 5 次提交

KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook · c2fe3cd4

由 Sean Christopherson 提交于 10月 06, 2020

Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(),
and invoke the new hook from kvm_valid_cr4(). This fixes an issue where
KVM_SET_SREGS would return success while failing to actually set CR4.

Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return
in __set_sregs() is not a viable option as KVM has already stuffed a
variety of vCPU state.

Note, kvm_valid_cr4() and is_valid_cr4() have different return types and
inverted semantics. This will be remedied in a future patch.

Fixes: 5e1746d6 ("KVM: nVMX: Allow setting the VMXE bit in CR4")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c2fe3cd4

KVM: SVM: Drop VMXE check from svm_set_cr4() · 311a0659

由 Sean Christopherson 提交于 10月 06, 2020

Drop svm_set_cr4()'s explicit check CR4.VMXE now that common x86 handles
the check by incorporating VMXE into the CR4 reserved bits, via
kvm_cpu_caps.  SVM obviously does not set X86_FEATURE_VMX.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-4-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

311a0659

KVM: VMX: Drop explicit 'nested' check from vmx_set_cr4() · a447e38a

由 Sean Christopherson 提交于 10月 06, 2020

Drop vmx_set_cr4()'s explicit check on the 'nested' module param now
that common x86 handles the check by incorporating VMXE into the CR4
reserved bits, via kvm_cpu_caps.  X86_FEATURE_VMX is set in kvm_cpu_caps
(by vmx_set_cpu_caps()), if and only if 'nested' is true.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-3-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a447e38a

KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4() · d3a9e414

由 Sean Christopherson 提交于 10月 06, 2020

Drop vmx_set_cr4()'s somewhat hidden guest_cpuid_has() check on VMXE now
that common x86 handles the check by incorporating VMXE into the CR4
reserved bits, i.e. in cr4_guest_rsvd_bits.  This fixes a bug where KVM
incorrectly rejects KVM_SET_SREGS with CR4.VMXE=1 if it's executed
before KVM_SET_CPUID{,2}.

Fixes: 5e1746d6 ("KVM: nVMX: Allow setting the VMXE bit in CR4")
Reported-by: NStas Sergeev <stsp@users.sourceforge.net>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d3a9e414

kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use · c887c9b9

由 Paolo Bonzini 提交于 11月 15, 2020

In some cases where shadow paging is in use, the root page will
be either mmu->pae_root or vcpu->arch.mmu->lm_root.  Then it will
not have an associated struct kvm_mmu_page, because it is allocated
with alloc_page instead of kvm_mmu_alloc_page.

Just return false quickly from is_tdp_mmu_root if the TDP MMU is
not in use, which also includes the case where shadow paging is
enabled.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c887c9b9

13 11月, 2020 6 次提交

KVM: SVM: Update cr3_lm_rsvd_bits for AMD SEV guests · 96308b06

由 Babu Moger 提交于 11月 12, 2020

For AMD SEV guests, update the cr3_lm_rsvd_bits to mask
the memory encryption bit in reserved bits.
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Message-Id: <160521948301.32054.5783800787423231162.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

96308b06

KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch · 0107973a

由 Babu Moger 提交于 11月 12, 2020

SEV guests fail to boot on a system that supports the PCID feature.

While emulating the RSM instruction, KVM reads the guest CR3
and calls kvm_set_cr3(). If the vCPU is in the long mode,
kvm_set_cr3() does a sanity check for the CR3 value. In this case,
it validates whether the value has any reserved bits set. The
reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory
encryption is enabled, the memory encryption bit is set in the CR3
value. The memory encryption bit may fall within the KVM reserved
bit range, causing the KVM emulation failure.

Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will
cache the reserved bits in the CR3 value. This will be initialized
to rsvd_bits(cpuid_maxphyaddr(vcpu), 63).

If the architecture has any special bits(like AMD SEV encryption bit)
that needs to be masked from the reserved bits, should be cleared
in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler.

Fixes: a780a3ea ("KVM: X86: Fix reserved bits check for MOV to CR3")
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Message-Id: <160521947657.32054.3264016688005356563.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0107973a

KVM: x86: clflushopt should be treated as a no-op by emulation · 51b958e5

由 David Edmondson 提交于 11月 03, 2020

The instruction emulator ignores clflush instructions, yet fails to
support clflushopt. Treat both similarly.

Fixes: 13e457e0 ("KVM: x86: Emulator does not decode clflush well")
Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
Message-Id: <20201103120400.240882-1-david.edmondson@oracle.com>
Reviewed-by: NJoao Martins <joao.m.martins@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

51b958e5

KVM: arm64: Handle SCXTNUM_ELx traps · ed4ffaf4

由 Marc Zyngier 提交于 11月 10, 2020

As the kernel never sets HCR_EL2.EnSCXT, accesses to SCXTNUM_ELx
will trap to EL2. Let's handle that as gracefully as possible
by injecting an UNDEF exception into the guest. This is consistent
with the guest's view of ID_AA64PFR0_EL1.CSV2 being at most 1.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-4-maz@kernel.org

ed4ffaf4

KVM: arm64: Unify trap handlers injecting an UNDEF · 338b1793

由 Marc Zyngier 提交于 11月 10, 2020

A large number of system register trap handlers only inject an
UNDEF exeption, and yet each class of sysreg seems to provide its
own, identical function.

Let's unify them all, saving us introducing yet another one later.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-3-maz@kernel.org

338b1793

KVM: arm64: Allow setting of ID_AA64PFR0_EL1.CSV2 from userspace · 23711a5e

由 Marc Zyngier 提交于 11月 10, 2020

We now expose ID_AA64PFR0_EL1.CSV2=1 to guests running on hosts
that are immune to Spectre-v2, but that don't have this field set,
most likely because they predate the specification.

However, this prevents the migration of guests that have started on
a host the doesn't fake this CSV2 setting to one that does, as KVM
rejects the write to ID_AA64PFR0_EL2 on the grounds that it isn't
what is already there.

In order to fix this, allow userspace to set this field as long as
this doesn't result in a promising more than what is already there
(setting CSV2 to 0 is acceptable, but setting it to 1 when it is
already set to 0 isn't).

Fixes: e1026237 ("KVM: arm64: Set CSV2 for guests on hardware unaffected by Spectre-v2")
Reported-by: NPeng Liang <liangpeng10@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-2-maz@kernel.org

23711a5e

12 11月, 2020 1 次提交

arm64: dts: fsl-ls1028a-kontron-sl28: specify in-band mode for ENETC · df392aef

由 Michael Walle 提交于 11月 09, 2020

Since commit 71b77a7a ("enetc: Migrate to PHYLINK and PCS_LYNX") the
network port of the Kontron sl28 board is broken. After the migration to
phylink the device tree has to specify the in-band-mode property. Add
it.

Fixes: 71b77a7a ("enetc: Migrate to PHYLINK and PCS_LYNX")
Suggested-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NMichael Walle <michael@walle.cc>
Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20201109110436.5906-1-michael@walle.ccSigned-off-by: NJakub Kicinski <kuba@kernel.org>

df392aef

08 11月, 2020 6 次提交

KVM: x86: handle MSR_IA32_DEBUGCTLMSR with report_ignored_msrs · 2cdef91c

由 Pankaj Gupta 提交于 11月 05, 2020

Windows2016 guest tries to enable LBR by setting the corresponding bits
in MSR_IA32_DEBUGCTLMSR. KVM does not emulate MSR_IA32_DEBUGCTLMSR and
spams the host kernel logs with error messages like:

	kvm [...]: vcpu1, guest rIP: 0xfffff800a8b687d3 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x1, nop"

This patch fixes this by enabling error logging only with
'report_ignored_msrs=1'.
Signed-off-by: NPankaj Gupta <pankaj.gupta@cloud.ionos.com>
Message-Id: <20201105153932.24316-1-pankaj.gupta.linux@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2cdef91c

kvm: x86: request masterclock update any time guest uses different msr · 1e293d1a

由 Oliver Upton 提交于 10月 27, 2020

Commit 5b9bb0eb ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME)
emulation in helper fn", 2020-10-21) subtly changed the behavior of guest
writes to MSR_KVM_SYSTEM_TIME(_NEW). Restore the previous behavior; update
the masterclock any time the guest uses a different msr than before.

Fixes: 5b9bb0eb ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME) emulation in helper fn", 2020-10-21)
Signed-off-by: NOliver Upton <oupton@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Message-Id: <20201027231044.655110-6-oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1e293d1a

kvm: x86: ensure pv_cpuid.features is initialized when enabling cap · 01b4f510

由 Oliver Upton 提交于 10月 27, 2020

Make the paravirtual cpuid enforcement mechanism idempotent to ioctl()
ordering by updating pv_cpuid.features whenever userspace requests the
capability. Extract this update out of kvm_update_cpuid_runtime() into a
new helper function and move its other call site into
kvm_vcpu_after_set_cpuid() where it more likely belongs.

Fixes: 66570e96 ("kvm: x86: only provide PV features if enabled in guest's CPUID")
Signed-off-by: NOliver Upton <oupton@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Message-Id: <20201027231044.655110-5-oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

01b4f510

kvm: x86: reads of restricted pv msrs should also result in #GP · 1930e5dd

由 Oliver Upton 提交于 10月 27, 2020

commit 66570e96 ("kvm: x86: only provide PV features if enabled in
guest's CPUID") only protects against disallowed guest writes to KVM
paravirtual msrs, leaving msr reads unchecked. Fix this by enforcing
KVM_CPUID_FEATURES for msr reads as well.

Fixes: 66570e96 ("kvm: x86: only provide PV features if enabled in guest's CPUID")
Signed-off-by: NOliver Upton <oupton@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Message-Id: <20201027231044.655110-4-oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1930e5dd

KVM: x86: use positive error values for msr emulation that causes #GP · cc4cb017

由 Maxim Levitsky 提交于 11月 01, 2020

Recent introduction of the userspace msr filtering added code that uses
negative error codes for cases that result in either #GP delivery to
the guest, or handled by the userspace msr filtering.

This breaks an assumption that a negative error code returned from the
msr emulation code is a semi-fatal error which should be returned
to userspace via KVM_RUN ioctl and usually kill the guest.

Fix this by reusing the already existing KVM_MSR_RET_INVALID error code,
and by adding a new KVM_MSR_RET_FILTERED error code for the
userspace filtered msrs.

Fixes: 291f35fb2c1d1 ("KVM: x86: report negative values from wrmsr emulation to userspace")
Reported-by: NQian Cai <cai@redhat.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201101115523.115780-1-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cc4cb017

KVM: x86/mmu: fix counting of rmap entries in pte_list_add · c6c4f961

由 Li RongQing 提交于 9月 27, 2020

Fix an off-by-one style bug in pte_list_add() where it failed to
account the last full set of SPTEs, i.e. when desc->sptes is full
and desc->more is NULL.

Merge the two "PTE_LIST_EXT-1" checks as part of the fix to avoid
an extra comparison.
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <1601196297-24104-1-git-send-email-lirongqing@baidu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c6c4f961

07 11月, 2020 8 次提交

x86/platform/uv: Recognize UV5 hubless system identifier · 801284f9

由 Mike Travis 提交于 11月 05, 2020

Testing shows a problem in that UV5 hubless systems were not being
recognized.  Add them to the list of OEM IDs checked.

Fixes: 6c779442 ("Add UV5 direct references")
Signed-off-by: NMike Travis <mike.travis@hpe.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201105222741.157029-4-mike.travis@hpe.com

801284f9

x86/platform/uv: Remove spaces from OEM IDs · 1aee505e

由 Mike Travis 提交于 11月 05, 2020

Testing shows that trailing spaces caused problems with the OEM_ID and
the OEM_TABLE_ID.  One being that the OEM_ID would not string compare
correctly.  Another the OEM_ID and OEM_TABLE_ID would be concatenated
in the printout.  Remove any trailing spaces.

Fixes: 1e61f5a9 ("Add and decode Arch Type in UVsystab")
Signed-off-by: NMike Travis <mike.travis@hpe.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201105222741.157029-3-mike.travis@hpe.com

1aee505e

x86/platform/uv: Fix missing OEM_TABLE_ID · 1aec69ae

由 Mike Travis 提交于 11月 05, 2020

Testing shows a problem in that the OEM_TABLE_ID was missing for
hubless systems. This is used to determine the APIC type (legacy or
extended). Add the OEM_TABLE_ID to the early hubless processing.

Fixes: 1e61f5a9 ("Add and decode Arch Type in UVsystab")
Signed-off-by: NMike Travis <mike.travis@hpe.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201105222741.157029-2-mike.travis@hpe.com

1aec69ae

KVM: arm64: Remove AA64ZFR0_EL1 accessors · c512298e

由 Andrew Jones 提交于 11月 05, 2020

The AA64ZFR0_EL1 accessors are just the general accessors with
its visibility function open-coded. It also skips the if-else
chain in read_id_reg, but there's no reason not to go there.
Indeed consolidating ID register accessors and removing lines
of code make it worthwhile.

Remove the AA64ZFR0_EL1 accessors, replacing them with the
general accessors for sanitized ID registers.

No functional change intended.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201105091022.15373-5-drjones@redhat.com

c512298e

KVM: arm64: Check RAZ visibility in ID register accessors · 912dee57

由 Andrew Jones 提交于 11月 05, 2020

The instruction encodings of ID registers are preallocated. Until an
encoding is assigned a purpose the register is RAZ. KVM's general ID
register accessor functions already support both paths, RAZ or not.
If for each ID register we can determine if it's RAZ or not, then all
ID registers can build on the general functions. The register visibility
function allows us to check whether a register should be completely
hidden or not, extending it to also report when the register should
be RAZ or not allows us to use it for ID registers as well.

Check for RAZ visibility in the ID register accessor functions,
allowing the RAZ case to be handled in a generic way for all system
registers.

The new REG_RAZ flag will be used in a later patch. This patch has
no intended functional change.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201105091022.15373-4-drjones@redhat.com

912dee57

KVM: arm64: Consolidate REG_HIDDEN_GUEST/USER · 01fe5ace

由 Andrew Jones 提交于 11月 05, 2020

REG_HIDDEN_GUEST and REG_HIDDEN_USER are always used together.
Consolidate them into a single REG_HIDDEN flag. We can always
add another flag later if some register needs to expose itself
differently to the guest than it does to userspace.

No functional change intended.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201105091022.15373-3-drjones@redhat.com

01fe5ace

KVM: arm64: Don't hide ID registers from userspace · f81cb2c3

由 Andrew Jones 提交于 11月 05, 2020

ID registers are RAZ until they've been allocated a purpose, but
that doesn't mean they should be removed from the KVM_GET_REG_LIST
list. So far we only have one register, SYS_ID_AA64ZFR0_EL1, that
is hidden from userspace when its function, SVE, is not present.

Expose SYS_ID_AA64ZFR0_EL1 to userspace as RAZ when SVE is not
implemented. Removing the userspace visibility checks is enough
to reexpose it, as it will already return zero to userspace when
SVE is not present. The register already behaves as RAZ for the
guest when SVE is not present.

Fixes: 73433762 ("KVM: arm64/sve: System register context switch and access support")
Reported-by: 张东旭 <xu910121@sina.com>
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org#v5.2+
Link: https://lore.kernel.org/r/20201105091022.15373-2-drjones@redhat.com

f81cb2c3

KVM: arm64: Fix build error in user_mem_abort() · faf00039

由 Gavin Shan 提交于 11月 03, 2020

The PUD and PMD are folded into PGD when the following options are
enabled. In that case, PUD_SHIFT is equal to PMD_SHIFT and we fail
to build with the indicated errors:

   CONFIG_ARM64_VA_BITS_42=y
   CONFIG_ARM64_PAGE_SHIFT=16
   CONFIG_PGTABLE_LEVELS=3

   arch/arm64/kvm/mmu.c: In function ‘user_mem_abort’:
   arch/arm64/kvm/mmu.c:798:2: error: duplicate case value
     case PMD_SHIFT:
     ^~~~
   arch/arm64/kvm/mmu.c:791:2: note: previously used here
     case PUD_SHIFT:
     ^~~~

This fixes the issue by skipping the check on PUD huge page when PUD
and PMD are folded into PGD.

Fixes: 2f40c460 ("KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes")
Reported-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NGavin Shan <gshan@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201103003009.32955-1-gshan@redhat.com

faf00039

06 11月, 2020 9 次提交

RISC-V: Fix the VDSO symbol generaton for binutils-2.35+ · c2c81bb2

由 Palmer Dabbelt 提交于 10月 23, 2020

We were relying on GNU ld's ability to re-link executable files in order
to extract our VDSO symbols.  This behavior was deemed a bug as of
binutils-2.35 (specifically the binutils-gdb commit a87e1817a4 ("Have
the linker fail if any attempt to link in an executable is made."), but
as that has been backported to at least Debian's binutils-2.34 in may
manifest in other places.

The previous version of this was a bit of a mess: we were linking a
static executable version of the VDSO, containing only a subset of the
input symbols, which we then linked into the kernel.  This worked, but
certainly wasn't a supported path through the toolchain.  Instead this
new version parses the textual output of nm to produce a symbol table.
Both rely on near-zero addresses being linkable, but as we rely on weak
undefined symbols being linkable elsewhere I don't view this as a major
issue.

Fixes: e2c0cdfb ("RISC-V: User-facing API")
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

c2c81bb2

RISC-V: Use non-PGD mappings for early DTB access · 1074dd44

由 Anup Patel 提交于 11月 04, 2020

Currently, we use PGD mappings for early DTB mapping in early_pgd
but this breaks Linux kernel on SiFive Unleashed because on SiFive
Unleashed PMP checks don't work correctly for PGD mappings.

To fix early DTB mappings on SiFive Unleashed, we use non-PGD
mappings (i.e. PMD) for early DTB access.

Fixes: 8f3a2b4a ("RISC-V: Move DT mapping outof fixmap")
Signed-off-by: NAnup Patel <anup.patel@wdc.com>
Reviewed-by: NAtish Patra <atish.patra@wdc.com>
Tested-by: NAtish Patra <atish.patra@wdc.com>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

1074dd44

riscv: uaccess: fix __put_kernel_nofault() · 635e3f3e

由 Changbin Du 提交于 11月 02, 2020

The copy_from_kernel_nofault() is broken on riscv because the 'dst' and
'src' are mistakenly reversed in __put_kernel_nofault() macro.

copy_to_kernel_nofault:
...
0xffffffe0003159b8 <+30>:    sd      a4,0(a1) # a1 aka 'src'

Fixes: d464118c ("riscv: implement __get_kernel_nofault and __put_user_nofault")
Signed-off-by: NChangbin Du <changbin.du@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAnup Patel <anup@brainfault.org>
Tested-by: NAnup Patel <anup@brainfault.org>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

635e3f3e

riscv: fix pfn_to_virt err in do_page_fault(). · bcacf5f6

由 Liu Shaohua 提交于 10月 26, 2020

The argument to pfn_to_virt() should be pfn not the value of CSR_SATP.
Reviewed-by: NPalmer Dabbelt <palmerdabbelt@google.com>
Reviewed-by: NAnup Patel <anup@brainfault.org>
Signed-off-by: Nliush <liush@allwinnertech.com>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

bcacf5f6

powerpc/numa: Fix build when CONFIG_NUMA=n · 3fb4a8fa

由 Scott Cheloha 提交于 11月 05, 2020

Add a non-NUMA definition for of_drconf_to_nid_single() to topology.h
so we have one even if powerpc/mm/numa.c is not compiled. On a
non-NUMA kernel the appropriate node id is always first_online_node.

Fixes: 72cdd117 ("pseries/hotplug-memory: hot-add: skip redundant LMB lookup")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NScott Cheloha <cheloha@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201105223040.3612663-1-cheloha@linux.ibm.com

3fb4a8fa

riscv: Set text_offset correctly for M-Mode · 79605f13

由 Sean Anderson 提交于 10月 22, 2020

M-Mode Linux is loaded at the start of RAM, not 2MB later. Perhaps this
should be calculated based on PAGE_OFFSET somehow? Even better would be to
deprecate text_offset and instead introduce something absolute.
Signed-off-by: NSean Anderson <seanga2@gmail.com>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

79605f13

arm64: kexec_file: try more regions if loading segments fails · 108aa503

由 Benjamin Gwin 提交于 11月 03, 2020

It's possible that the first region picked for the new kernel will make
it impossible to fit the other segments in the required 32GB window,
especially if we have a very large initrd.

Instead of giving up, we can keep testing other regions for the kernel
until we find one that works.
Suggested-by: NRyan O'Leary <ryanoleary@google.com>
Signed-off-by: NBenjamin Gwin <bgwin@google.com>
Link: https://lore.kernel.org/r/20201103201106.2397844-1-bgwin@google.comSigned-off-by: NWill Deacon <will@kernel.org>

108aa503

x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP · 1978b3a5

由 Anand K Mistry 提交于 11月 05, 2020

On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON,
STIBP is set to on and

  spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED

At the same time, IBPB can be set to conditional.

However, this leads to the case where it's impossible to turn on IBPB
for a process because in the PR_SPEC_DISABLE case in ib_prctl_set() the

  spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED

condition leads to a return before the task flag is set. Similarly,
ib_prctl_get() will return PR_SPEC_DISABLE even though IBPB is set to
conditional.

More generally, the following cases are possible:

1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb
2. STIBP = on && IBPB = conditional for AMD CPUs with
   X86_FEATURE_AMD_STIBP_ALWAYS_ON

The first case functions correctly today, but only because
spectre_v2_user_ibpb isn't updated to reflect the IBPB mode.

At a high level, this change does one thing. If either STIBP or IBPB
is set to conditional, allow the prctl to change the task flag.
Also, reflect that capability when querying the state. This isn't
perfect since it doesn't take into account if only STIBP or IBPB is
unconditionally on. But it allows the conditional feature to work as
expected, without affecting the unconditional one.

 [ bp: Massage commit message and comment; space out statements for
   better readability. ]

Fixes: 21998a35 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
Signed-off-by: NAnand K Mistry <amistry@google.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NTom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid

1978b3a5

RISC-V: Remove any memblock representing unusable memory area · 1bd14a66

由 Atish Patra 提交于 10月 07, 2020

RISC-V limits the physical memory size by -PAGE_OFFSET. Any memory beyond
that size from DRAM start is unusable. Just remove any memblock pointing
to those memory region without worrying about computing the maximum size.
Signed-off-by: NAtish Patra <atish.patra@wdc.com>
Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

1bd14a66

05 11月, 2020 5 次提交

powerpc/8xx: Manage _PAGE_ACCESSED through APG bits in L1 entry · 33fe43cf

由 Christophe Leroy 提交于 10月 12, 2020

When _PAGE_ACCESSED is not set, a minor fault is expected.
To do this, TLB miss exception ANDs _PAGE_PRESENT and _PAGE_ACCESSED
into the L2 entry valid bit.

To simplify the processing and reduce the number of instructions in
TLB miss exceptions, manage it as an APG bit and get it next to
_PAGE_GUARDED bit to allow a copy in one go. Then declare the
corresponding groups as handling all accesses as user accesses.
As the PP bits always define user as No Access, it will generate
a fault.
Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/80f488db230c6b0e7b3b990d72bd94a8a069e93e.1602492856.git.christophe.leroy@csgroup.eu

33fe43cf

powerpc/8xx: Always fault when _PAGE_ACCESSED is not set · 29daf869

由 Christophe Leroy 提交于 10月 12, 2020

The kernel expects pte_young() to work regardless of CONFIG_SWAP.

Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.

This adds at least 3 instructions to the TLB miss exception
handlers fast path. Following patch will reduce this overhead.

Also update the rotation instruction to the correct number of bits
to reflect all changes done to _PAGE_ACCESSED over time.

Fixes: d069cb43 ("powerpc/8xx: Don't touch ACCESSED when no SWAP.")
Fixes: 5f356497 ("powerpc/8xx: remove unused _PAGE_WRITETHRU")
Fixes: e0a8e0d9 ("powerpc/8xx: Handle PAGE_USER via APG bits")
Fixes: 5b2753fc ("powerpc/8xx: Implementation of PAGE_EXEC")
Fixes: a891c43b ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.")
Cc: stable@vger.kernel.org
Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.1602492856.git.christophe.leroy@csgroup.eu

29daf869

powerpc/40x: Always fault when _PAGE_ACCESSED is not set · 0540b0d2

由 Christophe Leroy 提交于 10月 10, 2020

The kernel expects pte_young() to work regardless of CONFIG_SWAP.

Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.

Fixes: 2c74e258 ("powerpc/40x: Rework 40x PTE access and TLB miss")
Cc: stable@vger.kernel.org
Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b02ca2ed2d3676a096219b48c0f69ec982a75bcf.1602342801.git.christophe.leroy@csgroup.eu

0540b0d2

powerpc/603: Always fault when _PAGE_ACCESSED is not set · 11522448

由 Christophe Leroy 提交于 10月 10, 2020

The kernel expects pte_young() to work regardless of CONFIG_SWAP.

Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.

Fixes: 84de6ab0 ("powerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.")
Cc: stable@vger.kernel.org
Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a44367744de54e2315b2f1a8cbbd7f88488072e0.1602342806.git.christophe.leroy@csgroup.eu

11522448

powerpc: Use asm_goto_volatile for put_user() · 1344a232

由 Michael Ellerman 提交于 11月 04, 2020

Andreas reported that commit ee0a49a6 ("powerpc/uaccess: Switch
__put_user_size_allowed() to __put_user_asm_goto()") broke
CLONE_CHILD_SETTID.

Further inspection showed that the put_user() in schedule_tail() was
missing entirely, the store not emitted by the compiler.

  <.schedule_tail>:
    mflr    r0
    std     r0,16(r1)
    stdu    r1,-112(r1)
    bl      <.finish_task_switch>
    ld      r9,2496(r3)
    cmpdi   cr7,r9,0
    bne     cr7,<.schedule_tail+0x60>
    ld      r3,392(r13)
    ld      r9,1392(r3)
    cmpdi   cr7,r9,0
    beq     cr7,<.schedule_tail+0x3c>
    li      r4,0
    li      r5,0
    bl      <.__task_pid_nr_ns>
    nop
    bl      <.calculate_sigpending>
    nop
    addi    r1,r1,112
    ld      r0,16(r1)
    mtlr    r0
    blr
    nop
    nop
    nop
    bl      <.__balance_callback>
    b       <.schedule_tail+0x1c>

Notice there are no stores other than to the stack. There should be a
stw in there for the store to current->set_child_tid.

This is only seen with GCC 4.9 era compilers (tested with 4.9.3 and
4.9.4), and only when CONFIG_PPC_KUAP is disabled.

When CONFIG_PPC_KUAP=y, the inline asm that's part of the isync()
and mtspr() inlined via allow_user_access() seems to be enough to
avoid the bug.

We already have a macro to work around this (or a similar bug), called
asm_volatile_goto which includes an empty asm block to tickle the
compiler into generating the right code. So use that.

With this applied the code generation looks more like it will work:

  <.schedule_tail>:
    mflr    r0
    std     r31,-8(r1)
    std     r0,16(r1)
    stdu    r1,-144(r1)
    std     r3,112(r1)
    bl      <._mcount>
    nop
    ld      r3,112(r1)
    bl      <.finish_task_switch>
    ld      r9,2624(r3)
    cmpdi   cr7,r9,0
    bne     cr7,<.schedule_tail+0xa0>
    ld      r3,2408(r13)
    ld      r31,1856(r3)
    cmpdi   cr7,r31,0
    beq     cr7,<.schedule_tail+0x80>
    li      r4,0
    li      r5,0
    bl      <.__task_pid_nr_ns>
    nop
    li      r9,-1
    clrldi  r9,r9,12
    cmpld   cr7,r31,r9
    bgt     cr7,<.schedule_tail+0x80>
    lis     r9,16
    rldicr  r9,r9,32,31
    subf    r9,r31,r9
    cmpldi  cr7,r9,3
    ble     cr7,<.schedule_tail+0x80>
    li      r9,0
    stw     r3,0(r31)				<-- stw
    nop
    bl      <.calculate_sigpending>
    nop
    addi    r1,r1,144
    ld      r0,16(r1)
    ld      r31,-8(r1)
    mtlr    r0
    blr
    nop
    bl      <.__balance_callback>
    b       <.schedule_tail+0x30>

Fixes: ee0a49a6 ("powerpc/uaccess: Switch __put_user_size_allowed() to __put_user_asm_goto()")
Reported-by: NAndreas Schwab <schwab@linux-m68k.org>
Tested-by: NAndreas Schwab <schwab@linux-m68k.org>
Suggested-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201104111742.672142-1-mpe@ellerman.id.au

1344a232

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功