提交 · ce6f8f02f9f6786355fa6c79d88b839639dd75d8 · openeuler / Kernel

16 5月, 2020 7 次提交

KVM: arm64: Use cpus_have_final_cap for has_vhe() · ce6f8f02

由 Marc Zyngier 提交于 5月 13, 2020

By the time we start using the has_vhe() helper, we have long
discovered whether we are running VHE or not. It thus makes
sense to use cpus_have_final_cap() instead of cpus_have_const_cap(),
which leads to a small text size reduction.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NDavid Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20200513103828.74580-1-maz@kernel.org

ce6f8f02

KVM: arm64: Simplify __kvm_timer_set_cntvoff implementation · c6fe89ff

由 Marc Zyngier 提交于 5月 13, 2020

Now that this function isn't constrained by the 32bit PCS,
let's simplify it by taking a single 64bit offset instead
of two 32bit parameters.
Signed-off-by: NMarc Zyngier <maz@kernel.org>

c6fe89ff

KVM: arm64: Clean up kvm makefiles · 25357de0

由 Fuad Tabba 提交于 5月 05, 2020

Consolidate references to the CONFIG_KVM configuration item to encompass
entire folders rather than per line.
Signed-off-by: NFuad Tabba <tabba@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200505154520.194120-5-tabba@google.com

25357de0

KVM: arm64: Change CONFIG_KVM to a menuconfig entry · f2613362

由 Will Deacon 提交于 5月 05, 2020

Changing CONFIG_KVM to be a 'menuconfig' entry in Kconfig mean that we
can straightforwardly enumerate optional features, such as the virtual
PMU device as dependent options.
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NFuad Tabba <tabba@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200505154520.194120-4-tabba@google.com

f2613362

KVM: arm64: Update help text · bf7bc1df

由 Will Deacon 提交于 5月 05, 2020

arm64 KVM supports 16k pages since 02e0b760
("arm64: kvm: Add support for 16K pages"), so update the Kconfig help
text accordingly.
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NFuad Tabba <tabba@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200505154520.194120-3-tabba@google.com

bf7bc1df

KVM: arm64: Kill off CONFIG_KVM_ARM_HOST · d82755b2

由 Will Deacon 提交于 5月 05, 2020

CONFIG_KVM_ARM_HOST is just a proxy for CONFIG_KVM, so remove it in favour
of the latter.
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NFuad Tabba <tabba@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200505154520.194120-2-tabba@google.com

d82755b2

KVM: arm64: Move virt/kvm/arm to arch/arm64 · 9ed24f4b

由 Marc Zyngier 提交于 5月 13, 2020

Now that the 32bit KVM/arm host is a distant memory, let's move the
whole of the KVM/arm64 code into the arm64 tree.

As they said in the song: Welcome Home (Sanitarium).
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200513104034.74741-1-maz@kernel.org

9ed24f4b

08 5月, 2020 1 次提交

arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory() · 996ed22c

由 Janakarajan Natarajan 提交于 5月 07, 2020

When trying to lock read-only pages, sev_pin_memory() fails because
FOLL_WRITE is used as the flag for get_user_pages_fast().

Commit 73b0140b ("mm/gup: change GUP fast to use flags rather than a
write 'bool'") updated the get_user_pages_fast() call sites to use
flags, but incorrectly updated the call in sev_pin_memory().  As the
original coding of this call was correct, revert the change made by that
commit.

Fixes: 73b0140b ("mm/gup: change GUP fast to use flags rather than a write 'bool'")
Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Link: http://lkml.kernel.org/r/20200423152419.87202-1-Janakarajan.Natarajan@amd.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

996ed22c

07 5月, 2020 2 次提交

arm64: hugetlb: avoid potential NULL dereference · 027d0c71

由 Mark Rutland 提交于 5月 05, 2020

The static analyzer in GCC 10 spotted that in huge_pte_alloc() we may
pass a NULL pmdp into pte_alloc_map() when pmd_alloc() returns NULL:

|   CC      arch/arm64/mm/pageattr.o
|   CC      arch/arm64/mm/hugetlbpage.o
|                  from arch/arm64/mm/hugetlbpage.c:10:
| arch/arm64/mm/hugetlbpage.c: In function ‘huge_pte_alloc’:
| ./arch/arm64/include/asm/pgtable-types.h:28:24: warning: dereference of NULL ‘pmdp’ [CWE-690] [-Wanalyzer-null-dereference]
| ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’
| arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’
|     |arch/arm64/mm/hugetlbpage.c:232:10:
|     |./arch/arm64/include/asm/pgtable-types.h:28:24:
| ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’
| arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’

This can only occur when the kernel cannot allocate a page, and so is
unlikely to happen in practice before other systems start failing.

We can avoid this by bailing out if pmd_alloc() fails, as we do earlier
in the function if pud_alloc() fails.

Fixes: 66b3923a ("arm64: hugetlb: add support for PTE contiguous bit")
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reported-by: NKyrill Tkachov <kyrylo.tkachov@arm.com>
Cc: <stable@vger.kernel.org> # 4.5.x-
Cc: Will Deacon <will@kernel.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

027d0c71

ARM: futex: Address build warning · 8101b5a1

由 Thomas Gleixner 提交于 4月 14, 2020

Stephen reported the following build warning on a ARM multi_v7_defconfig
build with GCC 9.2.1:

kernel/futex.c: In function 'do_futex':
kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized]
 1676 |   return oldval == cmparg;
      |          ~~~~~~~^~~~~~~~~
kernel/futex.c:1652:6: note: 'oldval' was declared here
 1652 |  int oldval, ret;
      |      ^~~~~~

introduced by commit a08971e9 ("futex: arch_futex_atomic_op_inuser()
calling conventions change").

While that change should not make any difference it confuses GCC which
fails to work out that oldval is not referenced when the return value is
not zero.

GCC fails to properly analyze arch_futex_atomic_op_inuser(). It's not the
early return, the issue is with the assembly macros. GCC fails to detect
that those either set 'ret' to 0 and set oldval or set 'ret' to -EFAULT
which makes oldval uninteresting. The store to the callsite supplied oldval
pointer is conditional on ret == 0.

The straight forward way to solve this is to make the store unconditional.

Aside of addressing the build warning this makes sense anyway because it
removes the conditional from the fastpath. In the error case the stored
value is uninteresting and the extra store does not matter at all.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/87pncao2ph.fsf@nanos.tec.linutronix.de

8101b5a1

06 5月, 2020 4 次提交

KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly · 495907ec

由 Peter Xu 提交于 5月 05, 2020

KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared
as supported.  My wild guess is that userspaces like QEMU are using "#ifdef
KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be
wrong because the compilation host may not be the runtime host.

The userspace might still want to keep the old "#ifdef" though to not break the
guest debug on old kernels.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20200505154750.126300-1-peterx@redhat.com>
[Do the same for PPC and s390. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

495907ec

kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits · 139f7425

由 Paolo Bonzini 提交于 5月 05, 2020

Using CPUID data can be useful for the processor compatibility
check, but that's it.  Using it to compute guest-reserved bits
can have both false positives (such as LA57 and UMIP which we
are already handling) and false negatives: in particular, with
this patch we don't allow anymore a KVM guest to set CR4.PKE
when CR4.PKE is clear on the host.

Fixes: b9dd21e1 ("KVM: x86: simplify handling of PKRU")
Reported-by: NJim Mattson <jmattson@google.com>
Tested-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

139f7425

KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path · c7cb2d65

由 Sean Christopherson 提交于 5月 05, 2020

Clear CF and ZF in the VM-Exit path after doing __FILL_RETURN_BUFFER so
that KVM doesn't interpret clobbered RFLAGS as a VM-Fail.  Filling the
RSB has always clobbered RFLAGS, its current incarnation just happens
clear CF and ZF in the processs.  Relying on the macro to clear CF and
ZF is extremely fragile, e.g. commit 089dd8e5 ("x86/speculation:
Change FILL_RETURN_BUFFER to work with objtool") tweaks the loop such
that the ZF flag is always set.
Reported-by: NQian Cai <cai@lca.pw>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Fixes: f2fde6a5 ("KVM: VMX: Move RSB stuffing to before the first RET after VM-Exit")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200506035355.2242-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c7cb2d65

RISC-V: Remove unused code from STRICT_KERNEL_RWX · 73cb8e2a

由 Atish Patra 提交于 5月 03, 2020

This patch removes the unused functions set_kernel_text_rw/ro.
Currently, it is not being invoked from anywhere and no other architecture
(except arm) uses this code. Even in ARM, these functions are not invoked
from anywhere currently.

Fixes: d27c3c90 ("riscv: add STRICT_KERNEL_RWX support")
Signed-off-by: NAtish Patra <atish.patra@wdc.com>
Reviewed-by: NZong Li <zong.li@sifive.com>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

73cb8e2a

05 5月, 2020 9 次提交

KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction · 5615e74f

由 Christian Borntraeger 提交于 5月 05, 2020

In LPAR we will only get an intercept for FC==3 for the PQAP
instruction. Running nested under z/VM can result in other intercepts as
well as ECA_APIE is an effective bit: If one hypervisor layer has
turned this bit off, the end result will be that we will get intercepts for
all function codes. Usually the first one will be a query like PQAP(QCI).
So the WARN_ON_ONCE is not right. Let us simply remove it.

Cc: Pierre Morel <pmorel@linux.ibm.com>
Cc: Tony Krowiak <akrowiak@linux.ibm.com>
Cc: stable@vger.kernel.org # v5.3+
Fixes: e5282de9 ("s390: ap: kvm: add PQAP interception for AQIC")
Link: https://lore.kernel.org/kvm/20200505083515.2720-1-borntraeger@de.ibm.comReported-by: NQian Cai <cailca@icloud.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

5615e74f

riscv: force __cpu_up_ variables to put in data section · d6d51612

由 Zong Li 提交于 5月 04, 2020

Put __cpu_up_stack_pointer and __cpu_up_task_pointer in data section.
Currently, these two variables are put in bss section, there is a
potential risk that secondary harts get the uninitialized value before
main hart finishing the bss clearing. In this case, all secondary
harts would pass the waiting loop and enable the MMU before main hart
set up the page table.

This issue happens on random booting of multiple harts, which means
it will manifest for BBL and OpenSBI v0.6 (or older version). In OpenSBI
v0.7 (or higher version), we have HSM extension so all the secondary harts
are brought-up by Linux kernel in an orderly fashion. This means we don't
need this change for OpenSBI v0.7 (or higher version).
Signed-off-by: NZong Li <zong.li@sifive.com>
Reviewed-by: NGreentime Hu <greentime.hu@sifive.com>
Reviewed-by: NAnup Patel <anup@brainfault.org>
Reviewed-by: NAtish Patra <atish.patra@wdc.com>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

d6d51612

riscv: add Linux note to vdso · 0a9f2a61

由 Andreas Schwab 提交于 4月 27, 2020

The Linux note in the vdso allows glibc to check the running kernel
version without having to issue the uname syscall.
Signed-off-by: NAndreas Schwab <schwab@suse.de>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

0a9f2a61

riscv: set max_pfn to the PFN of the last page · c749bb2d

由 Vincent Chen 提交于 4月 27, 2020

The current max_pfn equals to zero. In this case, I found it caused users
cannot get some page information through /proc such as kpagecount in v5.6
kernel because of new sanity checks. The following message is displayed by
stress-ng test suite with the command "stress-ng --verbose --physpage 1 -t
1" on HiFive unleashed board.

 # stress-ng --verbose --physpage 1 -t 1
 stress-ng: debug: [109] 4 processors online, 4 processors configured
 stress-ng: info: [109] dispatching hogs: 1 physpage
 stress-ng: debug: [109] cache allocate: reducing cache level from L3 (too high) to L0
 stress-ng: debug: [109] get_cpu_cache: invalid cache_level: 0
 stress-ng: info: [109] cache allocate: using built-in defaults as no suitable cache found
 stress-ng: debug: [109] cache allocate: default cache size: 2048K
 stress-ng: debug: [109] starting stressors
 stress-ng: debug: [109] 1 stressor spawned
 stress-ng: debug: [110] stress-ng-physpage: started [110] (instance 0)
 stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd34de000 in /proc/kpagecount, errno=0 (Success)
 stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd32db078 in /proc/kpagecount, errno=0 (Success)
 ...
 stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd32db078 in /proc/kpagecount, errno=0 (Success)
 stress-ng: debug: [110] stress-ng-physpage: exited [110] (instance 0)
 stress-ng: debug: [109] process [110] terminated
 stress-ng: info: [109] successful run completed in 1.00s
 #

After applying this patch, the kernel can pass the test.

 # stress-ng --verbose --physpage 1 -t 1
 stress-ng: debug: [104] 4 processors online, 4 processors configured stress-ng: info: [104] dispatching hogs: 1 physpage
 stress-ng: info: [104] cache allocate: using defaults, can't determine cache details from sysfs
 stress-ng: debug: [104] cache allocate: default cache size: 2048K
 stress-ng: debug: [104] starting stressors
 stress-ng: debug: [104] 1 stressor spawned
 stress-ng: debug: [105] stress-ng-physpage: started [105] (instance 0) stress-ng: debug: [105] stress-ng-physpage: exited [105] (instance 0) stress-ng: debug: [104] process [105] terminated
 stress-ng: info: [104] successful run completed in 1.01s
 #

Cc: stable@vger.kernel.org
Signed-off-by: NVincent Chen <vincent.chen@sifive.com>
Reviewed-by: NAnup Patel <anup@brainfault.org>
Reviewed-by: NYash Shah <yash.shah@sifive.com>
Tested-by: NYash Shah <yash.shah@sifive.com>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

c749bb2d

RISC-V: Remove N-extension related defines · a2da5b18

由 Anup Patel 提交于 4月 24, 2020

The RISC-V N-extension is still in draft state hence remove
N-extension related defines from asm/csr.h.
Signed-off-by: NAnup Patel <anup.patel@wdc.com>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

a2da5b18

RISC-V: Add bitmap reprensenting ISA features common across CPUs · 6bcff515

由 Anup Patel 提交于 4月 24, 2020

This patch adds riscv_isa bitmap which represents Host ISA features
common across all Host CPUs. The riscv_isa is not same as elf_hwcap
because elf_hwcap will only have ISA features relevant for user-space
apps whereas riscv_isa will have ISA features relevant to both kernel
and user-space apps.

One of the use-case for riscv_isa bitmap is in KVM hypervisor where
we will use it to do following operations:

1. Check whether hypervisor extension is available
2. Find ISA features that need to be virtualized (e.g. floating
   point support, vector extension, etc.)
Signed-off-by: NAnup Patel <anup.patel@wdc.com>
Signed-off-by: NAtish Patra <atish.patra@wdc.com>
Reviewed-by: NAlexander Graf <graf@amazon.com>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

6bcff515

RISC-V: Export riscv_cpuid_to_hartid_mask() API · 7391efa4

由 Anup Patel 提交于 4月 24, 2020

The riscv_cpuid_to_hartid_mask() API should be exported to allow
building KVM RISC-V as loadable module.
Signed-off-by: NAnup Patel <anup.patel@wdc.com>
Reviewed-by: NPalmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

7391efa4

kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts · 8be8f932

由 Paolo Bonzini 提交于 5月 04, 2020

Commit f458d039 ("kvm: ioapic: Lazy update IOAPIC EOI") introduces
the following infinite loop:

BUG: stack guard page was hit at 000000008f595917 \
(stack is 00000000bdefe5a4..00000000ae2b06f5)
kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI
RIP: 0010:kvm_set_irq+0x51/0x160 [kvm]
Call Trace:
 irqfd_resampler_ack+0x32/0x90 [kvm]
 kvm_notify_acked_irq+0x62/0xd0 [kvm]
 kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm]
 ioapic_set_irq+0x20e/0x240 [kvm]
 kvm_ioapic_set_irq+0x5c/0x80 [kvm]
 kvm_set_irq+0xbb/0x160 [kvm]
 ? kvm_hv_set_sint+0x20/0x20 [kvm]
 irqfd_resampler_ack+0x32/0x90 [kvm]
 kvm_notify_acked_irq+0x62/0xd0 [kvm]
 kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm]
 ioapic_set_irq+0x20e/0x240 [kvm]
 kvm_ioapic_set_irq+0x5c/0x80 [kvm]
 kvm_set_irq+0xbb/0x160 [kvm]
 ? kvm_hv_set_sint+0x20/0x20 [kvm]
....

The re-entrancy happens because the irq state is the OR of
the interrupt state and the resamplefd state.  That is, we don't
want to show the state as 0 until we've had a chance to set the
resamplefd.  But if the interrupt has _not_ gone low then
ioapic_set_irq is invoked again, causing an infinite loop.

This can only happen for a level-triggered interrupt, otherwise
irqfd_inject would immediately set the KVM_USERSPACE_IRQ_SOURCE_ID high
and then low.  Fortunately, in the case of level-triggered interrupts the VMEXIT already happens because
TMR is set.  Thus, fix the bug by restricting the lazy invocation
of the ack notifier to edge-triggered interrupts, the only ones that
need it.
Tested-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reported-by: borisvk@bstnet.org
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Link: https://www.spinics.net/lists/kvm/msg213512.html
Fixes: f458d039 ("kvm: ioapic: Lazy update IOAPIC EOI")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207489Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8be8f932

KVM: x86: Fixes posted interrupt check for IRQs delivery modes · 637543a8

由 Suravee Suthikulpanit 提交于 4月 07, 2020

Current logic incorrectly uses the enum ioapic_irq_destination_types
to check the posted interrupt destination types. However, the value was
set using APIC_DM_XXX macros, which are left-shifted by 8 bits.

Fixes by using the APIC_DM_FIXED and APIC_DM_LOWEST instead.

Fixes: (fdcf7562 'KVM: x86: Disable posted interrupts for non-standard IRQs delivery modes')
Cc: Alexander Graf <graf@amazon.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <1586239989-58305-1-git-send-email-suravee.suthikulpanit@amd.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Tested-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

637543a8

04 5月, 2020 2 次提交

KVM: SVM: fill in kvm_run->debug.arch.dr[67] · dee919d1

由 Paolo Bonzini 提交于 5月 04, 2020

The corresponding code was added for VMX in commit 42dbaa5a
("KVM: x86: Virtualize debug registers, 2008-12-15) but never for AMD.
Fix this.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dee919d1

KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning · f9336e32

由 Sean Christopherson 提交于 5月 04, 2020

Use BUG() in the impossible-to-hit default case when switching on the
scope of INVEPT to squash a warning with clang 11 due to clang treating
the BUG_ON() as conditional.

  >> arch/x86/kvm/vmx/nested.c:5246:3: warning: variable 'roots_to_free'
     is used uninitialized whenever 'if' condition is false
     [-Wsometimes-uninitialized]
                   BUG_ON(1);
Reported-by: Nkbuild test robot <lkp@intel.com>
Fixes: ce8fe7b7 ("KVM: nVMX: Free only the affected contexts when emulating INVEPT")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200504153506.28898-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f9336e32

03 5月, 2020 1 次提交

x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES · fb9cbbc8

由 Josh Poimboeuf 提交于 4月 28, 2020

Fix the following warnings seen with !CONFIG_MODULES:

  arch/x86/kernel/unwind_orc.c:29:26: warning: 'cur_orc_table' defined but not used [-Wunused-variable]
     29 | static struct orc_entry *cur_orc_table = __start_orc_unwind;
        |                          ^~~~~~~~~~~~~
  arch/x86/kernel/unwind_orc.c:28:13: warning: 'cur_orc_ip_table' defined but not used [-Wunused-variable]
     28 | static int *cur_orc_ip_table = __start_orc_unwind_ip;
        |             ^~~~~~~~~~~~~~~~

Fixes: 153eb222 ("x86/unwind/orc: Convert global variables to static")
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linux Next Mailing List <linux-next@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200428071640.psn5m7eh3zt2in4v@treble

fb9cbbc8

02 5月, 2020 2 次提交

x86/apic: Move TSC deadline timer debug printk · c84cb373

由 Thomas Gleixner 提交于 4月 27, 2020

Leon reported that the printk_once() in __setup_APIC_LVTT() triggers a
lockdep splat due to a lock order violation between hrtimer_base::lock and
console_sem, when the 'once' condition is reset via
/sys/kernel/debug/clear_warn_once after boot.

The initial printk cannot trigger this because that happens during boot
when the local APIC timer is set up on the boot CPU.

Prevent it by moving the printk to a place which is guaranteed to be only
called once during boot.

Mark the deadline timer check related functions and data __init while at
it.
Reported-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/87y2qhoshi.fsf@nanos.tec.linutronix.de

c84cb373

ftrace/x86: Fix trace event registration for syscalls without arguments · fdc63ff0

由 Konstantin Khlebnikov 提交于 4月 08, 2020

The refactoring of SYSCALL_DEFINE0() macros removed the ABI stubs and
simply defines __abi_sys_$NAME as alias of __do_sys_$NAME.

As a result kallsyms_lookup() returns "__do_sys_$NAME" which does not match
with the declared trace event name.

See also commit 1c758a22 ("tracing/x86: Update syscall trace events to
handle new prefixed syscall func names").

Add __do_sys_ to the valid prefixes which are checked in
arch_syscall_match_sym_name().

Fixes: d2b5de49 ("x86/entry: Refactor SYSCALL_DEFINE0 macros")
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/158636958997.7900.16485049455470033557.stgit@buzz

fdc63ff0

01 5月, 2020 3 次提交

KVM: arm64: Fix 32bit PC wrap-around · 0225fd5e

由 Marc Zyngier 提交于 4月 29, 2020

In the unlikely event that a 32bit vcpu traps into the hypervisor
on an instruction that is located right at the end of the 32bit
range, the emulation of that instruction is going to increment
PC past the 32bit range. This isn't great, as userspace can then
observe this value and get a bit confused.

Conversly, userspace can do things like (in the context of a 64bit
guest that is capable of 32bit EL0) setting PSTATE to AArch64-EL0,
set PC to a 64bit value, change PSTATE to AArch32-USR, and observe
that PC hasn't been truncated. More confusion.

Fix both by:
- truncating PC increments for 32bit guests
- sanitizing all 32bit regs every time a core reg is changed by
  userspace, and that PSTATE indicates a 32bit mode.

Cc: stable@vger.kernel.org
Acked-by: NWill Deacon <will@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

0225fd5e

x86/mm/cpa: Flush direct map alias during cpa · ab513018

由 Rick Edgecombe 提交于 4月 22, 2020

As an optimization, cpa_flush() was changed to optionally only flush
the range in @cpa if it was small enough. However, this range does
not include any direct map aliases changed in cpa_process_alias(). So
small set_memory_() calls that touch that alias don't get the direct
map changes flushed. This situation can happen when the virtual
address taking variants are passed an address in vmalloc or modules
space.

In these cases, force a full TLB flush.

Note this issue does not extend to cases where the set_memory_() calls are
passed a direct map address, or page array, etc, as the primary target. In
those cases the direct map would be flushed.

Fixes: 935f5839 ("x86/mm/cpa: Optimize cpa_flush_array() TLB invalidation")
Signed-off-by: NRick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200424105343.GA20730@hirez.programming.kicks-ass.net

ab513018

arm64: vdso: Add -fasynchronous-unwind-tables to cflags · 1578e5d0

由 Vincenzo Frascino 提交于 4月 29, 2020

On arm64 linux gcc uses -fasynchronous-unwind-tables -funwind-tables
by default since gcc-8, so now the de facto platform ABI is to allow
unwinding from async signal handlers.

However on bare metal targets (aarch64-none-elf), and on old gcc,
async and sync unwind tables are not enabled by default to avoid
runtime memory costs.

This means if linux is built with a baremetal toolchain the vdso.so
may not have unwind tables which breaks the gcc platform ABI guarantee
in userspace.

Add -fasynchronous-unwind-tables explicitly to the vgettimeofday.o
cflags to address the ABI change.

Fixes: 28b1a824 ("arm64: vdso: Substitute gettimeofday() with C implementation")
Cc: Will Deacon <will@kernel.org>
Reported-by: NSzabolcs Nagy <szabolcs.nagy@arm.com>
Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

1578e5d0

30 4月, 2020 4 次提交

KVM: arm64: Save/restore sp_el0 as part of __guest_enter · 6e977984

由 Marc Zyngier 提交于 4月 24, 2020

We currently save/restore sp_el0 in C code. This is a bit unsafe,
as a lot of the C code expects 'current' to be accessible from
there (and the opportunity to run kernel code in HYP is specially
great with VHE).

Instead, let's move the save/restore of sp_el0 to the assembly
code (in __guest_enter), making sure that sp_el0 is correct
very early on when we exit the guest, and is preserved as long
as possible to its host value when we enter the guest.
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

6e977984

KVM: arm64: Delete duplicated label in invalid_vector · 6aea9e05

由 Fangrui Song 提交于 4月 13, 2020

SYM_CODE_START defines \label , so it is redundant to define \label again.
A redefinition at the same place is accepted by GNU as
(https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=159fbb6088f17a341bcaaac960623cab881b4981)
but rejected by the clang integrated assembler.

Fixes: 617a2f39 ("arm64: kvm: Annotate assembly using modern annoations")
Signed-off-by: NFangrui Song <maskray@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Tested-by: NNick Desaulniers <ndesaulniers@google.com>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/988
Link: https://lore.kernel.org/r/20200413231016.250737-1-maskray@google.com

6aea9e05

crypto: arch/nhpoly1305 - process in explicit 4k chunks · a9a8ba90

由 Jason A. Donenfeld 提交于 4月 22, 2020

Rather than chunking via PAGE_SIZE, this commit changes the arch
implementations to chunk in explicit 4k parts, so that calculations on
maximum acceptable latency don't suddenly become invalid on platforms
where PAGE_SIZE isn't 4k, such as arm64.

Fixes: 0f961f9f ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
Fixes: 012c8238 ("crypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305")
Fixes: a00fa0c8 ("crypto: arm64/nhpoly1305 - add NEON-accelerated NHPoly1305")
Fixes: 16aae359 ("crypto: arm/nhpoly1305 - add NEON-accelerated NHPoly1305")
Cc: stable@vger.kernel.org
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

a9a8ba90

crypto: arch/lib - limit simd usage to 4k chunks · 706024a5

由 Jason A. Donenfeld 提交于 4月 22, 2020

The initial Zinc patchset, after some mailing list discussion, contained
code to ensure that kernel_fpu_enable would not be kept on for more than
a 4k chunk, since it disables preemption. The choice of 4k isn't totally
scientific, but it's not a bad guess either, and it's what's used in
both the x86 poly1305, blake2s, and nhpoly1305 code already (in the form
of PAGE_SIZE, which this commit corrects to be explicitly 4k for the
former two).

Ard did some back of the envelope calculations and found that
at 5 cycles/byte (overestimate) on a 1ghz processor (pretty slow), 4k
means we have a maximum preemption disabling of 20us, which Sebastian
confirmed was probably a good limit.

Unfortunately the chunking appears to have been left out of the final
patchset that added the glue code. So, this commit adds it back in.

Fixes: 84e03fa3 ("crypto: x86/chacha - expose SIMD ChaCha routine as library function")
Fixes: b3aad5ba ("crypto: arm64/chacha - expose arm64 ChaCha routine as library function")
Fixes: a44a3430 ("crypto: arm/chacha - expose ARM ChaCha routine as library function")
Fixes: d7d7b853 ("crypto: x86/poly1305 - wire up faster implementations for kernel")
Fixes: f569ca16 ("crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation")
Fixes: a6b803b3 ("crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation")
Fixes: ed0356ed ("crypto: blake2s - x86_64 SIMD implementation")
Cc: Eric Biggers <ebiggers@google.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

706024a5

25 4月, 2020 5 次提交

x86/unwind/orc: Fix premature unwind stoppage due to IRET frames · 81b67439

由 Josh Poimboeuf 提交于 4月 25, 2020

The following execution path is possible:

  fsnotify()
    [ realign the stack and store previous SP in R10 ]
    <IRQ>
      [ only IRET regs saved ]
      common_interrupt()
        interrupt_entry()
	  <NMI>
	    [ full pt_regs saved ]
	    ...
	    [ unwind stack ]

When the unwinder goes through the NMI and the IRQ on the stack, and
then sees fsnotify(), it doesn't have access to the value of R10,
because it only has the five IRET registers.  So the unwind stops
prematurely.

However, because the interrupt_entry() code is careful not to clobber
R10 before saving the full regs, the unwinder should be able to read R10
from the previously saved full pt_regs associated with the NMI.

Handle this case properly.  When encountering an IRET regs frame
immediately after a full pt_regs frame, use the pt_regs as a backup
which can be used to get the C register values.

Also, note that a call frame resets the 'prev_regs' value, because a
function is free to clobber the registers.  For this fix to work, the
IRET and full regs frames must be adjacent, with no FUNC frames in
between.  So replace the FUNC hint in interrupt_entry() with an
IRET_REGS hint.

Fixes: ee9f8fce ("x86/unwind: Add the ORC unwinder")
Reviewed-by: NMiroslav Benes <mbenes@suse.cz>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/97a408167cc09f1cfa0de31a7b70dd88868d743f.1587808742.git.jpoimboe@redhat.com

81b67439

x86/unwind/orc: Fix error path for bad ORC entry type · a0f81bf2

由 Josh Poimboeuf 提交于 4月 25, 2020

If the ORC entry type is unknown, nothing else can be done other than
reporting an error.  Exit the function instead of breaking out of the
switch statement.

Fixes: ee9f8fce ("x86/unwind: Add the ORC unwinder")
Reviewed-by: NMiroslav Benes <mbenes@suse.cz>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/a7fa668ca6eabbe81ab18b2424f15adbbfdc810a.1587808742.git.jpoimboe@redhat.com

a0f81bf2

x86/unwind/orc: Prevent unwinding before ORC initialization · 98d0c8eb

由 Josh Poimboeuf 提交于 4月 25, 2020

If the unwinder is called before the ORC data has been initialized,
orc_find() returns NULL, and it tries to fall back to using frame
pointers.  This can cause some unexpected warnings during boot.

Move the 'orc_init' check from orc_find() to __unwind_init(), so that it
doesn't even try to unwind from an uninitialized state.

Fixes: ee9f8fce ("x86/unwind: Add the ORC unwinder")
Reviewed-by: NMiroslav Benes <mbenes@suse.cz>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/069d1499ad606d85532eb32ce39b2441679667d5.1587808742.git.jpoimboe@redhat.com

98d0c8eb

x86/unwind/orc: Don't skip the first frame for inactive tasks · f1d9a2ab

由 Miroslav Benes 提交于 4月 25, 2020

When unwinding an inactive task, the ORC unwinder skips the first frame
by default.  If both the 'regs' and 'first_frame' parameters of
unwind_start() are NULL, 'state->sp' and 'first_frame' are later
initialized to the same value for an inactive task.  Given there is a
"less than or equal to" comparison used at the end of __unwind_start()
for skipping stack frames, the first frame is skipped.

Drop the equal part of the comparison and make the behavior equivalent
to the frame pointer unwinder.

Fixes: ee9f8fce ("x86/unwind: Add the ORC unwinder")
Reviewed-by: NMiroslav Benes <mbenes@suse.cz>
Signed-off-by: NMiroslav Benes <mbenes@suse.cz>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/7f08db872ab59e807016910acdbe82f744de7065.1587808742.git.jpoimboe@redhat.com

f1d9a2ab

x86/unwind: Prevent false warnings for non-current tasks · b08418b5

由 Josh Poimboeuf 提交于 4月 25, 2020

There's some daring kernel code out there which dumps the stack of
another task without first making sure the task is inactive.  If the
task happens to be running while the unwinder is reading the stack,
unusual unwinder warnings can result.

There's no race-free way for the unwinder to know whether such a warning
is legitimate, so just disable unwinder warnings for all non-current
tasks.
Reviewed-by: NMiroslav Benes <mbenes@suse.cz>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/ec424a2aea1d461eb30cab48a28c6433de2ab784.1587808742.git.jpoimboe@redhat.com

b08418b5

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功