- 15 11月, 2020 1 次提交
-
-
由 Sean Christopherson 提交于
Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(), and invoke the new hook from kvm_valid_cr4(). This fixes an issue where KVM_SET_SREGS would return success while failing to actually set CR4. Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return in __set_sregs() is not a viable option as KVM has already stuffed a variety of vCPU state. Note, kvm_valid_cr4() and is_valid_cr4() have different return types and inverted semantics. This will be remedied in a future patch. Fixes: 5e1746d6 ("KVM: nVMX: Allow setting the VMXE bit in CR4") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 11月, 2020 1 次提交
-
-
由 Babu Moger 提交于
SEV guests fail to boot on a system that supports the PCID feature. While emulating the RSM instruction, KVM reads the guest CR3 and calls kvm_set_cr3(). If the vCPU is in the long mode, kvm_set_cr3() does a sanity check for the CR3 value. In this case, it validates whether the value has any reserved bits set. The reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory encryption is enabled, the memory encryption bit is set in the CR3 value. The memory encryption bit may fall within the KVM reserved bit range, causing the KVM emulation failure. Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will cache the reserved bits in the CR3 value. This will be initialized to rsvd_bits(cpuid_maxphyaddr(vcpu), 63). If the architecture has any special bits(like AMD SEV encryption bit) that needs to be masked from the reserved bits, should be cleared in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler. Fixes: a780a3ea ("KVM: X86: Fix reserved bits check for MOV to CR3") Signed-off-by: NBabu Moger <babu.moger@amd.com> Message-Id: <160521947657.32054.3264016688005356563.stgit@bmoger-ubuntu> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 26 10月, 2020 1 次提交
-
-
由 Joe Perches 提交于
Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.plSigned-off-by: NJoe Perches <joe@perches.com> Reviewed-by: NNick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: NMiguel Ojeda <ojeda@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 10月, 2020 1 次提交
-
-
由 Rasmus Villemoes 提交于
Quoting https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html: You can define a local register variable and associate it with a specified register... The only supported use for this feature is to specify registers for input and output operands when calling Extended asm (see Extended Asm). This may be necessary if the constraints for a particular machine don't provide sufficient control to select the desired register. On 32-bit x86, this is used to ensure that gcc will put an 8-byte value into the %edx:%eax pair, while all other cases will just use the single register %eax (%rax on x86-64). While the _ASM_AX actually just expands to "%eax", note this comment next to get_user() which does something very similar: * The use of _ASM_DX as the register specifier is a bit of a * simplification, as gcc only cares about it as the starting point * and not size: for a 64-bit value it will use %ecx:%edx on 32 bits * (%ecx being the next register in gcc's x86 register sequence), and * %rdx on 64 bits. However, getting this to work requires that there is no code between the assignment to the local register variable and its use as an input to the asm() which can possibly clobber any of the registers involved - including evaluation of the expressions making up other inputs. In the current code, the ptr expression used directly as an input may cause such code to be emitted. For example, Sean Christopherson observed that with KASAN enabled and ptr being current->set_child_tid (from chedule_tail()), the load of current->set_child_tid causes a call to __asan_load8() to be emitted immediately prior to the __put_user_4 call, and Naresh Kamboju reports that various mmstress tests fail on KASAN-enabled builds. It's also possible to synthesize a broken case without KASAN if one uses "foo()" as the ptr argument, with foo being some "extern u64 __user *foo(void);" (though I don't know if that appears in real code). Fix it by making sure ptr gets evaluated before the assignment to __val_pu, and add a comment that __val_pu must be the last thing computed before the asm() is entered. Cc: Sean Christopherson <sean.j.christopherson@intel.com> Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Fixes: d55564cf ("x86: Make __put_user() generate an out-of-line call") Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 10月, 2020 1 次提交
-
-
由 Ben Gardon 提交于
Attach struct kvm_mmu_pages to every page in the TDP MMU to track metadata, facilitate NX reclaim, and enable inproved parallelism of MMU operations in future patches. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20201014182700.2888246-12-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 10月, 2020 6 次提交
-
-
由 Ben Gardon 提交于
The TDP MMU must be able to allocate paging structure root pages and track the usage of those pages. Implement a similar, but separate system for root page allocation to that of the x86 shadow paging implementation. When future patches add synchronization model changes to allow for parallel page faults, these pages will need to be handled differently from the x86 shadow paging based MMU's root pages. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
The TDP MMU offers an alternative mode of operation to the x86 shadow paging based MMU, optimized for running an L1 guest with TDP. The TDP MMU will require new fields that need to be initialized and torn down. Add hooks into the existing KVM MMU initialization process to do that initialization / cleanup. Currently the initialization and cleanup fucntions do not do very much, however more operations will be added in future patches. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20201014182700.2888246-4-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
This will be used to signal an error to the userspace, in case the vendor code failed during handling of this msr. (e.g -ENOMEM) Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201001112954.6258-4-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
As vcpu->arch.cpuid_entries is now allocated dynamically, the only remaining use for KVM_MAX_CPUID_ENTRIES is to check KVM_SET_CPUID/ KVM_SET_CPUID2 input for sanity. Since it was reported that the current limit (80) is insufficient for some CPUs, bump KVM_MAX_CPUID_ENTRIES and use an arbitrary value '256' as the new limit. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20201001130541.1398392-4-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
The current limit for guest CPUID leaves (KVM_MAX_CPUID_ENTRIES, 80) is reported to be insufficient but before we bump it let's switch to allocating vcpu->arch.cpuid_entries[] array dynamically. Currently, 'struct kvm_cpuid_entry2' is 40 bytes so vcpu->arch.cpuid_entries is 3200 bytes which accounts for 1/4 of the whole 'struct kvm_vcpu_arch' but having it pre-allocated (for all vCPUs which we also pre-allocate) gives us no real benefits. Another plus of the dynamic allocation is that we now do kvm_check_cpuid() check before we assign anything to vcpu->arch.cpuid_nent/cpuid_entries so no changes are made in case the check fails. Opportunistically remove unneeded 'out' labels from kvm_vcpu_ioctl_set_cpuid()/kvm_vcpu_ioctl_set_cpuid2() and return directly whenever possible. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20201001130541.1398392-3-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
-
由 Oliver Upton 提交于
KVM unconditionally provides PV features to the guest, regardless of the configured CPUID. An unwitting guest that doesn't check KVM_CPUID_FEATURES before use could access paravirt features that userspace did not intend to provide. Fix this by checking the guest's CPUID before performing any paravirtual operations. Introduce a capability, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, to gate the aforementioned enforcement. Migrating a VM from a host w/o this patch to a host with this patch could silently change the ABI exposed to the guest, warranting that we default to the old behavior and opt-in for the new one. Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Signed-off-by: NOliver Upton <oupton@google.com> Change-Id: I202a0926f65035b872bfe8ad15307c026de59a98 Message-Id: <20200818152429.1923996-4-oupton@google.com> Reviewed-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 10月, 2020 1 次提交
-
-
由 Michael Kelley 提交于
On ARM64, Hyper-V now specifies the interrupt to be used by VMbus in the ACPI DSDT. This information is not used on x86 because the interrupt vector must be hardcoded. But update the generic VMbus driver to do the parsing and pass the information to the architecture specific code that sets up the Linux IRQ. Update consumers of the interrupt to get it from an architecture specific function. Signed-off-by: NMichael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1597434304-40631-1-git-send-email-mikelley@microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
-
- 14 10月, 2020 1 次提交
-
-
由 Dan Williams 提交于
Patch series "device-dax: Support sub-dividing soft-reserved ranges", v5. The device-dax facility allows an address range to be directly mapped through a chardev, or optionally hotplugged to the core kernel page allocator as System-RAM. It is the mechanism for converting persistent memory (pmem) to be used as another volatile memory pool i.e. the current Memory Tiering hot topic on linux-mm. In the case of pmem the nvdimm-namespace-label mechanism can sub-divide it, but that labeling mechanism is not available / applicable to soft-reserved ("EFI specific purpose") memory [3]. This series provides a sysfs-mechanism for the daxctl utility to enable provisioning of volatile-soft-reserved memory ranges. The motivations for this facility are: 1/ Allow performance differentiated memory ranges to be split between kernel-managed and directly-accessed use cases. 2/ Allow physical memory to be provisioned along performance relevant address boundaries. For example, divide a memory-side cache [4] along cache-color boundaries. 3/ Parcel out soft-reserved memory to VMs using device-dax as a security / permissions boundary [5]. Specifically I have seen people (ab)using memmap=nn!ss (mark System-RAM as Persistent Memory) just to get the device-dax interface on custom address ranges. A follow-on for the VM use case is to teach device-dax to dynamically allocate 'struct page' at runtime to reduce the duplication of 'struct page' space in both the guest and the host kernel for the same physical pages. [2]: http://lore.kernel.org/r/20200713160837.13774-11-joao.m.martins@oracle.com [3]: http://lore.kernel.org/r/157309097008.1579826.12818463304589384434.stgit@dwillia2-desk3.amr.corp.intel.com [4]: http://lore.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com [5]: http://lore.kernel.org/r/20200110190313.17144-1-joao.m.martins@oracle.com This patch (of 23): In preparation for adding a new numa= option clean up the existing ones to avoid ifdefs in numa_setup(), and provide feedback when the option is numa=fake= option is invalid due to kernel config. The same does not need to be done for numa=noacpi, since the capability is already hard disabled at compile-time. Suggested-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Link: https://lkml.kernel.org/r/160106109960.30709.7379926726669669398.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/159643094279.4062302.17779410714418721328.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/159643094925.4062302.14979872973043772305.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 10月, 2020 3 次提交
-
-
由 Nick Desaulniers 提交于
Clang-11 shipped support for outputs to asm goto statments along the fallthrough path. Double up some of the get_user() and related macros to be able to take advantage of this extended GNU C extension. This should help improve the generated code's performance for these accesses. Cc: Bill Wendling <morbo@google.com> Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Instead of inlining the stac/mov/clac sequence (which also requires individual exception table entries and several asm instruction alternatives entries), just generate "call __put_user_nocheck_X" for the __put_user() cases, the same way we changed __get_user earlier. Unlike the get_user() case, we didn't have the same nice infrastructure to just generate the call with a single case, so this actually has to change some of the infrastructure in order to do this. But that only cleans up the code further. So now, instead of using a case statement for the sizes, we just do the same thing we've done on the get_user() side for a long time: use the size as an immediate constant to the asm, and generate the asm that way directly. In order to handle the special case of 64-bit data on a 32-bit kernel, I needed to change the calling convention slightly: the data is passed in %eax[:%edx], the pointer in %ecx, and the return value is also returned in %ecx. It used to be returned in %eax, but because of how %eax can now be a double register input, we don't want mix that with a single-register output. The actual low-level asm is easier to handle: we'll just share the code between the checking and non-checking case, with the non-checking case jumping into the middle of the function. That may sound a bit too special, but this code is all very very special anyway, so... Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Instead of inlining the whole stac/lfence/mov/clac sequence (which also requires individual exception table entries and several asm instruction alternatives entries), just generate "call __get_user_nocheck_X" for the __get_user() cases. We can use all the same infrastructure that we already do for the regular "get_user()", and the end result is simpler source code, and much simpler code generation. It also means that when I introduce asm goto with input for "unsafe_get_user()", there are no nasty interactions with the __get_user() code. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 10月, 2020 1 次提交
-
-
由 Borislav Petkov 提交于
Add asm/mce.h to asm/asm-prototypes.h so that that asm symbol's checksum can be generated in order to support CONFIG_MODVERSIONS with it and fix: WARNING: modpost: EXPORT symbol "copy_mc_fragile" [vmlinux] version \ generation failed, symbol will not be versioned. For reference see: 4efca4ed ("kbuild: modversions for EXPORT_SYMBOL() for asm") 334bb773 ("x86/kbuild: enable modversions for symbols exported from asm") Fixes: ec6347bb ("x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()") Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201007111447.GA23257@zn.tnic
-
- 07 10月, 2020 13 次提交
-
-
由 Dave Jiang 提交于
Currently, the MOVDIR64B instruction is used to atomically submit 64-byte work descriptors to devices. Although it can encounter errors like device queue full, command not accepted, device not ready, etc when writing to a device MMIO, MOVDIR64B can not report back on errors from the device itself. This means that MOVDIR64B users need to separately interact with a device to see if a descriptor was successfully queued, which slows down device interactions. ENQCMD and ENQCMDS also atomically submit 64-byte work descriptors to devices. But, they *can* report back errors directly from the device, such as if the device was busy, or device not enabled or does not support the command. This immediate feedback from the submission instruction itself reduces the number of interactions with the device and can greatly increase efficiency. ENQCMD can be used at any privilege level, but can effectively only submit work on behalf of the current process. ENQCMDS is a ring0-only instruction and can explicitly specify a process context instead of being tied to the current process or needing to reprogram the IA32_PASID MSR. Use ENQCMDS for work submission within the kernel because a Process Address ID (PASID) is setup to translate the kernel virtual address space. This PASID is provided to ENQCMDS from the descriptor structure submitted to the device and not retrieved from IA32_PASID MSR, which is setup for the current user address space. See Intel Software Developer’s Manual for more information on the instructions. [ bp: - Make operand constraints like movdir64b() because both insns are basically doing the same thing, more or less. - Fixup comments and cleanup. ] Link: https://lkml.kernel.org/r/20200924180041.34056-3-dave.jiang@intel.comSigned-off-by: NDave Jiang <dave.jiang@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NTony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20201005151126.657029-3-dave.jiang@intel.com
-
由 Dave Jiang 提交于
Carve out the MOVDIR64B inline asm primitive into a generic helper so that it can be used by other functions. Move it to special_insns.h and have iosubmit_cmds512() call it. [ bp: Massage commit message. ] Suggested-by: NMichael Matz <matz@suse.de> Signed-off-by: NDave Jiang <dave.jiang@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NTony Luck <tony.luck@intel.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201005151126.657029-2-dave.jiang@intel.com
-
由 Tony Luck 提交于
All instructions copying data between kernel and user memory are tagged with either _ASM_EXTABLE_UA or _ASM_EXTABLE_CPY entries in the exception table. ex_fault_handler_type() returns EX_HANDLER_UACCESS for both of these. Recovery is only possible when the machine check was triggered on a read from user memory. In this case the same strategy for recovery applies as if the user had made the access in ring3. If the fault was in kernel memory while copying to user there is no current recovery plan. For MOV and MOVZ instructions a full decode of the instruction is done to find the source address. For MOVS instructions the source address is in the %rsi register. The function fault_in_kernel_space() determines whether the source address is kernel or user, upgrade it from "static" so it can be used here. Co-developed-by: NYouquan Song <youquan.song@intel.com> Signed-off-by: NYouquan Song <youquan.song@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201006210910.21062-7-tony.luck@intel.com
-
由 Youquan Song 提交于
_ASM_EXTABLE_UA is a general exception entry to record the exception fixup for all exception spots between kernel and user space access. To enable recovery from machine checks while coping data from user addresses it is necessary to be able to distinguish the places that are looping copying data from those that copy a single byte/word/etc. Add a new macro _ASM_EXTABLE_CPY and use it in place of _ASM_EXTABLE_UA in the copy functions. Record the exception reason number to regs->ax at ex_handler_uaccess which is used to check MCE triggered. The new fixup routine ex_handler_copy() is almost an exact copy of ex_handler_uaccess() The difference is that it sets regs->ax to the trap number. Following patches use this to avoid trying to copy remaining bytes from the tail of the copy and possibly hitting the poison again. New mce.kflags bit MCE_IN_KERNEL_COPYIN will be used by mce_severity() calculation to indicate that a machine check is recoverable because the kernel was copying from user space. Signed-off-by: NYouquan Song <youquan.song@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201006210910.21062-4-tony.luck@intel.com
-
由 Tony Luck 提交于
Avoid a proliferation of ex_has_*_handler() functions by having just one function that returns the type of the handler (if any). Drop the __visible attribute for this function. It is not called from assembler so the attribute is not necessary. Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201006210910.21062-3-tony.luck@intel.com
-
由 Mike Travis 提交于
Add Copyrights to those files that have been updated for UV5 changes. Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201005203929.148656-14-mike.travis@hpe.com
-
由 Mike Travis 提交于
The UV NMI MMR addresses and fields moved between UV4 and UV5 necessitating a rewrite of the UV NMI handler. Adjust references to accommodate those changes. Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NDimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-13-mike.travis@hpe.com
-
由 Mike Travis 提交于
Update check of BIOS TSC sync status to include both possible "invalid" states provided by newer UV5 BIOS. Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-12-mike.travis@hpe.com
-
由 Mike Travis 提交于
When the UV BIOS starts the kernel it passes the UVsystab info struct to the kernel which contains information elements more specific than ACPI, and generally pertinent only to the MMRs. These are read only fields so information is passed one way only. A new field starting with UV5 is the UV architecture type so the ACPI OEM_ID field can be used for other purposes going forward. The UV Arch Type selects the entirety of the MMRs available, with their addresses and fields defined in uv_mmrs.h. Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NDimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-7-mike.travis@hpe.com
-
由 Mike Travis 提交于
Add new references to UV5 (and UVY class) system MMR addresses and fields primarily caused by the expansion from 46 to 52 bits of physical memory address. Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NDimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-6-mike.travis@hpe.com
-
由 Mike Travis 提交于
Update UV MMRs in uv_mmrs.h for UV5 based on Verilog output from the UV Hub hardware design files. This is the next UV architecture with a new class (UVY) being defined for 52 bit physical address masks. Uses a bitmask for UV arch identification so a single test can cover multiple versions. Includes other adjustments to match the uv_mmrs.h file to keep from encountering compile errors. New UV5 functionality is added in the patches that follow. [ Fix W=1 build warnings. ] Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-5-mike.travis@hpe.com
-
由 Mike Travis 提交于
UV class systems no longer use System Controller for monitoring of CPU activity provided by this driver. Other methods have been developed for BIOS and the management controller (BMC). Remove that supporting code. Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NDimitri Sivanich <dimitri.sivanich@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-3-mike.travis@hpe.com
-
由 Mike Travis 提交于
The Broadcast Assist Unit (BAU) TLB shootdown handler is being rewritten to become the UV BAU APIC driver. It is designed to speed up sending IPIs to selective CPUs within the system. Remove the current TLB shutdown handler (tlb_uv.c) file and a couple of kernel hooks in the interim. Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NDimitri Sivanich <dimitri.sivanich@hpe.com> Link: https://lkml.kernel.org/r/20201005203929.148656-2-mike.travis@hpe.com
-
- 06 10月, 2020 3 次提交
-
-
由 Dan Williams 提交于
In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named relative to communicating the scope of the interface. Specifically what addresses are valid to pass as source, destination, and what faults / exceptions are handled. Of particular concern is that even though x86 might be able to handle the semantics of copy_mc_to_user() with its common copy_user_generic() implementation other archs likely need / want an explicit path for this case: On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote: > > > > However now I see that copy_user_generic() works for the wrong reason. > > It works because the exception on the source address due to poison > > looks no different than a write fault on the user address to the > > caller, it's still just a short copy. So it makes copy_to_user() work > > for the wrong reason relative to the name. > > Right. > > And it won't work that way on other architectures. On x86, we have a > generic function that can take faults on either side, and we use it > for both cases (and for the "in_user" case too), but that's an > artifact of the architecture oddity. > > In fact, it's probably wrong even on x86 - because it can hide bugs - > but writing those things is painful enough that everybody prefers > having just one function. Replace a single top-level memcpy_mcsafe() with either copy_mc_to_user(), or copy_mc_to_kernel(). Introduce an x86 copy_mc_fragile() name as the rename for the low-level x86 implementation formerly named memcpy_mcsafe(). It is used as the slow / careful backend that is supplanted by a fast copy_mc_generic() in a follow-on patch. One side-effect of this reorganization is that separating copy_mc_64.S to its own file means that perf no longer needs to track dependencies for its memcpy_64.S benchmarks. [ bp: Massage a bit. ] Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NTony Luck <tony.luck@intel.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: <stable@vger.kernel.org> Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
-
由 Christoph Hellwig 提交于
Most of dma-debug.h is not required by anything outside of kernel/dma. Move the four declarations needed by dma-mappin.h or dma-ops providers into dma-mapping.h and dma-map-ops.h, and move the remainder of the file to kernel/dma/debug.h. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
Merge dma-contiguous.h into dma-map-ops.h, after removing the comment describing the contiguous allocator into kernel/dma/contigous.c. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 03 10月, 2020 1 次提交
-
-
由 Jonathan Cameron 提交于
In common with memoryless domains only register GI domains if the proximity node is not online. If a domain is already a memory containing domain, or a memoryless domain there is nothing to do just because it also contains a Generic Initiator. Signed-off-by: NJonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 01 10月, 2020 1 次提交
-
-
由 Arvind Sankar 提交于
The CRn accessor functions use __force_order as a dummy operand to prevent the compiler from reordering CRn reads/writes with respect to each other. The fact that the asm is volatile should be enough to prevent this: volatile asm statements should be executed in program order. However GCC 4.9.x and 5.x have a bug that might result in reordering. This was fixed in 8.1, 7.3 and 6.5. Versions prior to these, including 5.x and 4.9.x, may reorder volatile asm statements with respect to each other. There are some issues with __force_order as implemented: - It is used only as an input operand for the write functions, and hence doesn't do anything additional to prevent reordering writes. - It allows memory accesses to be cached/reordered across write functions, but CRn writes affect the semantics of memory accesses, so this could be dangerous. - __force_order is not actually defined in the kernel proper, but the LLVM toolchain can in some cases require a definition: LLVM (as well as GCC 4.9) requires it for PIE code, which is why the compressed kernel has a definition, but also the clang integrated assembler may consider the address of __force_order to be significant, resulting in a reference that requires a definition. Fix this by: - Using a memory clobber for the write functions to additionally prevent caching/reordering memory accesses across CRn writes. - Using a dummy input operand with an arbitrary constant address for the read functions, instead of a global variable. This will prevent reads from being reordered across writes, while allowing memory loads to be cached/reordered across CRn reads, which should be safe. Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKees Cook <keescook@chromium.org> Reviewed-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com> Tested-by: NNathan Chancellor <natechancellor@gmail.com> Tested-by: NSedat Dilek <sedat.dilek@gmail.com> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602 Link: https://lore.kernel.org/lkml/20200527135329.1172644-1-arnd@arndb.de/ Link: https://lkml.kernel.org/r/20200902232152.3709896-1-nivedita@alum.mit.edu
-
- 28 9月, 2020 5 次提交
-
-
由 Steven Rostedt (VMware) 提交于
7f47d8cc ("x86, tracing, perf: Add trace point for MSR accesses") added tracing of msr read and write, but because of complexity in having tracepoints in headers, and even more so for a core header like msr.h, not to mention the bloat a tracepoint adds to inline functions, a helper function is needed to be called from the header. Use the new tracepoint_enabled() macro in tracepoint-defs.h to test if the tracepoint is active before calling the helper function, instead of open coding the same logic, which requires knowing the internals of a tracepoint. Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Paolo Bonzini 提交于
We are going to use it for SVM too, so use a more generic name. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alexander Graf 提交于
It's not desireable to have all MSRs always handled by KVM kernel space. Some MSRs would be useful to handle in user space to either emulate behavior (like uCode updates) or differentiate whether they are valid based on the CPU model. To allow user space to specify which MSRs it wants to see handled by KVM, this patch introduces a new ioctl to push filter rules with bitmaps into KVM. Based on these bitmaps, KVM can then decide whether to reject MSR access. With the addition of KVM_CAP_X86_USER_SPACE_MSR it can also deflect the denied MSR events to user space to operate on. If no filter is populated, MSR handling stays identical to before. Signed-off-by: NAlexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-8-graf@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alexander Graf 提交于
In the following commits we will add pieces of MSR filtering. To ensure that code compiles even with the feature half-merged, let's add a few stubs and struct definitions before the real patches start. Signed-off-by: NAlexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-4-graf@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alexander Graf 提交于
MSRs are weird. Some of them are normal control registers, such as EFER. Some however are registers that really are model specific, not very interesting to virtualization workloads, and not performance critical. Others again are really just windows into package configuration. Out of these MSRs, only the first category is necessary to implement in kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against certain CPU models and MSRs that contain information on the package level are much better suited for user space to process. However, over time we have accumulated a lot of MSRs that are not the first category, but still handled by in-kernel KVM code. This patch adds a generic interface to handle WRMSR and RDMSR from user space. With this, any future MSR that is part of the latter categories can be handled in user space. Furthermore, it allows us to replace the existing "ignore_msrs" logic with something that applies per-VM rather than on the full system. That way you can run productive VMs in parallel to experimental ones where you don't care about proper MSR handling. Signed-off-by: NAlexander Graf <graf@amazon.com> Reviewed-by: NJim Mattson <jmattson@google.com> Message-Id: <20200925143422.21718-3-graf@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-