- 28 9月, 2018 1 次提交
-
-
由 Kairui Song 提交于
Commit 1958b5fc ("x86/boot: Add early boot support when running with SEV active") can occasionally cause system resets when kexec-ing a second kernel even if SEV is not active. That's because get_sev_encryption_bit() uses 32-bit rIP-relative addressing to read the value of enc_bit - a variable which caches a previously detected encryption bit position - but kexec may allocate the early boot code to a higher location, beyond the 32-bit addressing limit. In this case, garbage will be read and get_sev_encryption_bit() will return the wrong value, leading to accessing memory with the wrong encryption setting. Therefore, remove enc_bit, and thus get rid of the need to do 32-bit rIP-relative addressing in the first place. [ bp: massage commit message heavily. ] Fixes: 1958b5fc ("x86/boot: Add early boot support when running with SEV active") Suggested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NKairui Song <kasong@redhat.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NTom Lendacky <thomas.lendacky@amd.com> Cc: linux-kernel@vger.kernel.org Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: brijesh.singh@amd.com Cc: kexec@lists.infradead.org Cc: dyoung@redhat.com Cc: bhe@redhat.com Cc: ghook@redhat.com Link: https://lkml.kernel.org/r/20180927123845.32052-1-kasong@redhat.com
-
- 21 9月, 2018 1 次提交
-
-
由 Feng Tang 提交于
We met a kernel panic when enabling earlycon, which is due to the fixmap address of earlycon is not statically setup. Currently the static fixmap setup in head_64.S only covers 2M virtual address space, while it actually could be in 4M space with different kernel configurations, e.g. when VSYSCALL emulation is disabled. So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2, and add a build time check to ensure that the fixmap is covered by the initial static page tables. Fixes: 1ad83c85 ("x86_64,vsyscall: Make vsyscall emulation configurable") Suggested-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NFeng Tang <feng.tang@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: Nkernel test robot <rong.a.chen@intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> (Xen parts) Cc: H Peter Anvin <hpa@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180920025828.23699-1-feng.tang@intel.com
-
- 19 9月, 2018 10 次提交
-
-
由 Dan Carpenter 提交于
The first argument to WARN_ONCE() is a condition. Fixes: 5800dc5c ("x86/paravirt: Fix spectre-v2 mitigations for paravirt guests") Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NJuergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alok Kataria <akataria@vmware.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: virtualization@lists.linux-foundation.org Cc: kernel-janitors@vger.kernel.org Link: https://lkml.kernel.org/r/20180919103553.GD9238@mwanda
-
由 Reinette Chatre 提交于
In order to determine a sane default cache allocation for a new CAT/CDP resource group, all resource groups are checked to determine which cache portions are available to share. At this time all possible CLOSIDs that can be supported by the resource is checked. This is problematic if the resource supports more CLOSIDs than another CAT/CDP resource. In this case, the number of CLOSIDs that could be allocated are fewer than the number of CLOSIDs that can be supported by the resource. Limit the check of closids to that what is supported by the system based on the minimum across all resources. Fixes: 95f0b77e ("x86/intel_rdt: Initialize new resource group with sane defaults") Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-10-git-send-email-fenghua.yu@intel.com
-
由 Reinette Chatre 提交于
It is possible for a resource group to consist out of MBA as well as CAT/CDP resources. The "exclusive" resource mode only applies to the CAT/CDP resources since MBA allocations cannot be specified to overlap or not. When a user requests a resource group to become "exclusive" then it can only be successful if there are CAT/CDP resources in the group and none of their CBMs associated with the group's CLOSID overlaps with any other resource group. Fix the "exclusive" mode setting by failing if there isn't any CAT/CDP resource in the group and ensuring that the CBM checking is only done on CAT/CDP resources. Fixes: 49f7b4ef ("x86/intel_rdt: Enable setting of exclusive mode") Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-9-git-send-email-fenghua.yu@intel.com
-
由 Reinette Chatre 提交于
A loop is used to check if a CAT resource's CBM of one CLOSID overlaps with the CBM of another CLOSID of the same resource. The loop is run over all CLOSIDs supported by the resource. The problem with running the loop over all CLOSIDs supported by the resource is that its number of supported CLOSIDs may be more than the number of supported CLOSIDs on the system, which is the minimum number of CLOSIDs supported across all resources. Fix the loop to only consider the number of system supported CLOSIDs, not all that are supported by the resource. Fixes: 49f7b4ef ("x86/intel_rdt: Enable setting of exclusive mode") Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-8-git-send-email-fenghua.yu@intel.com
-
由 Reinette Chatre 提交于
A system supporting pseudo-locking may have MBA as well as CAT resources of which only the CAT resources could support cache pseudo-locking. When the schemata to be pseudo-locked is provided it should be checked that that schemata does not attempt to pseudo-lock a MBA resource. Fixes: e0bdfe8e ("x86/intel_rdt: Support creation/removal of pseudo-locked region") Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-7-git-send-email-fenghua.yu@intel.com
-
由 Reinette Chatre 提交于
When a new resource group is created, it is initialized with sane defaults that currently assume the resource being initialized is a CAT resource. This code path is also followed by a MBA resource that is not allocated the same as a CAT resource and as a result we encounter the following unchecked MSR access error: unchecked MSR access error: WRMSR to 0xd51 (tried to write 0x0000 000000000064) at rIP: 0xffffffffae059994 (native_write_msr+0x4/0x20) Call Trace: mba_wrmsr+0x41/0x80 update_domains+0x125/0x130 rdtgroup_mkdir+0x270/0x500 Fix the above by ensuring the initial allocation is only attempted on a CAT resource. Fixes: 95f0b77e ("x86/intel_rdt: Initialize new resource group with sane defaults") Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-6-git-send-email-fenghua.yu@intel.com
-
由 Reinette Chatre 提交于
When multiple resources are managed by RDT, the number of CLOSIDs used is the minimum of the CLOSIDs supported by each resource. In the function rdt_bit_usage_show(), the annotated bitmask is created to depict how the CAT supporting caches are being used. During this annotated bitmask creation, each resource group is queried for its mode that is used as a label in the annotated bitmask. The maximum number of resource groups is currently assumed to be the number of CLOSIDs supported by the resource for which the information is being displayed. This is incorrect since the number of active CLOSIDs is the minimum across all resources. If information for a cache instance with more CLOSIDs than another is being generated we thus encounter a warning like: invalid mode for closid 8 WARNING: CPU: 88 PID: 1791 at [SNIP]/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c :827 rdt_bit_usage_show+0x221/0x2b0 Fix this by ensuring that only the number of supported CLOSIDs are considered. Fixes: e6519011 ("x86/intel_rdt: Introduce "bit_usage" to display cache allocations details") Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-5-git-send-email-fenghua.yu@intel.com
-
由 Reinette Chatre 提交于
The number of CLOSIDs supported by a system is the minimum number of CLOSIDs supported by any of its resources. Care should be taken when iterating over the CLOSIDs of a resource since it may be that the number of CLOSIDs supported on the system is less than the number of CLOSIDs supported by the resource. Introduce a helper function that can be used to query the number of CLOSIDs that is supported by all resources, irrespective of how many CLOSIDs are supported by a particular resource. Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-4-git-send-email-fenghua.yu@intel.com
-
由 Reinette Chatre 提交于
Chen Yu reported a divide-by-zero error when accessing the 'size' resctrl file when a MBA resource is enabled. divide error: 0000 [#1] SMP PTI CPU: 93 PID: 1929 Comm: cat Not tainted 4.19.0-rc2-debug-rdt+ #25 RIP: 0010:rdtgroup_cbm_to_size+0x7e/0xa0 Call Trace: rdtgroup_size_show+0x11a/0x1d0 seq_read+0xd8/0x3b0 Quoting Chen Yu's report: This is because for MB resource, the r->cache.cbm_len is zero, thus calculating size in rdtgroup_cbm_to_size() will trigger the exception. Fix this issue in the 'size' file by getting correct memory bandwidth value which is in MBps when MBA software controller is enabled or in percentage when MBA software controller is disabled. Fixes: d9b48c86 ("x86/intel_rdt: Display resource groups' allocations in bytes") Reported-by: NChen Yu <yu.c.chen@intel.com> Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NChen Yu <yu.c.chen@intel.com> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Link: https://lkml.kernel.org/r/20180904174614.26682-1-yu.c.chen@intel.com Link: https://lkml.kernel.org/r/1537048707-76280-3-git-send-email-fenghua.yu@intel.com
-
由 Xiaochen Shen 提交于
Each resource is associated with a parsing callback to parse the data provided from user space when writing schemata file. The 'data' parameter in the callbacks is defined as a void pointer which is error prone due to lack of type check. parse_bw() processes the 'data' parameter as a string while its caller actually passes the parameter as a pointer to struct rdt_cbm_parse_data. Thus, parse_bw() takes wrong data and causes failure of parsing MBA throttle value. To fix the issue, the 'data' parameter in all parsing callbacks is defined and handled as a pointer to struct rdt_parse_data (renamed from struct rdt_cbm_parse_data). Fixes: 7604df6e ("x86/intel_rdt: Support flexible data to parsing callbacks") Fixes: 9ab9aa15 ("x86/intel_rdt: Ensure requested schemata respects mode") Signed-off-by: NXiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-2-git-send-email-fenghua.yu@intel.com
-
- 16 9月, 2018 2 次提交
-
-
由 Brijesh Singh 提交于
The recent removal of the memblock dependency from kvmclock caused a SEV guest regression because the wall_clock and hv_clock_boot variables are no longer mapped decrypted when SEV is active. Use the __bss_decrypted attribute to put the static wall_clock and hv_clock_boot in the .bss..decrypted section so that they are mapped decrypted during boot. In the preparatory stage of CPU hotplug, the per-cpu pvclock data pointer assigns either an element of the static array or dynamically allocated memory for the pvclock data pointer. The static array are now mapped decrypted but the dynamically allocated memory is not mapped decrypted. However, when SEV is active this memory range must be mapped decrypted. Add a function which is called after the page allocator is up, and allocate memory for the pvclock data pointers for the all possible cpus. Map this memory range as decrypted when SEV is active. Fixes: 368a540e ("x86/kvmclock: Remove memblock dependency") Suggested-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Link: https://lkml.kernel.org/r/1536932759-12905-3-git-send-email-brijesh.singh@amd.com
-
由 Brijesh Singh 提交于
kvmclock defines few static variables which are shared with the hypervisor during the kvmclock initialization. When SEV is active, memory is encrypted with a guest-specific key, and if the guest OS wants to share the memory region with the hypervisor then it must clear the C-bit before sharing it. Currently, we use kernel_physical_mapping_init() to split large pages before clearing the C-bit on shared pages. But it fails when called from the kvmclock initialization (mainly because the memblock allocator is not ready that early during boot). Add a __bss_decrypted section attribute which can be used when defining such shared variable. The so-defined variables will be placed in the .bss..decrypted section. This section will be mapped with C=0 early during boot. The .bss..decrypted section has a big chunk of memory that may be unused when memory encryption is not active, free it when memory encryption is not active. Suggested-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Radim Krčmář<rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Link: https://lkml.kernel.org/r/1536932759-12905-2-git-send-email-brijesh.singh@amd.com
-
- 15 9月, 2018 1 次提交
-
-
由 Randy Dunlap 提交于
Fix build warning in apm_32.c when CONFIG_PROC_FS is not enabled: ../arch/x86/kernel/apm_32.c:1643:12: warning: 'proc_apm_show' defined but not used [-Wunused-function] static int proc_apm_show(struct seq_file *m, void *v) Fixes: 3f3942ac ("proc: introduce proc_create_single{,_data}") Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Jiri Kosina <jikos@kernel.org> Link: https://lkml.kernel.org/r/be39ac12-44c2-4715-247f-4dcc3c525b8b@infradead.org
-
- 14 9月, 2018 1 次提交
-
-
由 Joerg Roedel 提交于
This reverts commit 1f40a46c. It turned out that this patch is not sufficient to enable PTI on 32 bit systems with legacy 2-level page-tables. In this paging mode the huge-page PTEs are in the top-level page-table directory, where also the mirroring to the user-space page-table happens. So every huge PTE exits twice, in the kernel and in the user page-table. That means that accessed/dirty bits need to be fetched from two PTEs in this mode to be safe, but this is not trivial to implement because it needs changes to generic code just for the sake of enabling PTI with 32-bit legacy paging. As all systems that need PTI should support PAE anyway, remove support for PTI when 32-bit legacy paging is used. Fixes: 7757d607 ('x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32') Reported-by: NMeelis Roos <mroos@linux.ee> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Link: https://lkml.kernel.org/r/1536922754-31379-1-git-send-email-joro@8bytes.org
-
- 13 9月, 2018 2 次提交
-
-
由 Guenter Roeck 提交于
Commit eeb89e2b ("x86/efi: Load fixmap GDT in efi_call_phys_epilog()") moved loading the fixmap in efi_call_phys_epilog() after load_cr3() since it was assumed to be more logical. Turns out this is incorrect: In efi_call_phys_prolog(), the gdt with its physical address is loaded first, and when the %cr3 is reloaded in _epilog from initial_page_table to swapper_pg_dir again the gdt is no longer mapped. This results in a triple fault if an interrupt occurs after load_cr3() and before load_fixmap_gdt(0). Calling load_fixmap_gdt(0) first restores the execution order prior to commit eeb89e2b and fixes the problem. Fixes: eeb89e2b ("x86/efi: Load fixmap GDT in efi_call_phys_epilog()") Signed-off-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: linux-efi@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Joerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/1536689892-21538-1-git-send-email-linux@roeck-us.net
-
由 Juergen Gross 提交于
Xen PV guests don't allow CPU0 hotplug, so disable it. Signed-off-by: NJuergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20180912174122.24282-1-jgross@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 9月, 2018 1 次提交
-
-
由 Boris Ostrovsky 提交于
For unprivileged Xen PV guests this is normal memory and ioremap will not be able to properly map it. While at it, since ioremap may return NULL, add a test for pointer's validity. Reported-by: NAndy Smith <andy@strugglers.net> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Cc: xen-devel@lists.xenproject.org Cc: jgross@suse.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180911195538.23289-1-boris.ostrovsky@oracle.com
-
- 10 9月, 2018 1 次提交
-
-
由 Jacek Tomaka 提交于
Problem: perf did not show branch predicted/mispredicted bit in brstack. Output of perf -F brstack for profile collected Before: 0x4fdbcd/0x4fdc03/-/-/-/0 0x45f4c1/0x4fdba0/-/-/-/0 0x45f544/0x45f4bb/-/-/-/0 0x45f555/0x45f53c/-/-/-/0 0x7f66901cc24b/0x45f555/-/-/-/0 0x7f66901cc22e/0x7f66901cc23d/-/-/-/0 0x7f66901cc1ff/0x7f66901cc20f/-/-/-/0 0x7f66901cc1e8/0x7f66901cc1fc/-/-/-/0 After: 0x4fdbcd/0x4fdc03/P/-/-/0 0x45f4c1/0x4fdba0/P/-/-/0 0x45f544/0x45f4bb/P/-/-/0 0x45f555/0x45f53c/P/-/-/0 0x7f66901cc24b/0x45f555/P/-/-/0 0x7f66901cc22e/0x7f66901cc23d/P/-/-/0 0x7f66901cc1ff/0x7f66901cc20f/P/-/-/0 0x7f66901cc1e8/0x7f66901cc1fc/P/-/-/0 Cause: As mentioned in Software Development Manual vol 3, 17.4.8.1, IA32_PERF_CAPABILITIES[5:0] indicates the format of the address that is stored in the LBR stack. Knights Landing reports 1 (LBR_FORMAT_LIP) as its format. Despite that, registers containing FROM address of the branch, do have MISPREDICT bit but because of the format indicated in IA32_PERF_CAPABILITIES[5:0], LBR did not read MISPREDICT bit. Solution: Teach LBR about above Knights Landing quirk and make it read MISPREDICT bit. Signed-off-by: NJacek Tomaka <jacek.tomaka@poczta.fm> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180802013830.10600-1-jacekt@dugeo.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 08 9月, 2018 4 次提交
-
-
由 Nadav Amit 提交于
When page-table entries are set, the compiler might optimize their assignment by using multiple instructions to set the PTE. This might turn into a security hazard if the user somehow manages to use the interim PTE. L1TF does not make our lives easier, making even an interim non-present PTE a security hazard. Using WRITE_ONCE() to set PTEs and friends should prevent this potential security hazard. I skimmed the differences in the binary with and without this patch. The differences are (obviously) greater when CONFIG_PARAVIRT=n as more code optimizations are possible. For better and worse, the impact on the binary with this patch is pretty small. Skimming the code did not cause anything to jump out as a security hazard, but it seems that at least move_soft_dirty_pte() caused set_pte_at() to use multiple writes. Signed-off-by: NNadav Amit <namit@vmware.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180902181451.80520-1-namit@vmware.com
-
由 Thomas Gleixner 提交于
activate_managed() returns EINVAL instead of -EINVAL in case of error. While this is unlikely to happen, the positive return value would cause further malfunction at the call site. Fixes: 2db1f959 ("x86/vector: Handle managed interrupts proper") Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
-
由 Wanpeng Li 提交于
Dan Carpenter reported that the untrusted data returns from kvm_register_read() results in the following static checker warning: arch/x86/kvm/lapic.c:576 kvm_pv_send_ipi() error: buffer underflow 'map->phys_map' 's32min-s32max' KVM guest can easily trigger this by executing the following assembly sequence in Ring0: mov $10, %rax mov $0xFFFFFFFF, %rbx mov $0xFFFFFFFF, %rdx mov $0, %rsi vmcall As this will cause KVM to execute the following code-path: vmx_handle_exit() -> handle_vmcall() -> kvm_emulate_hypercall() -> kvm_pv_send_ipi() which will reach out-of-bounds access. This patch fixes it by adding a check to kvm_pv_send_ipi() against map->max_apic_id, ignoring destinations that are not present and delivering the rest. We also check whether or not map->phys_map[min + i] is NULL since the max_apic_id is set to the max apic id, some phys_map maybe NULL when apic id is sparse, especially kvm unconditionally set max_apic_id to 255 to reserve enough space for any xAPIC ID. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> [Add second "if (min > map->max_apic_id)" to complete the fix. -Radim] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Liran Alon 提交于
Consider the case L1 had a IRQ/NMI event until it executed VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME, L0 needs to evaluate if this pending event should cause an exit from L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept EXTERNAL_INTERRUPT). Usually this would be handled by L0 requesting a IRQ/NMI window by setting VMCS accordingly. However, this setting was done on VMCS01 and now VMCS02 is active instead. Thus, when L1 executes VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by requesting a KVM_REQ_EVENT. Note that above scenario exists when L1 KVM is about to enter L2 but requests an "immediate-exit". As in this case, L1 will disable-interrupts and then send a self-IPI before entering L2. Reviewed-by: NNikita Leshchenko <nikita.leshchenko@oracle.com> Co-developed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 07 9月, 2018 1 次提交
-
-
由 Marc Zyngier 提交于
kvm_unmap_hva is long gone, and we only have kvm_unmap_hva_range to deal with. Drop the now obsolete code. Fixes: fb1522e0 ("KVM: update to new mmu_notifier semantic v2") Cc: James Hogan <jhogan@kernel.org> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
-
- 06 9月, 2018 2 次提交
-
-
由 Jann Horn 提交于
When the kernel.print-fatal-signals sysctl has been enabled, a simple userspace crash will cause the kernel to write a crash dump that contains, among other things, the kernel gsbase into dmesg. As suggested by Andy, limit output to pt_regs, FS_BASE and KERNEL_GS_BASE in this case. This also moves the bitness-specific logic from show_regs() into process_{32,64}.c. Fixes: 45807a1d ("vdso: print fatal signals") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180831194151.123586-1-jannh@google.com
-
由 Chuanhua Lei 提交于
Loops per jiffy is calculated by multiplying tsc_khz with 1e3 and then dividing it by HZ. Both tsc_khz and the temporary variable holding the multiplication result are of type unsigned long, so on 32bit the result is truncated to the lower 32bit. Use u64 as type for the temporary variable and cast tsc_khz to it before multiplying. [ tglx: Massaged changelog and removed pointless braces ] Fixes: cf7a63ef ("x86/tsc: Calibrate tsc only once") Signed-off-by: NChuanhua Lei <chuanhua.lei@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: yixin.zhu@linux.intel.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Tatashin <pasha.tatashin@microsoft.com> Cc: Rajvi Jingar <rajvi.jingar@intel.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Link: https://lkml.kernel.org/r/1536228203-18701-1-git-send-email-chuanhua.lei@linux.intel.com
-
- 03 9月, 2018 1 次提交
-
-
由 Randy Dunlap 提交于
Fix kernel-doc warnings in arch/x86/include/asm/atomic.h that are caused by having a #define macro between the kernel-doc notation and the function name. Fixed by moving the #define macro to after the function implementation. Make the same change for atomic64_{32,64}.h for consistency even though there were no kernel-doc warnings found in these header files, but there would be if they were used in generation of documentation. Fixes these kernel-doc warnings: ../arch/x86/include/asm/atomic.h:84: warning: Excess function parameter 'i' description in 'arch_atomic_sub_and_test' ../arch/x86/include/asm/atomic.h:84: warning: Excess function parameter 'v' description in 'arch_atomic_sub_and_test' ../arch/x86/include/asm/atomic.h:96: warning: Excess function parameter 'v' description in 'arch_atomic_inc' ../arch/x86/include/asm/atomic.h:109: warning: Excess function parameter 'v' description in 'arch_atomic_dec' ../arch/x86/include/asm/atomic.h:124: warning: Excess function parameter 'v' description in 'arch_atomic_dec_and_test' ../arch/x86/include/asm/atomic.h:138: warning: Excess function parameter 'v' description in 'arch_atomic_inc_and_test' ../arch/x86/include/asm/atomic.h:153: warning: Excess function parameter 'i' description in 'arch_atomic_add_negative' ../arch/x86/include/asm/atomic.h:153: warning: Excess function parameter 'v' description in 'arch_atomic_add_negative' Fixes: 18cc1814 ("atomics/treewide: Make test ops optional") Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lkml.kernel.org/r/0a1e678d-c8c5-b32c-2640-ed4e94d399d2@infradead.org
-
- 02 9月, 2018 4 次提交
-
-
由 Filippo Sironi 提交于
Handle the case where microcode gets loaded on the BSP's hyperthread sibling first and the boot_cpu_data's microcode revision doesn't get updated because of early exit due to the siblings sharing a microcode engine. For that, simply write the updated revision on all CPUs unconditionally. Signed-off-by: NFilippo Sironi <sironi@amazon.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: prarit@redhat.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1533050970-14385-1-git-send-email-sironi@amazon.de
-
由 Prarit Bhargava 提交于
When preparing an MCE record for logging, boot_cpu_data.microcode is used to read out the microcode revision on the box. However, on systems where late microcode update has happened, the microcode revision output in a MCE log record is wrong because boot_cpu_data.microcode is not updated when the microcode gets updated. But, the microcode revision saved in boot_cpu_data's microcode member should be kept up-to-date, regardless, for consistency. Make it so. Fixes: fa94d0c6 ("x86/MCE: Save microcode revision in machine check records") Signed-off-by: NPrarit Bhargava <prarit@redhat.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: sironi@amazon.de Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180731112739.32338-1-prarit@redhat.com
-
由 Randy Dunlap 提交于
Fix the section mismatch warning in arch/x86/mm/pti.c: WARNING: vmlinux.o(.text+0x6972a): Section mismatch in reference from the function pti_clone_pgtable() to the function .init.text:pti_user_pagetable_walk_pte() The function pti_clone_pgtable() references the function __init pti_user_pagetable_walk_pte(). This is often because pti_clone_pgtable lacks a __init annotation or the annotation of pti_user_pagetable_walk_pte is wrong. FATAL: modpost: Section mismatches detected. Fixes: 85900ea5 ("x86/pti: Map the vsyscall page if needed") Reported-by: Nkbuild test robot <lkp@intel.com> Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/43a6d6a3-d69d-5eda-da09-0b1c88215a2a@infradead.org
-
由 Samuel Neves 提交于
In the __getcpu function, lsl is using the wrong target and destination registers. Luckily, the compiler tends to choose %eax for both variables, so it has been working so far. Fixes: a582c540 ("x86/vdso: Use RDPID in preference to LSL when available") Signed-off-by: NSamuel Neves <sneves@dei.uc.pt> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NAndy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180901201452.27828-1-sneves@dei.uc.pt
-
- 01 9月, 2018 1 次提交
-
-
由 LuckTony 提交于
The trick with flipping bit 63 to avoid loading the address of the 1:1 mapping of the poisoned page while the 1:1 map is updated used to work when unmapping the page. But it falls down horribly when attempting to directly set the page as uncacheable. The problem is that when the cache mode is changed to uncachable, the pages needs to be flushed from the cache first. But the decoy address is non-canonical due to bit 63 flipped, and the CLFLUSH instruction throws a #GP fault. Add code to change_page_attr_set_clr() to fix the address before calling flush. Fixes: 284ce401 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()") Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Link: https://lkml.kernel.org/r/20180831165506.GA9605@agluck-desk
-
- 31 8月, 2018 4 次提交
-
-
由 Joerg Roedel 提交于
When PTI is enabled on x86-32 the kernel uses the GDT mapped in the fixmap for the simple reason that this address is also mapped for user-space. The efi_call_phys_prolog()/efi_call_phys_epilog() wrappers change the GDT to call EFI runtime services and switch back to the kernel GDT when they return. But the switch-back uses the writable GDT, not the fixmap GDT. When that happened and and the CPU returns to user-space it switches to the user %cr3 and tries to restore user segment registers. This fails because the writable GDT is not mapped in the user page-table, and without a GDT the fault handlers also can't be launched. The result is a triple fault and reboot of the machine. Fix that by restoring the GDT back to the fixmap GDT which is also mapped in the user page-table. Fixes: 7757d607 x86/pti: ('Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32') Reported-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NGuenter Roeck <linux@roeck-us.net> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: hpa@zytor.com Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/1535702738-10971-1-git-send-email-joro@8bytes.org
-
由 Andy Lutomirski 提交于
A NMI can hit in the middle of context switching or in the middle of switch_mm_irqs_off(). In either case, CR3 might not match current->mm, which could cause copy_from_user_nmi() and friends to read the wrong memory. Fix it by adding a new nmi_uaccess_okay() helper and checking it in copy_from_user_nmi() and in __copy_from_user_nmi()'s callers. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NRik van Riel <riel@surriel.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/dd956eba16646fd0b15c3c0741269dfd84452dac.1535557289.git.luto@kernel.org
-
由 Ben Hutchings 提交于
When bootstrapping an architecture, it's usual to generate the kernel's user-space headers (make headers_install) before building a compiler. Move the compiler check (for asm goto support) to the archprepare target so that it is only done when building code for the target. Fixes: e501ce95 ("x86: Force asm-goto") Reported-by: NHelmut Grohne <helmutg@debian.org> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180829194317.GA4765@decadent.org.uk
-
由 Jann Horn 提交于
show_opcodes() is used both for dumping kernel instructions and for dumping user instructions. If userspace causes #PF by jumping to a kernel address, show_opcodes() can be reached with regs->ip controlled by the user, pointing to kernel code. Make sure that userspace can't trick us into dumping kernel memory into dmesg. Fixes: 7cccf072 ("x86/dumpstack: Add a show_ip() function") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKees Cook <keescook@chromium.org> Reviewed-by: NBorislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: security@kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180828154901.112726-1-jannh@google.com
-
- 30 8月, 2018 3 次提交
-
-
由 Sean Christopherson 提交于
Allowing x86_emulate_instruction() to be called directly has led to subtle bugs being introduced, e.g. not setting EMULTYPE_NO_REEXECUTE in the emulation type. While most of the blame lies on re-execute being opt-out, exporting x86_emulate_instruction() also exposes its cr2 parameter, which may have contributed to commit d391f120 ("x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested") using x86_emulate_instruction() instead of emulate_instruction() because "hey, I have a cr2!", which in turn introduced its EMULTYPE_NO_REEXECUTE bug. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
Lack of the kvm_ prefix gives the impression that it's a VMX or SVM specific function, and there's no conflict that prevents adding the kvm_ prefix. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
Commit a6f177ef ("KVM: Reenter guest after emulation failure if due to access to non-mmio address") added reexecute_instruction() to handle the scenario where two (or more) vCPUS race to write a shadowed page, i.e. reexecute_instruction() is intended to return true if and only if the instruction being emulated was accessing a shadowed page. As L0 is only explicitly shadowing L1 tables, an emulation failure of a nested VM instruction cannot be due to a race to write a shadowed page and so should never be re-executed. This fixes an issue where an "MMIO" emulation failure[1] in L2 is all but guaranteed to result in an infinite loop when TDP is enabled. Because "cr2" is actually an L2 GPA when TDP is enabled, calling kvm_mmu_gva_to_gpa_write() to translate cr2 in the non-direct mapped case (L2 is never direct mapped) will almost always yield UNMAPPED_GVA and cause reexecute_instruction() to immediately return true. The !mmio_info_in_cache() check in kvm_mmu_page_fault() doesn't catch this case because mmio_info_in_cache() returns false for a nested MMU (the MMIO caching currently handles L1 only, e.g. to cache nested guests' GPAs we'd have to manually flush the cache when switching between VMs and when L1 updated its page tables controlling the nested guest). Way back when, commit 68be0803 ("KVM: x86: never re-execute instruction with enabled tdp") changed reexecute_instruction() to always return false when using TDP under the assumption that KVM would only get into the emulator for MMIO. Commit 95b3cf69 ("KVM: x86: let reexecute_instruction work for tdp") effectively reverted that behavior in order to handle the scenario where emulation failed due to an access from L1 to the shadow page tables for L2, but it didn't account for the case where emulation failed in L2 with TDP enabled. All of the above logic also applies to retry_instruction(), added by commit 1cb3f3ae ("KVM: x86: retry non-page-table writing instructions"). An indefinite loop in retry_instruction() should be impossible as it protects against retrying the same instruction over and over, but it's still correct to not retry an L2 instruction in the first place. Fix the immediate issue by adding a check for a nested guest when determining whether or not to allow retry in kvm_mmu_page_fault(). In addition to fixing the immediate bug, add WARN_ON_ONCE in the retry functions since they are not designed to handle nested cases, i.e. they need to be modified even if there is some scenario in the future where we want to allow retrying a nested guest. [1] This issue was encountered after commit 3a2936de ("kvm: mmu: Don't expose private memslots to L2") changed the page fault path to return KVM_PFN_NOSLOT when translating an L2 access to a prive memslot. Returning KVM_PFN_NOSLOT is semantically correct when we want to hide a memslot from L2, i.e. there effectively is no defined memory region for L2, but it has the unfortunate side effect of making KVM think the GFN is a MMIO page, thus triggering emulation. The failure occurred with in-development code that deliberately exposed a private memslot to L2, which L2 accessed with an instruction that is not emulated by KVM. Fixes: 95b3cf69 ("KVM: x86: let reexecute_instruction work for tdp") Fixes: 1cb3f3ae ("KVM: x86: retry non-page-table writing instructions") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Cc: Jim Mattson <jmattson@google.com> Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com> Cc: Xiao Guangrong <xiaoguangrong@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-