- 12 12月, 2020 1 次提交
-
-
由 Ashish Kalra 提交于
For SEV, all DMA to and from guest has to use shared (un-encrypted) pages. SEV uses SWIOTLB to make this happen without requiring changes to device drivers. However, depending on the workload being run, the default 64MB of it might not be enough and it may run out of buffers to use for DMA, resulting in I/O errors and/or performance degradation for high I/O workloads. Adjust the default size of SWIOTLB for SEV guests using a percentage of the total memory available to guest for the SWIOTLB buffers. Adds a new sev_setup_arch() function which is invoked from setup_arch() and it calls into a new swiotlb generic code function swiotlb_adjust_size() to do the SWIOTLB buffer adjustment. v5 fixed build errors and warnings as Reported-by: Nkbuild test robot <lkp@intel.com> Signed-off-by: NAshish Kalra <ashish.kalra@amd.com> Co-developed-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 06 12月, 2020 3 次提交
-
-
由 Masami Hiramatsu 提交于
Since insn.prefixes.nbytes can be bigger than the size of insn.prefixes.bytes[] when a prefix is repeated, the proper check must be: insn.prefixes.bytes[i] != 0 and i < 4 instead of using insn.prefixes.nbytes. Use the new for_each_insn_prefix() macro which does it correctly. Debugged by Kees Cook <keescook@chromium.org>. [ bp: Massage commit message. ] Fixes: 25189d08 ("x86/sev-es: Add support for handling IOIO exceptions") Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/160697106089.3146288.2052422845039649176.stgit@devnote2
-
由 Masami Hiramatsu 提交于
Since insn.prefixes.nbytes can be bigger than the size of insn.prefixes.bytes[] when a prefix is repeated, the proper check must be insn.prefixes.bytes[i] != 0 and i < 4 instead of using insn.prefixes.nbytes. Use the new for_each_insn_prefix() macro which does it correctly. Debugged by Kees Cook <keescook@chromium.org>. [ bp: Massage commit message. ] Fixes: 32d0b953 ("x86/insn-eval: Add utility functions to get segment selector") Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/160697104969.3146288.16329307586428270032.stgit@devnote2
-
由 Masami Hiramatsu 提交于
Since insn.prefixes.nbytes can be bigger than the size of insn.prefixes.bytes[] when a prefix is repeated, the proper check must be insn.prefixes.bytes[i] != 0 and i < 4 instead of using insn.prefixes.nbytes. Introduce a for_each_insn_prefix() macro for this purpose. Debugged by Kees Cook <keescook@chromium.org>. [ bp: Massage commit message, sync with the respective header in tools/ and drop "we". ] Fixes: 2b144498 ("uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints") Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/160697103739.3146288.7437620795200799020.stgit@devnote2
-
- 04 12月, 2020 1 次提交
-
-
由 Mike Travis 提交于
Currently, UV4 is incorrectly identified as UV4A and UV4A as UV5. Hub chip starts with revision 1, fix it. [ bp: Massage commit message. ] Fixes: 647128f1 ("x86/platform/uv: Update UV MMRs for UV5") Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Acked-by: NDimitri Sivanich <dimitri.sivanich@hpe.com> Link: https://lkml.kernel.org/r/20201203152252.371199-1-mike.travis@hpe.com
-
- 03 12月, 2020 2 次提交
-
-
由 Stephane Eranian 提交于
The kernel cannot disambiguate when 2+ PEBS counters overflow at the same time. This is what the comment for this code suggests. However, I see the comparison is done with the unfiltered p->status which is a copy of IA32_PERF_GLOBAL_STATUS at the time of the sample. This register contains more than the PEBS counter overflow bits. It also includes many other bits which could also be set. Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201126110922.317681-2-namhyung@kernel.org
-
由 Namhyung Kim 提交于
The commit 3966c3fe ("x86/perf/amd: Remove need to check "running" bit in NMI handler") introduced this. It seems x86_pmu_stop can be called recursively (like when it losts some samples) like below: x86_pmu_stop intel_pmu_disable_event (x86_pmu_disable) intel_pmu_pebs_disable intel_pmu_drain_pebs_nhm (x86_pmu_drain_pebs_buffer) x86_pmu_stop While commit 35d1ce6b ("perf/x86/intel/ds: Fix x86_pmu_stop warning for large PEBS") fixed it for the normal cases, there's another path to call x86_pmu_stop() recursively when a PEBS error was detected (like two or more counters overflowed at the same time). Like in the Kan's previous fix, we can skip the interrupt accounting for large PEBS, so check the iregs which is set for PMI only. Fixes: 3966c3fe ("x86/perf/amd: Remove need to check "running" bit in NMI handler") Reported-by: NJohn Sperbeck <jsperbeck@google.com> Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201126110922.317681-1-namhyung@kernel.org
-
- 02 12月, 2020 1 次提交
-
-
由 Babu Moger 提交于
When the AMD QoS feature CDP (code and data prioritization) is enabled or disabled, the CDP bit in MSR 0000_0C81 is written on one of the CPUs in an L3 domain (core complex). That is not correct - the CDP bit needs to be updated on all the logical CPUs in the domain. This was not spelled out clearly in the spec earlier. The specification has been updated and the updated document, "AMD64 Technology Platform Quality of Service Extensions Publication # 56375 Revision: 1.02 Issue Date: October 2020" is available now. Refer the section: Code and Data Prioritization. Fix the issue by adding a new flag arch_has_per_cpu_cfg in rdt_cache data structure. The documentation can be obtained at: https://developer.amd.com/wp-content/resources/56375.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 [ bp: Massage commit message. ] Fixes: 4d05bf71 ("x86/resctrl: Introduce AMD QOS feature") Signed-off-by: NBabu Moger <babu.moger@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NReinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/160675180380.15628.3309402017215002347.stgit@bmoger-ubuntu
-
- 01 12月, 2020 1 次提交
-
-
由 Nathan Chancellor 提交于
Currently, '--orphan-handling=warn' is spread out across four different architectures in their respective Makefiles, which makes it a little unruly to deal with in case it needs to be disabled for a specific linker version (in this case, ld.lld 10.0.1). To make it easier to control this, hoist this warning into Kconfig and the main Makefile so that disabling it is simpler, as the warning will only be enabled in a couple places (main Makefile and a couple of compressed boot folders that blow away LDFLAGS_vmlinx) and making it conditional is easier due to Kconfig syntax. One small additional benefit of this is saving a call to ld-option on incremental builds because we will have already evaluated it for CONFIG_LD_ORPHAN_WARN. To keep the list of supported architectures the same, introduce CONFIG_ARCH_WANT_LD_ORPHAN_WARN, which an architecture can select to gain this automatically after all of the sections are specified and size asserted. A special thanks to Kees Cook for the help text on this config. Link: https://github.com/ClangBuiltLinux/linux/issues/1187Acked-by: NKees Cook <keescook@chromium.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Tested-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NNathan Chancellor <natechancellor@gmail.com> Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
-
- 28 11月, 2020 2 次提交
-
-
由 Gabriele Paoloni 提交于
Currently, if mce_end() fails, no_way_out - the variable denoting whether the machine can recover from this MCE - is determined by whether the worst severity that was found across the MCA banks associated with the current CPU, is of panic severity. However, at this point no_way_out could have been already set by mca_start() after looking at all severities of all CPUs that entered the MCE handler. If mce_end() fails, check first if no_way_out is already set and, if so, stick to it, otherwise use the local worst value. [ bp: Massage. ] Signed-off-by: NGabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NTony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201127161819.3106432-2-gabriele.paoloni@intel.com
-
由 Vitaly Kuznetsov 提交于
Commit 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU") caused the following WARNING on an Intel Ice Lake CPU: get_mmio_spte: detect reserved bits on spte, addr 0xb80a0, dump hierarchy: ------ spte 0xb80a0 level 5. ------ spte 0xfcd210107 level 4. ------ spte 0x1004c40107 level 3. ------ spte 0x1004c41107 level 2. ------ spte 0x1db00000000b83b6 level 1. WARNING: CPU: 109 PID: 10254 at arch/x86/kvm/mmu/mmu.c:3569 kvm_mmu_page_fault.cold.150+0x54/0x22f [kvm] ... Call Trace: ? kvm_io_bus_get_first_dev+0x55/0x110 [kvm] vcpu_enter_guest+0xaa1/0x16a0 [kvm] ? vmx_get_cs_db_l_bits+0x17/0x30 [kvm_intel] ? skip_emulated_instruction+0xaa/0x150 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xca/0x520 [kvm] The guest triggering this crashes. Note, this happens with the traditional MMU and EPT enabled, not with the newly introduced TDP MMU. Turns out, there was a subtle change in the above mentioned commit. Previously, walk_shadow_page_get_mmio_spte() was setting 'root' to 'iterator.level' which is returned by shadow_walk_init() and this equals to 'vcpu->arch.mmu->shadow_root_level'. Now, get_mmio_spte() sets it to 'int root = vcpu->arch.mmu->root_level'. The difference between 'root_level' and 'shadow_root_level' on CPUs supporting 5-level page tables is that in some case we don't want to use 5-level, in particular when 'cpuid_maxphyaddr(vcpu) <= 48' kvm_mmu_get_tdp_level() returns '4'. In case upper layer is not used, the corresponding SPTE will fail '__is_rsvd_bits_set()' check. Revert to using 'shadow_root_level'. Fixes: 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU") Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20201126110206.2118959-1-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 27 11月, 2020 2 次提交
-
-
由 Paolo Bonzini 提交于
kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are a hodge-podge of conditions, hacked together to get something that more or less works. But what is actually needed is much simpler; in both cases the fundamental question is, do we have a place to stash an interrupt if userspace does KVM_INTERRUPT? In userspace irqchip mode, that is !vcpu->arch.interrupt.injected. Currently kvm_event_needs_reinjection(vcpu) covers it, but it is unnecessarily restrictive. In split irqchip mode it's a bit more complicated, we need to check kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK cycle and thus requires ExtINTs not to be masked) as well as !pending_userspace_extint(vcpu). However, there is no need to check kvm_event_needs_reinjection(vcpu), since split irqchip keeps pending ExtINT state separate from event injection state, and checking kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher priority than APIC interrupts. In fact the latter fixes a bug: when userspace requests an IRQ window vmexit, an interrupt in the local APIC can cause kvm_cpu_has_interrupt() to be true and thus kvm_vcpu_ready_for_interrupt_injection() to return false. When this happens, vcpu_run does not exit to userspace but the interrupt window vmexits keep occurring. The VM loops without any hope of making progress. Once we try to fix these with something like return kvm_arch_interrupt_allowed(vcpu) && - !kvm_cpu_has_interrupt(vcpu) && - !kvm_event_needs_reinjection(vcpu) && - kvm_cpu_accept_dm_intr(vcpu); + (!lapic_in_kernel(vcpu) + ? !vcpu->arch.interrupt.injected + : (kvm_apic_accept_pic_intr(vcpu) + && !pending_userspace_extint(v))); we realize two things. First, thanks to the previous patch the complex conditional can reuse !kvm_cpu_has_extint(vcpu). Second, the interrupt window request in vcpu_enter_guest() bool req_int_win = dm_request_for_irq_injection(vcpu) && kvm_cpu_accept_dm_intr(vcpu); should be kept in sync with kvm_vcpu_ready_for_interrupt_injection(): it is unnecessary to ask the processor for an interrupt window if we would not be able to return to userspace. Therefore, kvm_cpu_accept_dm_intr(vcpu) is basically !kvm_cpu_has_extint(vcpu) ANDed with the existing check for masked ExtINT. It all makes sense: - we can accept an interrupt from userspace if there is a place to stash it (and, for irqchip split, ExtINTs are not masked). Interrupts from userspace _can_ be accepted even if right now EFLAGS.IF=0. - in order to tell userspace we will inject its interrupt ("IRQ window open" i.e. kvm_vcpu_ready_for_interrupt_injection), both KVM and the vCPU need to be ready to accept the interrupt. ... and this is what the patch implements. Reported-by: NDavid Woodhouse <dwmw@amazon.co.uk> Analyzed-by: NDavid Woodhouse <dwmw@amazon.co.uk> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NNikos Tsironis <ntsironis@arrikto.com> Reviewed-by: NDavid Woodhouse <dwmw@amazon.co.uk> Tested-by: NDavid Woodhouse <dwmw@amazon.co.uk>
-
由 Paolo Bonzini 提交于
Centralize handling of interrupts from the userspace APIC in kvm_cpu_has_extint and kvm_cpu_get_extint, since userspace APIC interrupts are handled more or less the same as ExtINTs are with split irqchip. This removes duplicated code from kvm_cpu_has_injectable_intr and kvm_cpu_has_interrupt, and makes the code more similar between kvm_cpu_has_{extint,interrupt} on one side and kvm_cpu_get_{extint,interrupt} on the other. Cc: stable@vger.kernel.org Reviewed-by: NFilippo Sironi <sironi@amazon.de> Reviewed-by: NDavid Woodhouse <dwmw@amazon.co.uk> Tested-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 26 11月, 2020 1 次提交
-
-
由 Anand K Mistry 提交于
When spectre_v2_user={seccomp,prctl},ibpb is specified on the command line, IBPB is force-enabled and STIPB is conditionally-enabled (or not available). However, since 21998a35 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") the spectre_v2_user_ibpb variable is set to SPECTRE_V2_USER_{PRCTL,SECCOMP} instead of SPECTRE_V2_USER_STRICT, which is the actual behaviour. Because the issuing of IBPB relies on the switch_mm_*_ibpb static branches, the mitigations behave as expected. Since 1978b3a5 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP") this discrepency caused the misreporting of IB speculation via prctl(). On CPUs with STIBP always-on and spectre_v2_user=seccomp,ibpb, prctl(PR_GET_SPECULATION_CTRL) would return PR_SPEC_PRCTL | PR_SPEC_ENABLE instead of PR_SPEC_DISABLE since both IBPB and STIPB are always on. It also allowed prctl(PR_SET_SPECULATION_CTRL) to set the IB speculation mode, even though the flag is ignored. Similarly, for CPUs without SMT, prctl(PR_GET_SPECULATION_CTRL) should also return PR_SPEC_DISABLE since IBPB is always on and STIBP is not available. [ bp: Massage commit message. ] Fixes: 21998a35 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") Fixes: 1978b3a5 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP") Signed-off-by: NAnand K Mistry <amistry@google.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201110123349.1.Id0cbf996d2151f4c143c90f9028651a5b49a5908@changeid
-
- 25 11月, 2020 1 次提交
-
-
由 Lu Baolu 提交于
After commit 327d5b2f ("iommu/vt-d: Allow 32bit devices to uses DMA domain"), swiotlb could also be used for direct memory access if IOMMU is enabled but a device is configured to pass through the DMA translation. Keep swiotlb when IOMMU is forced on, otherwise, some devices won't work if "iommu=pt" kernel parameter is used. Fixes: 327d5b2f ("iommu/vt-d: Allow 32bit devices to uses DMA domain") Reported-and-tested-by: NAdrian Huang <ahuang12@lenovo.com> Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20201125014124.4070776-1-baolu.lu@linux.intel.com Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=210237Signed-off-by: NWill Deacon <will@kernel.org>
-
- 24 11月, 2020 3 次提交
-
-
由 Peter Zijlstra 提交于
We call arch_cpu_idle() with RCU disabled, but then use local_irq_{en,dis}able(), which invokes tracing, which relies on RCU. Switch all arch_cpu_idle() implementations to use raw_local_irq_{en,dis}able() and carefully manage the lockdep,rcu,tracing state like we do in entry. (XXX: we really should change arch_cpu_idle() to not return with interrupts enabled) Reported-by: NSven Schnelle <svens@linux.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NMark Rutland <mark.rutland@arm.com> Tested-by: NMark Rutland <mark.rutland@arm.com> Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
-
由 Xiaochen Shen 提交于
On resource group creation via a mkdir an extra kernfs_node reference is obtained by kernfs_get() to ensure that the rdtgroup structure remains accessible for the rdtgroup_kn_unlock() calls where it is removed on deletion. Currently the extra kernfs_node reference count is only dropped by kernfs_put() in rdtgroup_kn_unlock() while the rdtgroup structure is removed in a few other locations that lack the matching reference drop. In call paths of rmdir and umount, when a control group is removed, kernfs_remove() is called to remove the whole kernfs nodes tree of the control group (including the kernfs nodes trees of all child monitoring groups), and then rdtgroup structure is freed by kfree(). The rdtgroup structures of all child monitoring groups under the control group are freed by kfree() in free_all_child_rdtgrp(). Before calling kfree() to free the rdtgroup structures, the kernfs node of the control group itself as well as the kernfs nodes of all child monitoring groups still take the extra references which will never be dropped to 0 and the kernfs nodes will never be freed. It leads to reference count leak and kernfs_node_cache memory leak. For example, reference count leak is observed in these two cases: (1) mount -t resctrl resctrl /sys/fs/resctrl mkdir /sys/fs/resctrl/c1 mkdir /sys/fs/resctrl/c1/mon_groups/m1 umount /sys/fs/resctrl (2) mkdir /sys/fs/resctrl/c1 mkdir /sys/fs/resctrl/c1/mon_groups/m1 rmdir /sys/fs/resctrl/c1 The same reference count leak issue also exists in the error exit paths of mkdir in mkdir_rdt_prepare() and rdtgroup_mkdir_ctrl_mon(). Fix this issue by following changes to make sure the extra kernfs_node reference on rdtgroup is dropped before freeing the rdtgroup structure. (1) Introduce rdtgroup removal helper rdtgroup_remove() to wrap up kernfs_put() and kfree(). (2) Call rdtgroup_remove() in rdtgroup removal path where the rdtgroup structure is about to be freed by kfree(). (3) Call rdtgroup_remove() or kernfs_put() as appropriate in the error exit paths of mkdir where an extra reference is taken by kernfs_get(). Fixes: f3cbeaca ("x86/intel_rdt/cqm: Add rmdir support") Fixes: e02737d5 ("x86/intel_rdt: Add tasks files") Fixes: 60cf5e10 ("x86/intel_rdt: Add mkdir to resctrl file system") Reported-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NXiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NReinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1604085088-31707-1-git-send-email-xiaochen.shen@intel.com
-
由 Xiaochen Shen 提交于
Willem reported growing of kernfs_node_cache entries in slabtop when repeatedly creating and removing resctrl subdirectories as well as when repeatedly mounting and unmounting the resctrl filesystem. On resource group (control as well as monitoring) creation via a mkdir an extra kernfs_node reference is obtained to ensure that the rdtgroup structure remains accessible for the rdtgroup_kn_unlock() calls where it is removed on deletion. The kernfs_node reference count is dropped by kernfs_put() in rdtgroup_kn_unlock(). With the above explaining the need for one kernfs_get()/kernfs_put() pair in resctrl there are more places where a kernfs_node reference is obtained without a corresponding release. The excessive amount of reference count on kernfs nodes will never be dropped to 0 and the kernfs nodes will never be freed in the call paths of rmdir and umount. It leads to reference count leak and kernfs_node_cache memory leak. Remove the superfluous kernfs_get() calls and expand the existing comments surrounding the remaining kernfs_get()/kernfs_put() pair that remains in use. Superfluous kernfs_get() calls are removed from two areas: (1) In call paths of mount and mkdir, when kernfs nodes for "info", "mon_groups" and "mon_data" directories and sub-directories are created, the reference count of newly created kernfs node is set to 1. But after kernfs_create_dir() returns, superfluous kernfs_get() are called to take an additional reference. (2) kernfs_get() calls in rmdir call paths. Fixes: 17eafd07 ("x86/intel_rdt: Split resource group removal in two") Fixes: 4af4a88e ("x86/intel_rdt/cqm: Add mount,umount support") Fixes: f3cbeaca ("x86/intel_rdt/cqm: Add rmdir support") Fixes: d89b7379 ("x86/intel_rdt/cqm: Add mon_data") Fixes: c7d9aac6 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring") Fixes: 5dc1d5c6 ("x86/intel_rdt: Simplify info and base file lists") Fixes: 60cf5e10 ("x86/intel_rdt: Add mkdir to resctrl file system") Fixes: 4e978d06 ("x86/intel_rdt: Add "info" files to resctrl file system") Reported-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NXiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NReinette Chatre <reinette.chatre@intel.com> Tested-by: NWillem de Bruijn <willemb@google.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1604085053-31639-1-git-send-email-xiaochen.shen@intel.com
-
- 23 11月, 2020 1 次提交
-
-
由 Dan Williams 提交于
The core-mm has a default __weak implementation of phys_to_target_node() to mirror the weak definition of memory_add_physaddr_to_nid(). That symbol is exported for modules. However, while the export in mm/memory_hotplug.c exported the symbol in the configuration cases of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=y ...and: CONFIG_NUMA_KEEP_MEMINFO=n CONFIG_MEMORY_HOTPLUG=y ...it failed to export the symbol in the case of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=n Not only is that broken, but Christoph points out that the kernel should not be exporting any __weak symbol, which means that memory_add_physaddr_to_nid() example that phys_to_target_node() copied is broken too. Rework the definition of phys_to_target_node() and memory_add_physaddr_to_nid() to not require weak symbols. Move to the common arch override design-pattern of an asm header defining a symbol to replace the default implementation. The only common header that all memory_add_physaddr_to_nid() producing architectures implement is asm/sparsemem.h. In fact, powerpc already defines its memory_add_physaddr_to_nid() helper in sparsemem.h. Double-down on that observation and define phys_to_target_node() where necessary in asm/sparsemem.h. An alternate consideration that was discarded was to put this override in asm/numa.h, but that entangles with the definition of MAX_NUMNODES relative to the inclusion of linux/nodemask.h, and requires powerpc to grow a new header. The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid now that the symbol is properly exported / stubbed in all combinations of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG. [dan.j.williams@intel.com: v4] Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com [dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning] Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a035b6bf ("mm/memory_hotplug: introduce default phys_to_target_node() implementation") Reported-by: NRandy Dunlap <rdunlap@infradead.org> Reported-by: NThomas Gleixner <tglx@linutronix.de> Reported-by: Nkernel test robot <lkp@intel.com> Reported-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NRandy Dunlap <rdunlap@infradead.org> Tested-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 11月, 2020 2 次提交
-
-
由 Zhenzhong Duan 提交于
"intel_iommu=off" command line is used to disable iommu but iommu is force enabled in a tboot system for security reason. However for better performance on high speed network device, a new option "intel_iommu=tboot_noforce" is introduced to disable the force on. By default kernel should panic if iommu init fail in tboot for security reason, but it's unnecessory if we use "intel_iommu=tboot_noforce,off". Fix the code setting force_on and move intel_iommu_tboot_noforce from tboot code to intel iommu code. Fixes: 7304e8f2 ("iommu/vt-d: Correctly disable Intel IOMMU force on") Signed-off-by: NZhenzhong Duan <zhenzhong.duan@gmail.com> Tested-by: NLukasz Hawrylko <lukasz.hawrylko@linux.intel.com> Acked-by: NLu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20201110071908.3133-1-zhenzhong.duan@gmail.comSigned-off-by: NWill Deacon <will@kernel.org>
-
由 Thomas Gleixner 提交于
sysrq-t ends up invoking show_opcodes() for each task which tries to access the user space code of other processes, which is obviously bogus. It either manages to dump where the foreign task's regs->ip points to in a valid mapping of the current task or triggers a pagefault and prints "Code: Bad RIP value.". Both is just wrong. Add a safeguard in copy_code() and check whether the @regs pointer matches currents pt_regs. If not, do not even try to access it. While at it, add commentary why using copy_from_user_nmi() is safe in copy_code() even if the function name suggests otherwise. Reported-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Acked-by: NOleg Nesterov <oleg@redhat.com> Tested-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201117202753.667274723@linutronix.de
-
- 17 11月, 2020 4 次提交
-
-
由 Sami Tolvanen 提交于
This change switches rapl to use PMU_FORMAT_ATTR, and fixes two other macros to use device_attribute instead of kobj_attribute to avoid callback type mismatches that trip indirect call checking with Clang's Control-Flow Integrity (CFI). Reported-by: NSedat Dilek <sedat.dilek@gmail.com> Signed-off-by: NSami Tolvanen <samitolvanen@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20201113183126.1239404-1-samitolvanen@google.com
-
由 Chen Yu 提交于
Currently, scan_microcode() leverages microcode_matches() to check if the microcode matches the CPU by comparing the family and model. However, the processor stepping and flags of the microcode signature should also be considered when saving a microcode patch for early update. Use find_matching_signature() in scan_microcode() and get rid of the now-unused microcode_matches() which is a good cleanup in itself. Complete the verification of the patch being saved for early loading in save_microcode_patch() directly. This needs to be done there too because save_mc_for_early() will call save_microcode_patch() too. The second reason why this needs to be done is because the loader still tries to support, at least hypothetically, mixed-steppings systems and thus adds all patches to the cache that belong to the same CPU model albeit with different steppings. For example: microcode: CPU: sig=0x906ec, pf=0x2, rev=0xd6 microcode: mc_saved[0]: sig=0x906e9, pf=0x2a, rev=0xd6, total size=0x19400, date = 2020-04-23 microcode: mc_saved[1]: sig=0x906ea, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27 microcode: mc_saved[2]: sig=0x906eb, pf=0x2, rev=0xd6, total size=0x19400, date = 2020-04-23 microcode: mc_saved[3]: sig=0x906ec, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27 microcode: mc_saved[4]: sig=0x906ed, pf=0x22, rev=0xd6, total size=0x19400, date = 2020-04-23 The patch which is being saved for early loading, however, can only be the one which fits the CPU this runs on so do the signature verification before saving. [ bp: Do signature verification in save_microcode_patch() and rewrite commit message. ] Fixes: ec400dde ("x86/microcode_intel_early.c: Early update ucode on Intel's CPU") Signed-off-by: NChen Yu <yu.c.chen@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=208535 Link: https://lkml.kernel.org/r/20201113015923.13960-1-yu.c.chen@intel.com
-
由 Chen Zhou 提交于
Fix to return a negative error code from the error handling case instead of 0 in function svm_create_vcpu(), as done elsewhere in this function. Fixes: f4c847a9 ("KVM: SVM: refactor msr permission bitmap allocation") Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NChen Zhou <chenzhou10@huawei.com> Message-Id: <20201117025426.167824-1-chenzhou10@huawei.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ashish Kalra 提交于
Fix offset computation in __sev_dbg_decrypt() to include the source paddr before it is rounded down to be aligned to 16 bytes as required by SEV API. This fixes incorrect guest memory dumps observed when using qemu monitor. Signed-off-by: NAshish Kalra <ashish.kalra@amd.com> Message-Id: <20201110224205.29444-1-Ashish.Kalra@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 11月, 2020 1 次提交
-
-
由 Paolo Bonzini 提交于
In some cases where shadow paging is in use, the root page will be either mmu->pae_root or vcpu->arch.mmu->lm_root. Then it will not have an associated struct kvm_mmu_page, because it is allocated with alloc_page instead of kvm_mmu_alloc_page. Just return false quickly from is_tdp_mmu_root if the TDP MMU is not in use, which also includes the case where shadow paging is enabled. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 11月, 2020 4 次提交
-
-
由 Babu Moger 提交于
For AMD SEV guests, update the cr3_lm_rsvd_bits to mask the memory encryption bit in reserved bits. Signed-off-by: NBabu Moger <babu.moger@amd.com> Message-Id: <160521948301.32054.5783800787423231162.stgit@bmoger-ubuntu> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Babu Moger 提交于
SEV guests fail to boot on a system that supports the PCID feature. While emulating the RSM instruction, KVM reads the guest CR3 and calls kvm_set_cr3(). If the vCPU is in the long mode, kvm_set_cr3() does a sanity check for the CR3 value. In this case, it validates whether the value has any reserved bits set. The reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory encryption is enabled, the memory encryption bit is set in the CR3 value. The memory encryption bit may fall within the KVM reserved bit range, causing the KVM emulation failure. Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will cache the reserved bits in the CR3 value. This will be initialized to rsvd_bits(cpuid_maxphyaddr(vcpu), 63). If the architecture has any special bits(like AMD SEV encryption bit) that needs to be masked from the reserved bits, should be cleared in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler. Fixes: a780a3ea ("KVM: X86: Fix reserved bits check for MOV to CR3") Signed-off-by: NBabu Moger <babu.moger@amd.com> Message-Id: <160521947657.32054.3264016688005356563.stgit@bmoger-ubuntu> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Edmondson 提交于
The instruction emulator ignores clflush instructions, yet fails to support clflushopt. Treat both similarly. Fixes: 13e457e0 ("KVM: x86: Emulator does not decode clflush well") Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com> Message-Id: <20201103120400.240882-1-david.edmondson@oracle.com> Reviewed-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Mike Travis 提交于
A test shows that the output contains a space: # cat /proc/sgi_uv/archtype NSGI4 U/UVX Remove that embedded space by copying the "trimmed" buffer instead of the untrimmed input character list. Use sizeof to remove size dependency on copy out length. Increase output buffer size by one character just in case BIOS sends an 8 character string for archtype. Fixes: 1e61f5a9 ("Add and decode Arch Type in UVsystab") Signed-off-by: NMike Travis <mike.travis@hpe.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20201111010418.82133-1-mike.travis@hpe.com
-
- 11 11月, 2020 3 次提交
-
-
由 Jiri Slaby 提交于
Commit 39297dde ("x86/platform/uv: Remove UV BAU TLB Shootdown Handler") removed uv_flush_tlb_others. Its declaration was removed also from asm/uv/uv.h. But only for the CONFIG_X86_UV=y case. The inline definition (!X86_UV case) is still in place. So remove this implementation with everything what was added to support uv_flush_tlb_others: * include of asm/tlbflush.h * forward declarations of struct cpumask, mm_struct, and flush_tlb_info Signed-off-by: NJiri Slaby <jslaby@suse.cz> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NMike Travis <mike.travis@hpe.com> Acked-by: NSteve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20201109093653.2042-1-jslaby@suse.cz
-
由 Arvind Sankar 提交于
Commit d9e9a641 ("x86/mm/pti: Allocate a separate user PGD") changed the PGD allocation to allocate PGD_ALLOCATION_ORDER pages, so in the error path it should be freed using free_pages() rather than free_page(). Commit 06ace26f ("x86/efi: Free efi_pgd with free_pages()") fixed one instance of this, but missed another. Move the freeing out-of-line to avoid code duplication and fix this bug. Fixes: d9e9a641 ("x86/mm/pti: Allocate a separate user PGD") Link: https://lore.kernel.org/r/20201110163919.1134431-1-nivedita@alum.mit.eduSigned-off-by: NArvind Sankar <nivedita@alum.mit.edu> Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
-
由 Arnd Bergmann 提交于
gcc -Wextra points out a duplicate initialization of one array member: arch/x86/events/intel/uncore_snb.c:478:37: warning: initialized field overwritten [-Woverride-init] 478 | [SNB_PCI_UNCORE_IMC_DATA_READS] = { SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE, The only sensible explanation is that a duplicate 'READS' was used instead of the correct 'WRITES', so change it back. Fixes: 24633d90 ("perf/x86/intel/uncore: Add BW counters for GT, IA and IO breakdown") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201026215203.3893972-1-arnd@kernel.org
-
- 10 11月, 2020 5 次提交
-
-
由 Stephane Eranian 提交于
Starting with Arch Perfmon v5, the anythread filter on generic counters may be deprecated. The current kernel was exporting the any filter without checking. On Icelake, it means you could do cpu/event=0x3c,any/ even though the filter does not exist. This patch corrects the problem by relying on the CPUID 0xa leaf function to determine if anythread is supported or not as described in the Intel SDM Vol3b 18.2.5.1 AnyThread Deprecation section. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201028194247.3160610-1-eranian@google.com
-
由 Peter Zijlstra 提交于
Having pt_regs on-stack is unfortunate, it's 168 bytes. Since it isn't actually used, make it a static variable. This both gets if off the stack and ensures it gets 0 initialized, just in case someone does look at it. Reported-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201030151955.324273677@infradead.org
-
由 Peter Zijlstra 提交于
struct perf_sample_data lives on-stack, we should be careful about it's size. Furthermore, the pt_regs copy in there is only because x86_64 is a trainwreck, solve it differently. Reported-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NSteven Rostedt <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20201030151955.258178461@infradead.org
-
由 Peter Zijlstra 提交于
intel_pmu_drain_pebs_*() is typically called from handle_pmi_common(), both have an on-stack struct perf_sample_data, which is *big*. Rewire things so that drain_pebs() can use the one handle_pmi_common() has. Reported-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201030151955.054099690@infradead.org
-
由 Peter Zijlstra 提交于
__perf_output_begin() has an on-stack struct perf_sample_data in the unlikely case it needs to generate a LOST record. However, every call to perf_output_begin() must already have a perf_sample_data on-stack. Reported-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201030151954.985416146@infradead.org
-
- 09 11月, 2020 1 次提交
-
-
由 Brian Masney 提交于
When booting a hyperthreaded system with the kernel parameter 'mitigations=auto,nosmt', the following warning occurs: WARNING: CPU: 0 PID: 1 at drivers/xen/events/events_base.c:1112 unbind_from_irqhandler+0x4e/0x60 ... Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 ... Call Trace: xen_uninit_lock_cpu+0x28/0x62 xen_hvm_cpu_die+0x21/0x30 takedown_cpu+0x9c/0xe0 ? trace_suspend_resume+0x60/0x60 cpuhp_invoke_callback+0x9a/0x530 _cpu_up+0x11a/0x130 cpu_up+0x7e/0xc0 bringup_nonboot_cpus+0x48/0x50 smp_init+0x26/0x79 kernel_init_freeable+0xea/0x229 ? rest_init+0xaa/0xaa kernel_init+0xa/0x106 ret_from_fork+0x35/0x40 The secondary CPUs are not activated with the nosmt mitigations and only the primary thread on each CPU core is used. In this situation, xen_hvm_smp_prepare_cpus(), and more importantly xen_init_lock_cpu(), is not called, so the lock_kicker_irq is not initialized for the secondary CPUs. Let's fix this by exiting early in xen_uninit_lock_cpu() if the irq is not set to avoid the warning from above for each secondary CPU. Signed-off-by: NBrian Masney <bmasney@redhat.com> Link: https://lore.kernel.org/r/20201107011119.631442-1-bmasney@redhat.comReviewed-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
- 08 11月, 2020 1 次提交
-
-
由 Pankaj Gupta 提交于
Windows2016 guest tries to enable LBR by setting the corresponding bits in MSR_IA32_DEBUGCTLMSR. KVM does not emulate MSR_IA32_DEBUGCTLMSR and spams the host kernel logs with error messages like: kvm [...]: vcpu1, guest rIP: 0xfffff800a8b687d3 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x1, nop" This patch fixes this by enabling error logging only with 'report_ignored_msrs=1'. Signed-off-by: NPankaj Gupta <pankaj.gupta@cloud.ionos.com> Message-Id: <20201105153932.24316-1-pankaj.gupta.linux@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-