- 15 10月, 2021 5 次提交
-
-
由 Tim Chen 提交于
There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce is shared among a cluster of cores instead of being exclusive to one single core. To prevent oversubscription of L2 cache, load should be balanced between such L2 clusters, especially for tasks with no shared data. On benchmark such as SPECrate mcf test, this change provides a boost to performance especially on medium load system on Jacobsville. on a Jacobsville that has 24 Atom cores, arranged into 6 clusters of 4 cores each, the benchmark number is as follow: Improvement over baseline kernel for mcf_r copies run time base rate 1 -0.1% -0.2% 6 25.1% 25.1% 12 18.8% 19.0% 24 0.3% 0.3% So this looks pretty good. In terms of the system's task distribution, some pretty bad clumping can be seen for the vanilla kernel without the L2 cluster domain for the 6 and 12 copies case. With the extra domain for cluster, the load does get evened out between the clusters. Note this patch isn't an universal win as spreading isn't necessarily a win, particually for those workload who can benefit from packing. Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NBarry Song <song.bao.hua@hisilicon.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210924085104.44806-4-21cnbao@gmail.com
-
由 Barry Song 提交于
This patch adds scheduler level for clusters and automatically enables the load balance among clusters. It will directly benefit a lot of workload which loves more resources such as memory bandwidth, caches. Testing has widely been done in two different hardware configurations of Kunpeng920: 24 cores in one NUMA(6 clusters in each NUMA node); 32 cores in one NUMA(8 clusters in each NUMA node) Workload is running on either one NUMA node or four NUMA nodes, thus, this can estimate the effect of cluster spreading w/ and w/o NUMA load balance. * Stream benchmark: 4threads stream (on 1NUMA * 24cores = 24cores) stream stream w/o patch w/ patch MB/sec copy 29929.64 ( 0.00%) 32932.68 ( 10.03%) MB/sec scale 29861.10 ( 0.00%) 32710.58 ( 9.54%) MB/sec add 27034.42 ( 0.00%) 32400.68 ( 19.85%) MB/sec triad 27225.26 ( 0.00%) 31965.36 ( 17.41%) 6threads stream (on 1NUMA * 24cores = 24cores) stream stream w/o patch w/ patch MB/sec copy 40330.24 ( 0.00%) 42377.68 ( 5.08%) MB/sec scale 40196.42 ( 0.00%) 42197.90 ( 4.98%) MB/sec add 37427.00 ( 0.00%) 41960.78 ( 12.11%) MB/sec triad 37841.36 ( 0.00%) 42513.64 ( 12.35%) 12threads stream (on 1NUMA * 24cores = 24cores) stream stream w/o patch w/ patch MB/sec copy 52639.82 ( 0.00%) 53818.04 ( 2.24%) MB/sec scale 52350.30 ( 0.00%) 53253.38 ( 1.73%) MB/sec add 53607.68 ( 0.00%) 55198.82 ( 2.97%) MB/sec triad 54776.66 ( 0.00%) 56360.40 ( 2.89%) Thus, it could help memory-bound workload especially under medium load. Similar improvement is also seen in lkp-pbzip2: * lkp-pbzip2 benchmark 2-96 threads (on 4NUMA * 24cores = 96cores) lkp-pbzip2 lkp-pbzip2 w/o patch w/ patch Hmean tput-2 11062841.57 ( 0.00%) 11341817.51 * 2.52%* Hmean tput-5 26815503.70 ( 0.00%) 27412872.65 * 2.23%* Hmean tput-8 41873782.21 ( 0.00%) 43326212.92 * 3.47%* Hmean tput-12 61875980.48 ( 0.00%) 64578337.51 * 4.37%* Hmean tput-21 105814963.07 ( 0.00%) 111381851.01 * 5.26%* Hmean tput-30 150349470.98 ( 0.00%) 156507070.73 * 4.10%* Hmean tput-48 237195937.69 ( 0.00%) 242353597.17 * 2.17%* Hmean tput-79 360252509.37 ( 0.00%) 362635169.23 * 0.66%* Hmean tput-96 394571737.90 ( 0.00%) 400952978.48 * 1.62%* 2-24 threads (on 1NUMA * 24cores = 24cores) lkp-pbzip2 lkp-pbzip2 w/o patch w/ patch Hmean tput-2 11071705.49 ( 0.00%) 11296869.10 * 2.03%* Hmean tput-4 20782165.19 ( 0.00%) 21949232.15 * 5.62%* Hmean tput-6 30489565.14 ( 0.00%) 33023026.96 * 8.31%* Hmean tput-8 40376495.80 ( 0.00%) 42779286.27 * 5.95%* Hmean tput-12 61264033.85 ( 0.00%) 62995632.78 * 2.83%* Hmean tput-18 86697139.39 ( 0.00%) 86461545.74 ( -0.27%) Hmean tput-24 104854637.04 ( 0.00%) 104522649.46 * -0.32%* In the case of 6 threads and 8 threads, we see the greatest performance improvement. Similar improvement can be seen on lkp-pixz though the improvement is smaller: * lkp-pixz benchmark 2-24 threads lkp-pixz (on 1NUMA * 24cores = 24cores) lkp-pixz lkp-pixz w/o patch w/ patch Hmean tput-2 6486981.16 ( 0.00%) 6561515.98 * 1.15%* Hmean tput-4 11645766.38 ( 0.00%) 11614628.43 ( -0.27%) Hmean tput-6 15429943.96 ( 0.00%) 15957350.76 * 3.42%* Hmean tput-8 19974087.63 ( 0.00%) 20413746.98 * 2.20%* Hmean tput-12 28172068.18 ( 0.00%) 28751997.06 * 2.06%* Hmean tput-18 39413409.54 ( 0.00%) 39896830.55 * 1.23%* Hmean tput-24 49101815.85 ( 0.00%) 49418141.47 * 0.64%* * SPECrate benchmark 4,8,16 copies mcf_r(on 1NUMA * 32cores = 32cores) Base Base Run Time Rate ------- --------- 4 Copies w/o 580 (w/ 570) w/o 11.1 (w/ 11.3) 8 Copies w/o 647 (w/ 605) w/o 20.0 (w/ 21.4, +7%) 16 Copies w/o 844 (w/ 844) w/o 30.6 (w/ 30.6) 32 Copies(on 4NUMA * 32 cores = 128cores) [w/o patch] Base Base Base Benchmarks Copies Run Time Rate --------------- ------- --------- --------- 500.perlbench_r 32 584 87.2 * 502.gcc_r 32 503 90.2 * 505.mcf_r 32 745 69.4 * 520.omnetpp_r 32 1031 40.7 * 523.xalancbmk_r 32 597 56.6 * 525.x264_r 1 -- CE 531.deepsjeng_r 32 336 109 * 541.leela_r 32 556 95.4 * 548.exchange2_r 32 513 163 * 557.xz_r 32 530 65.2 * Est. SPECrate2017_int_base 80.3 [w/ patch] Base Base Base Benchmarks Copies Run Time Rate --------------- ------- --------- --------- 500.perlbench_r 32 580 87.8 (+0.688%) * 502.gcc_r 32 477 95.1 (+5.432%) * 505.mcf_r 32 644 80.3 (+13.574%) * 520.omnetpp_r 32 942 44.6 (+9.58%) * 523.xalancbmk_r 32 560 60.4 (+6.714%%) * 525.x264_r 1 -- CE 531.deepsjeng_r 32 337 109 (+0.000%) * 541.leela_r 32 554 95.6 (+0.210%) * 548.exchange2_r 32 515 163 (+0.000%) * 557.xz_r 32 524 66.0 (+1.227%) * Est. SPECrate2017_int_base 83.7 (+4.062%) On the other hand, it is slightly helpful to CPU-bound tasks like kernbench: * 24-96 threads kernbench (on 4NUMA * 24cores = 96cores) kernbench kernbench w/o cluster w/ cluster Min user-24 12054.67 ( 0.00%) 12024.19 ( 0.25%) Min syst-24 1751.51 ( 0.00%) 1731.68 ( 1.13%) Min elsp-24 600.46 ( 0.00%) 598.64 ( 0.30%) Min user-48 12361.93 ( 0.00%) 12315.32 ( 0.38%) Min syst-48 1917.66 ( 0.00%) 1892.73 ( 1.30%) Min elsp-48 333.96 ( 0.00%) 332.57 ( 0.42%) Min user-96 12922.40 ( 0.00%) 12921.17 ( 0.01%) Min syst-96 2143.94 ( 0.00%) 2110.39 ( 1.56%) Min elsp-96 211.22 ( 0.00%) 210.47 ( 0.36%) Amean user-24 12063.99 ( 0.00%) 12030.78 * 0.28%* Amean syst-24 1755.20 ( 0.00%) 1735.53 * 1.12%* Amean elsp-24 601.60 ( 0.00%) 600.19 ( 0.23%) Amean user-48 12362.62 ( 0.00%) 12315.56 * 0.38%* Amean syst-48 1921.59 ( 0.00%) 1894.95 * 1.39%* Amean elsp-48 334.10 ( 0.00%) 332.82 * 0.38%* Amean user-96 12925.27 ( 0.00%) 12922.63 ( 0.02%) Amean syst-96 2146.66 ( 0.00%) 2122.20 * 1.14%* Amean elsp-96 211.96 ( 0.00%) 211.79 ( 0.08%) Note this patch isn't an universal win, it might hurt those workload which can benefit from packing. Though tasks which want to take advantages of lower communication latency of one cluster won't necessarily been packed in one cluster while kernel is not aware of clusters, they have some chance to be randomly packed. But this patch will make them more likely spread. Signed-off-by: NBarry Song <song.bao.hua@hisilicon.com> Tested-by: NYicong Yang <yangyicong@hisilicon.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-
由 Jonathan Cameron 提交于
Both ACPI and DT provide the ability to describe additional layers of topology between that of individual cores and higher level constructs such as the level at which the last level cache is shared. In ACPI this can be represented in PPTT as a Processor Hierarchy Node Structure [1] that is the parent of the CPU cores and in turn has a parent Processor Hierarchy Nodes Structure representing a higher level of topology. For example Kunpeng 920 has 6 or 8 clusters in each NUMA node, and each cluster has 4 cpus. All clusters share L3 cache data, but each cluster has local L3 tag. On the other hand, each clusters will share some internal system bus. +-----------------------------------+ +---------+ | +------+ +------+ +--------------------------+ | | | CPU0 | | cpu1 | | +-----------+ | | | +------+ +------+ | | | | | | +----+ L3 | | | | +------+ +------+ cluster | | tag | | | | | CPU2 | | CPU3 | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | | +-----------------------------------+ | | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | | | L3 | | | | +------+ +------+ +----+ tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | L3 | | data | +-----------------------------------+ | | | +------+ +------+ | +-----------+ | | | | | | | | | | | | | +------+ +------+ +----+ L3 | | | | | | tag | | | | +------+ +------+ | | | | | | | | | | | +-----------+ | | | +------+ +------+ +--------------------------+ | +-----------------------------------| | | +-----------------------------------| | | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | +----+ L3 | | | | +------+ +------+ | | tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | | +-----------------------------------+ | | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | | | L3 | | | | +------+ +------+ +---+ tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | | | +-----------------------------------+ | | +-----------------------------------+ | | | +------+ +------+ +--------------------------+ | | | | | | | +-----------+ | | | +------+ +------+ | | | | | | | | L3 | | | | +------+ +------+ +--+ tag | | | | | | | | | | | | | | +------+ +------+ | +-----------+ | | | | +---------+ +-----------------------------------+ That means spreading tasks among clusters will bring more bandwidth while packing tasks within one cluster will lead to smaller cache synchronization latency. So both kernel and userspace will have a chance to leverage this topology to deploy tasks accordingly to achieve either smaller cache latency within one cluster or an even distribution of load among clusters for higher throughput. This patch exposes cluster topology to both kernel and userspace. Libraried like hwloc will know cluster by cluster_cpus and related sysfs attributes. PoC of HWLOC support at [2]. Note this patch only handle the ACPI case. Special consideration is needed for SMT processors, where it is necessary to move 2 levels up the hierarchy from the leaf nodes (thus skipping the processor core level). Note that arm64 / ACPI does not provide any means of identifying a die level in the topology but that may be unrelate to the cluster level. [1] ACPI Specification 6.3 - section 5.2.29.1 processor hierarchy node structure (Type 0) [2] https://github.com/hisilicon/hwloc/tree/linux-clusterSigned-off-by: NJonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: NTian Tao <tiantao6@hisilicon.com> Signed-off-by: NBarry Song <song.bao.hua@hisilicon.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210924085104.44806-2-21cnbao@gmail.com
-
由 Kees Cook 提交于
Having a stable wchan means the process must be blocked and for it to stay that way while performing stack unwinding. Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm] Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64] Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org
-
由 Qi Zheng 提交于
Currently, the kernel CONFIG_UNWINDER_ORC option is enabled by default on x86, but the implementation of get_wchan() is still based on the frame pointer unwinder, so the /proc/<pid>/wchan usually returned 0 regardless of whether the task <pid> is running. Reimplement get_wchan() by calling stack_trace_save_tsk(), which is adapted to the ORC and frame pointer unwinders. Fixes: ee9f8fce ("x86/unwind: Add the ORC unwinder") Signed-off-by: NQi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211008111626.271115116@infradead.org
-
- 05 10月, 2021 1 次提交
-
-
由 Ricardo Neri 提交于
When scheduling, it is better to prefer a separate physical core rather than the SMT sibling of a high priority core. The existing formula to compute priorities takes such fact in consideration. There may exist, however, combinations of priorities (i.e., maximum frequencies) in which the priority of high-numbered SMT siblings of high-priority cores collides with the priority of low-numbered SMT siblings of low-priority cores. Consider for instance an SMT2 system with CPUs [0, 1] with priority 60 and [2, 3] with priority 30(CPUs in brackets are SMT siblings. In such a case, the resulting priorities would be [120, 60], [60, 30]. Thus, to ensure that CPU2 has higher priority than CPU1, divide the raw priority by the squared SMT iterator. The resulting priorities are [120, 30]. [60, 15]. Originally-by: NLen Brown <len.brown@intel.com> Signed-off-by: NLen Brown <len.brown@intel.com> Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210911011819.12184-2-ricardo.neri-calderon@linux.intel.com
-
- 04 10月, 2021 1 次提交
-
-
由 Linus Torvalds 提交于
The recent change to make objtool aware of more symbol relocation types (commit 24ff6525: "objtool: Teach get_alt_entry() about more relocation types") also added another check, and resulted in this objtool warning when building kvm on x86: arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception The reason seems to be that kvm_fastop_exception() is marked as a global symbol, which causes the relocation to ke kept around for objtool. And at the same time, the kvm_fastop_exception definition (which is done as an inline asm statement) doesn't actually set the type of the global, which then makes objtool unhappy. The minimal fix is to just not mark kvm_fastop_exception as being a global symbol. It's only used in that one compilation unit anyway, so it was always pointless. That's how all the other local exception table labels are done. I'm not entirely happy about the kinds of games that the kvm code plays with doing its own exception handling, and the fact that it confused objtool is most definitely a symptom of the code being a bit too subtle and ad-hoc. But at least this trivial one-liner makes objtool no longer upset about what is going on. Fixes: 24ff6525 ("objtool: Teach get_alt_entry() about more relocation types") Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/ Cc: Borislav Petkov <bp@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 10月, 2021 2 次提交
-
-
由 Kan Liang 提交于
According to the latest event list, the event encoding 0xEF is only available on the first 4 counters. Add it into the event constraints table. Fixes: 60176089 ("perf/x86/intel: Add Icelake support") Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1632842343-25862-1-git-send-email-kan.liang@linux.intel.com
-
由 Anand K Mistry 提交于
perf_init_event tries multiple init callbacks and does not reset the event state between tries. When x86_pmu_event_init runs, it unconditionally sets the destroy callback to hw_perf_event_destroy. On the next init attempt after x86_pmu_event_init, in perf_try_init_event, if the pmu's capabilities includes PERF_PMU_CAP_NO_EXCLUDE, the destroy callback will be run. However, if the next init didn't set the destroy callback, hw_perf_event_destroy will be run (since the callback wasn't reset). Looking at other pmu init functions, the common pattern is to only set the destroy callback on a successful init. Resetting the callback on failure tries to replicate that pattern. This was discovered after commit f11dd0d8 ("perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS Op") when the second (and only second) run of the perf tool after a reboot results in 0 samples being generated. The extra run of hw_perf_event_destroy results in active_events having an extra decrement on each perf run. The second run has active_events == 0 and every subsequent run has active_events < 0. When active_events == 0, the NMI handler will early-out and not record any samples. Signed-off-by: NAnand K Mistry <amistry@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210929170405.1.I078b98ee7727f9ae9d6df8262bad7e325e40faf0@changeid
-
- 30 9月, 2021 2 次提交
-
-
由 Sean Christopherson 提交于
Check whether a CPUID entry's index is significant before checking for a matching index to hack-a-fix an undefined behavior bug due to consuming uninitialized data. RESET/INIT emulation uses kvm_cpuid() to retrieve CPUID.0x1, which does _not_ have a significant index, and fails to initialize the dummy variable that doubles as EBX/ECX/EDX output _and_ ECX, a.k.a. index, input. Practically speaking, it's _extremely_ unlikely any compiler will yield code that causes problems, as the compiler would need to inline the kvm_cpuid() call to detect the uninitialized data, and intentionally hose the kernel, e.g. insert ud2, instead of simply ignoring the result of the index comparison. Although the sketchy "dummy" pattern was introduced in SVM by commit 66f7b72e ("KVM: x86: Make register state after reset conform to specification"), it wasn't actually broken until commit 7ff6c035 ("KVM: x86: Remove stateful CPUID handling") arbitrarily swapped the order of operations such that "index" was checked before the significant flag. Avoid consuming uninitialized data by reverting to checking the flag before the index purely so that the fix can be easily backported; the offending RESET/INIT code has been refactored, moved, and consolidated from vendor code to common x86 since the bug was introduced. A future patch will directly address the bad RESET/INIT behavior. The undefined behavior was detected by syzbot + KernelMemorySanitizer. BUG: KMSAN: uninit-value in cpuid_entry2_find arch/x86/kvm/cpuid.c:68 BUG: KMSAN: uninit-value in kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103 BUG: KMSAN: uninit-value in kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183 cpuid_entry2_find arch/x86/kvm/cpuid.c:68 [inline] kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103 [inline] kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183 kvm_vcpu_reset+0x13fb/0x1c20 arch/x86/kvm/x86.c:10885 kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923 vcpu_enter_guest+0xfd2/0x6d80 arch/x86/kvm/x86.c:9534 vcpu_run+0x7f5/0x18d0 arch/x86/kvm/x86.c:9788 kvm_arch_vcpu_ioctl_run+0x245b/0x2d10 arch/x86/kvm/x86.c:10020 Local variable ----dummy@kvm_vcpu_reset created at: kvm_vcpu_reset+0x1fb/0x1c20 arch/x86/kvm/x86.c:10812 kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923 Reported-by: syzbot+f3985126b746b3d59c9d@syzkaller.appspotmail.com Reported-by: NAlexander Potapenko <glider@google.com> Fixes: 2a24be79 ("KVM: VMX: Set EDX at INIT with CPUID.0x1, Family-Model-Stepping") Fixes: 7ff6c035 ("KVM: x86: Remove stateful CPUID handling") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Message-Id: <20210929222426.1855730-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Zelin Deng 提交于
There're other modules might use hv_clock_per_cpu variable like ptp_kvm, so move it into kvmclock.h and export the symbol to make it visiable to other modules. Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Cc: <stable@vger.kernel.org> Message-Id: <1632892429-101194-2-git-send-email-zelin.deng@linux.alibaba.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 28 9月, 2021 1 次提交
-
-
由 Johan Almbladh 提交于
Fix the case where the dst register maps to %rax as otherwise this produces an incorrect mapping with the implementation in 981f94c3 ("bpf: Add bitwise atomic instructions") as %rax is clobbered given it's part of the cmpxchg as operand. The issue is similar to b29dd96b ("bpf, x86: Fix BPF_FETCH atomic and/or/ xor with r0 as src") just that the case of dst register was missed. Before, dst=r0 (%rax) src=r2 (%rsi): [...] c5: mov %rax,%r10 c8: mov 0x0(%rax),%rax <---+ (broken) cc: mov %rax,%r11 | cf: and %rsi,%r11 | d2: lock cmpxchg %r11,0x0(%rax) <---+ d8: jne 0x00000000000000c8 | da: mov %rax,%rsi | dd: mov %r10,%rax | [...] | | After, dst=r0 (%rax) src=r2 (%rsi): | | [...] | da: mov %rax,%r10 | dd: mov 0x0(%r10),%rax <---+ (fixed) e1: mov %rax,%r11 | e4: and %rsi,%r11 | e7: lock cmpxchg %r11,0x0(%r10) <---+ ed: jne 0x00000000000000dd ef: mov %rax,%rsi f2: mov %r10,%rax [...] The remaining combinations were fine as-is though: After, dst=r9 (%r15) src=r0 (%rax): [...] dc: mov %rax,%r10 df: mov 0x0(%r15),%rax e3: mov %rax,%r11 e6: and %r10,%r11 e9: lock cmpxchg %r11,0x0(%r15) ef: jne 0x00000000000000df _ f1: mov %rax,%r10 | (unneeded, but f4: mov %r10,%rax _| not a problem) [...] After, dst=r9 (%r15) src=r4 (%rcx): [...] de: mov %rax,%r10 e1: mov 0x0(%r15),%rax e5: mov %rax,%r11 e8: and %rcx,%r11 eb: lock cmpxchg %r11,0x0(%r15) f1: jne 0x00000000000000e1 f3: mov %rax,%rcx f6: mov %r10,%rax [...] The case of dst == src register is rejected by the verifier and therefore not supported, but x86 JIT also handles this case just fine. After, dst=r0 (%rax) src=r0 (%rax): [...] eb: mov %rax,%r10 ee: mov 0x0(%r10),%rax f2: mov %rax,%r11 f5: and %r10,%r11 f8: lock cmpxchg %r11,0x0(%r10) fe: jne 0x00000000000000ee 100: mov %rax,%r10 103: mov %r10,%rax [...] Fixes: 981f94c3 ("bpf: Add bitwise atomic instructions") Reported-by: NJohan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: NJohan Almbladh <johan.almbladh@anyfinetworks.com> Co-developed-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NBrendan Jackman <jackmanb@google.com> Acked-by: NAlexei Starovoitov <ast@kernel.org>
-
- 27 9月, 2021 2 次提交
-
-
由 Zhenzhong Duan 提交于
When updating the host's mask for its MSR_IA32_TSX_CTRL user return entry, clear the mask in the found uret MSR instead of vmx->guest_uret_msrs[i]. Modifying guest_uret_msrs directly is completely broken as 'i' does not point at the MSR_IA32_TSX_CTRL entry. In fact, it's guaranteed to be an out-of-bounds accesses as is always set to kvm_nr_uret_msrs in a prior loop. By sheer dumb luck, the fallout is limited to "only" failing to preserve the host's TSX_CTRL_CPUID_CLEAR. The out-of-bounds access is benign as it's guaranteed to clear a bit in a guest MSR value, which are always zero at vCPU creation on both x86-64 and i386. Cc: stable@vger.kernel.org Fixes: 8ea8b8d6 ("KVM: VMX: Use common x86's uret MSR list as the one true list") Signed-off-by: NZhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210926015545.281083-1-zhenzhong.duan@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Randy Dunlap 提交于
This is a nuisance when CONFIG_WERROR is set, so drop the variable declaration since the code that used it was removed. ../arch/nios2/kernel/setup.c: In function 'setup_arch': ../arch/nios2/kernel/setup.c:152:13: warning: unused variable 'dram_start' [-Wunused-variable] 152 | int dram_start; Fixes: 7f7bc20b ("nios2: Don't use _end for calculating min_low_pfn") Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Reported-by: Nkernel test robot <lkp@intel.com> Reviewed-by: NMike Rapoport <rppt@linux.ibm.com> Cc: Andreas Oetken <andreas.oetken@siemens.com> Signed-off-by: NDinh Nguyen <dinguyen@kernel.org>
-
- 25 9月, 2021 1 次提交
-
-
由 Geert Uytterhoeven 提交于
If X2TLB=y (CPU_SHX2=y or CPU_SHX3=y, e.g. migor_defconfig), pgd_t.pgd is "unsigned long long", causing: In file included from arch/sh/include/asm/pgtable.h:13, from include/linux/pgtable.h:6, from include/linux/mm.h:33, from arch/sh/kernel/asm-offsets.c:14: arch/sh/include/asm/pgtable-3level.h: In function `pud_pgtable': arch/sh/include/asm/pgtable-3level.h:37:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 37 | return (pmd_t *)pud_val(pud); | ^ Fix this by adding an intermediate cast to "unsigned long", which is basically what the old code did before. Link: https://lkml.kernel.org/r/2c2eef3c9a2f57e5609100a4864715ccf253d30f.1631713483.git.geert+renesas@glider.be Fixes: 9cf6fa24 ("mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *") Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be> Tested-by: NDaniel Palmer <daniel@thingy.jp> Acked-by: NRob Landley <rob@landley.net> Cc: Yoshinori Sato <ysato@users.osdn.me> Cc: Rich Felker <dalias@libc.org> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Jacopo Mondi <jacopo+renesas@jmondi.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 9月, 2021 13 次提交
-
-
由 Randy Dunlap 提交于
SERIAL_CORE_CONSOLE depends on TTY so EARLY_PRINTK should also depend on TTY so that it does not select SERIAL_CORE_CONSOLE inadvertently. WARNING: unmet direct dependencies detected for SERIAL_CORE_CONSOLE Depends on [n]: TTY [=n] && HAS_IOMEM [=y] Selected by [y]: - EARLY_PRINTK [=y] Fixes: e8bf5bc7 ("nios2: add early printk support") Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: NDinh Nguyen <dinguyen@kernel.org>
-
由 Christoph Hellwig 提交于
Add a m68k-only set_fc helper to set the SFC and DFC registers for the few places that need to override it for special MM operations, but disconnect that from the deprecated kernel-wide set_fs() API. Note that the SFC/DFC registers are context switched, so there is no need to disable preemption. Partially based on an earlier patch from Linus Torvalds <torvalds@linux-foundation.org>. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMichael Schmitz <schmitzmic@gmail.com> Tested-by: NMichael Schmitz <schmitzmic@gmail.com> Link: https://lore.kernel.org/r/20210916070405.52750-7-hch@lst.deSigned-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
-
由 Christoph Hellwig 提交于
Allow non-faulting access to kernel addresses without overriding the address space. Implemented by passing the instruction name to the low-level assembly macros as an argument, and force the use of the normal move instructions for kernel access. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMichael Schmitz <schmitzmic@gmail.com> Tested-by: NMichael Schmitz <schmitzmic@gmail.com> Link: https://lore.kernel.org/r/20210916070405.52750-6-hch@lst.deSigned-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
-
由 Christoph Hellwig 提交于
Add new helpers for doing the grunt work of the 8-byte {get,put}_user routines to allow for better reuse. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMichael Schmitz <schmitzmic@gmail.com> Tested-by: NMichael Schmitz <schmitzmic@gmail.com> Link: https://lore.kernel.org/r/20210916070405.52750-5-hch@lst.deSigned-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
-
由 Christoph Hellwig 提交于
Simplify the handling a bit by using the common helper instead of referencing undefined symbols. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMichael Schmitz <schmitzmic@gmail.com> Tested-by: NMichael Schmitz <schmitzmic@gmail.com> Link: https://lore.kernel.org/r/20210916070405.52750-4-hch@lst.deSigned-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
-
由 Christoph Hellwig 提交于
The 030 case in virt_to_phys_slow can't ever be reached, so remove it. Suggested-by: NMichael Schmitz <schmitzmic@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMichael Schmitz <schmitzmic@gmail.com> Tested-by: NMichael Schmitz <schmitzmic@gmail.com> Link: https://lore.kernel.org/r/20210916070405.52750-3-hch@lst.deSigned-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
-
由 Christoph Hellwig 提交于
Document that access_ok is completely broken for coldfire and friends at the moment. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMichael Schmitz <schmitzmic@gmail.com> Tested-by: NMichael Schmitz <schmitzmic@gmail.com> Link: https://lore.kernel.org/r/20210916070405.52750-2-hch@lst.deSigned-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
-
由 Al Viro 提交于
sigreturn has to deal with an unpleasant problem - exception stack frames have different sizes, depending upon the exception (and processor model, as well) and variable-sized part of exception frame may contain information needed for instruction restart. So when signal handler terminates and calls sigreturn to resume the execution at the place where we'd been when we caught the signal, it has to rearrange the frame at the bottom of kernel stack. Worse, it might need to open a gap in the kernel stack, shifting pt_regs towards lower addresses. Doing that from C is insane - we'd need to shift stack frames (return addresses, local variables, etc.) of C call chain, right under the nose of compiler and hope it won't fall apart horribly. What had been actually done is only slightly less insane - an inline asm in mangle_kernel_stack() moved the stuff around, then reset stack pointer and jumped to label in asm glue. However, we can avoid all that mess if the asm wrapper we have to use anyway would reserve some space on the stack between switch_stack and the C stack frame of do_{rt_,}sigreturn(). Then C part can simply memmove() pt_regs + switch_stack, memcpy() the variable part of exception frame into the opened gap - all of that without inline asm, buggering C call chain, magical jumps to asm labels, etc. Asm wrapper would need to know where the moved switch_stack has ended up - it might have been shifted into the gap we'd reserved before do_rt_sigreturn() call. That's where it needs to set the stack pointer to. So let the C part return just that and be done with that. While we are at it, the call of berr_040cleanup() we need to do when returning via 68040 bus error exception frame can be moved into C part as well. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NMichael Schmitz <schmitzmic@gmail.com> Reviewed-by: NMichael Schmitz <schmitzmic@gmail.com> Tested-by: NFinn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/r/YP2dTQPm1wGPWFgD@zeniv-ca.linux.org.ukSigned-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
-
由 Al Viro 提交于
We get there when sigreturn has performed obscene acts on kernel stack; in particular, the location of pt_regs has shifted. We are about to call syscall_trace(), which might stop for tracer. If that happens, we'd better have task_pt_regs() returning correct result... Fucked-up-by: NAl Viro <viro@zeniv.linux.org.uk> Fixes: bd6f56a7 ("m68k: Missing syscall_trace() on sigreturn") Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NMichael Schmitz <schmitzmic@gmail.com> Reviewed-by: NMichael Schmitz <schmitzmic@gmail.com> Tested-by: NFinn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/r/YP2dMWeV1LkHiOpr@zeniv-ca.linux.org.ukSigned-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
-
由 Al Viro 提交于
When we have several pending signals, have entered with the kernel with large exception frame *and* have already built at least one sigframe, regs->stkadj is going to be non-zero and regs->format/sr/pc are going to be junk - the real values are in shifted exception stack frame we'd built when putting together the first sigframe. If that happens, subsequent sigframes are going to be garbage. Not hard to fix - just need to find the "adjusted" frame first and look for format/vector/sr/pc in it. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NMichael Schmitz <schmitzmic@gmail.com> Reviewed-by: NMichael Schmitz <schmitzmic@gmail.com> Tested-by: NFinn Thain <fthain@linux-m68k.org> Link: https://lore.kernel.org/r/YP2dBIAPTaVvHiZ6@zeniv-ca.linux.org.ukSigned-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
-
由 Numfor Mbiziwo-Tiapo 提交于
Don't perform unaligned loads in __get_next() and __peek_nbyte_next() as these are forms of undefined behavior: "A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined." (from http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) These problems were identified using the undefined behavior sanitizer (ubsan) with the tools version of the code and perf test. [ bp: Massage commit message. ] Signed-off-by: NNumfor Mbiziwo-Tiapo <nums@google.com> Signed-off-by: NIan Rogers <irogers@google.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NMasami Hiramatsu <mhiramat@kernel.org> Link: https://lkml.kernel.org/r/20210923161843.751834-1-irogers@google.com
-
由 Josh Poimboeuf 提交于
sm4_aesni_avx_crypt8() sets up the frame pointer (which includes pushing RBP) before doing a conditional sibling call to sm4_aesni_avx_crypt4(), which sets up an additional frame pointer. Things will not go well when sm4_aesni_avx_crypt4() pops only the innermost single frame pointer and then tries to return to the outermost frame pointer. Sibling calls need to occur with an empty stack frame. Do the conditional sibling call *before* setting up the stack pointer. This fixes the following warning: arch/x86/crypto/sm4-aesni-avx-asm_64.o: warning: objtool: sm4_aesni_avx_crypt8()+0x8: sibling call from callable instruction with modified stack frame Fixes: a7ee22ee ("crypto: x86/sm4 - add AES-NI/AVX/x86_64 implementation") Reported-by: Nkernel test robot <lkp@intel.com> Reported-by: NArnd Bergmann <arnd@kernel.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NTianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Jia He 提交于
This reverts commit 437b38c5. The memory semantics added in commit 437b38c5 causes SystemMemory Operation region, whose address range is not described in the EFI memory map to be mapped as NormalNC memory on arm64 platforms (through acpi_os_map_memory() in acpi_ex_system_memory_space_handler()). This triggers the following abort on an ARM64 Ampere eMAG machine, because presumably the physical address range area backing the Opregion does not support NormalNC memory attributes driven on the bus. Internal error: synchronous external abort: 96000410 [#1] SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0+ #462 Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 0.14 02/22/2019 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [...snip...] Call trace: acpi_ex_system_memory_space_handler+0x26c/0x2c8 acpi_ev_address_space_dispatch+0x228/0x2c4 acpi_ex_access_region+0x114/0x268 acpi_ex_field_datum_io+0x128/0x1b8 acpi_ex_extract_from_field+0x14c/0x2ac acpi_ex_read_data_from_field+0x190/0x1b8 acpi_ex_resolve_node_to_value+0x1ec/0x288 acpi_ex_resolve_to_value+0x250/0x274 acpi_ds_evaluate_name_path+0xac/0x124 acpi_ds_exec_end_op+0x90/0x410 acpi_ps_parse_loop+0x4ac/0x5d8 acpi_ps_parse_aml+0xe0/0x2c8 acpi_ps_execute_method+0x19c/0x1ac acpi_ns_evaluate+0x1f8/0x26c acpi_ns_init_one_device+0x104/0x140 acpi_ns_walk_namespace+0x158/0x1d0 acpi_ns_initialize_devices+0x194/0x218 acpi_initialize_objects+0x48/0x50 acpi_init+0xe0/0x498 If the Opregion address range is not present in the EFI memory map there is no way for us to determine the memory attributes to use to map it - defaulting to NormalNC does not work (and it is not correct on a memory region that may have read side-effects) and therefore commit 437b38c5 should be reverted, which means reverting back to the original behavior whereby address ranges that are mapped using acpi_os_map_memory() default to the safe devicenGnRnE attributes on ARM64 if the mapped address range is not defined in the EFI memory map. Fixes: 437b38c5 ("ACPI: Add memory semantics to acpi_os_map_memory()") Signed-off-by: NJia He <justin.he@arm.com> Acked-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 23 9月, 2021 7 次提交
-
-
由 Lai Jiangshan 提交于
If gpte is changed from non-present to present, the guest doesn't need to flush tlb per SDM. So the host must synchronze sp before link it. Otherwise the guest might use a wrong mapping. For example: the guest first changes a level-1 pagetable, and then links its parent to a new place where the original gpte is non-present. Finally the guest can access the remapped area without flushing the tlb. The guest's behavior should be allowed per SDM, but the host kvm mmu makes it wrong. Fixes: 4731d4c7 ("KVM: MMU: out of sync shadow core") Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210918005636.3675-3-jiangshanlai@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 dann frazier 提交于
A noted side-effect of commit 0c6c2d36 ("arm64: Generate cpucaps.h") is that cpucaps are now sorted, changing the enumeration order. This assumed no dependencies between cpucaps, which turned out not to be true in one case. UNMAP_KERNEL_AT_EL0 currently needs to be processed after WORKAROUND_CAVIUM_27456. ThunderX systems are incompatible with KPTI, so unmap_kernel_at_el0() bails if WORKAROUND_CAVIUM_27456 is set. But because of the sorting, WORKAROUND_CAVIUM_27456 will not yet have been considered when unmap_kernel_at_el0() checks for it, so the kernel tries to run w/ KPTI - and quickly falls over. Because all ThunderX implementations have homogeneous CPUs, we can remove this dependency by just checking the current CPU for the erratum. Fixes: 0c6c2d36 ("arm64: Generate cpucaps.h") Cc: <stable@vger.kernel.org> # 5.13.x Signed-off-by: Ndann frazier <dann.frazier@canonical.com> Suggested-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: NMark Brown <broonie@kernel.org> Acked-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210923145002.3394558-1-dann.frazier@canonical.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Lai Jiangshan 提交于
When kvm->tlbs_dirty > 0, some rmaps might have been deleted without flushing tlb remotely after kvm_sync_page(). If @gfn was writable before and it's rmaps was deleted in kvm_sync_page(), and if the tlb entry is still in a remote running VCPU, the @gfn is not safely protected. To fix the problem, kvm_sync_page() does the remote flush when needed to avoid the problem. Fixes: a4ee1ca4 ("KVM: MMU: delay flush all tlbs on sync_page path") Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210918005636.3675-2-jiangshanlai@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
These field correspond to features that we don't expose yet to L2 While currently there are no CVE worthy features in this field, if AMD adds more features to this field, that could allow guest escapes similar to CVE-2021-3653 and CVE-2021-3656. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210914154825.104886-6-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
GP SVM errata workaround made the #GP handler always emulate the SVM instructions. However these instructions #GP in case the operand is not 4K aligned, but the workaround code didn't check this and we ended up emulating these instructions anyway. This is only an emulation accuracy check bug as there is no harm for KVM to read/write unaligned vmcb images. Fixes: 82a11e9c ("KVM: SVM: Add emulation support for #GP triggered by SVM instructions") Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210914154825.104886-4-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
In svm_clear_vintr we try to restore the virtual interrupt injection that might be pending, but we fail to restore the interrupt vector. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210914154825.104886-2-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Kees Cook 提交于
When building under GCC 4.9 and 5.5: arch/x86/include/asm/special_insns.h: Assembler messages: arch/x86/include/asm/special_insns.h:286: Error: operand size mismatch for `setz' Change the type to "bool" for condition code arguments, as documented. Fixes: 7f5933f8 ("x86/asm: Add an enqcmds() wrapper for the ENQCMDS instruction") Co-developed-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210910223332.3224851-1-keescook@chromium.org
-
- 22 9月, 2021 5 次提交
-
-
由 Fares Mehanna 提交于
Intel PMU MSRs is in msrs_to_save_all[], so add AMD PMU MSRs to have a consistent behavior between Intel and AMD when using KVM_GET_MSRS, KVM_SET_MSRS or KVM_GET_MSR_INDEX_LIST. We have to add legacy and new MSRs to handle guests running without X86_FEATURE_PERFCTR_CORE. Signed-off-by: NFares Mehanna <faresx@amazon.de> Message-Id: <20210915133951.22389-1-faresx@amazon.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
If L1 had invalid state on VM entry (can happen on SMM transactions when we enter from real mode, straight to nested guest), then after we load 'host' state from VMCS12, the state has to become valid again, but since we load the segment registers with __vmx_set_segment we weren't always updating emulation_required. Update emulation_required explicitly at end of load_vmcs12_host_state. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210913140954.165665-8-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
It is possible that when non root mode is entered via special entry (!from_vmentry), that is from SMM or from loading the nested state, the L2 state could be invalid in regard to non unrestricted guest mode, but later it can become valid. (for example when RSM emulation restores segment registers from SMRAM) Thus delay the check to VM entry, where we will check this and fail. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210913140954.165665-7-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Since no actual VM entry happened, the VM exit information is stale. To avoid this, synthesize an invalid VM guest state VM exit. Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210913140954.165665-6-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Use return statements instead of nested if, and fix error path to free all the maps that were allocated. Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210913140954.165665-2-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-