1. 17 2月, 2021 2 次提交
  2. 11 2月, 2021 1 次提交
  3. 09 2月, 2021 2 次提交
  4. 08 2月, 2021 1 次提交
    • R
      cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there · d11a1d08
      Rafael J. Wysocki 提交于
      If the maximum performance level taken for computing the
      arch_max_freq_ratio value used in the x86 scale-invariance code is
      higher than the one corresponding to the cpuinfo.max_freq value
      coming from the acpi_cpufreq driver, the scale-invariant utilization
      falls below 100% even if the CPU runs at cpuinfo.max_freq or slightly
      faster, which causes the schedutil governor to select a frequency
      below cpuinfo.max_freq.  That frequency corresponds to a frequency
      table entry below the maximum performance level necessary to get to
      the "boost" range of CPU frequencies which prevents "boost"
      frequencies from being used in some workloads.
      
      While this issue is related to scale-invariance, it may be amplified
      by commit db865272 ("cpufreq: Avoid configuring old governors as
      default with intel_pstate") from the 5.10 development cycle which
      made it extremely easy to default to schedutil even if the preferred
      driver is acpi_cpufreq as long as intel_pstate is built too, because
      the mere presence of the latter effectively removes the ondemand
      governor from the defaults.  Distro kernels are likely to include
      both intel_pstate and acpi_cpufreq on x86, so their users who cannot
      use intel_pstate or choose to use acpi_cpufreq may easily be
      affectecd by this issue.
      
      If CPPC is available, it can be used to address this issue by
      extending the frequency tables created by acpi_cpufreq to cover the
      entire available frequency range (including "boost" frequencies) for
      each CPU, but if CPPC is not there, acpi_cpufreq has no idea what
      the maximum "boost" frequency is and the frequency tables created by
      it cannot be extended in a meaningful way, so in that case make it
      ask the arch scale-invariance code to to use the "nominal" performance
      level for CPU utilization scaling in order to avoid the issue at hand.
      
      Fixes: db865272 ("cpufreq: Avoid configuring old governors as default with intel_pstate")
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NGiovanni Gherdovich <ggherdovich@suse.cz>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      d11a1d08
  5. 06 2月, 2021 4 次提交
  6. 05 2月, 2021 3 次提交
    • D
      x86/apic: Add extra serialization for non-serializing MSRs · 25a068b8
      Dave Hansen 提交于
      Jan Kiszka reported that the x2apic_wrmsr_fence() function uses a plain
      MFENCE while the Intel SDM (10.12.3 MSR Access in x2APIC Mode) calls for
      MFENCE; LFENCE.
      
      Short summary: we have special MSRs that have weaker ordering than all
      the rest. Add fencing consistent with current SDM recommendations.
      
      This is not known to cause any issues in practice, only in theory.
      
      Longer story below:
      
      The reason the kernel uses a different semantic is that the SDM changed
      (roughly in late 2017). The SDM changed because folks at Intel were
      auditing all of the recommended fences in the SDM and realized that the
      x2apic fences were insufficient.
      
      Why was the pain MFENCE judged insufficient?
      
      WRMSR itself is normally a serializing instruction. No fences are needed
      because the instruction itself serializes everything.
      
      But, there are explicit exceptions for this serializing behavior written
      into the WRMSR instruction documentation for two classes of MSRs:
      IA32_TSC_DEADLINE and the X2APIC MSRs.
      
      Back to x2apic: WRMSR is *not* serializing in this specific case.
      But why is MFENCE insufficient? MFENCE makes writes visible, but
      only affects load/store instructions. WRMSR is unfortunately not a
      load/store instruction and is unaffected by MFENCE. This means that a
      non-serializing WRMSR could be reordered by the CPU to execute before
      the writes made visible by the MFENCE have even occurred in the first
      place.
      
      This means that an x2apic IPI could theoretically be triggered before
      there is any (visible) data to process.
      
      Does this affect anything in practice? I honestly don't know. It seems
      quite possible that by the time an interrupt gets to consume the (not
      yet) MFENCE'd data, it has become visible, mostly by accident.
      
      To be safe, add the SDM-recommended fences for all x2apic WRMSRs.
      
      This also leaves open the question of the _other_ weakly-ordered WRMSR:
      MSR_IA32_TSC_DEADLINE. While it has the same ordering architecture as
      the x2APIC MSRs, it seems substantially less likely to be a problem in
      practice. While writes to the in-memory Local Vector Table (LVT) might
      theoretically be reordered with respect to a weakly-ordered WRMSR like
      TSC_DEADLINE, the SDM has this to say:
      
        In x2APIC mode, the WRMSR instruction is used to write to the LVT
        entry. The processor ensures the ordering of this write and any
        subsequent WRMSR to the deadline; no fencing is required.
      
      But, that might still leave xAPIC exposed. The safest thing to do for
      now is to add the extra, recommended LFENCE.
      
       [ bp: Massage commit message, fix typos, drop accidentally added
         newline to tools/arch/x86/include/asm/barrier.h. ]
      Reported-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200305174708.F77040DD@viggo.jf.intel.com
      25a068b8
    • M
      Revert "x86/setup: don't remove E820_TYPE_RAM for pfn 0" · 5c279c4c
      Mike Rapoport 提交于
      This reverts commit bde9cfa3.
      
      Changing the first memory page type from E820_TYPE_RESERVED to
      E820_TYPE_RAM makes it a part of "System RAM" resource rather than a
      reserved resource and this in turn causes devmem_is_allowed() to treat
      is as area that can be accessed but it is filled with zeroes instead of
      the actual data as previously.
      
      The change in /dev/mem output causes lilo to fail as was reported at
      slakware users forum, and probably other legacy applications will
      experience similar problems.
      
      Link: https://www.linuxquestions.org/questions/slackware-14/slackware-current-lilo-vesa-warnings-after-recent-updates-4175689617/#post6214439Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c279c4c
    • S
      KVM: x86: Set so called 'reserved CR3 bits in LM mask' at vCPU reset · 031b91a5
      Sean Christopherson 提交于
      Set cr3_lm_rsvd_bits, which is effectively an invalid GPA mask, at vCPU
      reset.  The reserved bits check needs to be done even if userspace never
      configures the guest's CPUID model.
      
      Cc: stable@vger.kernel.org
      Fixes: 0107973a ("KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210204000117.3303214-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      031b91a5
  7. 04 2月, 2021 1 次提交
  8. 03 2月, 2021 3 次提交
    • P
      KVM: x86: cleanup CR3 reserved bits checks · c1c35cf7
      Paolo Bonzini 提交于
      If not in long mode, the low bits of CR3 are reserved but not enforced to
      be zero, so remove those checks.  If in long mode, however, the MBZ bits
      extend down to the highest physical address bit of the guest, excluding
      the encryption bit.
      
      Make the checks consistent with the above, and match them between
      nested_vmcb_checks and KVM_SET_SREGS.
      
      Cc: stable@vger.kernel.org
      Fixes: 761e4169 ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests")
      Fixes: a780a3ea ("KVM: X86: Fix reserved bits check for MOV to CR3")
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c1c35cf7
    • S
      KVM: SVM: Treat SVM as unsupported when running as an SEV guest · ccd85d90
      Sean Christopherson 提交于
      Don't let KVM load when running as an SEV guest, regardless of what
      CPUID says.  Memory is encrypted with a key that is not accessible to
      the host (L0), thus it's impossible for L0 to emulate SVM, e.g. it'll
      see garbage when reading the VMCB.
      
      Technically, KVM could decrypt all memory that needs to be accessible to
      the L0 and use shadow paging so that L0 does not need to shadow NPT, but
      exposing such information to L0 largely defeats the purpose of running as
      an SEV guest.  This can always be revisited if someone comes up with a
      use case for running VMs inside SEV guests.
      
      Note, VMLOAD, VMRUN, etc... will also #GP on GPAs with C-bit set, i.e. KVM
      is doomed even if the SEV guest is debuggable and the hypervisor is willing
      to decrypt the VMCB.  This may or may not be fixed on CPUs that have the
      SVME_ADDR_CHK fix.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210202212017.2486595-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ccd85d90
    • S
      KVM: x86: Update emulator context mode if SYSENTER xfers to 64-bit mode · 943dea8a
      Sean Christopherson 提交于
      Set the emulator context to PROT64 if SYSENTER transitions from 32-bit
      userspace (compat mode) to a 64-bit kernel, otherwise the RIP update at
      the end of x86_emulate_insn() will incorrectly truncate the new RIP.
      
      Note, this bug is mostly limited to running an Intel virtual CPU model on
      an AMD physical CPU, as other combinations of virtual and physical CPUs
      do not trigger full emulation.  On Intel CPUs, SYSENTER in compatibility
      mode is legal, and unconditionally transitions to 64-bit mode.  On AMD
      CPUs, SYSENTER is illegal in compatibility mode and #UDs.  If the vCPU is
      AMD, KVM injects a #UD on SYSENTER in compat mode.  If the pCPU is Intel,
      SYSENTER will execute natively and not trigger #UD->VM-Exit (ignoring
      guest TLB shenanigans).
      
      Fixes: fede8076 ("KVM: x86: handle wrap around 32-bit address space")
      Cc: stable@vger.kernel.org
      Signed-off-by: NJonny Barker <jonny@jonnybarker.com>
      [sean: wrote changelog]
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210202165546.2390296-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      943dea8a
  9. 02 2月, 2021 4 次提交
  10. 01 2月, 2021 1 次提交
  11. 30 1月, 2021 1 次提交
  12. 29 1月, 2021 1 次提交
    • P
      Fix unsynchronized access to sev members through svm_register_enc_region · 19a23da5
      Peter Gonda 提交于
      Grab kvm->lock before pinning memory when registering an encrypted
      region; sev_pin_memory() relies on kvm->lock being held to ensure
      correctness when checking and updating the number of pinned pages.
      
      Add a lockdep assertion to help prevent future regressions.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Fixes: 1e80fdc0 ("KVM: SVM: Pin guest memory when SEV is active")
      Signed-off-by: NPeter Gonda <pgonda@google.com>
      
      V2
       - Fix up patch description
       - Correct file paths svm.c -> sev.c
       - Add unlock of kvm->lock on sev_pin_memory error
      
      V1
       - https://lore.kernel.org/kvm/20210126185431.1824530-1-pgonda@google.com/
      
      Message-Id: <20210127161524.2832400-1-pgonda@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      19a23da5
  13. 28 1月, 2021 1 次提交
    • M
      KVM: x86: fix CPUID entries returned by KVM_GET_CPUID2 ioctl · 181f4948
      Michael Roth 提交于
      Recent commit 255cbecf modified struct kvm_vcpu_arch to make
      'cpuid_entries' a pointer to an array of kvm_cpuid_entry2 entries
      rather than embedding the array in the struct. KVM_SET_CPUID and
      KVM_SET_CPUID2 were updated accordingly, but KVM_GET_CPUID2 was missed.
      
      As a result, KVM_GET_CPUID2 currently returns random fields from struct
      kvm_vcpu_arch to userspace rather than the expected CPUID values. Fix
      this by treating 'cpuid_entries' as a pointer when copying its
      contents to userspace buffer.
      
      Fixes: 255cbecf ("KVM: x86: allocate vcpu->arch.cpuid_entries dynamically")
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NMichael Roth <michael.roth@amd.com.com>
      Message-Id: <20210128024451.1816770-1-michael.roth@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      181f4948
  14. 27 1月, 2021 1 次提交
    • J
      x86/xen: avoid warning in Xen pv guest with CONFIG_AMD_MEM_ENCRYPT enabled · 2e924936
      Juergen Gross 提交于
      When booting a kernel which has been built with CONFIG_AMD_MEM_ENCRYPT
      enabled as a Xen pv guest a warning is issued for each processor:
      
      [    5.964347] ------------[ cut here ]------------
      [    5.968314] WARNING: CPU: 0 PID: 1 at /home/gross/linux/head/arch/x86/xen/enlighten_pv.c:660 get_trap_addr+0x59/0x90
      [    5.972321] Modules linked in:
      [    5.976313] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.11.0-rc5-default #75
      [    5.980313] Hardware name: Dell Inc. OptiPlex 9020/0PC5F7, BIOS A05 12/05/2013
      [    5.984313] RIP: e030:get_trap_addr+0x59/0x90
      [    5.988313] Code: 42 10 83 f0 01 85 f6 74 04 84 c0 75 1d b8 01 00 00 00 c3 48 3d 00 80 83 82 72 08 48 3d 20 81 83 82 72 0c b8 01 00 00 00 eb db <0f> 0b 31 c0 c3 48 2d 00 80 83 82 48 ba 72 1c c7 71 1c c7 71 1c 48
      [    5.992313] RSP: e02b:ffffc90040033d38 EFLAGS: 00010202
      [    5.996313] RAX: 0000000000000001 RBX: ffffffff82a141d0 RCX: ffffffff8222ec38
      [    6.000312] RDX: ffffffff8222ec38 RSI: 0000000000000005 RDI: ffffc90040033d40
      [    6.004313] RBP: ffff8881003984a0 R08: 0000000000000007 R09: ffff888100398000
      [    6.008312] R10: 0000000000000007 R11: ffffc90040246000 R12: ffff8884082182a8
      [    6.012313] R13: 0000000000000100 R14: 000000000000001d R15: ffff8881003982d0
      [    6.016316] FS:  0000000000000000(0000) GS:ffff888408200000(0000) knlGS:0000000000000000
      [    6.020313] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    6.024313] CR2: ffffc900020ef000 CR3: 000000000220a000 CR4: 0000000000050660
      [    6.028314] Call Trace:
      [    6.032313]  cvt_gate_to_trap.part.7+0x3f/0x90
      [    6.036313]  ? asm_exc_double_fault+0x30/0x30
      [    6.040313]  xen_convert_trap_info+0x87/0xd0
      [    6.044313]  xen_pv_cpu_up+0x17a/0x450
      [    6.048313]  bringup_cpu+0x2b/0xc0
      [    6.052313]  ? cpus_read_trylock+0x50/0x50
      [    6.056313]  cpuhp_invoke_callback+0x80/0x4c0
      [    6.060313]  _cpu_up+0xa7/0x140
      [    6.064313]  cpu_up+0x98/0xd0
      [    6.068313]  bringup_nonboot_cpus+0x4f/0x60
      [    6.072313]  smp_init+0x26/0x79
      [    6.076313]  kernel_init_freeable+0x103/0x258
      [    6.080313]  ? rest_init+0xd0/0xd0
      [    6.084313]  kernel_init+0xa/0x110
      [    6.088313]  ret_from_fork+0x1f/0x30
      [    6.092313] ---[ end trace be9ecf17dceeb4f3 ]---
      
      Reason is that there is no Xen pv trap entry for X86_TRAP_VC.
      
      Fix that by adding a generic trap handler for unknown traps and wire all
      unknown bare metal handlers to this generic handler, which will just
      crash the system in case such a trap will ever happen.
      
      Fixes: 0786138c ("x86/sev-es: Add a Runtime #VC Exception Handler")
      Cc: <stable@vger.kernel.org> # v5.10
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      2e924936
  15. 26 1月, 2021 9 次提交
    • P
      KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX · 9a78e158
      Paolo Bonzini 提交于
      VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS,
      which may need to be loaded outside guest mode.  Therefore we cannot
      WARN in that case.
      
      However, that part of nested_get_vmcs12_pages is _not_ needed at
      vmentry time.  Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling,
      so that both vmentry and migration (and in the latter case, independent
      of is_guest_mode) do the parts that are needed.
      
      Cc: <stable@vger.kernel.org> # 5.10.x: f2c7ef3b: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES
      Cc: <stable@vger.kernel.org> # 5.10.x
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9a78e158
    • S
      KVM: x86: Revert "KVM: x86: Mark GPRs dirty when written" · aed89418
      Sean Christopherson 提交于
      Revert the dirty/available tracking of GPRs now that KVM copies the GPRs
      to the GHCB on any post-VMGEXIT VMRUN, even if a GPR is not dirty.  Per
      commit de3cd117 ("KVM: x86: Omit caching logic for always-available
      GPRs"), tracking for GPRs noticeably impacts KVM's code footprint.
      
      This reverts commit 1c04d8c9.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210122235049.3107620-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      aed89418
    • S
      KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest · 25009140
      Sean Christopherson 提交于
      Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the
      GRPs' dirty bits are set from time zero and never cleared, i.e. will
      always be seen as dirty.  The obvious alternative would be to clear
      the dirty bits when appropriate, but removing the dirty checks is
      desirable as it allows reverting GPR dirty+available tracking, which
      adds overhead to all flavors of x86 VMs.
      
      Note, unconditionally writing the GPRs in the GHCB is tacitly allowed
      by the GHCB spec, which allows the hypervisor (or guest) to provide
      unnecessary info; it's the guest's responsibility to consume only what
      it needs (the hypervisor is untrusted after all).
      
        The guest and hypervisor can supply additional state if desired but
        must not rely on that additional state being provided.
      
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Fixes: 291bd20d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210122235049.3107620-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      25009140
    • M
      KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration · d51e1d3f
      Maxim Levitsky 提交于
      Even when we are outside the nested guest, some vmcs02 fields
      may not be in sync vs vmcs12.  This is intentional, even across
      nested VM-exit, because the sync can be delayed until the nested
      hypervisor performs a VMCLEAR or a VMREAD/VMWRITE that affects those
      rarely accessed fields.
      
      However, during KVM_GET_NESTED_STATE, the vmcs12 has to be up to date to
      be able to restore it.  To fix that, call copy_vmcs02_to_vmcs12_rare()
      before the vmcs12 contents are copied to userspace.
      
      Fixes: 7952d769 ("KVM: nVMX: Sync rarely accessed guest fields only when needed")
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210114205449.8715-2-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d51e1d3f
    • L
      kvm: tracing: Fix unmatched kvm_entry and kvm_exit events · d95df951
      Lorenzo Brescia 提交于
      On VMX, if we exit and then re-enter immediately without leaving
      the vmx_vcpu_run() function, the kvm_entry event is not logged.
      That means we will see one (or more) kvm_exit, without its (their)
      corresponding kvm_entry, as shown here:
      
       CPU-1979 [002] 89.871187: kvm_entry: vcpu 1
       CPU-1979 [002] 89.871218: kvm_exit:  reason MSR_WRITE
       CPU-1979 [002] 89.871259: kvm_exit:  reason MSR_WRITE
      
      It also seems possible for a kvm_entry event to be logged, but then
      we leave vmx_vcpu_run() right away (if vmx->emulation_required is
      true). In this case, we will have a spurious kvm_entry event in the
      trace.
      
      Fix these situations by moving trace_kvm_entry() inside vmx_vcpu_run()
      (where trace_kvm_exit() already is).
      
      A trace obtained with this patch applied looks like this:
      
       CPU-14295 [000] 8388.395387: kvm_entry: vcpu 0
       CPU-14295 [000] 8388.395392: kvm_exit:  reason MSR_WRITE
       CPU-14295 [000] 8388.395393: kvm_entry: vcpu 0
       CPU-14295 [000] 8388.395503: kvm_exit:  reason EXTERNAL_INTERRUPT
      
      Of course, not calling trace_kvm_entry() in common x86 code any
      longer means that we need to adjust the SVM side of things too.
      Signed-off-by: NLorenzo Brescia <lorenzo.brescia@edu.unito.it>
      Signed-off-by: NDario Faggioli <dfaggioli@suse.com>
      Message-Id: <160873470698.11652.13483635328769030605.stgit@Wayrath>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d95df951
    • J
      KVM: x86: get smi pending status correctly · 1f7becf1
      Jay Zhou 提交于
      The injection process of smi has two steps:
      
          Qemu                        KVM
      Step1:
          cpu->interrupt_request &= \
              ~CPU_INTERRUPT_SMI;
          kvm_vcpu_ioctl(cpu, KVM_SMI)
      
                                      call kvm_vcpu_ioctl_smi() and
                                      kvm_make_request(KVM_REQ_SMI, vcpu);
      
      Step2:
          kvm_vcpu_ioctl(cpu, KVM_RUN, 0)
      
                                      call process_smi() if
                                      kvm_check_request(KVM_REQ_SMI, vcpu) is
                                      true, mark vcpu->arch.smi_pending = true;
      
      The vcpu->arch.smi_pending will be set true in step2, unfortunately if
      vcpu paused between step1 and step2, the kvm_run->immediate_exit will be
      set and vcpu has to exit to Qemu immediately during step2 before mark
      vcpu->arch.smi_pending true.
      During VM migration, Qemu will get the smi pending status from KVM using
      KVM_GET_VCPU_EVENTS ioctl at the downtime, then the smi pending status
      will be lost.
      Signed-off-by: NJay Zhou <jianjay.zhou@huawei.com>
      Signed-off-by: NShengen Zhuang <zhuangshengen@huawei.com>
      Message-Id: <20210118084720.1585-1-jianjay.zhou@huawei.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1f7becf1
    • L
      KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[] · 98dd2f10
      Like Xu 提交于
      The HW_REF_CPU_CYCLES event on the fixed counter 2 is pseudo-encoded as
      0x0300 in the intel_perfmon_event_map[]. Correct its usage.
      
      Fixes: 62079d8a ("KVM: PMU: add proper support for fixed counter 2")
      Signed-off-by: NLike Xu <like.xu@linux.intel.com>
      Message-Id: <20201230081916.63417-1-like.xu@linux.intel.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      98dd2f10
    • L
      KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh() · e61ab2a3
      Like Xu 提交于
      Since we know vPMU will not work properly when (1) the guest bit_width(s)
      of the [gp|fixed] counters are greater than the host ones, or (2) guest
      requested architectural events exceeds the range supported by the host, so
      we can setup a smaller left shift value and refresh the guest cpuid entry,
      thus fixing the following UBSAN shift-out-of-bounds warning:
      
      shift exponent 197 is too large for 64-bit type 'long long unsigned int'
      
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:120
       ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
       __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
       intel_pmu_refresh.cold+0x75/0x99 arch/x86/kvm/vmx/pmu_intel.c:348
       kvm_vcpu_after_set_cpuid+0x65a/0xf80 arch/x86/kvm/cpuid.c:177
       kvm_vcpu_ioctl_set_cpuid2+0x160/0x440 arch/x86/kvm/cpuid.c:308
       kvm_arch_vcpu_ioctl+0x11b6/0x2d70 arch/x86/kvm/x86.c:4709
       kvm_vcpu_ioctl+0x7b9/0xdb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3386
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported-by: syzbot+ae488dc136a4cc6ba32b@syzkaller.appspotmail.com
      Signed-off-by: NLike Xu <like.xu@linux.intel.com>
      Message-Id: <20210118025800.34620-1-like.xu@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e61ab2a3
    • S
      KVM: x86: Add more protection against undefined behavior in rsvd_bits() · eb79cd00
      Sean Christopherson 提交于
      Add compile-time asserts in rsvd_bits() to guard against KVM passing in
      garbage hardcoded values, and cap the upper bound at '63' for dynamic
      values to prevent generating a mask that would overflow a u64.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210113204515.3473079-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eb79cd00
  16. 25 1月, 2021 1 次提交
    • M
      x86/setup: don't remove E820_TYPE_RAM for pfn 0 · bde9cfa3
      Mike Rapoport 提交于
      Patch series "mm: fix initialization of struct page for holes in  memory layout", v3.
      
      Commit 73a6e474 ("mm: memmap_init: iterate over memblock regions
      rather that check each PFN") exposed several issues with the memory map
      initialization and these patches fix those issues.
      
      Initially there were crashes during compaction that Qian Cai reported
      back in April [1].  It seemed back then that the problem was fixed, but
      a few weeks ago Andrea Arcangeli hit the same bug [2] and there was an
      additional discussion at [3].
      
      [1] https://lore.kernel.org/lkml/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw
      [2] https://lore.kernel.org/lkml/20201121194506.13464-1-aarcange@redhat.com
      [3] https://lore.kernel.org/mm-commits/20201206005401.qKuAVgOXr%akpm@linux-foundation.org
      
      This patch (of 2):
      
      The first 4Kb of memory is a BIOS owned area and to avoid its allocation
      for the kernel it was not listed in e820 tables as memory.  As the result,
      pfn 0 was never recognised by the generic memory management and it is not
      a part of neither node 0 nor ZONE_DMA.
      
      If set_pfnblock_flags_mask() would be ever called for the pageblock
      corresponding to the first 2Mbytes of memory, having pfn 0 outside of
      ZONE_DMA would trigger
      
      	VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
      
      Along with reserving the first 4Kb in e820 tables, several first pages are
      reserved with memblock in several places during setup_arch().  These
      reservations are enough to ensure the kernel does not touch the BIOS area
      and it is not necessary to remove E820_TYPE_RAM for pfn 0.
      
      Remove the update of e820 table that changes the type of pfn 0 and move
      the comment describing why it was done to trim_low_memory_range() that
      reserves the beginning of the memory.
      
      Link: https://lkml.kernel.org/r/20210111194017.22696-2-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bde9cfa3
  17. 22 1月, 2021 1 次提交
  18. 21 1月, 2021 2 次提交
  19. 20 1月, 2021 1 次提交