1. 16 4月, 2019 10 次提交
    • S
      KVM: x86: Load SMRAM in a single shot when leaving SMM · ed19321f
      Sean Christopherson 提交于
      RSM emulation is currently broken on VMX when the interrupted guest has
      CR4.VMXE=1.  Rather than dance around the issue of HF_SMM_MASK being set
      when loading SMSTATE into architectural state, ideally RSM emulation
      itself would be reworked to clear HF_SMM_MASK prior to loading non-SMM
      architectural state.
      
      Ostensibly, the only motivation for having HF_SMM_MASK set throughout
      the loading of state from the SMRAM save state area is so that the
      memory accesses from GET_SMSTATE() are tagged with role.smm.  Load
      all of the SMRAM save state area from guest memory at the beginning of
      RSM emulation, and load state from the buffer instead of reading guest
      memory one-by-one.
      
      This paves the way for clearing HF_SMM_MASK prior to loading state,
      and also aligns RSM with the enter_smm() behavior, which fills a
      buffer and writes SMRAM save state in a single go.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ed19321f
    • L
      KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU · e51bfdb6
      Liran Alon 提交于
      Issue was discovered when running kvm-unit-tests on KVM running as L1 on
      top of Hyper-V.
      
      When vmx_instruction_intercept unit-test attempts to run RDPMC to test
      RDPMC-exiting, it is intercepted by L1 KVM which it's EXIT_REASON_RDPMC
      handler raise #GP because vCPU exposed by Hyper-V doesn't support PMU.
      Instead of unit-test expectation to be reflected with EXIT_REASON_RDPMC.
      
      The reason vmx_instruction_intercept unit-test attempts to run RDPMC
      even though Hyper-V doesn't support PMU is because L1 expose to L2
      support for RDPMC-exiting. Which is reasonable to assume that is
      supported only in case CPU supports PMU to being with.
      
      Above issue can easily be simulated by modifying
      vmx_instruction_intercept config in x86/unittests.cfg to run QEMU with
      "-cpu host,+vmx,-pmu" and run unit-test.
      
      To handle issue, change KVM to expose RDPMC-exiting only when guest
      supports PMU.
      Reported-by: NSaar Amar <saaramar@microsoft.com>
      Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e51bfdb6
    • L
      KVM: x86: Raise #GP when guest vCPU do not support PMU · 672ff6cf
      Liran Alon 提交于
      Before this change, reading a VMware pseduo PMC will succeed even when
      PMU is not supported by guest. This can easily be seen by running
      kvm-unit-test vmware_backdoors with "-cpu host,-pmu" option.
      Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      672ff6cf
    • W
      x86/kvm: move kvm_load/put_guest_xcr0 into atomic context · 1811d979
      WANG Chao 提交于
      guest xcr0 could leak into host when MCE happens in guest mode. Because
      do_machine_check() could schedule out at a few places.
      
      For example:
      
      kvm_load_guest_xcr0
      ...
      kvm_x86_ops->run(vcpu) {
        vmx_vcpu_run
          vmx_complete_atomic_exit
            kvm_machine_check
              do_machine_check
                do_memory_failure
                  memory_failure
                    lock_page
      
      In this case, host_xcr0 is 0x2ff, guest vcpu xcr0 is 0xff. After schedule
      out, host cpu has guest xcr0 loaded (0xff).
      
      In __switch_to {
           switch_fpu_finish
             copy_kernel_to_fpregs
               XRSTORS
      
      If any bit i in XSTATE_BV[i] == 1 and xcr0[i] == 0, XRSTORS will
      generate #GP (In this case, bit 9). Then ex_handler_fprestore kicks in
      and tries to reinitialize fpu by restoring init fpu state. Same story as
      last #GP, except we get DOUBLE FAULT this time.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NWANG Chao <chao.wang@ucloud.cn>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1811d979
    • V
      KVM: x86: svm: make sure NMI is injected after nmi_singlestep · 99c22179
      Vitaly Kuznetsov 提交于
      I noticed that apic test from kvm-unit-tests always hangs on my EPYC 7401P,
      the hanging test nmi-after-sti is trying to deliver 30000 NMIs and tracing
      shows that we're sometimes able to deliver a few but never all.
      
      When we're trying to inject an NMI we may fail to do so immediately for
      various reasons, however, we still need to inject it so enable_nmi_window()
      arms nmi_singlestep mode. #DB occurs as expected, but we're not checking
      for pending NMIs before entering the guest and unless there's a different
      event to process, the NMI will never get delivered.
      
      Make KVM_REQ_EVENT request on the vCPU from db_interception() to make sure
      pending NMIs are checked and possibly injected.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      99c22179
    • S
      svm/avic: Fix invalidate logical APIC id entry · e44e3eac
      Suthikulpanit, Suravee 提交于
      Only clear the valid bit when invalidate logical APIC id entry.
      The current logic clear the valid bit, but also set the rest of
      the bits (including reserved bits) to 1.
      
      Fixes: 98d90582 ('svm: Fix AVIC DFR and LDR handling')
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e44e3eac
    • S
      Revert "svm: Fix AVIC incomplete IPI emulation" · 4a58038b
      Suthikulpanit, Suravee 提交于
      This reverts commit bb218fbc.
      
      As Oren Twaig pointed out the old discussion:
      
        https://patchwork.kernel.org/patch/8292231/
      
      that the change coud potentially cause an extra IPI to be sent to
      the destination vcpu because the AVIC hardware already set the IRR bit
      before the incomplete IPI #VMEXIT with id=1 (target vcpu is not running).
      Since writting to ICR and ICR2 will also set the IRR. If something triggers
      the destination vcpu to get scheduled before the emulation finishes, then
      this could result in an additional IPI.
      
      Also, the issue mentioned in the commit bb218fbc was misdiagnosed.
      
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Reported-by: NOren Twaig <oren@scalemp.com>
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4a58038b
    • B
      kvm: mmu: Fix overflow on kvm mmu page limit calculation · bc8a3d89
      Ben Gardon 提交于
      KVM bases its memory usage limits on the total number of guest pages
      across all memslots. However, those limits, and the calculations to
      produce them, use 32 bit unsigned integers. This can result in overflow
      if a VM has more guest pages that can be represented by a u32. As a
      result of this overflow, KVM can use a low limit on the number of MMU
      pages it will allocate. This makes KVM unable to map all of guest memory
      at once, prompting spurious faults.
      
      Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch
      	introduced no new failures.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bc8a3d89
    • P
      KVM: nVMX: always use early vmcs check when EPT is disabled · 2b27924b
      Paolo Bonzini 提交于
      The remaining failures of vmx.flat when EPT is disabled are caused by
      incorrectly reflecting VMfails to the L1 hypervisor.  What happens is
      that nested_vmx_restore_host_state corrupts the guest CR3, reloading it
      with the host's shadow CR3 instead, because it blindly loads GUEST_CR3
      from the vmcs01.
      
      For simplicity let's just always use hardware VMCS checks when EPT is
      disabled.  This way, nested_vmx_restore_host_state is not reached at
      all (or at least shouldn't be reached).
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2b27924b
    • P
      KVM: nVMX: allow tests to use bad virtual-APIC page address · 69090810
      Paolo Bonzini 提交于
      As mentioned in the comment, there are some special cases where we can simply
      clear the TPR shadow bit from the CPU-based execution controls in the vmcs02.
      Handle them so that we can remove some XFAILs from vmx.flat.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      69090810
  2. 15 4月, 2019 1 次提交
    • S
      KVM: x86/mmu: Fix an inverted list_empty() check when zapping sptes · cfd32acf
      Sean Christopherson 提交于
      A recently introduced helper for handling zap vs. remote flush
      incorrectly bails early, effectively leaking defunct shadow pages.
      Manifests as a slab BUG when exiting KVM due to the shadow pages
      being alive when their associated cache is destroyed.
      
      ==========================================================================
      BUG kvm_mmu_page_header: Objects remaining in kvm_mmu_page_header on ...
      --------------------------------------------------------------------------
      Disabling lock debugging due to kernel taint
      INFO: Slab 0x00000000fc436387 objects=26 used=23 fp=0x00000000d023caee ...
      CPU: 6 PID: 4315 Comm: rmmod Tainted: G    B             5.1.0-rc2+ #19
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      Call Trace:
       dump_stack+0x46/0x5b
       slab_err+0xad/0xd0
       ? on_each_cpu_mask+0x3c/0x50
       ? ksm_migrate_page+0x60/0x60
       ? on_each_cpu_cond_mask+0x7c/0xa0
       ? __kmalloc+0x1ca/0x1e0
       __kmem_cache_shutdown+0x13a/0x310
       shutdown_cache+0xf/0x130
       kmem_cache_destroy+0x1d5/0x200
       kvm_mmu_module_exit+0xa/0x30 [kvm]
       kvm_arch_exit+0x45/0x60 [kvm]
       kvm_exit+0x6f/0x80 [kvm]
       vmx_exit+0x1a/0x50 [kvm_intel]
       __x64_sys_delete_module+0x153/0x1f0
       ? exit_to_usermode_loop+0x88/0xc0
       do_syscall_64+0x4f/0x100
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: a2113634 ("KVM: x86/mmu: Split remote_flush+zap case out of kvm_mmu_flush_or_zap()")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cfd32acf
  3. 10 4月, 2019 3 次提交
  4. 09 4月, 2019 23 次提交
  5. 08 4月, 2019 3 次提交
    • L
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · fd008d1a
      Linus Torvalds 提交于
      Pull crypto fix from Herbert Xu:
       "This fixes a bug in the implementation of xcbc and cmac in caam"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: caam - fix copy of next buffer for xcbc and cmac
      fd008d1a
    • M
      net: vrf: Fix ping failed when vrf mtu is set to 0 · 5055376a
      Miaohe Lin 提交于
      When the mtu of a vrf device is set to 0, it would cause ping
      failed. So I think we should limit vrf mtu in a reasonable range
      to solve this problem. I set dev->min_mtu to IPV6_MIN_MTU, so it
      will works for both ipv4 and ipv6. And if dev->max_mtu still be 0
      can be confusing, so I set dev->max_mtu to ETH_MAX_MTU.
      
      Here is the reproduce step:
      
      1.Config vrf interface and set mtu to 0:
      3: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
      master vrf1 state UP mode DEFAULT group default qlen 1000
          link/ether 52:54:00:9e:dd:c1 brd ff:ff:ff:ff:ff:ff
      
      2.Ping peer:
      3: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
      master vrf1 state UP group default qlen 1000
          link/ether 52:54:00:9e:dd:c1 brd ff:ff:ff:ff:ff:ff
          inet 10.0.0.1/16 scope global enp4s0
             valid_lft forever preferred_lft forever
      connect: Network is unreachable
      
      3.Set mtu to default value, ping works:
      PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
      64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=1.88 ms
      
      Fixes: ad49bc63 ("net: vrf: remove MTU limits for vrf device")
      Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5055376a
    • Q
      slab: fix a crash by reading /proc/slab_allocators · fcf88917
      Qian Cai 提交于
      The commit 510ded33 ("slab: implement slab_root_caches list")
      changes the name of the list node within "struct kmem_cache" from "list"
      to "root_caches_node", but leaks_show() still use the "list" which
      causes a crash when reading /proc/slab_allocators.
      
      You need to have CONFIG_SLAB=y and CONFIG_MEMCG=y to see the problem,
      because without MEMCG all slab caches are root caches, and the "list"
      node happens to be the right one.
      
      Fixes: 510ded33 ("slab: implement slab_root_caches list")
      Signed-off-by: NQian Cai <cai@lca.pw>
      Reviewed-by: NTobin C. Harding <tobin@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fcf88917