1. 18 12月, 2019 2 次提交
  2. 13 12月, 2019 6 次提交
    • P
      KVM: x86: fix out-of-bounds write in KVM_GET_EMULATED_CPUID (CVE-2019-19332) · 5119ffd4
      Paolo Bonzini 提交于
      commit 433f4ba1904100da65a311033f17a9bf586b287e upstream.
      
      The bounds check was present in KVM_GET_SUPPORTED_CPUID but not
      KVM_GET_EMULATED_CPUID.
      
      Reported-by: syzbot+e3f4897236c4eeb8af4f@syzkaller.appspotmail.com
      Fixes: 84cffe49 ("kvm: Emulate MOVBE", 2013-10-29)
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5119ffd4
    • S
      KVM: x86: Grab KVM's srcu lock when setting nested state · 5cbc7ff5
      Sean Christopherson 提交于
      commit ad5996d9a0e8019c3ae5151e687939369acfe044 upstream.
      
      Acquire kvm->srcu for the duration of ->set_nested_state() to fix a bug
      where nVMX derefences ->memslots without holding ->srcu or ->slots_lock.
      
      The other half of nested migration, ->get_nested_state(), does not need
      to acquire ->srcu as it is a purely a dump of internal KVM (and CPU)
      state to userspace.
      
      Detected as an RCU lockdep splat that is 100% reproducible by running
      KVM's state_test selftest with CONFIG_PROVE_LOCKING=y.  Note that the
      failing function, kvm_is_visible_gfn(), is only checking the validity of
      a gfn, it's not actually accessing guest memory (which is more or less
      unsupported during vmx_set_nested_state() due to incorrect MMU state),
      i.e. vmx_set_nested_state() itself isn't fundamentally broken.  In any
      case, setting nested state isn't a fast path so there's no reason to go
      out of our way to avoid taking ->srcu.
      
        =============================
        WARNING: suspicious RCU usage
        5.4.0-rc7+ #94 Not tainted
        -----------------------------
        include/linux/kvm_host.h:626 suspicious rcu_dereference_check() usage!
      
                     other info that might help us debug this:
      
        rcu_scheduler_active = 2, debug_locks = 1
        1 lock held by evmcs_test/10939:
         #0: ffff88826ffcb800 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x630 [kvm]
      
        stack backtrace:
        CPU: 1 PID: 10939 Comm: evmcs_test Not tainted 5.4.0-rc7+ #94
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        Call Trace:
         dump_stack+0x68/0x9b
         kvm_is_visible_gfn+0x179/0x180 [kvm]
         mmu_check_root+0x11/0x30 [kvm]
         fast_cr3_switch+0x40/0x120 [kvm]
         kvm_mmu_new_cr3+0x34/0x60 [kvm]
         nested_vmx_load_cr3+0xbd/0x1f0 [kvm_intel]
         nested_vmx_enter_non_root_mode+0xab8/0x1d60 [kvm_intel]
         vmx_set_nested_state+0x256/0x340 [kvm_intel]
         kvm_arch_vcpu_ioctl+0x491/0x11a0 [kvm]
         kvm_vcpu_ioctl+0xde/0x630 [kvm]
         do_vfs_ioctl+0xa2/0x6c0
         ksys_ioctl+0x66/0x70
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x54/0x200
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        RIP: 0033:0x7f59a2b95f47
      
      Fixes: 8fcc4b59 ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5cbc7ff5
    • P
      KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES · 6a10f818
      Paolo Bonzini 提交于
      commit cbbaa2727aa3ae9e0a844803da7cef7fd3b94f2b upstream.
      
      KVM does not implement MSR_IA32_TSX_CTRL, so it must not be presented
      to the guests.  It is also confusing to have !ARCH_CAP_TSX_CTRL_MSR &&
      !RTM && ARCH_CAP_TAA_NO: lack of MSR_IA32_TSX_CTRL suggests TSX was not
      hidden (it actually was), yet the value says that TSX is not vulnerable
      to microarchitectural data sampling.  Fix both.
      
      Cc: stable@vger.kernel.org
      Tested-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a10f818
    • P
      KVM: x86: do not modify masked bits of shared MSRs · 5efbd9a9
      Paolo Bonzini 提交于
      commit de1fca5d6e0105c9d33924e1247e2f386efc3ece upstream.
      
      "Shared MSRs" are guest MSRs that are written to the host MSRs but
      keep their value until the next return to userspace.  They support
      a mask, so that some bits keep the host value, but this mask is
      only used to skip an unnecessary MSR write and the value written
      to the MSR is always the guest MSR.
      
      Fix this and, while at it, do not update smsr->values[slot].curr if
      for whatever reason the wrmsr fails.  This should only happen due to
      reserved bits, so the value written to smsr->values[slot].curr
      will not match when the user-return notifier and the host value will
      always be restored.  However, it is untidy and in rare cases this
      can actually avoid spurious WRMSRs on return to userspace.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Tested-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5efbd9a9
    • K
      x86/PCI: Avoid AMD FCH XHCI USB PME# from D0 defect · 2e5c738a
      Kai-Heng Feng 提交于
      commit 7e8ce0e2b036dbc6617184317983aea4f2c52099 upstream.
      
      The AMD FCH USB XHCI Controller advertises support for generating PME#
      while in D0.  When in D0, it does signal PME# for USB 3.0 connect events,
      but not for USB 2.0 or USB 1.1 connect events, which means the controller
      doesn't wake correctly for those events.
      
        00:10.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller [1022:7914] (rev 20) (prog-if 30 [XHCI])
              Subsystem: Dell FCH USB XHCI Controller [1028:087e]
              Capabilities: [50] Power Management version 3
                      Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
      
      Clear PCI_PM_CAP_PME_D0 in dev->pme_support to indicate the device will not
      assert PME# from D0 so we don't rely on it.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203673
      Link: https://lore.kernel.org/r/20190902145252.32111-1-kai.heng.feng@canonical.comSigned-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e5c738a
    • J
      x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all() · 37d080a4
      Joerg Roedel 提交于
      commit 9a62d20027da3164a22244d9f022c0c987261687 upstream.
      
      The job of vmalloc_sync_all() is to help the lazy freeing of vmalloc()
      ranges: before such vmap ranges are reused we make sure that they are
      unmapped from every task's page tables.
      
      This is really easy on pagetable setups where the kernel page tables
      are shared between all tasks - this is the case on 32-bit kernels
      with SHARED_KERNEL_PMD = 1.
      
      But on !SHARED_KERNEL_PMD 32-bit kernels this involves iterating
      over the pgd_list and clearing all pmd entries in the pgds that
      are cleared in the init_mm.pgd, which is the reference pagetable
      that the vmalloc() code uses.
      
      In that context the current practice of vmalloc_sync_all() iterating
      until FIX_ADDR_TOP is buggy:
      
              for (address = VMALLOC_START & PMD_MASK;
                   address >= TASK_SIZE_MAX && address < FIXADDR_TOP;
                   address += PMD_SIZE) {
                      struct page *page;
      
      Because iterating up to FIXADDR_TOP will involve a lot of non-vmalloc
      address ranges:
      
      	VMALLOC -> PKMAP -> LDT -> CPU_ENTRY_AREA -> FIX_ADDR
      
      This is mostly harmless for the FIX_ADDR and CPU_ENTRY_AREA ranges
      that don't clear their pmds, but it's lethal for the LDT range,
      which relies on having different mappings in different processes,
      and 'synchronizing' them in the vmalloc sense corrupts those
      pagetable entries (clearing them).
      
      This got particularly prominent with PTI, which turns SHARED_KERNEL_PMD
      off and makes this the dominant mapping mode on 32-bit.
      
      To make LDT working again vmalloc_sync_all() must only iterate over
      the volatile parts of the kernel address range that are identical
      between all processes.
      
      So the correct check in vmalloc_sync_all() is "address < VMALLOC_END"
      to make sure the VMALLOC areas are synchronized and the LDT
      mapping is not falsely overwritten.
      
      The CPU_ENTRY_AREA and the FIXMAP area are no longer synced either,
      but this is not really a proplem since their PMDs get established
      during bootup and never change.
      
      This change fixes the ldt_gdt selftest in my setup.
      
      [ mingo: Fixed up the changelog to explain the logic and modified the
               copying to only happen up until VMALLOC_END. ]
      Reported-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: hpa@zytor.com
      Fixes: 7757d607: ("x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32")
      Link: https://lkml.kernel.org/r/20191126111119.GA110513@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37d080a4
  3. 05 12月, 2019 6 次提交
  4. 01 12月, 2019 9 次提交
  5. 24 11月, 2019 13 次提交
  6. 21 11月, 2019 4 次提交