1. 19 10月, 2014 1 次提交
    • A
      x86,kvm,vmx: Preserve CR4 across VM entry · d974baa3
      Andy Lutomirski 提交于
      CR4 isn't constant; at least the TSD and PCE bits can vary.
      
      TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks
      like it's correct.
      
      This adds a branch and a read from cr4 to each vm entry.  Because it is
      extremely likely that consecutive entries into the same vcpu will have
      the same host cr4 value, this fixes up the vmcs instead of restoring cr4
      after the fact.  A subsequent patch will add a kernel-wide cr4 shadow,
      reducing the overhead in the common case to just two memory reads and a
      branch.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: stable@vger.kernel.org
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d974baa3
  2. 03 10月, 2014 1 次提交
    • P
      kvm: do not handle APIC access page if in-kernel irqchip is not in use · f439ed27
      Paolo Bonzini 提交于
      This fixes the following OOPS:
      
         loaded kvm module (v3.17-rc1-168-gcec26bc3)
         BUG: unable to handle kernel paging request at fffffffffffffffe
         IP: [<ffffffff81168449>] put_page+0x9/0x30
         PGD 1e15067 PUD 1e17067 PMD 0
         Oops: 0000 [#1] PREEMPT SMP
          [<ffffffffa063271d>] ? kvm_vcpu_reload_apic_access_page+0x5d/0x70 [kvm]
          [<ffffffffa013b6db>] vmx_vcpu_reset+0x21b/0x470 [kvm_intel]
          [<ffffffffa0658816>] ? kvm_pmu_reset+0x76/0xb0 [kvm]
          [<ffffffffa064032a>] kvm_vcpu_reset+0x15a/0x1b0 [kvm]
          [<ffffffffa06403ac>] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
          [<ffffffffa062e540>] kvm_vm_ioctl+0x200/0x780 [kvm]
          [<ffffffff81212170>] do_vfs_ioctl+0x2d0/0x4b0
          [<ffffffff8108bd99>] ? __mmdrop+0x69/0xb0
          [<ffffffff812123d1>] SyS_ioctl+0x81/0xa0
          [<ffffffff8112a6f6>] ? __audit_syscall_exit+0x1f6/0x2a0
          [<ffffffff817229e9>] system_call_fastpath+0x16/0x1b
         Code: c6 78 ce a3 81 4c 89 e7 e8 d9 80 ff ff 0f 0b 4c 89 e7 e8 8f f6 ff ff e9 fa fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 1e 8b 47 1c 85 c0 74 27 f0
         RIP  [<ffffffff81193045>] put_page+0x5/0x50
      
      when not using the in-kernel irqchip ("-machine kernel_irqchip=off"
      with QEMU).  The fix is to make the same check in
      kvm_vcpu_reload_apic_access_page that we already have
      in vmx.c's vm_need_virtualize_apic_accesses().
      Reported-by: NJan Kiszka <jan.kiszka@siemens.com>
      Tested-by: NJan Kiszka <jan.kiszka@siemens.com>
      Fixes: 4256f43fSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f439ed27
  3. 24 9月, 2014 17 次提交
  4. 17 9月, 2014 2 次提交
    • T
      kvm: Make init_rmode_identity_map() return 0 on success. · f51770ed
      Tang Chen 提交于
      In init_rmode_identity_map(), there two variables indicating the return
      value, r and ret, and it return 0 on error, 1 on success. The function
      is only called by vmx_create_vcpu(), and ret is redundant.
      
      This patch removes the redundant variable, and makes init_rmode_identity_map()
      return 0 on success, -errno on failure.
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f51770ed
    • T
      kvm: Remove ept_identity_pagetable from struct kvm_arch. · a255d479
      Tang Chen 提交于
      kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
      it is never used to refer to the page at all.
      
      In vcpu initialization, it indicates two things:
      1. indicates if ept page is allocated
      2. indicates if a memory slot for identity page is initialized
      
      Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
      identity pagetable is initialized. So we can remove ept_identity_pagetable.
      
      NOTE: In the original code, ept identity pagetable page is pinned in memroy.
            As a result, it cannot be migrated/hot-removed. After this patch, since
            kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
            is no longer pinned in memory. And it can be migrated/hot-removed.
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NGleb Natapov <gleb@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a255d479
  5. 16 9月, 2014 1 次提交
  6. 11 9月, 2014 2 次提交
  7. 08 9月, 2014 1 次提交
    • T
      percpu_counter: add @gfp to percpu_counter_init() · 908c7f19
      Tejun Heo 提交于
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_counters too.
      
      We could have left percpu_counter_init() alone and added
      percpu_counter_init_gfp(); however, the number of users isn't that
      high and introducing _gfp variants to all percpu data structures would
      be quite ugly, so let's just do the conversion.  This is the one with
      the most users.  Other percpu data structures are a lot easier to
      convert.
      
      This patch doesn't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Cc: x86@kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      908c7f19
  8. 05 9月, 2014 2 次提交
  9. 03 9月, 2014 6 次提交
    • P
      KVM: nSVM: propagate the NPF EXITINFO to the guest · 5e352519
      Paolo Bonzini 提交于
      This is similar to what the EPT code does with the exit qualification.
      This allows the guest to see a valid value for bits 33:32.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5e352519
    • P
      KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD · a0c0feb5
      Paolo Bonzini 提交于
      Bit 8 would be the "global" bit, which does not quite make sense for non-leaf
      page table entries.  Intel ignores it; AMD ignores it in PDEs, but reserves it
      in PDPEs and PML4Es.  The SVM test is relying on this behavior, so enforce it.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a0c0feb5
    • T
      KVM: mmio: cleanup kvm_set_mmio_spte_mask · d1431483
      Tiejun Chen 提交于
      Just reuse rsvd_bits() inside kvm_set_mmio_spte_mask()
      for slightly better code.
      Signed-off-by: NTiejun Chen <tiejun.chen@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d1431483
    • D
      kvm: x86: fix stale mmio cache bug · 56f17dd3
      David Matlack 提交于
      The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
      up to userspace:
      
      (1) Guest accesses gpa X without a memory slot. The gfn is cached in
      struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
      the SPTE write-execute-noread so that future accesses cause
      EPT_MISCONFIGs.
      
      (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
      covering the page just accessed.
      
      (3) Guest attempts to read or write to gpa X again. On Intel, this
      generates an EPT_MISCONFIG. The memory slot generation number that
      was incremented in (2) would normally take care of this but we fast
      path mmio faults through quickly_check_mmio_pf(), which only checks
      the per-vcpu mmio cache. Since we hit the cache, KVM passes a
      KVM_EXIT_MMIO up to userspace.
      
      This patch fixes the issue by using the memslot generation number
      to validate the mmio cache.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      [xiaoguangrong: adjust the code to make it simpler for stable-tree fix.]
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Tested-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      56f17dd3
    • D
      kvm: fix potentially corrupt mmio cache · ee3d1570
      David Matlack 提交于
      vcpu exits and memslot mutations can run concurrently as long as the
      vcpu does not aquire the slots mutex. Thus it is theoretically possible
      for memslots to change underneath a vcpu that is handling an exit.
      
      If we increment the memslot generation number again after
      synchronize_srcu_expedited(), vcpus can safely cache memslot generation
      without maintaining a single rcu_dereference through an entire vm exit.
      And much of the x86/kvm code does not maintain a single rcu_dereference
      of the current memslots during each exit.
      
      We can prevent the following case:
      
         vcpu (CPU 0)                             | thread (CPU 1)
      --------------------------------------------+--------------------------
      1  vm exit                                  |
      2  srcu_read_unlock(&kvm->srcu)             |
      3  decide to cache something based on       |
           old memslots                           |
      4                                           | change memslots
                                                  | (increments generation)
      5                                           | synchronize_srcu(&kvm->srcu);
      6  retrieve generation # from new memslots  |
      7  tag cache with new memslot generation    |
      8  srcu_read_unlock(&kvm->srcu)             |
      ...                                         |
         <action based on cache occurs even       |
          though the caching decision was based   |
          on the old memslots>                    |
      ...                                         |
         <action *continues* to occur until next  |
          memslot generation change, which may    |
          be never>                               |
                                                  |
      
      By incrementing the generation after synchronizing with kvm->srcu readers,
      we ensure that the generation retrieved in (6) will become invalid soon
      after (8).
      
      Keeping the existing increment is not strictly necessary, but we
      do keep it and just move it for consistency from update_memslots to
      install_new_memslots.  It invalidates old cached MMIOs immediately,
      instead of having to wait for the end of synchronize_srcu_expedited,
      which makes the code more clearly correct in case CPU 1 is preempted
      right after synchronize_srcu() returns.
      
      To avoid halving the generation space in SPTEs, always presume that the
      low bit of the generation is zero when reconstructing a generation number
      out of an SPTE.  This effectively disables MMIO caching in SPTEs during
      the call to synchronize_srcu_expedited.  Using the low bit this way is
      somewhat like a seqcount---where the protected thing is a cache, and
      instead of retrying we can simply punt if we observe the low bit to be 1.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ee3d1570
    • P
      KVM: do not bias the generation number in kvm_current_mmio_generation · 00f034a1
      Paolo Bonzini 提交于
      The next patch will give a meaning (a la seqcount) to the low bit of the
      generation number.  Ensure that it matches between kvm->memslots->generation
      and kvm_current_mmio_generation().
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      00f034a1
  10. 30 8月, 2014 1 次提交
  11. 29 8月, 2014 6 次提交