1. 02 8月, 2010 24 次提交
  2. 01 8月, 2010 16 次提交
    • J
      KVM: Remove unnecessary divide operations · 82855413
      Joerg Roedel 提交于
      This patch converts unnecessary divide and modulo operations
      in the KVM large page related code into logical operations.
      This allows to convert gfn_t to u64 while not breaking 32
      bit builds.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      82855413
    • S
      KVM: Fix IOMMU memslot reference warning · 95c87e2b
      Sheng Yang 提交于
      This patch fixes the following warning.
      
      ===================================================
      [ INFO: suspicious rcu_dereference_check() usage. ]
      ---------------------------------------------------
      include/linux/kvm_host.h:259 invoked rcu_dereference_check() without
      protection!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 0
      no locks held by qemu-system-x86/29679.
      
      stack backtrace:
      Pid: 29679, comm: qemu-system-x86 Not tainted 2.6.35-rc3+ #200
      Call Trace:
       [<ffffffff810a224e>] lockdep_rcu_dereference+0xa8/0xb1
       [<ffffffffa018a06f>] kvm_iommu_unmap_memslots+0xc9/0xde [kvm]
       [<ffffffffa018a0c4>] kvm_iommu_unmap_guest+0x40/0x4e [kvm]
       [<ffffffffa018f772>] kvm_arch_destroy_vm+0x1a/0x186 [kvm]
       [<ffffffffa01800d0>] kvm_put_kvm+0x110/0x167 [kvm]
       [<ffffffffa0180ecc>] kvm_vcpu_release+0x18/0x1c [kvm]
       [<ffffffff81156f5d>] fput+0x22a/0x3a0
       [<ffffffff81152288>] filp_close+0xb4/0xcd
       [<ffffffff8106599f>] put_files_struct+0x1b7/0x36b
       [<ffffffff81065830>] ? put_files_struct+0x48/0x36b
       [<ffffffff8131ee59>] ? do_raw_spin_unlock+0x118/0x160
       [<ffffffff81065bc0>] exit_files+0x6d/0x75
       [<ffffffff81068348>] do_exit+0x47d/0xc60
       [<ffffffff8177e7b5>] ? _raw_spin_unlock_irq+0x30/0x36
       [<ffffffff81068bfa>] do_group_exit+0xcf/0x134
       [<ffffffff81080790>] get_signal_to_deliver+0x732/0x81d
       [<ffffffff81095996>] ? cpu_clock+0x4e/0x60
       [<ffffffff81002082>] do_notify_resume+0x117/0xc43
       [<ffffffff810a2fa3>] ? trace_hardirqs_on+0xd/0xf
       [<ffffffff81080d79>] ? sys_rt_sigtimedwait+0x2b5/0x3bf
       [<ffffffff8177d9f2>] ? trace_hardirqs_off_thunk+0x3a/0x3c
       [<ffffffff81003221>] ? sysret_signal+0x5/0x3d
       [<ffffffff8100343b>] int_signal+0x12/0x17
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      95c87e2b
    • A
      KVM: PPC: Make use of hash based Shadow MMU · fef093be
      Alexander Graf 提交于
      We just introduced generic functions to handle shadow pages on PPC.
      This patch makes the respective backends make use of them, getting
      rid of a lot of duplicate code along the way.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      fef093be
    • A
      KVM: PPC: Add generic hpte management functions · 7741909b
      Alexander Graf 提交于
      Currently the shadow paging code keeps an array of entries it knows about.
      Whenever the guest invalidates an entry, we loop through that entry,
      trying to invalidate matching parts.
      
      While this is a really simple implementation, it is probably the most
      ineffective one possible. So instead, let's keep an array of lists around
      that are indexed by a hash. This way each PTE can be added by 4 list_add,
      removed by 4 list_del invocations and the search only needs to loop through
      entries that share the same hash.
      
      This patch implements said lookup and exports generic functions that both
      the 32-bit and 64-bit backend can use.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      7741909b
    • X
      KVM: MMU: cleanup FNAME(fetch)() functions · 84754cd8
      Xiao Guangrong 提交于
      Cleanup this function that we are already get the direct sp's access
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      84754cd8
    • X
      KVM: MMU: fix direct sp's access corrupted · 9e7b0e7f
      Xiao Guangrong 提交于
      If the mapping is writable but the dirty flag is not set, we will find
      the read-only direct sp and setup the mapping, then if the write #PF
      occur, we will mark this mapping writable in the read-only direct sp,
      now, other real read-only mapping will happily write it without #PF.
      
      It may hurt guest's COW
      
      Fixed by re-install the mapping when write #PF occur.
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      9e7b0e7f
    • X
      KVM: MMU: fix conflict access permissions in direct sp · 5fd5387c
      Xiao Guangrong 提交于
      In no-direct mapping, we mark sp is 'direct' when we mapping the
      guest's larger page, but its access is encoded form upper page-struct
      entire not include the last mapping, it will cause access conflict.
      
      For example, have this mapping:
              [W]
            / PDE1 -> |---|
        P[W]          |   | LPA
            \ PDE2 -> |---|
              [R]
      
      P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the
      same lage page(LPA). The P's access is WR, PDE1's access is WR,
      PDE2's access is RO(just consider read-write permissions here)
      
      When guest access PDE1, we will create a direct sp for LPA, the sp's
      access is from P, is W, then we will mark the ptes is W in this sp.
      
      Then, guest access PDE2, we will find LPA's shadow page, is the same as
      PDE's, and mark the ptes is RO.
      
      So, if guest access PDE1, the incorrect #PF is occured.
      
      Fixed by encode the last mapping access into direct shadow page
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      5fd5387c
    • X
      KVM: MMU: fix writable sync sp mapping · 36a2e677
      Xiao Guangrong 提交于
      While we sync many unsync sp at one time(in mmu_sync_children()),
      we may mapping the spte writable, it's dangerous, if one unsync
      sp's mapping gfn is another unsync page's gfn.
      
      For example:
      
      SP1.pte[0] = P
      SP2.gfn's pfn = P
      [SP1.pte[0] = SP2.gfn's pfn]
      
      First, we write protected SP1 and SP2, but SP1 and SP2 are still the
      unsync sp.
      
      Then, sync SP1 first, it will detect SP1.pte[0].gfn only has one unsync-sp,
      that is SP2, so it will mapping it writable, but we plan to sync SP2 soon,
      at this point, the SP2->unsync is not reliable since later we sync SP2 but
      SP2->gfn is already writable.
      
      So the final result is: SP2 is the sync page but SP2.gfn is writable.
      
      This bug will corrupt guest's page table, fixed by mark read-only mapping
      if the mapped gfn has shadow pages.
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      36a2e677
    • S
      KVM: VMX: Execute WBINVD to keep data consistency with assigned devices · f5f48ee1
      Sheng Yang 提交于
      Some guest device driver may leverage the "Non-Snoop" I/O, and explicitly
      WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD or
      CLFLUSH, we need to maintain data consistency either by:
      1: flushing cache (wbinvd) when the guest is scheduled out if there is no
      wbinvd exit, or
      2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits.
      Signed-off-by: NYaozu (Eddie) Dong <eddie.dong@intel.com>
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      f5f48ee1
    • A
      KVM: Document KVM specific review items · cf3e3d3e
      Avi Kivity 提交于
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      cf3e3d3e
    • A
      KVM: Simplify vcpu_enter_guest() mmu reload logic slightly · 3e007509
      Avi Kivity 提交于
      No need to reload the mmu in between two different vcpu->requests checks.
      
      kvm_mmu_reload() may trigger KVM_REQ_TRIPLE_FAULT, but that will be caught
      during atomic guest entry later.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      3e007509
    • C
      KVM: Search the LAPIC's for one that will accept a PIC interrupt · 529df65e
      Chris Lalancette 提交于
      Older versions of 32-bit linux have a "Checking 'hlt' instruction"
      test where they repeatedly call the 'hlt' instruction, and then
      expect a timer interrupt to kick the CPU out of halt.  This happens
      before any LAPIC or IOAPIC setup happens, which means that all of
      the APIC's are in virtual wire mode at this point.  Unfortunately,
      the current implementation of virtual wire mode is hardcoded to
      only kick the BSP, so if a crash+kexec occurs on a different
      vcpu, it will never get kicked.
      
      This patch makes pic_unlock() do the equivalent of
      kvm_irq_delivery_to_apic() for the IOAPIC code.  That is, it runs
      through all of the vcpus looking for one that is in virtual wire
      mode.  In the normal case where LAPICs and IOAPICs are configured,
      this won't be used at all.  In the bootstrap phase of a modern
      OS, before the LAPICs and IOAPICs are configured, this will have
      exactly the same behavior as today; VCPU0 is always looked at
      first, so it will always get out of the loop after the first
      iteration.  This will only go through the loop more than once
      during a kexec/kdump, in which case it will only do it a few times
      until the kexec'ed kernel programs the LAPIC and IOAPIC.
      Signed-off-by: NChris Lalancette <clalance@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      529df65e
    • T
      KVM: ia64: cleanup kvm_ia64_sync_dirty_log() · 979586e0
      Takuya Yoshikawa 提交于
      kvm_ia64_sync_dirty_log() is a helper function for kvm_vm_ioctl_get_dirty_log()
      which copies ia64's arch specific dirty bitmap to general one in memslot.
      So doing sanity checks in this function is unnatural. We move these checks
      outside of this and change the prototype appropriately.
      Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      979586e0
    • T
      KVM: ia64: fix dirty_log_lock spin_lock section not to include get_dirty_log() · 4482b06c
      Takuya Yoshikawa 提交于
      kvm_get_dirty_log() calls copy_to_user(). So we need to narrow the
      dirty_log_lock spin_lock section not to include this.
      Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      4482b06c
    • A
      KVM: PPC: Make BAT only guest segments work · 4d29bdbf
      Alexander Graf 提交于
      When a guest sets its SR entry to invalid, we may still find a
      corresponding entry in a BAT. So we need to make sure we're not
      faulting on invalid SR entries, but instead just claim them to be
      BAT resolved.
      
      This resolves breakage experienced when using libogc based guests.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      4d29bdbf
    • A
      KVM: PPC: Use kernel hash function · 3b249157
      Alexander Graf 提交于
      The linux kernel already provides a hash function. Let's reuse that
      instead of reinventing the wheel!
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      3b249157