1. 08 4月, 2012 7 次提交
  2. 03 4月, 2012 5 次提交
  3. 02 4月, 2012 1 次提交
  4. 29 3月, 2012 1 次提交
  5. 20 3月, 2012 1 次提交
  6. 08 3月, 2012 2 次提交
  7. 05 3月, 2012 23 次提交
    • A
      KVM: PPC: Add HPT preallocator · d2a1b483
      Alexander Graf 提交于
      We're currently allocating 16MB of linear memory on demand when creating
      a guest. That does work some times, but finding 16MB of linear memory
      available in the system at runtime is definitely not a given.
      
      So let's add another command line option similar to the RMA preallocator,
      that we can use to keep a pool of page tables around. Now, when a guest
      gets created it has a pretty low chance of receiving an OOM.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d2a1b483
    • A
      KVM: PPC: Initialize linears with zeros · b7f5d011
      Alexander Graf 提交于
      RMAs and HPT preallocated spaces should be zeroed, so we don't accidently
      leak information from previous VM executions.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      b7f5d011
    • A
      KVM: PPC: Convert RMA allocation into generic code · b4e70611
      Alexander Graf 提交于
      We have code to allocate big chunks of linear memory on bootup for later use.
      This code is currently used for RMA allocation, but can be useful beyond that
      extent.
      
      Make it generic so we can reuse it for other stuff later.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      b4e70611
    • A
      KVM: PPC: E500: Fail init when not on e500v2 · 9cf7c0e4
      Alexander Graf 提交于
      When enabling the current KVM code on e500mc, I get the following oops:
      
          Oops: Exception in kernel mode, sig: 4 [#1]
          SMP NR_CPUS=8 P2041 RDB
          Modules linked in:
          NIP: c067df4c LR: c067df44 CTR: 00000000
          REGS: ee055ed0 TRAP: 0700   Not tainted  (3.2.0-10391-g36c5afe)
          MSR: 00029002 <CE,EE,ME>  CR: 24042022  XER: 00000000
          TASK = ee0429b0[1] 'swapper/0' THREAD: ee054000 CPU: 2
          GPR00: c067df44 ee055f80 ee0429b0 00000000 00000058 0000003f ee211600 60c6b864
          GPR08: 7cc903a6 0000002c 00000000 00000001 44042082 2d180088 00000000 00000000
          GPR16: c0000a00 00000014 3fffffff 03fe9000 00000015 7ff3be68 c06e0000 00000000
          GPR24: 00000000 00000000 00001720 c067df1c c06e0000 00000000 ee054000 c06ab51c
          NIP [c067df4c] kvmppc_e500_init+0x30/0xf8
          LR [c067df44] kvmppc_e500_init+0x28/0xf8
          Call Trace:
          [ee055f80] [c067df44] kvmppc_e500_init+0x28/0xf8 (unreliable)
          [ee055fb0] [c0001d30] do_one_initcall+0x50/0x1f0
          [ee055fe0] [c06721dc] kernel_init+0xa4/0x14c
          [ee055ff0] [c000e910] kernel_thread+0x4c/0x68
          Instruction dump:
          9421ffd0 7c0802a6 93410018 9361001c 90010034 93810020 93a10024 93c10028
          93e1002c 4bfffe7d 2c030000 408200a4 <7c1082a6> 90010008 7c1182a6 9001000c
          ---[ end trace b8ef4903fcbf9dd3 ]---
      
      Since it doesn't make sense to run the init function on any non-supported
      platform, we can just call our "is this platform supported?" function and
      bail out of init() if it's not.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      9cf7c0e4
    • P
      KVM: Move gfn_to_memslot() to kvm_host.h · 9d4cba7f
      Paul Mackerras 提交于
      This moves __gfn_to_memslot() and search_memslots() from kvm_main.c to
      kvm_host.h to reduce the code duplication caused by the need for
      non-modular code in arch/powerpc/kvm/book3s_hv_rm_mmu.c to call
      gfn_to_memslot() in real mode.
      
      Rather than putting gfn_to_memslot() itself in a header, which would
      lead to increased code size, this puts __gfn_to_memslot() in a header.
      Then, the non-modular uses of gfn_to_memslot() are changed to call
      __gfn_to_memslot() instead.  This way there is only one place in the
      source code that needs to be changed should the gfn_to_memslot()
      implementation need to be modified.
      
      On powerpc, the Book3S HV style of KVM has code that is called from
      real mode which needs to call gfn_to_memslot() and thus needs this.
      (Module code is allocated in the vmalloc region, which can't be
      accessed in real mode.)
      
      With this, we can remove builtin_gfn_to_memslot() from book3s_hv_rm_mmu.c.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      9d4cba7f
    • A
      KVM: PPC: Rename MMIO register identifiers · b3c5d3c2
      Alexander Graf 提交于
      We need the KVM_REG namespace for generic register settings now, so
      let's rename the existing users to something different, enabling
      us to reuse the namespace for more visible interfaces.
      
      While at it, also move these private constants to a private header.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      b3c5d3c2
    • P
      KVM: PPC: Move kvm_vcpu_ioctl_[gs]et_one_reg down to platform-specific code · 31f3438e
      Paul Mackerras 提交于
      This moves the get/set_one_reg implementation down from powerpc.c into
      booke.c, book3s_pr.c and book3s_hv.c.  This avoids #ifdefs in C code,
      but more importantly, it fixes a bug on Book3s HV where we were
      accessing beyond the end of the kvm_vcpu struct (via the to_book3s()
      macro) and corrupting memory, causing random crashes and file corruption.
      
      On Book3s HV we only accept setting the HIOR to zero, since the guest
      runs in supervisor mode and its vectors are never offset from zero.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      [agraf update to apply on top of changed ONE_REG patches]
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      31f3438e
    • A
      KVM: PPC: Add support for explicit HIOR setting · 1022fc3d
      Alexander Graf 提交于
      Until now, we always set HIOR based on the PVR, but this is just wrong.
      Instead, we should be setting HIOR explicitly, so user space can decide
      what the initial HIOR value is - just like on real hardware.
      
      We keep the old PVR based way around for backwards compatibility, but
      once user space uses the SET_ONE_REG based method, we drop the PVR logic.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      1022fc3d
    • A
      KVM: PPC: Add generic single register ioctls · e24ed81f
      Alexander Graf 提交于
      Right now we transfer a static struct every time we want to get or set
      registers. Unfortunately, over time we realize that there are more of
      these than we thought of before and the extensibility and flexibility of
      transferring a full struct every time is limited.
      
      So this is a new approach to the problem. With these new ioctls, we can
      get and set a single register that is identified by an ID. This allows for
      very precise and limited transmittal of data. When we later realize that
      it's a better idea to shove over multiple registers at once, we can reuse
      most of the infrastructure and simply implement a GET_MANY_REGS / SET_MANY_REGS
      interface.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      e24ed81f
    • S
      KVM: PPC: Use the vcpu kmem_cache when allocating new VCPUs · 6b75e6bf
      Sasha Levin 提交于
      Currently the code kzalloc()s new VCPUs instead of using the kmem_cache
      which is created when KVM is initialized.
      
      Modify it to allocate VCPUs from that kmem_cache.
      Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      6b75e6bf
    • L
      KVM: PPC: booke: Add booke206 TLB trace · d37b1a03
      Liu Yu 提交于
      The existing kvm_stlb_write/kvm_gtlb_write were a poor match for
      the e500/book3e MMU -- mas1 was passed as "tid", mas2 was limited
      to "unsigned int" which will be a problem on 64-bit, mas3/7 got
      split up rather than treated as a single 64-bit word, etc.
      Signed-off-by: NLiu Yu <yu.liu@freescale.com>
      [scottwood@freescale.com: made mas2 64-bit, and added mas8 init]
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d37b1a03
    • P
      KVM: PPC: Book3s HV: Implement get_dirty_log using hardware changed bit · 82ed3616
      Paul Mackerras 提交于
      This changes the implementation of kvm_vm_ioctl_get_dirty_log() for
      Book3s HV guests to use the hardware C (changed) bits in the guest
      hashed page table.  Since this makes the implementation quite different
      from the Book3s PR case, this moves the existing implementation from
      book3s.c to book3s_pr.c and creates a new implementation in book3s_hv.c.
      That implementation calls kvmppc_hv_get_dirty_log() to do the actual
      work by calling kvm_test_clear_dirty on each page.  It iterates over
      the HPTEs, clearing the C bit if set, and returns 1 if any C bit was
      set (including the saved C bit in the rmap entry).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      82ed3616
    • P
      KVM: PPC: Book3S HV: Use the hardware referenced bit for kvm_age_hva · 55514893
      Paul Mackerras 提交于
      This uses the host view of the hardware R (referenced) bit to speed
      up kvm_age_hva() and kvm_test_age_hva().  Instead of removing all
      the relevant HPTEs in kvm_age_hva(), we now just reset their R bits
      if set.  Also, kvm_test_age_hva() now scans the relevant HPTEs to
      see if any of them have R set.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      55514893
    • P
      KVM: PPC: Book3s HV: Maintain separate guest and host views of R and C bits · bad3b507
      Paul Mackerras 提交于
      This allows both the guest and the host to use the referenced (R) and
      changed (C) bits in the guest hashed page table.  The guest has a view
      of R and C that is maintained in the guest_rpte field of the revmap
      entry for the HPTE, and the host has a view that is maintained in the
      rmap entry for the associated gfn.
      
      Both view are updated from the guest HPT.  If a bit (R or C) is zero
      in either view, it will be initially set to zero in the HPTE (or HPTEs),
      until set to 1 by hardware.  When an HPTE is removed for any reason,
      the R and C bits from the HPTE are ORed into both views.  We have to
      be careful to read the R and C bits from the HPTE after invalidating
      it, but before unlocking it, in case of any late updates by the hardware.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      bad3b507
    • P
      KVM: PPC: Book3S HV: Keep HPTE locked when invalidating · a92bce95
      Paul Mackerras 提交于
      This reworks the implementations of the H_REMOVE and H_BULK_REMOVE
      hcalls to make sure that we keep the HPTE locked and in the reverse-
      mapping chain until we have finished invalidating it.  Previously
      we would remove it from the chain and unlock it before invalidating
      it, leaving a tiny window when the guest could access the page even
      though we believe we have removed it from the guest (e.g.,
      kvm_unmap_hva() has been called for the page and has found no HPTEs
      in the chain).  In addition, we'll need this for future patches where
      we will need to read the R and C bits in the HPTE after invalidating
      it.
      
      Doing this required restructuring kvmppc_h_bulk_remove() substantially.
      Since we want to batch up the tlbies, we now need to keep several
      HPTEs locked simultaneously.  In order to avoid possible deadlocks,
      we don't spin on the HPTE bitlock for any except the first HPTE in
      a batch.  If we can't acquire the HPTE bitlock for the second or
      subsequent HPTE, we terminate the batch at that point, do the tlbies
      that we have accumulated so far, unlock those HPTEs, and then start
      a new batch to do the remaining invalidations.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      a92bce95
    • M
      KVM: PPC: Add KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS · b5434032
      Matt Evans 提交于
      PPC KVM lacks these two capabilities, and as such a userland system must assume
      a max of 4 VCPUs (following api.txt).  With these, a userland can determine
      a more realistic limit.
      Signed-off-by: NMatt Evans <matt@ozlabs.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      b5434032
    • M
      KVM: PPC: Fix vcpu_create dereference before validity check. · 03cdab53
      Matt Evans 提交于
      Fix usage of vcpu struct before check that it's actually valid.
      Signed-off-by: NMatt Evans <matt@ozlabs.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      03cdab53
    • P
      KVM: PPC: Allow for read-only pages backing a Book3S HV guest · 4cf302bc
      Paul Mackerras 提交于
      With this, if a guest does an H_ENTER with a read/write HPTE on a page
      which is currently read-only, we make the actual HPTE inserted be a
      read-only version of the HPTE.  We now intercept protection faults as
      well as HPTE not found faults, and for a protection fault we work out
      whether it should be reflected to the guest (e.g. because the guest HPTE
      didn't allow write access to usermode) or handled by switching to
      kernel context and calling kvmppc_book3s_hv_page_fault, which will then
      request write access to the page and update the actual HPTE.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      4cf302bc
    • P
      KVM: PPC: Implement MMU notifiers for Book3S HV guests · 342d3db7
      Paul Mackerras 提交于
      This adds the infrastructure to enable us to page out pages underneath
      a Book3S HV guest, on processors that support virtualized partition
      memory, that is, POWER7.  Instead of pinning all the guest's pages,
      we now look in the host userspace Linux page tables to find the
      mapping for a given guest page.  Then, if the userspace Linux PTE
      gets invalidated, kvm_unmap_hva() gets called for that address, and
      we replace all the guest HPTEs that refer to that page with absent
      HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit
      set, which will cause an HDSI when the guest tries to access them.
      Finally, the page fault handler is extended to reinstantiate the
      guest HPTE when the guest tries to access a page which has been paged
      out.
      
      Since we can't intercept the guest DSI and ISI interrupts on PPC970,
      we still have to pin all the guest pages on PPC970.  We have a new flag,
      kvm->arch.using_mmu_notifiers, that indicates whether we can page
      guest pages out.  If it is not set, the MMU notifier callbacks do
      nothing and everything operates as before.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      342d3db7
    • P
      KVM: PPC: Implement MMIO emulation support for Book3S HV guests · 697d3899
      Paul Mackerras 提交于
      This provides the low-level support for MMIO emulation in Book3S HV
      guests.  When the guest tries to map a page which is not covered by
      any memslot, that page is taken to be an MMIO emulation page.  Instead
      of inserting a valid HPTE, we insert an HPTE that has the valid bit
      clear but another hypervisor software-use bit set, which we call
      HPTE_V_ABSENT, to indicate that this is an absent page.  An
      absent page is treated much like a valid page as far as guest hcalls
      (H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that
      an absent HPTE doesn't need to be invalidated with tlbie since it
      was never valid as far as the hardware is concerned.
      
      When the guest accesses a page for which there is an absent HPTE, it
      will take a hypervisor data storage interrupt (HDSI) since we now set
      the VPM1 bit in the LPCR.  Our HDSI handler for HPTE-not-present faults
      looks up the hash table and if it finds an absent HPTE mapping the
      requested virtual address, will switch to kernel mode and handle the
      fault in kvmppc_book3s_hv_page_fault(), which at present just calls
      kvmppc_hv_emulate_mmio() to set up the MMIO emulation.
      
      This is based on an earlier patch by Benjamin Herrenschmidt, but since
      heavily reworked.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      697d3899
    • P
      KVM: PPC: Maintain a doubly-linked list of guest HPTEs for each gfn · 06ce2c63
      Paul Mackerras 提交于
      This expands the reverse mapping array to contain two links for each
      HPTE which are used to link together HPTEs that correspond to the
      same guest logical page.  Each circular list of HPTEs is pointed to
      by the rmap array entry for the guest logical page, pointed to by
      the relevant memslot.  Links are 32-bit HPT entry indexes rather than
      full 64-bit pointers, to save space.  We use 3 of the remaining 32
      bits in the rmap array entries as a lock bit, a referenced bit and
      a present bit (the present bit is needed since HPTE index 0 is valid).
      The bit lock for the rmap chain nests inside the HPTE lock bit.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      06ce2c63
    • P
      KVM: PPC: Allow I/O mappings in memory slots · 9d0ef5ea
      Paul Mackerras 提交于
      This provides for the case where userspace maps an I/O device into the
      address range of a memory slot using a VM_PFNMAP mapping.  In that
      case, we work out the pfn from vma->vm_pgoff, and record the cache
      enable bits from vma->vm_page_prot in two low-order bits in the
      slot_phys array entries.  Then, in kvmppc_h_enter() we check that the
      cache bits in the HPTE that the guest wants to insert match the cache
      bits in the slot_phys array entry.  However, we do allow the guest to
      create what it thinks is a non-cacheable or write-through mapping to
      memory that is actually cacheable, so that we can use normal system
      memory as part of an emulated device later on.  In that case the actual
      HPTE we insert is a cacheable HPTE.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      9d0ef5ea
    • P
      KVM: PPC: Allow use of small pages to back Book3S HV guests · da9d1d7f
      Paul Mackerras 提交于
      This relaxes the requirement that the guest memory be provided as
      16MB huge pages, allowing it to be provided as normal memory, i.e.
      in pages of PAGE_SIZE bytes (4k or 64k).  To allow this, we index
      the kvm->arch.slot_phys[] arrays with a small page index, even if
      huge pages are being used, and use the low-order 5 bits of each
      entry to store the order of the enclosing page with respect to
      normal pages, i.e. log_2(enclosing_page_size / PAGE_SIZE).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      da9d1d7f