1. 12 7月, 2011 21 次提交
    • P
      KVM: PPC: Add support for Book3S processors in hypervisor mode · de56a948
      Paul Mackerras 提交于
      This adds support for KVM running on 64-bit Book 3S processors,
      specifically POWER7, in hypervisor mode.  Using hypervisor mode means
      that the guest can use the processor's supervisor mode.  That means
      that the guest can execute privileged instructions and access privileged
      registers itself without trapping to the host.  This gives excellent
      performance, but does mean that KVM cannot emulate a processor
      architecture other than the one that the hardware implements.
      
      This code assumes that the guest is running paravirtualized using the
      PAPR (Power Architecture Platform Requirements) interface, which is the
      interface that IBM's PowerVM hypervisor uses.  That means that existing
      Linux distributions that run on IBM pSeries machines will also run
      under KVM without modification.  In order to communicate the PAPR
      hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
      to include/linux/kvm.h.
      
      Currently the choice between book3s_hv support and book3s_pr support
      (i.e. the existing code, which runs the guest in user mode) has to be
      made at kernel configuration time, so a given kernel binary can only
      do one or the other.
      
      This new book3s_hv code doesn't support MMIO emulation at present.
      Since we are running paravirtualized guests, this isn't a serious
      restriction.
      
      With the guest running in supervisor mode, most exceptions go straight
      to the guest.  We will never get data or instruction storage or segment
      interrupts, alignment interrupts, decrementer interrupts, program
      interrupts, single-step interrupts, etc., coming to the hypervisor from
      the guest.  Therefore this introduces a new KVMTEST_NONHV macro for the
      exception entry path so that we don't have to do the KVM test on entry
      to those exception handlers.
      
      We do however get hypervisor decrementer, hypervisor data storage,
      hypervisor instruction storage, and hypervisor emulation assist
      interrupts, so we have to handle those.
      
      In hypervisor mode, real-mode accesses can access all of RAM, not just
      a limited amount.  Therefore we put all the guest state in the vcpu.arch
      and use the shadow_vcpu in the PACA only for temporary scratch space.
      We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
      anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
      We don't have a shared page with the guest, but we still need a
      kvm_vcpu_arch_shared struct to store the values of various registers,
      so we include one in the vcpu_arch struct.
      
      The POWER7 processor has a restriction that all threads in a core have
      to be in the same partition.  MMU-on kernel code counts as a partition
      (partition 0), so we have to do a partition switch on every entry to and
      exit from the guest.  At present we require the host and guest to run
      in single-thread mode because of this hardware restriction.
      
      This code allocates a hashed page table for the guest and initializes
      it with HPTEs for the guest's Virtual Real Memory Area (VRMA).  We
      require that the guest memory is allocated using 16MB huge pages, in
      order to simplify the low-level memory management.  This also means that
      we can get away without tracking paging activity in the host for now,
      since huge pages can't be paged or swapped.
      
      This also adds a few new exports needed by the book3s_hv code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      de56a948
    • P
      KVM: PPC: Split host-state fields out of kvmppc_book3s_shadow_vcpu · 3c42bf8a
      Paul Mackerras 提交于
      There are several fields in struct kvmppc_book3s_shadow_vcpu that
      temporarily store bits of host state while a guest is running,
      rather than anything relating to the particular guest or vcpu.
      This splits them out into a new kvmppc_host_state structure and
      modifies the definitions in asm-offsets.c to suit.
      
      On 32-bit, we have a kvmppc_host_state structure inside the
      kvmppc_book3s_shadow_vcpu since the assembly code needs to be able
      to get to them both with one pointer.  On 64-bit they are separate
      fields in the PACA.  This means that on 64-bit we don't need to
      copy the kvmppc_host_state in and out on vcpu load/unload, and
      in future will mean that the book3s_hv code doesn't need a
      shadow_vcpu struct in the PACA at all.  That does mean that we
      have to be careful not to rely on any values persisting in the
      hstate field of the paca across any point where we could block
      or get preempted.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3c42bf8a
    • P
      KVM: PPC: Move guest enter/exit down into subarch-specific code · df6909e5
      Paul Mackerras 提交于
      Instead of doing the kvm_guest_enter/exit() and local_irq_dis/enable()
      calls in powerpc.c, this moves them down into the subarch-specific
      book3s_pr.c and booke.c.  This eliminates an extra local_irq_enable()
      call in book3s_pr.c, and will be needed for when we do SMT4 guest
      support in the book3s hypervisor mode code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      df6909e5
    • P
      KVM: PPC: Pass init/destroy vm and prepare/commit memory region ops down · f9e0554d
      Paul Mackerras 提交于
      This arranges for the top-level arch/powerpc/kvm/powerpc.c file to
      pass down some of the calls it gets to the lower-level subarchitecture
      specific code.  The lower-level implementations (in booke.c and book3s.c)
      are no-ops.  The coming book3s_hv.c will need this.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      f9e0554d
    • P
      KVM: PPC: Deliver program interrupts right away instead of queueing them · 3cf658b6
      Paul Mackerras 提交于
      Doing so means that we don't have to save the flags anywhere and gets
      rid of the last reference to to_book3s(vcpu) in arch/powerpc/kvm/book3s.c.
      
      Doing so is OK because a program interrupt won't be generated at the
      same time as any other synchronous interrupt.  If a program interrupt
      and an asynchronous interrupt (external or decrementer) are generated
      at the same time, the program interrupt will be delivered, which is
      correct because it has a higher priority, and then the asynchronous
      interrupt will be masked.
      
      We don't ever generate system reset or machine check interrupts to the
      guest, but if we did, then we would need to make sure they got delivered
      rather than the program interrupt.  The current code would be wrong in
      this situation anyway since it would deliver the program interrupt as
      well as the reset/machine check interrupt.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3cf658b6
    • P
      powerpc, KVM: Rework KVM checks in first-level interrupt handlers · b01c8b54
      Paul Mackerras 提交于
      Instead of branching out-of-line with the DO_KVM macro to check if we
      are in a KVM guest at the time of an interrupt, this moves the KVM
      check inline in the first-level interrupt handlers.  This speeds up
      the non-KVM case and makes sure that none of the interrupt handlers
      are missing the check.
      
      Because the first-level interrupt handlers are now larger, some things
      had to be move out of line in exceptions-64s.S.
      
      This all necessitated some minor changes to the interrupt entry code
      in KVM.  This also streamlines the book3s_32 KVM test.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b01c8b54
    • P
      KVM: PPC: Split out code from book3s.c into book3s_pr.c · f05ed4d5
      Paul Mackerras 提交于
      In preparation for adding code to enable KVM to use hypervisor mode
      on 64-bit Book 3S processors, this splits book3s.c into two files,
      book3s.c and book3s_pr.c, where book3s_pr.c contains the code that is
      specific to running the guest in problem state (user mode) and book3s.c
      contains code which should apply to all Book 3S processors.
      
      In doing this, we abstract some details, namely the interrupt offset,
      updating the interrupt pending flag, and detecting if the guest is
      in a critical section.  These are all things that will be different
      when we use hypervisor mode.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      f05ed4d5
    • P
      KVM: PPC: Move fields between struct kvm_vcpu_arch and kvmppc_vcpu_book3s · c4befc58
      Paul Mackerras 提交于
      This moves the slb field, which represents the state of the emulated
      SLB, from the kvmppc_vcpu_book3s struct to the kvm_vcpu_arch, and the
      hpte_hash_[v]pte[_long] fields from kvm_vcpu_arch to kvmppc_vcpu_book3s.
      This is in accord with the principle that the kvm_vcpu_arch struct
      represents the state of the emulated CPU, and the kvmppc_vcpu_book3s
      struct holds the auxiliary data structures used in the emulation.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c4befc58
    • P
      KVM: PPC: Fix machine checks on 32-bit Book3S · 149dbdb1
      Paul Mackerras 提交于
      Commit 69acc0d3ba ("KVM: PPC: Resolve real-mode handlers through
      function exports") resulted in vcpu->arch.trampoline_lowmem and
      vcpu->arch.trampoline_enter ending up with kernel virtual addresses
      rather than physical addresses.  This is OK on 64-bit Book3S machines,
      which ignore the top 4 bits of the effective address in real mode,
      but on 32-bit Book3S machines, accessing these addresses in real mode
      causes machine check interrupts, as the hardware uses the whole
      effective address as the physical address in real mode.
      
      This fixes the problem by using __pa() to convert these addresses
      to physical addresses.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      149dbdb1
    • S
      KVM: PPC: e500: Don't search over the entire TLB0. · 1aee47a0
      Scott Wood 提交于
      Only look in the 4 entries that could possibly contain the
      entry we're looking for.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1aee47a0
    • L
      KVM: PPC: e500: Add shadow PID support · dd9ebf1f
      Liu Yu 提交于
      Dynamically assign host PIDs to guest PIDs, splitting each guest PID into
      multiple host (shadow) PIDs based on kernel/user and MSR[IS/DS].  Use
      both PID0 and PID1 so that the shadow PIDs for the right mode can be
      selected, that correspond both to guest TID = zero and guest TID = guest
      PID.
      
      This allows us to significantly reduce the frequency of needing to
      invalidate the entire TLB.  When the guest mode or PID changes, we just
      update the host PID0/PID1.  And since the allocation of shadow PIDs is
      global, multiple guests can share the TLB without conflict.
      
      Note that KVM does not yet support the guest setting PID1 or PID2 to
      a value other than zero.  This will need to be fixed for nested KVM
      to work.  Until then, we enforce the requirement for guest PID1/PID2
      to stay zero by failing the emulation if the guest tries to set them
      to something else.
      Signed-off-by: NLiu Yu <yu.liu@freescale.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      dd9ebf1f
    • L
      KVM: PPC: e500: Stop keeping shadow TLB · 08b7fa92
      Liu Yu 提交于
      Instead of a fully separate set of TLB entries, keep just the
      pfn and dirty status.
      Signed-off-by: NLiu Yu <yu.liu@freescale.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      08b7fa92
    • S
      KVM: PPC: e500: enable magic page · a4cd8b23
      Scott Wood 提交于
      This is a shared page used for paravirtualization.  It is always present
      in the guest kernel's effective address space at the address indicated
      by the hypercall that enables it.
      
      The physical address specified by the hypercall is not used, as
      e500 does not have real mode.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a4cd8b23
    • S
      KVM: PPC: e500: Support large page mappings of PFNMAP vmas. · 9973d54e
      Scott Wood 提交于
      This allows large pages to be used on guest mappings backed by things like
      /dev/mem, resulting in a significant speedup when guest memory
      is mapped this way (it's useful for directly-assigned MMIO, too).
      
      This is not a substitute for hugetlbfs integration, but is useful for
      configurations where devices are directly assigned on chips without an
      IOMMU -- in these cases, we need guest physical and true physical to
      match, and be contiguous, so static reservation and mapping via /dev/mem
      is the most straightforward way to set things up.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9973d54e
    • S
      KVM: PPC: e500: Eliminate shadow_pages[], and use pfns instead. · 59c1f4e3
      Scott Wood 提交于
      This is in line with what other architectures do, and will allow us to
      map things other than ordinary, unreserved kernel pages -- such as
      dedicated devices, or large contiguous reserved regions.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      59c1f4e3
    • S
      KVM: PPC: e500: don't use MAS0 as intermediate storage. · 0ef30995
      Scott Wood 提交于
      This avoids races.  It also means that we use the shadow TLB way,
      rather than the hardware hint -- if this is a problem, we could do
      a tlbsx before inserting a TLB0 entry.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      0ef30995
    • S
      KVM: PPC: e500: Disable preloading TLB1 in tlb_load(). · 6fc4d1eb
      Scott Wood 提交于
      Since TLB1 loading doesn't check the shadow TLB before allocating another
      entry, you can get duplicates.
      
      Once shadow PIDs are enabled in a later patch, we won't need to
      invalidate the TLB on every switch, so this optimization won't be
      needed anyway.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      6fc4d1eb
    • S
      KVM: PPC: e500: Save/restore SPE state · 4cd35f67
      Scott Wood 提交于
      This is done lazily.  The SPE save will be done only if the guest has
      used SPE since the last preemption or heavyweight exit.  Restore will be
      done only on demand, when enabling MSR_SPE in the shadow MSR, in response
      to an SPE fault or mtmsr emulation.
      
      For SPEFSCR, Linux already switches it on context switch (non-lazily), so
      the only remaining bit is to save it between qemu and the guest.
      Signed-off-by: NLiu Yu <yu.liu@freescale.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4cd35f67
    • S
      KVM: PPC: booke: use shadow_msr · ecee273f
      Scott Wood 提交于
      Keep the guest MSR and the guest-mode true MSR separate, rather than
      modifying the guest MSR on each guest entry to produce a true MSR.
      
      Any bits which should be modified based on guest MSR must be explicitly
      propagated from vcpu->arch.shared->msr to vcpu->arch.shadow_msr in
      kvmppc_set_msr().
      
      While we're modifying the guest entry code, reorder a few instructions
      to bury some load latencies.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ecee273f
    • A
      KVM: PPC: Resolve real-mode handlers through function exports · a22a2dac
      Alexander Graf 提交于
      Up until now, Book3S KVM had variables stored in the kernel that a kernel module
      or the kvm code in the kernel could read from to figure out where some real mode
      helper functions are located.
      
      This is all unnecessary. The high bits of the EA get ignore in real mode, so we
      can just use the pointer as is. Also, it's a lot easier on relocations when we
      use the normal way of resolving the address to a function, instead of jumping
      through hoops.
      
      This patch fixes compilation with CONFIG_RELOCATABLE=y.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a22a2dac
    • S
      KVM: PPC: fix partial application of "exit timing in ticks" · 24294b9a
      Stuart Yoder 提交于
      When http://www.spinics.net/lists/kvm-ppc/msg02664.html
      was applied to produce commit b51e7aa7ed6d8d134d02df78300ab0f91cfff4d2,
      the removal of the conversion in add_exit_timing was left out.
      Signed-off-by: NStuart Yoder <stuart.yoder@freescale.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      24294b9a
  2. 22 5月, 2011 5 次提交
  3. 20 5月, 2011 2 次提交
  4. 11 5月, 2011 1 次提交
    • B
      KVM: PPC: Fix issue clearing exit timing counters · 09000adb
      Bharat Bhushan 提交于
      Following dump is observed on host when clearing the exit timing counters
      
      [root@p1021mds kvm]# echo -n 'c' > vm1200_vcpu0_timing
      INFO: task echo:1276 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      echo          D 0ff5bf94     0  1276   1190 0x00000000
      Call Trace:
      [c2157e40] [c0007908] __switch_to+0x9c/0xc4
      [c2157e50] [c040293c] schedule+0x1b4/0x3bc
      [c2157e90] [c04032dc] __mutex_lock_slowpath+0x74/0xc0
      [c2157ec0] [c00369e4] kvmppc_init_timing_stats+0x20/0xb8
      [c2157ed0] [c0036b00] kvmppc_exit_timing_write+0x84/0x98
      [c2157ef0] [c00b9f90] vfs_write+0xc0/0x16c
      [c2157f10] [c00ba284] sys_write+0x4c/0x90
      [c2157f40] [c000e320] ret_from_syscall+0x0/0x3c
      
              The vcpu->mutex is used by kvm_ioctl_* (KVM_RUN etc) and same was
      used when clearing the stats (in kvmppc_init_timing_stats()). What happens
      is that when the guest is idle then it held the vcpu->mutx. While the
      exiting timing process waits for guest to release the vcpu->mutex and
      a hang state is reached.
      
              Now using seprate lock for exit timing stats.
      Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
      Acked-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      09000adb
  5. 20 4月, 2011 3 次提交
  6. 18 3月, 2011 1 次提交
  7. 12 1月, 2011 2 次提交
  8. 06 11月, 2010 4 次提交
  9. 24 10月, 2010 1 次提交
    • A
      KVM: PPC: Fix compile error in e500_tlb.c · 344941be
      Alexander Graf 提交于
      The e500_tlb.c file didn't compile for me due to the following error:
      
      arch/powerpc/kvm/e500_tlb.c: In function ‘kvmppc_e500_shadow_map’:
      arch/powerpc/kvm/e500_tlb.c:300: error: format ‘%lx’ expects type ‘long unsigned int’, but argument 2 has type ‘gfn_t’
      
      So let's explicitly cast the argument to make printk happy.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      344941be