1. 27 1月, 2014 2 次提交
    • P
      KVM: PPC: Book3S PR: Cope with doorbell interrupts · 40688909
      Paul Mackerras 提交于
      When the PR host is running on a POWER8 machine in POWER8 mode, it
      will use doorbell interrupts for IPIs.  If one of them arrives while
      we are in the guest, we pop out of the guest with trap number 0xA00,
      which isn't handled by kvmppc_handle_exit_pr, leading to the following
      BUG_ON:
      
      [  331.436215] exit_nr=0xa00 | pc=0x1d2c | msr=0x800000000000d032
      [  331.437522] ------------[ cut here ]------------
      [  331.438296] kernel BUG at arch/powerpc/kvm/book3s_pr.c:982!
      [  331.439063] Oops: Exception in kernel mode, sig: 5 [#2]
      [  331.439819] SMP NR_CPUS=1024 NUMA pSeries
      [  331.440552] Modules linked in: tun nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw virtio_net kvm binfmt_misc ibmvscsi scsi_transport_srp scsi_tgt virtio_blk
      [  331.447614] CPU: 11 PID: 1296 Comm: qemu-system-ppc Tainted: G      D      3.11.7-200.2.fc19.ppc64p7 #1
      [  331.448920] task: c0000003bdc8c000 ti: c0000003bd32c000 task.ti: c0000003bd32c000
      [  331.450088] NIP: d0000000025d6b9c LR: d0000000025d6b98 CTR: c0000000004cfdd0
      [  331.451042] REGS: c0000003bd32f420 TRAP: 0700   Tainted: G      D       (3.11.7-200.2.fc19.ppc64p7)
      [  331.452331] MSR: 800000000282b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 28004824  XER: 20000000
      [  331.454616] SOFTE: 1
      [  331.455106] CFAR: c000000000848bb8
      [  331.455726]
      GPR00: d0000000025d6b98 c0000003bd32f6a0 d0000000026017b8 0000000000000032
      GPR04: c0000000018627f8 c000000001873208 320d0a3030303030 3030303030643033
      GPR08: c000000000c490a8 0000000000000000 0000000000000000 0000000000000002
      GPR12: 0000000028004822 c00000000fdc6300 0000000000000000 00000100076ec310
      GPR16: 000000002ae343b8 00003ffffd397398 0000000000000000 0000000000000000
      GPR20: 00000100076f16f4 00000100076ebe60 0000000000000008 ffffffffffffffff
      GPR24: 0000000000000000 0000008001041e60 0000000000000000 0000008001040ce8
      GPR28: c0000003a2d80000 0000000000000a00 0000000000000001 c0000003a2681810
      [  331.466504] NIP [d0000000025d6b9c] .kvmppc_handle_exit_pr+0x75c/0xa80 [kvm]
      [  331.466999] LR [d0000000025d6b98] .kvmppc_handle_exit_pr+0x758/0xa80 [kvm]
      [  331.467517] Call Trace:
      [  331.467909] [c0000003bd32f6a0] [d0000000025d6b98] .kvmppc_handle_exit_pr+0x758/0xa80 [kvm] (unreliable)
      [  331.468553] [c0000003bd32f750] [d0000000025d98f0] kvm_start_lightweight+0xb4/0xc4 [kvm]
      [  331.469189] [c0000003bd32f920] [d0000000025d7648] .kvmppc_vcpu_run_pr+0xd8/0x270 [kvm]
      [  331.469838] [c0000003bd32f9c0] [d0000000025cf748] .kvmppc_vcpu_run+0xc8/0xf0 [kvm]
      [  331.470790] [c0000003bd32fa50] [d0000000025cc19c] .kvm_arch_vcpu_ioctl_run+0x5c/0x1b0 [kvm]
      [  331.471401] [c0000003bd32fae0] [d0000000025c4888] .kvm_vcpu_ioctl+0x478/0x730 [kvm]
      [  331.472026] [c0000003bd32fc90] [c00000000026192c] .do_vfs_ioctl+0x4dc/0x7a0
      [  331.472561] [c0000003bd32fd80] [c000000000261cc4] .SyS_ioctl+0xd4/0xf0
      [  331.473095] [c0000003bd32fe30] [c000000000009ed8] syscall_exit+0x0/0x98
      [  331.473633] Instruction dump:
      [  331.473766] 4bfff9b4 2b9d0800 419efc18 60000000 60420000 3d220000 e8bf11a0 e8df12a8
      [  331.474733] 7fa4eb78 e8698660 48015165 e8410028 <0fe00000> 813f00e4 3ba00000 39290001
      [  331.475386] ---[ end trace 49fc47d994c1f8f2 ]---
      [  331.479817]
      
      This fixes the problem by making kvmppc_handle_exit_pr() recognize the
      interrupt.  We also need to jump to the doorbell interrupt handler in
      book3s_segment.S to handle the interrupt on the way out of the guest.
      Having done that, there's nothing further to be done in
      kvmppc_handle_exit_pr().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      40688909
    • S
      kvm/ppc: IRQ disabling cleanup · 6c85f52b
      Scott Wood 提交于
      Simplify the handling of lazy EE by going directly from fully-enabled
      to hard-disabled.  This replaces the lazy_irq_pending() check
      (including its misplaced kvm_guest_exit() call).
      
      As suggested by Tiejun Chen, move the interrupt disabling into
      kvmppc_prepare_to_enter() rather than have each caller do it.  Also
      move the IRQ enabling on heavyweight exit into
      kvmppc_prepare_to_enter().
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      6c85f52b
  2. 09 1月, 2014 4 次提交
    • P
      KVM: PPC: Load/save FP/VMX/VSX state directly to/from vcpu struct · 99dae3ba
      Paul Mackerras 提交于
      Now that we have the vcpu floating-point and vector state stored in
      the same type of struct as the main kernel uses, we can load that
      state directly from the vcpu struct instead of having extra copies
      to/from the thread_struct.  Similarly, when the guest state needs to
      be saved, we can have it saved it directly to the vcpu struct by
      setting the current->thread.fp_save_area and current->thread.vr_save_area
      pointers.  That also means that we don't need to back up and restore
      userspace's FP/vector state.  This all makes the code simpler and
      faster.
      
      Note that it's not necessary to save or modify current->thread.fpexc_mode,
      since nothing in KVM uses or is affected by its value.  Nor is it
      necessary to touch used_vr or used_vsr.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      99dae3ba
    • P
      KVM: PPC: Store FP/VSX/VMX state in thread_fp/vr_state structures · efff1912
      Paul Mackerras 提交于
      This uses struct thread_fp_state and struct thread_vr_state to store
      the floating-point, VMX/Altivec and VSX state, rather than flat arrays.
      This makes transferring the state to/from the thread_struct simpler
      and allows us to unify the get/set_one_reg implementations for the
      VSX registers.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      efff1912
    • P
      KVM: PPC: Use load_fp/vr_state rather than load_up_fpu/altivec · 09548fda
      Paul Mackerras 提交于
      The load_up_fpu and load_up_altivec functions were never intended to
      be called from C, and do things like modifying the MSR value in their
      callers' stack frames, which are assumed to be interrupt frames.  In
      addition, on 32-bit Book S they require the MMU to be off.
      
      This makes KVM use the new load_fp_state() and load_vr_state() functions
      instead of load_up_fpu/altivec.  This means we can remove the assembler
      glue in book3s_rmhandlers.S, and potentially fixes a bug on Book E,
      where load_up_fpu was called directly from C.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      09548fda
    • A
      KVM: PPC: Add devname:kvm aliases for modules · 398a76c6
      Alexander Graf 提交于
      Systems that support automatic loading of kernel modules through
      device aliases should try and automatically load kvm when /dev/kvm
      gets opened.
      
      Add code to support that magic for all PPC kvm targets, even the
      ones that don't support modules yet.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      398a76c6
  3. 09 12月, 2013 1 次提交
    • A
      KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvy · 40fdd8c8
      Alexander Graf 提交于
      As soon as we get back to our "highmem" handler in virtual address
      space we may get preempted. Today the reason we can get preempted is
      that we replay interrupts and all the lazy logic thinks we have
      interrupts enabled.
      
      However, it's not hard to make the code interruptible and that way
      we can enable and handle interrupts even earlier.
      
      This fixes random guest crashes that happened with CONFIG_PREEMPT=y
      for me.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      40fdd8c8
  4. 18 10月, 2013 2 次提交
  5. 17 10月, 2013 12 次提交
    • A
      kvm: powerpc: book3s: Support building HV and PR KVM as module · 2ba9f0d8
      Aneesh Kumar K.V 提交于
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [agraf: squash in compile fix]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      2ba9f0d8
    • A
      kvm: powerpc: book3s: pr: move PR related tracepoints to a separate header · 72c12535
      Aneesh Kumar K.V 提交于
      This patch moves PR related tracepoints to a separate header. This
      enables in converting PR to a kernel module which will be done in
      later patches
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      72c12535
    • A
      kvm: powerpc: book3s: Add is_hv_enabled to kvmppc_ops · 699cc876
      Aneesh Kumar K.V 提交于
      This help us to identify whether we are running with hypervisor mode KVM
      enabled. The change is needed so that we can have both HV and PR kvm
      enabled in the same kernel.
      
      If both HV and PR KVM are included, interrupts come in to the HV version
      of the kvmppc_interrupt code, which then jumps to the PR handler,
      renamed to kvmppc_interrupt_pr, if the guest is a PR guest.
      
      Allowing both PR and HV in the same kernel required some changes to
      kvm_dev_ioctl_check_extension(), since the values returned now can't
      be selected with #ifdefs as much as previously. We look at is_hv_enabled
      to return the right value when checking for capabilities.For capabilities that
      are only provided by HV KVM, we return the HV value only if
      is_hv_enabled is true. For capabilities provided by PR KVM but not HV,
      we return the PR value only if is_hv_enabled is false.
      
      NOTE: in later patch we replace is_hv_enabled with a static inline
      function comparing kvm_ppc_ops
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      699cc876
    • A
      kvm: powerpc: Add kvmppc_ops callback · 3a167bea
      Aneesh Kumar K.V 提交于
      This patch add a new callback kvmppc_ops. This will help us in enabling
      both HV and PR KVM together in the same kernel. The actual change to
      enable them together is done in the later patch in the series.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [agraf: squash in booke changes]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3a167bea
    • P
      KVM: PPC: Book3S PR: Reduce number of shadow PTEs invalidated by MMU notifiers · 491d6ecc
      Paul Mackerras 提交于
      Currently, whenever any of the MMU notifier callbacks get called, we
      invalidate all the shadow PTEs.  This is inefficient because it means
      that we typically then get a lot of DSIs and ISIs in the guest to fault
      the shadow PTEs back in.  We do this even if the address range being
      notified doesn't correspond to guest memory.
      
      This commit adds code to scan the memslot array to find out what range(s)
      of guest physical addresses corresponds to the host virtual address range
      being affected.  For each such range we flush only the shadow PTEs
      for the range, on all cpus.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      491d6ecc
    • P
      KVM: PPC: Book3S PR: Better handling of host-side read-only pages · 93b159b4
      Paul Mackerras 提交于
      Currently we request write access to all pages that get mapped into the
      guest, even if the guest is only loading from the page.  This reduces
      the effectiveness of KSM because it means that we unshare every page we
      access.  Also, we always set the changed (C) bit in the guest HPTE if
      it allows writing, even for a guest load.
      
      This fixes both these problems.  We pass an 'iswrite' flag to the
      mmu.xlate() functions and to kvmppc_mmu_map_page() to indicate whether
      the access is a load or a store.  The mmu.xlate() functions now only
      set C for stores.  kvmppc_gfn_to_pfn() now calls gfn_to_pfn_prot()
      instead of gfn_to_pfn() so that it can indicate whether we need write
      access to the page, and get back a 'writable' flag to indicate whether
      the page is writable or not.  If that 'writable' flag is clear, we then
      make the host HPTE read-only even if the guest HPTE allowed writing.
      
      This means that we can get a protection fault when the guest writes to a
      page that it has mapped read-write but which is read-only on the host
      side (perhaps due to KSM having merged the page).  Thus we now call
      kvmppc_handle_pagefault() for protection faults as well as HPTE not found
      faults.  In kvmppc_handle_pagefault(), if the access was allowed by the
      guest HPTE and we thus need to install a new host HPTE, we then need to
      remove the old host HPTE if there is one.  This is done with a new
      function, kvmppc_mmu_unmap_page(), which uses kvmppc_mmu_pte_vflush() to
      find and remove the old host HPTE.
      
      Since the memslot-related functions require the KVM SRCU read lock to
      be held, this adds srcu_read_lock/unlock pairs around the calls to
      kvmppc_handle_pagefault().
      
      Finally, this changes kvmppc_mmu_book3s_32_xlate_pte() to not ignore
      guest HPTEs that don't permit access, and to return -EPERM for accesses
      that are not permitted by the page protections.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      93b159b4
    • P
      KVM: PPC: Book3S PR: Allocate kvm_vcpu structs from kvm_vcpu_cache · 3ff95502
      Paul Mackerras 提交于
      This makes PR KVM allocate its kvm_vcpu structs from the kvm_vcpu_cache
      rather than having them embedded in the kvmppc_vcpu_book3s struct,
      which is allocated with vzalloc.  The reason is to reduce the
      differences between PR and HV KVM in order to make is easier to have
      them coexist in one kernel binary.
      
      With this, the kvm_vcpu struct has a pointer to the kvmppc_vcpu_book3s
      struct.  The pointer to the kvmppc_book3s_shadow_vcpu struct has moved
      from the kvmppc_vcpu_book3s struct to the kvm_vcpu struct, and is only
      present for 32-bit, since it is only used for 32-bit.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: squash in compile fix from Aneesh]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3ff95502
    • P
      KVM: PPC: Book3S PR: Make HPT accesses and updates SMP-safe · 9308ab8e
      Paul Mackerras 提交于
      This adds a per-VM mutex to provide mutual exclusion between vcpus
      for accesses to and updates of the guest hashed page table (HPT).
      This also makes the code use single-byte writes to the HPT entry
      when updating of the reference (R) and change (C) bits.  The reason
      for doing this, rather than writing back the whole HPTE, is that on
      non-PAPR virtual machines, the guest OS might be writing to the HPTE
      concurrently, and writing back the whole HPTE might conflict with
      that.  Also, real hardware does single-byte writes to update R and C.
      
      The new mutex is taken in kvmppc_mmu_book3s_64_xlate() when reading
      the HPT and updating R and/or C, and in the PAPR HPT update hcalls
      (H_ENTER, H_REMOVE, etc.).  Having the mutex means that we don't need
      to use a hypervisor lock bit in the HPT update hcalls, and we don't
      need to be careful about the order in which the bytes of the HPTE are
      updated by those hcalls.
      
      The other change here is to make emulated TLB invalidations (tlbie)
      effective across all vcpus.  To do this we call kvmppc_mmu_pte_vflush
      for all vcpus in kvmppc_ppc_book3s_64_tlbie().
      
      For 32-bit, this makes the setting of the accessed and dirty bits use
      single-byte writes, and makes tlbie invalidate shadow HPTEs for all
      vcpus.
      
      With this, PR KVM can successfully run SMP guests.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9308ab8e
    • P
      KVM: PPC: Book3S PR: Use 64k host pages where possible · c9029c34
      Paul Mackerras 提交于
      Currently, PR KVM uses 4k pages for the host-side mappings of guest
      memory, regardless of the host page size.  When the host page size is
      64kB, we might as well use 64k host page mappings for guest mappings
      of 64kB and larger pages and for guest real-mode mappings.  However,
      the magic page has to remain a 4k page.
      
      To implement this, we first add another flag bit to the guest VSID
      values we use, to indicate that this segment is one where host pages
      should be mapped using 64k pages.  For segments with this bit set
      we set the bits in the shadow SLB entry to indicate a 64k base page
      size.  When faulting in host HPTEs for this segment, we make them
      64k HPTEs instead of 4k.  We record the pagesize in struct hpte_cache
      for use when invalidating the HPTE.
      
      For now we restrict the segment containing the magic page (if any) to
      4k pages.  It should be possible to lift this restriction in future
      by ensuring that the magic 4k page is appropriately positioned within
      a host 64k page.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c9029c34
    • P
      KVM: PPC: Book3S PR: Allow guest to use 64k pages · a4a0f252
      Paul Mackerras 提交于
      This adds the code to interpret 64k HPTEs in the guest hashed page
      table (HPT), 64k SLB entries, and to tell the guest about 64k pages
      in kvm_vm_ioctl_get_smmu_info().  Guest 64k pages are still shadowed
      by 4k pages.
      
      This also adds another hash table to the four we have already in
      book3s_mmu_hpte.c to allow us to find all the PTEs that we have
      instantiated that match a given 64k guest page.
      
      The tlbie instruction changed starting with POWER6 to use a bit in
      the RB operand to indicate large page invalidations, and to use other
      RB bits to indicate the base and actual page sizes and the segment
      size.  64k pages came in slightly earlier, with POWER5++.
      We use one bit in vcpu->arch.hflags to indicate that the emulated
      cpu supports 64k pages, and another to indicate that it has the new
      tlbie definition.
      
      The KVM_PPC_GET_SMMU_INFO ioctl presents a bit of a problem, because
      the MMU capabilities depend on which CPU model we're emulating, but it
      is a VM ioctl not a VCPU ioctl and therefore doesn't get passed a VCPU
      fd.  In addition, commonly-used userspace (QEMU) calls it before
      setting the PVR for any VCPU.  Therefore, as a best effort we look at
      the first vcpu in the VM and return 64k pages or not depending on its
      capabilities.  We also make the PVR default to the host PVR on recent
      CPUs that support 1TB segments (and therefore multiple page sizes as
      well) so that KVM_PPC_GET_SMMU_INFO will include 64k page and 1TB
      segment support on those CPUs.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a4a0f252
    • P
      KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu · a2d56020
      Paul Mackerras 提交于
      Currently PR-style KVM keeps the volatile guest register values
      (R0 - R13, CR, LR, CTR, XER, PC) in a shadow_vcpu struct rather than
      the main kvm_vcpu struct.  For 64-bit, the shadow_vcpu exists in two
      places, a kmalloc'd struct and in the PACA, and it gets copied back
      and forth in kvmppc_core_vcpu_load/put(), because the real-mode code
      can't rely on being able to access the kmalloc'd struct.
      
      This changes the code to copy the volatile values into the shadow_vcpu
      as one of the last things done before entering the guest.  Similarly
      the values are copied back out of the shadow_vcpu to the kvm_vcpu
      immediately after exiting the guest.  We arrange for interrupts to be
      still disabled at this point so that we can't get preempted on 64-bit
      and end up copying values from the wrong PACA.
      
      This means that the accessor functions in kvm_book3s.h for these
      registers are greatly simplified, and are same between PR and HV KVM.
      In places where accesses to shadow_vcpu fields are now replaced by
      accesses to the kvm_vcpu, we can also remove the svcpu_get/put pairs.
      Finally, on 64-bit, we don't need the kmalloc'd struct at all any more.
      
      With this, the time to read the PVR one million times in a loop went
      from 567.7ms to 575.5ms (averages of 6 values), an increase of about
      1.4% for this worse-case test for guest entries and exits.  The
      standard deviation of the measurements is about 11ms, so the
      difference is only marginally significant statistically.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a2d56020
    • P
      KVM: PPC: Book3S PR: Fix compilation without CONFIG_ALTIVEC · f2481771
      Paul Mackerras 提交于
      Commit 9d1ffdd8 ("KVM: PPC: Book3S PR: Don't corrupt guest state
      when kernel uses VMX") added a call to kvmppc_load_up_altivec() that
      isn't guarded by CONFIG_ALTIVEC, causing a link failure when building
      a kernel without CONFIG_ALTIVEC set.  This adds an #ifdef to fix this.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      f2481771
  6. 11 10月, 2013 1 次提交
    • P
      powerpc: Put FP/VSX and VR state into structures · de79f7b9
      Paul Mackerras 提交于
      This creates new 'thread_fp_state' and 'thread_vr_state' structures
      to store FP/VSX state (including FPSCR) and Altivec/VSX state
      (including VSCR), and uses them in the thread_struct.  In the
      thread_fp_state, the FPRs and VSRs are represented as u64 rather
      than double, since we rarely perform floating-point computations
      on the values, and this will enable the structures to be used
      in KVM code as well.  Similarly FPSCR is now a u64 rather than
      a structure of two 32-bit values.
      
      This takes the offsets out of the macros such as SAVE_32FPRS,
      REST_32FPRS, etc.  This enables the same macros to be used for normal
      and transactional state, enabling us to delete the transactional
      versions of the macros.   This also removes the unused do_load_up_fpu
      and do_load_up_altivec, which were in fact buggy since they didn't
      create large enough stack frames to account for the fact that
      load_up_fpu and load_up_altivec are not designed to be called from C
      and assume that their caller's stack frame is an interrupt frame.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      de79f7b9
  7. 28 8月, 2013 3 次提交
    • P
      KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls · 8b23de29
      Paul Mackerras 提交于
      It turns out that if we exit the guest due to a hcall instruction (sc 1),
      and the loading of the instruction in the guest exit path fails for any
      reason, the call to kvmppc_ld() in kvmppc_get_last_inst() fetches the
      instruction after the hcall instruction rather than the hcall itself.
      This in turn means that the instruction doesn't get recognized as an
      hcall in kvmppc_handle_exit_pr() but gets passed to the guest kernel
      as a sc instruction.  That usually results in the guest kernel getting
      a return code of 38 (ENOSYS) from an hcall, which often triggers a
      BUG_ON() or other failure.
      
      This fixes the problem by adding a new variant of kvmppc_get_last_inst()
      called kvmppc_get_last_sc(), which fetches the instruction if necessary
      from pc - 4 rather than pc.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8b23de29
    • P
      KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX · 9d1ffdd8
      Paul Mackerras 提交于
      Currently the code assumes that once we load up guest FP/VSX or VMX
      state into the CPU, it stays valid in the CPU registers until we
      explicitly flush it to the thread_struct.  However, on POWER7,
      copy_page() and memcpy() can use VMX.  These functions do flush the
      VMX state to the thread_struct before using VMX instructions, but if
      this happens while we have guest state in the VMX registers, and we
      then re-enter the guest, we don't reload the VMX state from the
      thread_struct, leading to guest corruption.  This has been observed
      to cause guest processes to segfault.
      
      To fix this, we check before re-entering the guest that all of the
      bits corresponding to facilities owned by the guest, as expressed
      in vcpu->arch.guest_owned_ext, are set in current->thread.regs->msr.
      Any bits that have been cleared correspond to facilities that have
      been used by kernel code and thus flushed to the thread_struct, so
      for them we reload the state from the thread_struct.
      
      We also need to check current->thread.regs->msr before calling
      giveup_fpu() or giveup_altivec(), since if the relevant bit is
      clear, the state has already been flushed to the thread_struct and
      to flush it again would corrupt it.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9d1ffdd8
    • T
      KVM: PPC: Book3S PR: return appropriate error when allocation fails · 7c7b406e
      Thadeu Lima de Souza Cascardo 提交于
      err was overwritten by a previous function call, and checked to be 0. If
      the following page allocation fails, 0 is going to be returned instead
      of -ENOMEM.
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7c7b406e
  8. 09 8月, 2013 1 次提交
  9. 11 7月, 2013 1 次提交
  10. 30 6月, 2013 1 次提交
    • P
      KVM: PPC: Book3S PR: Allow guest to use 1TB segments · 0f296829
      Paul Mackerras 提交于
      With this, the guest can use 1TB segments as well as 256MB segments.
      Since we now have the situation where a single emulated guest segment
      could correspond to multiple shadow segments (as the shadow segments
      are still 256MB segments), this adds a new kvmppc_mmu_flush_segment()
      to scan for all shadow segments that need to be removed.
      
      This restructures the guest HPT (hashed page table) lookup code to
      use the correct hashing and matching functions for HPTEs within a
      1TB segment.  We use the standard hpt_hash() function instead of
      open-coding the hash calculation, and we use HPTE_V_COMPARE() with
      an AVPN value that has the B (segment size) field included.  The
      calculation of avpn is done a little earlier since it doesn't change
      in the loop starting at the do_second label.
      
      The computation in kvmppc_mmu_book3s_64_esid_to_vsid() changes so that
      it returns a 256MB VSID even if the guest SLB entry is a 1TB entry.
      This is because the users of this function are creating 256MB SLB
      entries.  We set a new VSID_1T flag so that entries created from 1T
      segments don't collide with entries from 256MB segments.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      0f296829
  11. 27 4月, 2013 3 次提交
  12. 18 4月, 2013 1 次提交
  13. 18 3月, 2013 1 次提交
  14. 05 3月, 2013 1 次提交
  15. 15 2月, 2013 1 次提交
    • P
      powerpc/kvm/book3s_pr: Fix compilation on 32-bit machines · deb26c27
      Paul Mackerras 提交于
      Commit a413f474 ("powerpc: Disable relocation on exceptions whenever
      PR KVM is active") added calls to pSeries_disable_reloc_on_exc() and
      pSeries_enable_reloc_on_exc() to book3s_pr.c, and added declarations
      of those functions to <asm/hvcall.h>, but didn't add an include of
      <asm/hvcall.h> to book3s_pr.c.  64-bit kernels seem to get hvcall.h
      included via some other path, but 32-bit kernels fail to compile with:
      
      arch/powerpc/kvm/book3s_pr.c: In function ‘kvmppc_core_init_vm’:
      arch/powerpc/kvm/book3s_pr.c:1300:4: error: implicit declaration of function ‘pSeries_disable_reloc_on_exc’ [-Werror=implicit-function-declaration]
      arch/powerpc/kvm/book3s_pr.c: In function ‘kvmppc_core_destroy_vm’:
      arch/powerpc/kvm/book3s_pr.c:1316:4: error: implicit declaration of function ‘pSeries_enable_reloc_on_exc’ [-Werror=implicit-function-declaration]
      cc1: all warnings being treated as errors
      make[2]: *** [arch/powerpc/kvm/book3s_pr.o] Error 1
      make[1]: *** [arch/powerpc/kvm] Error 2
      make: *** [sub-make] Error 2
      
      This fixes it by adding an include of hvcall.h.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      deb26c27
  16. 10 1月, 2013 2 次提交
    • A
      KVM: PPC: Book3S: PR: Enable alternative instruction for SC 1 · 50c7bb80
      Alexander Graf 提交于
      When running on top of pHyp, the hypercall instruction "sc 1" goes
      straight into pHyp without trapping in supervisor mode.
      
      So if we want to support PAPR guest in this configuration we need to
      add a second way of accessing PAPR hypercalls, preferably with the
      exact same semantics except for the instruction.
      
      So let's overlay an officially reserved instruction and emulate PAPR
      hypercalls whenever we hit that one.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      50c7bb80
    • I
      powerpc: Disable relocation on exceptions whenever PR KVM is active · a413f474
      Ian Munsie 提交于
      For PR KVM we allow userspace to map 0xc000000000000000. Because
      transitioning from userspace to the guest kernel may use the relocated
      exception vectors we have to disable relocation on exceptions whenever
      PR KVM is active as we cannot trust that address.
      
      This issue does not apply to HV KVM, since changing from a guest to the
      hypervisor will never use the relocated exception vectors.
      
      Currently the hypervisor interface only allows us to toggle relocation
      on exceptions on a partition wide scope, so we need to globally disable
      relocation on exceptions when the first PR KVM instance is started and
      only re-enable them when all PR KVM instances have been destroyed.
      
      It's a bit heavy handed, but until the hypervisor gives us a lightweight
      way to toggle relocation on exceptions on a single thread it's only real
      option.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a413f474
  17. 06 12月, 2012 2 次提交
    • P
      KVM: PPC: Book3S PR: MSR_DE doesn't exist on Book 3S · 3a2e7b0d
      Paul Mackerras 提交于
      The mask of MSR bits that get transferred from the guest MSR to the
      shadow MSR included MSR_DE.  In fact that bit only exists on Book 3E
      processors, and it is assigned the same bit used for MSR_BE on Book 3S
      processors.  Since we already had MSR_BE in the mask, this just removes
      MSR_DE.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3a2e7b0d
    • P
      KVM: PPC: Book3S PR: Fix VSX handling · 28c483b6
      Paul Mackerras 提交于
      This fixes various issues in how we were handling the VSX registers
      that exist on POWER7 machines.  First, we were running off the end
      of the current->thread.fpr[] array.  Ultimately this was because the
      vcpu->arch.vsr[] array is sized to be able to store both the FP
      registers and the extra VSX registers (i.e. 64 entries), but PR KVM
      only uses it for the extra VSX registers (i.e. 32 entries).
      
      Secondly, calling load_up_vsx() from C code is a really bad idea,
      because it jumps to fast_exception_return at the end, rather than
      returning with a blr instruction.  This was causing it to jump off
      to a random location with random register contents, since it was using
      the largely uninitialized stack frame created by kvmppc_load_up_vsx.
      
      In fact, it isn't necessary to call either __giveup_vsx or load_up_vsx,
      since giveup_fpu and load_up_fpu handle the extra VSX registers as well
      as the standard FP registers on machines with VSX.  Also, since VSX
      instructions can access the VMX registers and the FP registers as well
      as the extra VSX registers, we have to load up the FP and VMX registers
      before we can turn on the MSR_VSX bit for the guest.  Conversely, if
      we save away any of the VSX or FP registers, we have to turn off MSR_VSX
      for the guest.
      
      To handle all this, it is more convenient for a single call to
      kvmppc_giveup_ext() to handle all the state saving that needs to be done,
      so we make it take a set of MSR bits rather than just one, and the switch
      statement becomes a series of if statements.  Similarly kvmppc_handle_ext
      needs to be able to load up more than one set of registers.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      28c483b6
  18. 06 10月, 2012 1 次提交
    • P
      KVM: PPC: Book3S: Get/set guest FP regs using the GET/SET_ONE_REG interface · a8bd19ef
      Paul Mackerras 提交于
      This enables userspace to get and set all the guest floating-point
      state using the KVM_[GS]ET_ONE_REG ioctls.  The floating-point state
      includes all of the traditional floating-point registers and the
      FPSCR (floating point status/control register), all the VMX/Altivec
      vector registers and the VSCR (vector status/control register), and
      on POWER7, the vector-scalar registers (note that each FP register
      is the high-order half of the corresponding VSR).
      
      Most of these are implemented in common Book 3S code, except for VSX
      on POWER7.  Because HV and PR differ in how they store the FP and VSX
      registers on POWER7, the code for these cases is not common.  On POWER7,
      the FP registers are the upper halves of the VSX registers vsr0 - vsr31.
      PR KVM stores vsr0 - vsr31 in two halves, with the upper halves in the
      arch.fpr[] array and the lower halves in the arch.vsr[] array, whereas
      HV KVM on POWER7 stores the whole VSX register in arch.vsr[].
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix whitespace, vsx compilation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a8bd19ef