1. 17 10月, 2013 7 次提交
    • P
      KVM: PPC: Book3S PR: Allow guest to use 64k pages · a4a0f252
      Paul Mackerras 提交于
      This adds the code to interpret 64k HPTEs in the guest hashed page
      table (HPT), 64k SLB entries, and to tell the guest about 64k pages
      in kvm_vm_ioctl_get_smmu_info().  Guest 64k pages are still shadowed
      by 4k pages.
      
      This also adds another hash table to the four we have already in
      book3s_mmu_hpte.c to allow us to find all the PTEs that we have
      instantiated that match a given 64k guest page.
      
      The tlbie instruction changed starting with POWER6 to use a bit in
      the RB operand to indicate large page invalidations, and to use other
      RB bits to indicate the base and actual page sizes and the segment
      size.  64k pages came in slightly earlier, with POWER5++.
      We use one bit in vcpu->arch.hflags to indicate that the emulated
      cpu supports 64k pages, and another to indicate that it has the new
      tlbie definition.
      
      The KVM_PPC_GET_SMMU_INFO ioctl presents a bit of a problem, because
      the MMU capabilities depend on which CPU model we're emulating, but it
      is a VM ioctl not a VCPU ioctl and therefore doesn't get passed a VCPU
      fd.  In addition, commonly-used userspace (QEMU) calls it before
      setting the PVR for any VCPU.  Therefore, as a best effort we look at
      the first vcpu in the VM and return 64k pages or not depending on its
      capabilities.  We also make the PVR default to the host PVR on recent
      CPUs that support 1TB segments (and therefore multiple page sizes as
      well) so that KVM_PPC_GET_SMMU_INFO will include 64k page and 1TB
      segment support on those CPUs.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a4a0f252
    • P
      KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu · a2d56020
      Paul Mackerras 提交于
      Currently PR-style KVM keeps the volatile guest register values
      (R0 - R13, CR, LR, CTR, XER, PC) in a shadow_vcpu struct rather than
      the main kvm_vcpu struct.  For 64-bit, the shadow_vcpu exists in two
      places, a kmalloc'd struct and in the PACA, and it gets copied back
      and forth in kvmppc_core_vcpu_load/put(), because the real-mode code
      can't rely on being able to access the kmalloc'd struct.
      
      This changes the code to copy the volatile values into the shadow_vcpu
      as one of the last things done before entering the guest.  Similarly
      the values are copied back out of the shadow_vcpu to the kvm_vcpu
      immediately after exiting the guest.  We arrange for interrupts to be
      still disabled at this point so that we can't get preempted on 64-bit
      and end up copying values from the wrong PACA.
      
      This means that the accessor functions in kvm_book3s.h for these
      registers are greatly simplified, and are same between PR and HV KVM.
      In places where accesses to shadow_vcpu fields are now replaced by
      accesses to the kvm_vcpu, we can also remove the svcpu_get/put pairs.
      Finally, on 64-bit, we don't need the kmalloc'd struct at all any more.
      
      With this, the time to read the PVR one million times in a loop went
      from 567.7ms to 575.5ms (averages of 6 values), an increase of about
      1.4% for this worse-case test for guest entries and exits.  The
      standard deviation of the measurements is about 11ms, so the
      difference is only marginally significant statistically.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a2d56020
    • P
      KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7 · 388cc6e1
      Paul Mackerras 提交于
      This enables us to use the Processor Compatibility Register (PCR) on
      POWER7 to put the processor into architecture 2.05 compatibility mode
      when running a guest.  In this mode the new instructions and registers
      that were introduced on POWER7 are disabled in user mode.  This
      includes all the VSX facilities plus several other instructions such
      as ldbrx, stdbrx, popcntw, popcntd, etc.
      
      To select this mode, we have a new register accessible through the
      set/get_one_reg interface, called KVM_REG_PPC_ARCH_COMPAT.  Setting
      this to zero gives the full set of capabilities of the processor.
      Setting it to one of the "logical" PVR values defined in PAPR puts
      the vcpu into the compatibility mode for the corresponding
      architecture level.  The supported values are:
      
      0x0f000002	Architecture 2.05 (POWER6)
      0x0f000003	Architecture 2.06 (POWER7)
      0x0f100003	Architecture 2.06+ (POWER7+)
      
      Since the PCR is per-core, the architecture compatibility level and
      the corresponding PCR value are stored in the struct kvmppc_vcore, and
      are therefore shared between all vcpus in a virtual core.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: squash in fix to add missing break statements and documentation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      388cc6e1
    • P
      KVM: PPC: Book3S HV: Add support for guest Program Priority Register · 4b8473c9
      Paul Mackerras 提交于
      POWER7 and later IBM server processors have a register called the
      Program Priority Register (PPR), which controls the priority of
      each hardware CPU SMT thread, and affects how fast it runs compared
      to other SMT threads.  This priority can be controlled by writing to
      the PPR or by use of a set of instructions of the form or rN,rN,rN
      which are otherwise no-ops but have been defined to set the priority
      to particular levels.
      
      This adds code to context switch the PPR when entering and exiting
      guests and to make the PPR value accessible through the SET/GET_ONE_REG
      interface.  When entering the guest, we set the PPR as late as
      possible, because if we are setting a low thread priority it will
      make the code run slowly from that point on.  Similarly, the
      first-level interrupt handlers save the PPR value in the PACA very
      early on, and set the thread priority to the medium level, so that
      the interrupt handling code runs at a reasonable speed.
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4b8473c9
    • P
      KVM: PPC: Book3S HV: Store LPCR value for each virtual core · a0144e2a
      Paul Mackerras 提交于
      This adds the ability to have a separate LPCR (Logical Partitioning
      Control Register) value relating to a guest for each virtual core,
      rather than only having a single value for the whole VM.  This
      corresponds to what real POWER hardware does, where there is a LPCR
      per CPU thread but most of the fields are required to have the same
      value on all active threads in a core.
      
      The per-virtual-core LPCR can be read and written using the
      GET/SET_ONE_REG interface.  Userspace can can only modify the
      following fields of the LPCR value:
      
      DPFD	Default prefetch depth
      ILE	Interrupt little-endian
      TC	Translation control (secondary HPT hash group search disable)
      
      We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which
      contains bits relating to memory management, i.e. the Virtualized
      Partition Memory (VPM) bits and the bits relating to guest real mode.
      When this default value is updated, the update needs to be propagated
      to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do
      that.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix whitespace]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a0144e2a
    • P
      KVM: PPC: Book3S HV: Implement timebase offset for guests · 93b0f4dc
      Paul Mackerras 提交于
      This allows guests to have a different timebase origin from the host.
      This is needed for migration, where a guest can migrate from one host
      to another and the two hosts might have a different timebase origin.
      However, the timebase seen by the guest must not go backwards, and
      should go forwards only by a small amount corresponding to the time
      taken for the migration.
      
      Therefore this provides a new per-vcpu value accessed via the one_reg
      interface using the new KVM_REG_PPC_TB_OFFSET identifier.  This value
      defaults to 0 and is not modified by KVM.  On entering the guest, this
      value is added onto the timebase, and on exiting the guest, it is
      subtracted from the timebase.
      
      This is only supported for recent POWER hardware which has the TBU40
      (timebase upper 40 bits) register.  Writing to the TBU40 register only
      alters the upper 40 bits of the timebase, leaving the lower 24 bits
      unchanged.  This provides a way to modify the timebase for guest
      migration without disturbing the synchronization of the timebase
      registers across CPU cores.  The kernel rounds up the value given
      to a multiple of 2^24.
      
      Timebase values stored in KVM structures (struct kvm_vcpu, struct
      kvmppc_vcore, etc.) are stored as host timebase values.  The timebase
      values in the dispatch trace log need to be guest timebase values,
      however, since that is read directly by the guest.  This moves the
      setting of vcpu->arch.dec_expires on guest exit to a point after we
      have restored the host timebase so that vcpu->arch.dec_expires is a
      host timebase value.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      93b0f4dc
    • P
      KVM: PPC: Book3S HV: Save/restore SIAR and SDAR along with other PMU registers · 14941789
      Paul Mackerras 提交于
      Currently we are not saving and restoring the SIAR and SDAR registers in
      the PMU (performance monitor unit) on guest entry and exit.  The result
      is that performance monitoring tools in the guest could get false
      information about where a program was executing and what data it was
      accessing at the time of a performance monitor interrupt.  This fixes
      it by saving and restoring these registers along with the other PMU
      registers on guest entry/exit.
      
      This also provides a way for userspace to access these values for a
      vcpu via the one_reg interface.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      14941789
  2. 14 10月, 2013 1 次提交
  3. 05 9月, 2013 1 次提交
  4. 28 8月, 2013 1 次提交
    • P
      KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls · 8b23de29
      Paul Mackerras 提交于
      It turns out that if we exit the guest due to a hcall instruction (sc 1),
      and the loading of the instruction in the guest exit path fails for any
      reason, the call to kvmppc_ld() in kvmppc_get_last_inst() fetches the
      instruction after the hcall instruction rather than the hcall itself.
      This in turn means that the instruction doesn't get recognized as an
      hcall in kvmppc_handle_exit_pr() but gets passed to the guest kernel
      as a sc instruction.  That usually results in the guest kernel getting
      a return code of 38 (ENOSYS) from an hcall, which often triggers a
      BUG_ON() or other failure.
      
      This fixes the problem by adding a new variant of kvmppc_get_last_inst()
      called kvmppc_get_last_sc(), which fetches the instruction if necessary
      from pc - 4 rather than pc.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8b23de29
  5. 27 8月, 2013 3 次提交
  6. 24 8月, 2013 2 次提交
  7. 21 8月, 2013 3 次提交
    • S
      of: move of_get_cpu_node implementation to DT core library · 183912d3
      Sudeep KarkadaNagesha 提交于
      This patch moves the generalized implementation of of_get_cpu_node from
      PowerPC to DT core library, thereby adding support for retrieving cpu
      node for a given logical cpu index on any architecture.
      
      The CPU subsystem can now use this function to assign of_node in the
      cpu device while registering CPUs.
      
      It is recommended to use these helper function only in pre-SMP/early
      initialisation stages to retrieve CPU device node pointers in logical
      ordering. Once the cpu devices are registered, it can be retrieved easily
      from cpu device of_node which avoids unnecessary parsing and matching.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Acked-by: NRob Herring <rob.herring@calxeda.com>
      Signed-off-by: NSudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
      183912d3
    • S
      powerpc: Convert some mftb/mftbu into mfspr · beb2dc0a
      Scott Wood 提交于
      Some CPUs (such as e500v1/v2) don't implement mftb and will take a
      trap.  mfspr should work on everything that has a timebase, and is the
      preferred instruction according to ISA v2.06.
      
      Currently we get away with mftb on 85xx because the assembler converts
      it to mfspr due to -Wa,-me500.  However, that flag has other effects
      that are undesireable for certain targets (e.g.  lwsync is converted to
      sync), and is hostile to multiplatform kernels.  Thus we would like to
      stop setting it for all e500-family builds.
      
      mftb/mftbu instances which are in 85xx code or common code are
      converted.  Instances which will never run on 85xx are left alone.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      beb2dc0a
    • S
      powerpc/fsl-booke: Work around erratum A-006958 · d52459ca
      Scott Wood 提交于
      Erratum A-006598 says that 64-bit mftb is not atomic -- it's subject
      to a similar race condition as doing mftbu/mftbl on 32-bit.  The lower
      half of timebase is updated before the upper half; thus, we can share
      the workaround for a similar bug on Cell.  This workaround involves
      looping if the lower half of timebase is zero, thus avoiding the need
      for a scratch register (other than CR0).  This workaround must be
      avoided when the timebase is frozen, such as during the timebase sync
      code.
      
      This deals with kernel and vdso accesses, but other userspace accesses
      will of course need to be fixed elsewhere.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      d52459ca
  8. 14 8月, 2013 22 次提交