1. 07 4月, 2014 1 次提交
    • V
      cpufreq: powernv: cpufreq driver for powernv platform · b3d627a5
      Vaidyanathan Srinivasan 提交于
      Backend driver to dynamically set voltage and frequency on
      IBM POWER non-virtualized platforms.  Power management SPRs
      are used to set the required PState.
      
      This driver works in conjunction with cpufreq governors
      like 'ondemand' to provide a demand based frequency and
      voltage setting on IBM POWER non-virtualized platforms.
      
      PState table is obtained from OPAL v3 firmware through device
      tree.
      
      powernv_cpufreq back-end driver would parse the relevant device-tree
      nodes and initialise the cpufreq subsystem on powernv platform.
      
      The code was originally written by svaidy@linux.vnet.ibm.com. Over
      time it was modified to accomodate bug-fixes as well as updates to the
      the cpu-freq core. Relevant portions of the change logs corresponding
      to those modifications are noted below:
      
       * The policy->cpus needs to be populated in a hotplug-invariant
         manner instead of using cpu_sibling_mask() which varies with
         cpu-hotplug. This is because the cpufreq core code copies this
         content into policy->related_cpus mask which should not vary on
         cpu-hotplug. [Authored by srivatsa.bhat@linux.vnet.ibm.com]
      
       * Create a helper routine that can return the cpu-frequency for the
         corresponding pstate_id. Also, cache the values of the pstate_max,
         pstate_min and pstate_nominal and nr_pstates in a static structure
         so that they can be reused in the future to perform any
         validations. [Authored by ego@linux.vnet.ibm.com]
      
       * Create a driver attribute named cpuinfo_nominal_freq which creates
         a sysfs read-only file named cpuinfo_nominal_freq. Export the
         frequency corresponding to the nominal_pstate through this
         interface.
      
           Nominal frequency is the highest non-turbo frequency for the
         platform.  This is generally used for setting governor policies
         from user space for optimal energy efficiency. [Authored by
         ego@linux.vnet.ibm.com]
      
       * Implement a powernv_cpufreq_get(unsigned int cpu) method which will
         return the current operating frequency. Export this via the sysfs
         interface cpuinfo_cur_freq by setting powernv_cpufreq_driver.get to
         powernv_cpufreq_get(). [Authored by ego@linux.vnet.ibm.com]
      
      [Change log updated by ego@linux.vnet.ibm.com]
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      b3d627a5
  2. 29 3月, 2014 1 次提交
  3. 24 3月, 2014 2 次提交
  4. 20 3月, 2014 2 次提交
    • S
      powerpc/booke64: Use SPRG7 for VDSO · 9d378dfa
      Scott Wood 提交于
      Previously SPRG3 was marked for use by both VDSO and critical
      interrupts (though critical interrupts were not fully implemented).
      
      In commit 8b64a9df ("powerpc/booke64:
      Use SPRG0/3 scratch for bolted TLB miss & crit int"), Mihai Caraman
      made an attempt to resolve this conflict by restoring the VDSO value
      early in the critical interrupt, but this has some issues:
      
       - It's incompatible with EXCEPTION_COMMON which restores r13 from the
         by-then-overwritten scratch (this cost me some debugging time).
       - It forces critical exceptions to be a special case handled
         differently from even machine check and debug level exceptions.
       - It didn't occur to me that it was possible to make this work at all
         (by doing a final "ld r13, PACA_EXCRIT+EX_R13(r13)") until after
         I made (most of) this patch. :-)
      
      It might be worth investigating using a load rather than SPRG on return
      from all exceptions (except TLB misses where the scratch never leaves
      the SPRG) -- it could save a few cycles.  Until then, let's stick with
      SPRG for all exceptions.
      
      Since we cannot use SPRG4-7 for scratch without corrupting the state of
      a KVM guest, move VDSO to SPRG7 on book3e.  Since neither SPRG4-7 nor
      critical interrupts exist on book3s, SPRG3 is still used for VDSO
      there.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Cc: Mihai Caraman <mihai.caraman@freescale.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: kvm-ppc@vger.kernel.org
      9d378dfa
    • W
  5. 27 1月, 2014 5 次提交
    • P
      KVM: PPC: Book3S HV: Add support for DABRX register on POWER7 · 8563bf52
      Paul Mackerras 提交于
      The DABRX (DABR extension) register on POWER7 processors provides finer
      control over which accesses cause a data breakpoint interrupt.  It
      contains 3 bits which indicate whether to enable accesses in user,
      kernel and hypervisor modes respectively to cause data breakpoint
      interrupts, plus one bit that enables both real mode and virtual mode
      accesses to cause interrupts.  Currently, KVM sets DABRX to allow
      both kernel and user accesses to cause interrupts while in the guest.
      
      This adds support for the guest to specify other values for DABRX.
      PAPR defines a H_SET_XDABR hcall to allow the guest to set both DABR
      and DABRX with one call.  This adds a real-mode implementation of
      H_SET_XDABR, which shares most of its code with the existing H_SET_DABR
      implementation.  To support this, we add a per-vcpu field to store the
      DABRX value plus code to get and set it via the ONE_REG interface.
      
      For Linux guests to use this new hcall, userspace needs to add
      "hcall-xdabr" to the set of strings in the /chosen/hypertas-functions
      property in the device tree.  If userspace does this and then migrates
      the guest to a host where the kernel doesn't include this patch, then
      userspace will need to implement H_SET_XDABR by writing the specified
      DABR value to the DABR using the ONE_REG interface.  In that case, the
      old kernel will set DABRX to DABRX_USER | DABRX_KERNEL.  That should
      still work correctly, at least for Linux guests, since Linux guests
      cope with getting data breakpoint interrupts in modes that weren't
      requested by just ignoring the interrupt, and Linux guests never set
      DABRX_BTI.
      
      The other thing this does is to make H_SET_DABR and H_SET_XDABR work
      on POWER8, which has the DAWR and DAWRX instead of DABR/X.  Guests that
      know about POWER8 should use H_SET_MODE rather than H_SET_[X]DABR, but
      guests running in POWER7 compatibility mode will still use H_SET_[X]DABR.
      For them, this adds the logic to convert DABR/X values into DAWR/X values
      on POWER8.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8563bf52
    • P
      KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8 · e0622bd9
      Paul Mackerras 提交于
      POWER8 has a bit in the LPCR to enable or disable the PURR and SPURR
      registers to count when in the guest.  Set this bit.
      
      POWER8 has a field in the LPCR called AIL (Alternate Interrupt Location)
      which is used to enable relocation-on interrupts.  Allow userspace to
      set this field.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e0622bd9
    • P
      KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs · aa31e843
      Paul Mackerras 提交于
      * SRR1 wake reason field for system reset interrupt on wakeup from nap
        is now a 4-bit field on P8, compared to 3 bits on P7.
      
      * Set PECEDP in LPCR when napping because of H_CEDE so guest doorbells
        will wake us up.
      
      * Waking up from nap because of a guest doorbell interrupt is not a
        reason to exit the guest.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      aa31e843
    • P
      KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8 · 5557ae0e
      Paul Mackerras 提交于
      This allows us to select architecture 2.05 (POWER6) or 2.06 (POWER7)
      compatibility modes on a POWER8 processor.  (Note that transactional
      memory is disabled for usermode if either or both of the PCR_TM_DIS
      and PCR_ARCH_206 bits are set.)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5557ae0e
    • M
      KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs · b005255e
      Michael Neuling 提交于
      This adds fields to the struct kvm_vcpu_arch to store the new
      guest-accessible SPRs on POWER8, adds code to the get/set_one_reg
      functions to allow userspace to access this state, and adds code to
      the guest entry and exit to context-switch these SPRs between host
      and guest.
      
      Note that DPDES (Directed Privileged Doorbell Exception State) is
      shared between threads on a core; hence we store it in struct
      kvmppc_vcore and have the master thread save and restore it.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b005255e
  6. 08 1月, 2014 1 次提交
  7. 23 11月, 2013 1 次提交
  8. 17 10月, 2013 3 次提交
    • P
      KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7 · 388cc6e1
      Paul Mackerras 提交于
      This enables us to use the Processor Compatibility Register (PCR) on
      POWER7 to put the processor into architecture 2.05 compatibility mode
      when running a guest.  In this mode the new instructions and registers
      that were introduced on POWER7 are disabled in user mode.  This
      includes all the VSX facilities plus several other instructions such
      as ldbrx, stdbrx, popcntw, popcntd, etc.
      
      To select this mode, we have a new register accessible through the
      set/get_one_reg interface, called KVM_REG_PPC_ARCH_COMPAT.  Setting
      this to zero gives the full set of capabilities of the processor.
      Setting it to one of the "logical" PVR values defined in PAPR puts
      the vcpu into the compatibility mode for the corresponding
      architecture level.  The supported values are:
      
      0x0f000002	Architecture 2.05 (POWER6)
      0x0f000003	Architecture 2.06 (POWER7)
      0x0f100003	Architecture 2.06+ (POWER7+)
      
      Since the PCR is per-core, the architecture compatibility level and
      the corresponding PCR value are stored in the struct kvmppc_vcore, and
      are therefore shared between all vcpus in a virtual core.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: squash in fix to add missing break statements and documentation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      388cc6e1
    • P
      KVM: PPC: Book3S HV: Store LPCR value for each virtual core · a0144e2a
      Paul Mackerras 提交于
      This adds the ability to have a separate LPCR (Logical Partitioning
      Control Register) value relating to a guest for each virtual core,
      rather than only having a single value for the whole VM.  This
      corresponds to what real POWER hardware does, where there is a LPCR
      per CPU thread but most of the fields are required to have the same
      value on all active threads in a core.
      
      The per-virtual-core LPCR can be read and written using the
      GET/SET_ONE_REG interface.  Userspace can can only modify the
      following fields of the LPCR value:
      
      DPFD	Default prefetch depth
      ILE	Interrupt little-endian
      TC	Translation control (secondary HPT hash group search disable)
      
      We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which
      contains bits relating to memory management, i.e. the Virtualized
      Partition Memory (VPM) bits and the bits relating to guest real mode.
      When this default value is updated, the update needs to be propagated
      to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do
      that.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix whitespace]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a0144e2a
    • P
      KVM: PPC: Book3S HV: Implement timebase offset for guests · 93b0f4dc
      Paul Mackerras 提交于
      This allows guests to have a different timebase origin from the host.
      This is needed for migration, where a guest can migrate from one host
      to another and the two hosts might have a different timebase origin.
      However, the timebase seen by the guest must not go backwards, and
      should go forwards only by a small amount corresponding to the time
      taken for the migration.
      
      Therefore this provides a new per-vcpu value accessed via the one_reg
      interface using the new KVM_REG_PPC_TB_OFFSET identifier.  This value
      defaults to 0 and is not modified by KVM.  On entering the guest, this
      value is added onto the timebase, and on exiting the guest, it is
      subtracted from the timebase.
      
      This is only supported for recent POWER hardware which has the TBU40
      (timebase upper 40 bits) register.  Writing to the TBU40 register only
      alters the upper 40 bits of the timebase, leaving the lower 24 bits
      unchanged.  This provides a way to modify the timebase for guest
      migration without disturbing the synchronization of the timebase
      registers across CPU cores.  The kernel rounds up the value given
      to a multiple of 2^24.
      
      Timebase values stored in KVM structures (struct kvm_vcpu, struct
      kvmppc_vcore, etc.) are stored as host timebase values.  The timebase
      values in the dispatch trace log need to be guest timebase values,
      however, since that is read directly by the guest.  This moves the
      setting of vcpu->arch.dec_expires on guest exit to a point after we
      have restored the host timebase so that vcpu->arch.dec_expires is a
      host timebase value.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      93b0f4dc
  9. 11 10月, 2013 1 次提交
  10. 05 9月, 2013 1 次提交
  11. 21 8月, 2013 2 次提交
    • S
      powerpc: Convert some mftb/mftbu into mfspr · beb2dc0a
      Scott Wood 提交于
      Some CPUs (such as e500v1/v2) don't implement mftb and will take a
      trap.  mfspr should work on everything that has a timebase, and is the
      preferred instruction according to ISA v2.06.
      
      Currently we get away with mftb on 85xx because the assembler converts
      it to mfspr due to -Wa,-me500.  However, that flag has other effects
      that are undesireable for certain targets (e.g.  lwsync is converted to
      sync), and is hostile to multiplatform kernels.  Thus we would like to
      stop setting it for all e500-family builds.
      
      mftb/mftbu instances which are in 85xx code or common code are
      converted.  Instances which will never run on 85xx are left alone.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      beb2dc0a
    • S
      powerpc/fsl-booke: Work around erratum A-006958 · d52459ca
      Scott Wood 提交于
      Erratum A-006598 says that 64-bit mftb is not atomic -- it's subject
      to a similar race condition as doing mftbu/mftbl on 32-bit.  The lower
      half of timebase is updated before the upper half; thus, we can share
      the workaround for a similar bug on Cell.  This workaround involves
      looping if the lower half of timebase is zero, thus avoiding the need
      for a scratch register (other than CR0).  This workaround must be
      avoided when the timebase is frozen, such as during the timebase sync
      code.
      
      This deals with kernel and vdso accesses, but other userspace accesses
      will of course need to be fixed elsewhere.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      d52459ca
  12. 14 8月, 2013 1 次提交
  13. 09 8月, 2013 1 次提交
  14. 24 7月, 2013 1 次提交
  15. 01 7月, 2013 2 次提交
  16. 01 6月, 2013 3 次提交
  17. 02 5月, 2013 3 次提交
  18. 27 4月, 2013 1 次提交
    • P
      KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts · 4619ac88
      Paul Mackerras 提交于
      This streamlines our handling of external interrupts that come in
      while we're in the guest.  First, when waking up a hardware thread
      that was napping, we split off the "napping due to H_CEDE" case
      earlier, and use the code that handles an external interrupt (0x500)
      in the guest to handle that too.  Secondly, the code that handles
      those external interrupts now checks if any other thread is exiting
      to the host before bouncing an external interrupt to the guest, and
      also checks that there is actually an external interrupt pending for
      the guest before setting the LPCR MER bit (mediated external request).
      
      This also makes sure that we clear the "ceded" flag when we handle a
      wakeup from cede in real mode, and fixes a potential infinite loop
      in kvmppc_run_vcpu() which can occur if we ever end up with the ceded
      flag set but MSR[EE] off.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4619ac88
  19. 26 4月, 2013 2 次提交
  20. 18 4月, 2013 1 次提交
  21. 05 3月, 2013 1 次提交
  22. 15 2月, 2013 3 次提交
  23. 13 2月, 2013 1 次提交