1. 11 4月, 2017 1 次提交
    • G
      powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1 · 17ed4c8f
      Gautham R. Shenoy 提交于
      POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up
      from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus
      the HSPRG0 of a thread waking up from can contain the paca pointer of
      its sibling.
      
      This patch implements a context recovery framework within threads of a
      core, by provisioning space in paca_struct for saving every sibling
      threads's paca pointers. Basically, we should be able to arrive at the
      right paca pointer from any of the thread's existing paca pointer.
      
      At bootup, during powernv idle-init, we save the paca address of every
      CPU in each one its siblings paca_struct in the slot corresponding to
      this CPU's index in the core.
      
      On wakeup from a stop, the thread will determine its index in the core
      from the TIR register and recover its PACA pointer by indexing into
      the correct slot in the provisioned space in the current PACA.
      
      Furthermore, ensure that the NVGPRs are restored from the stack on the
      way out by setting the NAPSTATELOST in paca.
      
      [Changelog written with inputs from svaidy@linux.vnet.ibm.com]
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Call it a bug]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      17ed4c8f
  2. 01 4月, 2017 1 次提交
  3. 15 2月, 2017 2 次提交
  4. 06 2月, 2017 2 次提交
  5. 31 1月, 2017 1 次提交
  6. 24 1月, 2017 1 次提交
    • M
      powerpc: Revert the initial stack protector support · f2574030
      Michael Ellerman 提交于
      Unfortunately the stack protector support we merged recently only works
      on some toolchains. If the toolchain is built without glibc support
      everything works fine, but if glibc is built then it leads to a panic
      at boot.
      
      The solution is not rc5 material, so revert the support for now. This
      reverts commits:
      
      6533b7c1 ("powerpc: Initial stack protector (-fstack-protector) support")
      902e06eb ("powerpc/32: Change the stack protector canary value per task")
      
      Fixes: 6533b7c1 ("powerpc: Initial stack protector (-fstack-protector) support")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f2574030
  7. 14 1月, 2017 1 次提交
  8. 24 11月, 2016 2 次提交
    • P
      KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9 · 7c5b06ca
      Paul Mackerras 提交于
      POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
      and tlbiel (local tlbie) instructions.  Both instructions get a
      set of new parameters (RIC, PRS and R) which appear as bits in the
      instruction word.  The tlbiel instruction now has a second register
      operand, which contains a PID and/or LPID value if needed, and
      should otherwise contain 0.
      
      This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
      as well as older processors.  Since we only handle HPT guests so
      far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
      word as on previous processors, so we don't need to conditionally
      execute different instructions depending on the processor.
      
      The local flush on first entry to a guest in book3s_hv_rmhandlers.S
      is a loop which depends on the number of TLB sets.  Rather than
      using feature sections to set the number of iterations based on
      which CPU we're on, we now work out this number at VM creation time
      and store it in the kvm_arch struct.  That will make it possible to
      get the number from the device tree in future, which will help with
      compatibility with future processors.
      
      Since mmu_partition_table_set_entry() does a global flush of the
      whole LPID, we don't need to do the TLB flush on first entry to the
      guest on each processor.  Therefore we don't set all bits in the
      tlb_need_flush bitmap on VM startup on POWER9.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7c5b06ca
    • P
      KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs · e9cf1e08
      Paul Mackerras 提交于
      This adds code to handle two new guest-accessible special-purpose
      registers on POWER9: TIDR (thread ID register) and PSSCR (processor
      stop status and control register).  They are context-switched
      between host and guest, and the guest values can be read and set
      via the one_reg interface.
      
      The PSSCR contains some fields which are guest-accessible and some
      which are only accessible in hypervisor mode.  We only allow the
      guest-accessible fields to be read or set by userspace.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e9cf1e08
  9. 23 11月, 2016 1 次提交
    • C
      powerpc/32: Change the stack protector canary value per task · 902e06eb
      Christophe Leroy 提交于
      Partially copied from commit df0698be ("ARM: stack protector:
      change the canary value per task")
      
      A new random value for the canary is stored in the task struct whenever
      a new task is forked.  This is meant to allow for different canary values
      per task.  On powerpc, GCC expects the canary value to be found in a global
      variable called __stack_chk_guard.  So this variable has to be updated
      with the value stored in the task struct whenever a task switch occurs.
      
      Because the variable GCC expects is global, this cannot work on SMP
      unfortunately.  So, on SMP, the same initial canary value is kept
      throughout, making this feature a bit less effective although it is still
      useful.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      902e06eb
  10. 21 11月, 2016 1 次提交
    • P
      KVM: PPC: Book3S HV: Save/restore XER in checkpointed register state · 0d808df0
      Paul Mackerras 提交于
      When switching from/to a guest that has a transaction in progress,
      we need to save/restore the checkpointed register state.  Although
      XER is part of the CPU state that gets checkpointed, the code that
      does this saving and restoring doesn't save/restore XER.
      
      This fixes it by saving and restoring the XER.  To allow userspace
      to read/write the checkpointed XER value, we also add a new ONE_REG
      specifier.
      
      The visible effect of this bug is that the guest may see its XER
      value being corrupted when it uses transactions.
      
      Fixes: e4e38121 ("KVM: PPC: Book3S HV: Add transactional memory support")
      Fixes: 0a8eccef ("KVM: PPC: Book3S HV: Add missing code for transaction reclaim on guest exit")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      0d808df0
  11. 04 10月, 2016 1 次提交
  12. 27 9月, 2016 1 次提交
    • P
      KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread · 88b02cf9
      Paul Mackerras 提交于
      POWER8 has one virtual timebase (VTB) register per subcore, not one
      per CPU thread.  The HV KVM code currently treats VTB as a per-thread
      register, which can lead to spurious soft lockup messages from guests
      which use the VTB as the time source for the soft lockup detector.
      (CPUs before POWER8 did not have the VTB register.)
      
      For HV KVM, this fixes the problem by making only the primary thread
      in each virtual core save and restore the VTB value.  With this,
      the VTB state becomes part of the kvmppc_vcore structure.  This
      also means that "piggybacking" of multiple virtual cores onto one
      subcore is not possible on POWER8, because then the virtual cores
      would share a single VTB register.
      
      PR KVM emulates a VTB register, which is per-vcpu because PR KVM
      has no notion of CPU threads or SMT.  For PR KVM we move the VTB
      state into the kvmppc_vcpu_book3s struct.
      
      Cc: stable@vger.kernel.org # v3.14+
      Reported-by: NThomas Huth <thuth@redhat.com>
      Tested-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      88b02cf9
  13. 09 7月, 2016 3 次提交
    • S
      powerpc/8xx: Force VIRT_IMMR_BASE to be a positive number · 9f595fd8
      Scott Wood 提交于
      The asm-offsets mechanism generates signed numbers, even if the
      input value is explicitly unsigned.  This causes a problem with
      older binutils (e.g. 2.23), which sign-extend a negative number
      when @h is applied.  Thus, this instruction:
      
      	cmpli   cr0, r11, VIRT_IMMR_BASE@h
      
      resulted in this:
      
      Error: operand out of range (0xfffffff0 is not between 0x00000000 and
      0x0000ffff)
      
      By casting to a larger type, we can force the output to be expressed
      as a positive number.
      Signed-off-by: NScott Wood <oss@buserror.net>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      9f595fd8
    • C
      powerpc/8xx: Fix vaddr for IMMR early remap · f86ef74e
      Christophe Leroy 提交于
      Memory: 124428K/131072K available (3748K kernel code, 188K rwdata,
      648K rodata, 508K init, 290K bss, 6644K reserved)
      Kernel virtual memory layout:
        * 0xfffdf000..0xfffff000  : fixmap
        * 0xfde00000..0xfe000000  : consistent mem
        * 0xfddf6000..0xfde00000  : early ioremap
        * 0xc9000000..0xfddf6000  : vmalloc & ioremap
      SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
      
      Today, IMMR is mapped 1:1 at startup
      
      Mapping IMMR 1:1 is just wrong because it may overlap with another
      area. On most mpc8xx boards it is OK as IMMR is set to 0xff000000
      but for instance on EP88xC board, IMMR is at 0xfa200000 which
      overlaps with VM ioremap area
      
      This patch fixes the virtual address for remapping IMMR with the fixmap
      regardless of the value of IMMR.
      
      The size of IMMR area is 256kbytes (CPM at offset 0, security engine
      at offset 128k) so a 512k page is enough
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NScott Wood <oss@buserror.net>
      f86ef74e
    • C
      powerpc32: provide VIRT_CPU_ACCOUNTING · c223c903
      Christophe Leroy 提交于
      This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
      PPC32 doesn't have the PACA structure, so we use the task_info
      structure to store the accounting data.
      
      In order to reuse on PPC32 the PPC64 functions, all u64 data has
      been replaced by 'unsigned long' so that it is u32 on PPC32 and
      u64 on PPC64
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NScott Wood <oss@buserror.net>
      c223c903
  14. 16 6月, 2016 1 次提交
    • R
      powerpc/asm: Remove unused symbols in asm-offsets.c · aac6a91f
      Rashmica Gupta 提交于
      THREAD_DSCR:
        Added in efcac658 "powerpc: Per process DSCR + some fixes (try#4)"
        Last usage removed in 152d523e "powerpc: Create context switch helpers save_sprs() and restore_sprs()"
      
      THREAD_DSCR_INHERIT:
        Added in 71433285 "powerpc: Restore correct DSCR in context switch"
        Last usage removed in 152d523e "powerpc: Create context switch helpers save_sprs() and restore_sprs()"
      
      THREAD_TAR:
        Added in 2468dcf6 "powerpc: Add support for context switching the TAR register"
        Last usage removed in 152d523e "powerpc: Create context switch helpers save_sprs() and restore_sprs()"
      
      THREAD_BESCR, THREAD_EBBHR and THREAD_EBBRR:
        Added in 9353374b "powerpc: Context switch the new EBB SPRs"
        Last usage removed in 152d523e "powerpc: Create context switch helpers save_sprs() and restore_sprs()"
      
      THREAD_SIAR, THREAD_SDAR, THREAD_SIER, THREAD_MMCR0, and THREAD_MMCR2:
        Added in 59affcd3 "powerpc: Context switch more PMU related SPRs"
        Last usage removed in b11ae951 "powerpc: Partial revert of "Context switch more PMU related SPRs""
      
      PACA_LOCK_TOKEN:
        Added in 9e368f29 "KVM: PPC: book3s_hv: Add support for PPC970-family processors"
        Last usage removed in c17b98cf "KVM: PPC: Book3S HV: Remove code for PPC970 processors"
      
      HCALL_STAT_SIZE, HCALL_STAT_CALLS, HCALL_STAT_TB and HCALL_STAT_PURR:
        Added in 57852a85 "[POWERPC] powerpc: Instrument Hypervisor Calls"
        Last usage removed in c8cd093a "powerpc: tracing: Add hypervisor call tracepoints"
      
      VCPU_EPLC:
        Added in d30f6e48 "KVM: PPC: booke: category E.HV (GS-mode) support"
        Never used.
      
      CPU_DOWN_FLUSH:
        Added in e7affb1d "powerpc/cache: add cache flush operation for various e500"
        Never used.
      
      CFG_STAMP_XSEC:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Last usage removed in 0e469db8 "powerpc: Rework VDSO gettimeofday to prevent time going backwards"
      
      KVM_LPCR:
        Added in aa04b4cc "KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests"
        Last usage removed in a0144e2a "KVM: PPC: Book3S HV: Store LPCR value for each virtual core"
      
      GPR15, GPR16, GPR17, GPR18, GPR19, GPR20, GPR21, GPR22, GPR23, GPR24,
      GPR25, GPR26, GPR27, GPR28, GPR29, GPR30 and GPR31:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Never used.
      
      VCPU_SHADOW_FSCR:
        Added in 616dff86 "KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR"
        Never used.
      
      VCPU_SHADOW_SRR1:
        Added in a2d56020 "KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu"
        Never used.
      
      KVM_SPLIT_SIZE:
        Added in b4deba5c "KVM: PPC: Book3S HV: Implement dynamicmicro-threading on POWER8"
        Never used.
      
      VCPU_VCPUID:
        Added in de56a948 "KVM: PPC: Add support for Book3S processors in hypervisor mode"
        Last usage removed 1b400ba0 "KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations"
      
      _MQ:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Never used.
      
      AUDITCONTEXT:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Last usage removed in 401d1f02 "[PATCH] syscall entry/exit revamp"
      
      CLONE_VM:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Currently unused.
      
      CLONE_UNTRACED:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Currently unused.
      Signed-off-by: NRashmica Gupta <rashmicy@gmail.com>
      [mpe: Munge change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      aac6a91f
  15. 01 5月, 2016 1 次提交
    • A
      powerpc/mm: Make page table size a variable · dd1842a2
      Aneesh Kumar K.V 提交于
      Radix and hash MMU models support different page table sizes. Make
      the #defines a variable so that existing code can work with variable
      sizes.
      
      Slice related code is only used by hash, so use hash constants there. We
      will replicate some of the boundary conditions with resepct to TASK_SIZE
      using radix values too. Right now we do boundary condition check using
      hash constants.
      
      Swapper pgdir size is initialized in asm code. We select the max pgd
      size to keep it simple. For now we select hash pgdir. When adding radix
      we will switch that to radix pgdir which is 64K.
      
      BUILD_BUG_ON check which is removed is already done in hugepage_init()
      using MAYBE_BUILD_BUG_ON().
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      dd1842a2
  16. 14 4月, 2016 1 次提交
    • M
      powerpc/livepatch: Add live patching support on ppc64le · 85baa095
      Michael Ellerman 提交于
      Add the kconfig logic & assembly support for handling live patched
      functions. This depends on DYNAMIC_FTRACE_WITH_REGS, which in turn
      depends on the new -mprofile-kernel ftrace ABI, which is only supported
      currently on ppc64le.
      
      Live patching is handled by a special ftrace handler. This means it runs
      from ftrace_caller(). The live patch handler modifies the NIP so as to
      redirect the return from ftrace_caller() to the new patched function.
      
      However there is one particularly tricky case we need to handle.
      
      If a function A calls another function B, and it is known at link time
      that they share the same TOC, then A will not save or restore its TOC,
      and will call the local entry point of B.
      
      When we live patch B, we replace it with a new function C, which may
      not have the same TOC as A. At live patch time it's too late to modify A
      to do the TOC save/restore, so the live patching code must interpose
      itself between A and C, and do the TOC save/restore that A omitted.
      
      An additionaly complication is that the livepatch code can not create a
      stack frame in order to save the TOC. That is because if C takes > 8
      arguments, or is varargs, A will have written the arguments for C in
      A's stack frame.
      
      To solve this, we introduce a "livepatch stack" which grows upward from
      the base of the regular stack, and is used to store the TOC & LR when
      calling a live patched function.
      
      When the patched function returns, we retrieve the real LR & TOC from
      the livepatch stack, restore them, and pop the livepatch "stack frame".
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      85baa095
  17. 05 3月, 2016 1 次提交
  18. 02 3月, 2016 1 次提交
    • C
      powerpc: Restore FPU/VEC/VSX if previously used · 70fe3d98
      Cyril Bur 提交于
      Currently the FPU, VEC and VSX facilities are lazily loaded. This is not
      a problem unless a process is using these facilities.
      
      Modern versions of GCC are very good at automatically vectorising code,
      new and modernised workloads make use of floating point and vector
      facilities, even the kernel makes use of vectorised memcpy.
      
      All this combined greatly increases the cost of a syscall since the
      kernel uses the facilities sometimes even in syscall fast-path making it
      increasingly common for a thread to take an *_unavailable exception soon
      after a syscall, not to mention potentially taking all three.
      
      The obvious overcompensation to this problem is to simply always load
      all the facilities on every exit to userspace. Loading up all FPU, VEC
      and VSX registers every time can be expensive and if a workload does
      avoid using them, it should not be forced to incur this penalty.
      
      An 8bit counter is used to detect if the registers have been used in the
      past and the registers are always loaded until the value wraps to back
      to zero.
      
      Several versions of the assembly in entry_64.S were tested:
      
        1. Always calling C.
        2. Performing a common case check and then calling C.
        3. A complex check in asm.
      
      After some benchmarking it was determined that avoiding C in the common
      case is a performance benefit (option 2). The full check in asm (option
      3) greatly complicated that codepath for a negligible performance gain
      and the trade-off was deemed not worth it.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      [mpe: Move load_vec in the struct to fill an existing hole, reword change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      
      fixup
      70fe3d98
  19. 27 12月, 2015 1 次提交
  20. 19 12月, 2015 1 次提交
  21. 22 8月, 2015 2 次提交
    • P
      KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8 · b4deba5c
      Paul Mackerras 提交于
      This builds on the ability to run more than one vcore on a physical
      core by using the micro-threading (split-core) modes of the POWER8
      chip.  Previously, only vcores from the same VM could be run together,
      and (on POWER8) only if they had just one thread per core.  With the
      ability to split the core on guest entry and unsplit it on guest exit,
      we can run up to 8 vcpu threads from up to 4 different VMs, and we can
      run multiple vcores with 2 or 4 vcpus per vcore.
      
      Dynamic micro-threading is only available if the static configuration
      of the cores is whole-core mode (unsplit), and only on POWER8.
      
      To manage this, we introduce a new kvm_split_mode struct which is
      shared across all of the subcores in the core, with a pointer in the
      paca on each thread.  In addition we extend the core_info struct to
      have information on each subcore.  When deciding whether to add a
      vcore to the set already on the core, we now have two possibilities:
      (a) piggyback the vcore onto an existing subcore, or (b) start a new
      subcore.
      
      Currently, when any vcpu needs to exit the guest and switch to host
      virtual mode, we interrupt all the threads in all subcores and switch
      the core back to whole-core mode.  It may be possible in future to
      allow some of the subcores to keep executing in the guest while
      subcore 0 switches to the host, but that is not implemented in this
      patch.
      
      This adds a module parameter called dynamic_mt_modes which controls
      which micro-threading (split-core) modes the code will consider, as a
      bitmap.  In other words, if it is 0, no micro-threading mode is
      considered; if it is 2, only 2-way micro-threading is considered; if
      it is 4, only 4-way, and if it is 6, both 2-way and 4-way
      micro-threading mode will be considered.  The default is 6.
      
      With this, we now have secondary threads which are the primary thread
      for their subcore and therefore need to do the MMU switch.  These
      threads will need to be started even if they have no vcpu to run, so
      we use the vcore pointer in the PACA rather than the vcpu pointer to
      trigger them.
      
      It is now possible for thread 0 to find that an exit has been
      requested before it gets to switch the subcore state to the guest.  In
      that case we haven't added the guest's timebase offset to the
      timebase, so we need to be careful not to subtract the offset in the
      guest exit path.  In fact we just skip the whole path that switches
      back to host context, since we haven't switched to the guest context.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b4deba5c
    • P
      KVM: PPC: Book3S HV: Make use of unused threads when running guests · ec257165
      Paul Mackerras 提交于
      When running a virtual core of a guest that is configured with fewer
      threads per core than the physical cores have, the extra physical
      threads are currently unused.  This makes it possible to use them to
      run one or more other virtual cores from the same guest when certain
      conditions are met.  This applies on POWER7, and on POWER8 to guests
      with one thread per virtual core.  (It doesn't apply to POWER8 guests
      with multiple threads per vcore because they require a 1-1 virtual to
      physical thread mapping in order to be able to use msgsndp and the
      TIR.)
      
      The idea is that we maintain a list of preempted vcores for each
      physical cpu (i.e. each core, since the host runs single-threaded).
      Then, when a vcore is about to run, it checks to see if there are
      any vcores on the list for its physical cpu that could be
      piggybacked onto this vcore's execution.  If so, those additional
      vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
      threads are started as well as the original vcore, which is called
      the master vcore.
      
      After the vcores have exited the guest, the extra ones are put back
      onto the preempted list if any of their VCPUs are still runnable and
      not idle.
      
      This means that vcpu->arch.ptid is no longer necessarily the same as
      the physical thread that the vcpu runs on.  In order to make it easier
      for code that wants to send an IPI to know which CPU to target, we
      now store that in a new field in struct vcpu_arch, called thread_cpu.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Tested-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ec257165
  22. 18 8月, 2015 1 次提交
  23. 07 6月, 2015 1 次提交
  24. 21 4月, 2015 5 次提交
    • P
      KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 · 66feed61
      Paul Mackerras 提交于
      This uses msgsnd where possible for signalling other threads within
      the same core on POWER8 systems, rather than IPIs through the XICS
      interrupt controller.  This includes waking secondary threads to run
      the guest, the interrupts generated by the virtual XICS, and the
      interrupts to bring the other threads out of the guest when exiting.
      
      Aggregated statistics from debugfs across vcpus for a guest with 32
      vcpus, 8 threads/vcore, running on a POWER8, show this before the
      change:
      
       rm_entry:     3387.6ns (228 - 86600, 1008969 samples)
        rm_exit:     4561.5ns (12 - 3477452, 1009402 samples)
        rm_intr:     1660.0ns (12 - 553050, 3600051 samples)
      
      and this after the change:
      
       rm_entry:     3060.1ns (212 - 65138, 953873 samples)
        rm_exit:     4244.1ns (12 - 9693408, 954331 samples)
        rm_intr:     1342.3ns (12 - 1104718, 3405326 samples)
      
      for a test of booting Fedora 20 big-endian to the login prompt.
      
      The time taken for a H_PROD hcall (which is handled in the host
      kernel) went down from about 35 microseconds to about 16 microseconds
      with this change.
      
      The noinline added to kvmppc_run_core turned out to be necessary for
      good performance, at least with gcc 4.9.2 as packaged with Fedora 21
      and a little-endian POWER8 host.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      66feed61
    • P
      KVM: PPC: Book3S HV: Use bitmap of active threads rather than count · 7d6c40da
      Paul Mackerras 提交于
      Currently, the entry_exit_count field in the kvmppc_vcore struct
      contains two 8-bit counts, one of the threads that have started entering
      the guest, and one of the threads that have started exiting the guest.
      This changes it to an entry_exit_map field which contains two bitmaps
      of 8 bits each.  The advantage of doing this is that it gives us a
      bitmap of which threads need to be signalled when exiting the guest.
      That means that we no longer need to use the trick of setting the
      HDEC to 0 to pull the other threads out of the guest, which led in
      some cases to a spurious HDEC interrupt on the next guest entry.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7d6c40da
    • P
      KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken · 5d5b99cd
      Paul Mackerras 提交于
      We can tell when a secondary thread has finished running a guest by
      the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
      is no real need for the nap_count field in the kvmppc_vcore struct.
      This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
      pointers of the secondary threads rather than polling vc->nap_count.
      Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
      this also means that we can tell which secondary threads have got
      stuck and thus print a more informative error message.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5d5b99cd
    • P
      KVM: PPC: Book3S HV: Minor cleanups · 1f09c3ed
      Paul Mackerras 提交于
      * Remove unused kvmppc_vcore::n_busy field.
      * Remove setting of RMOR, since it was only used on PPC970 and the
        PPC970 KVM support has been removed.
      * Don't use r1 or r2 in setting the runlatch since they are
        conventionally reserved for other things; use r0 instead.
      * Streamline the code a little and remove the ext_interrupt_to_host
        label.
      * Add some comments about register usage.
      * hcall_try_real_mode doesn't need to be global, and can't be
        called from C code anyway.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1f09c3ed
    • P
      KVM: PPC: Book3S HV: Accumulate timing information for real-mode code · b6c295df
      Paul Mackerras 提交于
      This reads the timebase at various points in the real-mode guest
      entry/exit code and uses that to accumulate total, minimum and
      maximum time spent in those parts of the code.  Currently these
      times are accumulated per vcpu in 5 parts of the code:
      
      * rm_entry - time taken from the start of kvmppc_hv_entry() until
        just before entering the guest.
      * rm_intr - time from when we take a hypervisor interrupt in the
        guest until we either re-enter the guest or decide to exit to the
        host.  This includes time spent handling hcalls in real mode.
      * rm_exit - time from when we decide to exit the guest until the
        return from kvmppc_hv_entry().
      * guest - time spend in the guest
      * cede - time spent napping in real mode due to an H_CEDE hcall
        while other threads in the same vcore are active.
      
      These times are exposed in debugfs in a directory per vcpu that
      contains a file called "timings".  This file contains one line for
      each of the 5 timings above, with the name followed by a colon and
      4 numbers, which are the count (number of times the code has been
      executed), the total time, the minimum time, and the maximum time,
      all in nanoseconds.
      
      The overhead of the extra code amounts to about 30ns for an hcall that
      is handled in real mode (e.g. H_SET_DABR), which is about 25%.  Since
      production environments may not wish to incur this overhead, the new
      code is conditional on a new config symbol,
      CONFIG_KVM_BOOK3S_HV_EXIT_TIMING.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b6c295df
  25. 29 12月, 2014 1 次提交
  26. 17 12月, 2014 2 次提交
    • P
      KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register · 4a157d61
      Paul Mackerras 提交于
      There are two ways in which a guest instruction can be obtained from
      the guest in the guest exit code in book3s_hv_rmhandlers.S.  If the
      exit was caused by a Hypervisor Emulation interrupt (i.e. an illegal
      instruction), the offending instruction is in the HEIR register
      (Hypervisor Emulation Instruction Register).  If the exit was caused
      by a load or store to an emulated MMIO device, we load the instruction
      from the guest by turning data relocation on and loading the instruction
      with an lwz instruction.
      
      Unfortunately, in the case where the guest has opposite endianness to
      the host, these two methods give results of different endianness, but
      both get put into vcpu->arch.last_inst.  The HEIR value has been loaded
      using guest endianness, whereas the lwz will load the instruction using
      host endianness.  The rest of the code that uses vcpu->arch.last_inst
      assumes it was loaded using host endianness.
      
      To fix this, we define a new vcpu field to store the HEIR value.  Then,
      in kvmppc_handle_exit_hv(), we transfer the value from this new field to
      vcpu->arch.last_inst, doing a byte-swap if the guest and host endianness
      differ.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4a157d61
    • P
      KVM: PPC: Book3S HV: Remove code for PPC970 processors · c17b98cf
      Paul Mackerras 提交于
      This removes the code that was added to enable HV KVM to work
      on PPC970 processors.  The PPC970 is an old CPU that doesn't
      support virtualizing guest memory.  Removing PPC970 support also
      lets us remove the code for allocating and managing contiguous
      real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
      case, the code for pinning pages of guest memory when first
      accessed and keeping track of which pages have been pinned, and
      the code for handling H_ENTER hypercalls in virtual mode.
      
      Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
      The KVM_CAP_PPC_RMA capability now always returns 0.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c17b98cf
  27. 15 12月, 2014 2 次提交
    • S
      powernv/powerpc: Add winkle support for offline cpus · 77b54e9f
      Shreyas B. Prabhu 提交于
      Winkle is a deep idle state supported in power8 chips. A core enters
      winkle when all the threads of the core enter winkle. In this state
      power supply to the entire chiplet i.e core, private L2 and private L3
      is turned off. As a result it gives higher powersavings compared to
      sleep.
      
      But entering winkle results in a total hypervisor state loss. Hence the
      hypervisor context has to be preserved before entering winkle and
      restored upon wake up.
      
      Power-on Reset Engine (PORE) is a dedicated engine which is responsible
      for powering on the chiplet during wake up. It can be programmed to
      restore the register contests of a few specific registers. This patch
      uses PORE to restore register state wherever possible and uses stack to
      save and restore rest of the necessary registers.
      
      With hypervisor state restore things fall under three categories-
      per-core state, per-subcore state and per-thread state. To manage this,
      extend the infrastructure introduced for sleep. Mainly we add a paca
      variable subcore_sibling_mask. Using this and the core_idle_state we can
      distingush first thread in core and subcore.
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      77b54e9f
    • S
      powernv/cpuidle: Redesign idle states management · 7cba160a
      Shreyas B. Prabhu 提交于
      Deep idle states like sleep and winkle are per core idle states. A core
      enters these states only when all the threads enter either the
      particular idle state or a deeper one. There are tasks like fastsleep
      hardware bug workaround and hypervisor core state save which have to be
      done only by the last thread of the core entering deep idle state and
      similarly tasks like timebase resync, hypervisor core register restore
      that have to be done only by the first thread waking up from these
      state.
      
      The current idle state management does not have a way to distinguish the
      first/last thread of the core waking/entering idle states. Tasks like
      timebase resync are done for all the threads. This is not only is
      suboptimal, but can cause functionality issues when subcores and kvm is
      involved.
      
      This patch adds the necessary infrastructure to track idle states of
      threads in a per-core structure. It uses this info to perform tasks like
      fastsleep workaround and timebase resync only once per core.
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Originally-by: NPreeti U. Murthy <preeti@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: linux-pm@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7cba160a
  28. 02 12月, 2014 1 次提交