1. 21 6月, 2017 1 次提交
    • A
      KVM: PPC: Book3S HV: Add new capability to control MCE behaviour · 134764ed
      Aravinda Prasad 提交于
      This introduces a new KVM capability to control how KVM behaves
      on machine check exception (MCE) in HV KVM guests.
      
      If this capability has not been enabled, KVM redirects machine check
      exceptions to guest's 0x200 vector, if the address in error belongs to
      the guest. With this capability enabled, KVM will cause a guest exit
      with the exit reason indicating an NMI.
      
      The new capability is required to avoid problems if a new kernel/KVM
      is used with an old QEMU, running a guest that doesn't issue
      "ibm,nmi-register".  As old QEMU does not understand the NMI exit
      type, it treats it as a fatal error.  However, the guest could have
      handled the machine check error if the exception was delivered to
      guest's 0x200 interrupt vector instead of NMI exit in case of old
      QEMU.
      
      [paulus@ozlabs.org - Reworded the commit message to be clearer,
       enable only on HV KVM.]
      Signed-off-by: NAravinda Prasad <aravinda@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      134764ed
  2. 19 6月, 2017 2 次提交
    • P
      KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9 · 57900694
      Paul Mackerras 提交于
      On POWER9, we no longer have the restriction that we had on POWER8
      where all threads in a core have to be in the same partition, so
      the CPU threads are now independent.  However, we still want to be
      able to run guests with a virtual SMT topology, if only to allow
      migration of guests from POWER8 systems to POWER9.
      
      A guest that has a virtual SMT mode greater than 1 will expect to
      be able to use the doorbell facility; it will expect the msgsndp
      and msgclrp instructions to work appropriately and to be able to read
      sensible values from the TIR (thread identification register) and
      DPDES (directed privileged doorbell exception status) special-purpose
      registers.  However, since each CPU thread is a separate sub-processor
      in POWER9, these instructions and registers can only be used within
      a single CPU thread.
      
      In order for these instructions to appear to act correctly according
      to the guest's virtual SMT mode, we have to trap and emulate them.
      We cause them to trap by clearing the HFSCR_MSGP bit in the HFSCR
      register.  The emulation is triggered by the hypervisor facility
      unavailable interrupt that occurs when the guest uses them.
      
      To cause a doorbell interrupt to occur within the guest, we set the
      DPDES register to 1.  If the guest has interrupts enabled, the CPU
      will generate a doorbell interrupt and clear the DPDES register in
      hardware.  The DPDES hardware register for the guest is saved in the
      vcpu->arch.vcore->dpdes field.  Since this gets written by the guest
      exit code, other VCPUs wishing to cause a doorbell interrupt don't
      write that field directly, but instead set a vcpu->arch.doorbell_request
      flag.  This is consumed and set to 0 by the guest entry code, which
      then sets DPDES to 1.
      
      Emulating reads of the DPDES register is somewhat involved, because
      it requires reading the doorbell pending interrupt status of all of the
      VCPU threads in the virtual core, and if any of those VCPUs are
      running, their doorbell status is only up-to-date in the hardware
      DPDES registers of the CPUs where they are running.  In order to get
      a reasonable approximation of the current doorbell status, we send
      those CPUs an IPI, causing an exit from the guest which will update
      the vcpu->arch.vcore->dpdes field.  We then use that value in
      constructing the emulated DPDES register value.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      57900694
    • P
      KVM: PPC: Book3S HV: Context-switch HFSCR between host and guest on POWER9 · 769377f7
      Paul Mackerras 提交于
      This adds code to allow us to use a different value for the HFSCR
      (Hypervisor Facilities Status and Control Register) when running the
      guest from that which applies in the host.  The reason for doing this
      is to allow us to trap the msgsndp instruction and related operations
      in future so that they can be virtualized.  We also save the value of
      HFSCR when a hypervisor facility unavailable interrupt occurs, because
      the high byte of HFSCR indicates which facility the guest attempted to
      access.
      
      We save and restore the host value on guest entry/exit because some
      bits of it affect host userspace execution.
      
      We only do all this on POWER9, not on POWER8, because we are not
      intending to virtualize any of the facilities controlled by HFSCR on
      POWER8.  In particular, the HFSCR bit that controls execution of
      msgsndp and related operations does not exist on POWER8.  The HFSCR
      doesn't exist at all on POWER7.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      769377f7
  3. 28 4月, 2017 3 次提交
    • N
      powerpc/64s: Dedicated system reset interrupt stack · b1ee8a3d
      Nicholas Piggin 提交于
      The system reset interrupt is used for crash/debug situations, so it is
      desirable to have as little impact on the normal state of the system as
      possible.
      
      Currently it uses the current kernel stack to process the exception.
      This stores into the stack which may be involved with the crash. The
      stack pointer may be corrupted, or it may have overflowed.
      
      Avoid or minimise these problems by creating a dedicated NMI stack for
      the system reset interrupt to use.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b1ee8a3d
    • N
      powerpc/64s: Disallow system reset vs system reset reentrancy · c4f3b52c
      Nicholas Piggin 提交于
      In preparation for using a dedicated stack for system reset interrupts,
      prevent a nested system reset from recovering, in order to simplify
      code that is called in crash/debug path. This allows a system reset
      interrupt to just use the base stack pointer.
      
      Keep an in_nmi nesting counter similarly to the in_mce counter. Consider
      the interrrupt non-recoverable if it is taken inside another system
      reset.
      
      Interrupt nesting could be allowed similarly to MCE, but system reset
      is a special case that's not for normal operation, so simplicity wins
      until there is requirement for nested system reset interrupts.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c4f3b52c
    • N
      powerpc/64s: Fix system reset vs general interrupt reentrancy · a3d96f70
      Nicholas Piggin 提交于
      The system reset interrupt can occur when MSR_EE=0, and it currently
      uses the PACA_EXGEN save area.
      
      Some PACA_EXGEN interrupts have a window where MSR_RI=1 and MSR_EE=0
      when the save area is still in use. A system reset interrupt in this
      window can lead to undetected corruption when the save area gets
      overwritten.
      
      This patch introduces PACA_EXNMI save area for system reset exceptions,
      which closes this corruption window. It's also helpful to retain the
      EXGEN state for debugging situations, even if not considering the
      recoverability aspect.
      
      This patch also moves the PACA_EXMC area down to a less frequently used
      part of the paca with the new save area.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a3d96f70
  4. 27 4月, 2017 1 次提交
  5. 12 4月, 2017 1 次提交
    • M
      powerpc/mm: Fix swapper_pg_dir size on 64-bit hash w/64K pages · 03dfee6d
      Michael Ellerman 提交于
      Recently in commit f6eedbba ("powerpc/mm/hash: Increase VA range to 128TB"),
      we increased H_PGD_INDEX_SIZE to 15 when we're building with 64K pages. This
      makes it larger than RADIX_PGD_INDEX_SIZE (13), which means the logic to
      calculate MAX_PGD_INDEX_SIZE in book3s/64/pgtable.h is wrong.
      
      The end result is that the PGD (Page Global Directory, ie top level page table)
      of the kernel (aka. swapper_pg_dir), is too small.
      
      This generally doesn't lead to a crash, as we don't use the full range in normal
      operation. However if we try to dump the kernel pagetables we can trigger a
      crash because we walk off the end of the pgd into other memory and eventually
      try to dereference something bogus:
      
        $ cat /sys/kernel/debug/kernel_pagetables
        Unable to handle kernel paging request for data at address 0xe8fece0000000000
        Faulting instruction address: 0xc000000000072314
        cpu 0xc: Vector: 380 (Data SLB Access) at [c0000000daa13890]
            pc: c000000000072314: ptdump_show+0x164/0x430
            lr: c000000000072550: ptdump_show+0x3a0/0x430
           dar: e802cf0000000000
        seq_read+0xf8/0x560
        full_proxy_read+0x84/0xc0
        __vfs_read+0x6c/0x1d0
        vfs_read+0xbc/0x1b0
        SyS_read+0x6c/0x110
        system_call+0x38/0xfc
      
      The root cause is that MAX_PGD_INDEX_SIZE isn't actually computed to be
      the max of H_PGD_INDEX_SIZE or RADIX_PGD_INDEX_SIZE. To fix that move
      the calculation into asm-offsets.c where we can do it easily using
      max().
      
      Fixes: f6eedbba ("powerpc/mm/hash: Increase VA range to 128TB")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      03dfee6d
  6. 11 4月, 2017 1 次提交
    • G
      powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1 · 17ed4c8f
      Gautham R. Shenoy 提交于
      POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up
      from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus
      the HSPRG0 of a thread waking up from can contain the paca pointer of
      its sibling.
      
      This patch implements a context recovery framework within threads of a
      core, by provisioning space in paca_struct for saving every sibling
      threads's paca pointers. Basically, we should be able to arrive at the
      right paca pointer from any of the thread's existing paca pointer.
      
      At bootup, during powernv idle-init, we save the paca address of every
      CPU in each one its siblings paca_struct in the slot corresponding to
      this CPU's index in the core.
      
      On wakeup from a stop, the thread will determine its index in the core
      from the TIR register and recover its PACA pointer by indexing into
      the correct slot in the provisioned space in the current PACA.
      
      Furthermore, ensure that the NVGPRs are restored from the stack on the
      way out by setting the NAPSTATELOST in paca.
      
      [Changelog written with inputs from svaidy@linux.vnet.ibm.com]
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Call it a bug]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      17ed4c8f
  7. 01 4月, 2017 1 次提交
  8. 15 2月, 2017 2 次提交
  9. 06 2月, 2017 2 次提交
  10. 31 1月, 2017 1 次提交
  11. 24 1月, 2017 1 次提交
    • M
      powerpc: Revert the initial stack protector support · f2574030
      Michael Ellerman 提交于
      Unfortunately the stack protector support we merged recently only works
      on some toolchains. If the toolchain is built without glibc support
      everything works fine, but if glibc is built then it leads to a panic
      at boot.
      
      The solution is not rc5 material, so revert the support for now. This
      reverts commits:
      
      6533b7c1 ("powerpc: Initial stack protector (-fstack-protector) support")
      902e06eb ("powerpc/32: Change the stack protector canary value per task")
      
      Fixes: 6533b7c1 ("powerpc: Initial stack protector (-fstack-protector) support")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f2574030
  12. 14 1月, 2017 1 次提交
  13. 24 11月, 2016 2 次提交
    • P
      KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9 · 7c5b06ca
      Paul Mackerras 提交于
      POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
      and tlbiel (local tlbie) instructions.  Both instructions get a
      set of new parameters (RIC, PRS and R) which appear as bits in the
      instruction word.  The tlbiel instruction now has a second register
      operand, which contains a PID and/or LPID value if needed, and
      should otherwise contain 0.
      
      This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
      as well as older processors.  Since we only handle HPT guests so
      far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
      word as on previous processors, so we don't need to conditionally
      execute different instructions depending on the processor.
      
      The local flush on first entry to a guest in book3s_hv_rmhandlers.S
      is a loop which depends on the number of TLB sets.  Rather than
      using feature sections to set the number of iterations based on
      which CPU we're on, we now work out this number at VM creation time
      and store it in the kvm_arch struct.  That will make it possible to
      get the number from the device tree in future, which will help with
      compatibility with future processors.
      
      Since mmu_partition_table_set_entry() does a global flush of the
      whole LPID, we don't need to do the TLB flush on first entry to the
      guest on each processor.  Therefore we don't set all bits in the
      tlb_need_flush bitmap on VM startup on POWER9.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7c5b06ca
    • P
      KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs · e9cf1e08
      Paul Mackerras 提交于
      This adds code to handle two new guest-accessible special-purpose
      registers on POWER9: TIDR (thread ID register) and PSSCR (processor
      stop status and control register).  They are context-switched
      between host and guest, and the guest values can be read and set
      via the one_reg interface.
      
      The PSSCR contains some fields which are guest-accessible and some
      which are only accessible in hypervisor mode.  We only allow the
      guest-accessible fields to be read or set by userspace.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e9cf1e08
  14. 23 11月, 2016 1 次提交
    • C
      powerpc/32: Change the stack protector canary value per task · 902e06eb
      Christophe Leroy 提交于
      Partially copied from commit df0698be ("ARM: stack protector:
      change the canary value per task")
      
      A new random value for the canary is stored in the task struct whenever
      a new task is forked.  This is meant to allow for different canary values
      per task.  On powerpc, GCC expects the canary value to be found in a global
      variable called __stack_chk_guard.  So this variable has to be updated
      with the value stored in the task struct whenever a task switch occurs.
      
      Because the variable GCC expects is global, this cannot work on SMP
      unfortunately.  So, on SMP, the same initial canary value is kept
      throughout, making this feature a bit less effective although it is still
      useful.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      902e06eb
  15. 21 11月, 2016 1 次提交
    • P
      KVM: PPC: Book3S HV: Save/restore XER in checkpointed register state · 0d808df0
      Paul Mackerras 提交于
      When switching from/to a guest that has a transaction in progress,
      we need to save/restore the checkpointed register state.  Although
      XER is part of the CPU state that gets checkpointed, the code that
      does this saving and restoring doesn't save/restore XER.
      
      This fixes it by saving and restoring the XER.  To allow userspace
      to read/write the checkpointed XER value, we also add a new ONE_REG
      specifier.
      
      The visible effect of this bug is that the guest may see its XER
      value being corrupted when it uses transactions.
      
      Fixes: e4e38121 ("KVM: PPC: Book3S HV: Add transactional memory support")
      Fixes: 0a8eccef ("KVM: PPC: Book3S HV: Add missing code for transaction reclaim on guest exit")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      0d808df0
  16. 04 10月, 2016 1 次提交
  17. 27 9月, 2016 1 次提交
    • P
      KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread · 88b02cf9
      Paul Mackerras 提交于
      POWER8 has one virtual timebase (VTB) register per subcore, not one
      per CPU thread.  The HV KVM code currently treats VTB as a per-thread
      register, which can lead to spurious soft lockup messages from guests
      which use the VTB as the time source for the soft lockup detector.
      (CPUs before POWER8 did not have the VTB register.)
      
      For HV KVM, this fixes the problem by making only the primary thread
      in each virtual core save and restore the VTB value.  With this,
      the VTB state becomes part of the kvmppc_vcore structure.  This
      also means that "piggybacking" of multiple virtual cores onto one
      subcore is not possible on POWER8, because then the virtual cores
      would share a single VTB register.
      
      PR KVM emulates a VTB register, which is per-vcpu because PR KVM
      has no notion of CPU threads or SMT.  For PR KVM we move the VTB
      state into the kvmppc_vcpu_book3s struct.
      
      Cc: stable@vger.kernel.org # v3.14+
      Reported-by: NThomas Huth <thuth@redhat.com>
      Tested-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      88b02cf9
  18. 09 7月, 2016 3 次提交
    • S
      powerpc/8xx: Force VIRT_IMMR_BASE to be a positive number · 9f595fd8
      Scott Wood 提交于
      The asm-offsets mechanism generates signed numbers, even if the
      input value is explicitly unsigned.  This causes a problem with
      older binutils (e.g. 2.23), which sign-extend a negative number
      when @h is applied.  Thus, this instruction:
      
      	cmpli   cr0, r11, VIRT_IMMR_BASE@h
      
      resulted in this:
      
      Error: operand out of range (0xfffffff0 is not between 0x00000000 and
      0x0000ffff)
      
      By casting to a larger type, we can force the output to be expressed
      as a positive number.
      Signed-off-by: NScott Wood <oss@buserror.net>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      9f595fd8
    • C
      powerpc/8xx: Fix vaddr for IMMR early remap · f86ef74e
      Christophe Leroy 提交于
      Memory: 124428K/131072K available (3748K kernel code, 188K rwdata,
      648K rodata, 508K init, 290K bss, 6644K reserved)
      Kernel virtual memory layout:
        * 0xfffdf000..0xfffff000  : fixmap
        * 0xfde00000..0xfe000000  : consistent mem
        * 0xfddf6000..0xfde00000  : early ioremap
        * 0xc9000000..0xfddf6000  : vmalloc & ioremap
      SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
      
      Today, IMMR is mapped 1:1 at startup
      
      Mapping IMMR 1:1 is just wrong because it may overlap with another
      area. On most mpc8xx boards it is OK as IMMR is set to 0xff000000
      but for instance on EP88xC board, IMMR is at 0xfa200000 which
      overlaps with VM ioremap area
      
      This patch fixes the virtual address for remapping IMMR with the fixmap
      regardless of the value of IMMR.
      
      The size of IMMR area is 256kbytes (CPM at offset 0, security engine
      at offset 128k) so a 512k page is enough
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NScott Wood <oss@buserror.net>
      f86ef74e
    • C
      powerpc32: provide VIRT_CPU_ACCOUNTING · c223c903
      Christophe Leroy 提交于
      This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
      PPC32 doesn't have the PACA structure, so we use the task_info
      structure to store the accounting data.
      
      In order to reuse on PPC32 the PPC64 functions, all u64 data has
      been replaced by 'unsigned long' so that it is u32 on PPC32 and
      u64 on PPC64
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NScott Wood <oss@buserror.net>
      c223c903
  19. 16 6月, 2016 1 次提交
    • R
      powerpc/asm: Remove unused symbols in asm-offsets.c · aac6a91f
      Rashmica Gupta 提交于
      THREAD_DSCR:
        Added in efcac658 "powerpc: Per process DSCR + some fixes (try#4)"
        Last usage removed in 152d523e "powerpc: Create context switch helpers save_sprs() and restore_sprs()"
      
      THREAD_DSCR_INHERIT:
        Added in 71433285 "powerpc: Restore correct DSCR in context switch"
        Last usage removed in 152d523e "powerpc: Create context switch helpers save_sprs() and restore_sprs()"
      
      THREAD_TAR:
        Added in 2468dcf6 "powerpc: Add support for context switching the TAR register"
        Last usage removed in 152d523e "powerpc: Create context switch helpers save_sprs() and restore_sprs()"
      
      THREAD_BESCR, THREAD_EBBHR and THREAD_EBBRR:
        Added in 9353374b "powerpc: Context switch the new EBB SPRs"
        Last usage removed in 152d523e "powerpc: Create context switch helpers save_sprs() and restore_sprs()"
      
      THREAD_SIAR, THREAD_SDAR, THREAD_SIER, THREAD_MMCR0, and THREAD_MMCR2:
        Added in 59affcd3 "powerpc: Context switch more PMU related SPRs"
        Last usage removed in b11ae951 "powerpc: Partial revert of "Context switch more PMU related SPRs""
      
      PACA_LOCK_TOKEN:
        Added in 9e368f29 "KVM: PPC: book3s_hv: Add support for PPC970-family processors"
        Last usage removed in c17b98cf "KVM: PPC: Book3S HV: Remove code for PPC970 processors"
      
      HCALL_STAT_SIZE, HCALL_STAT_CALLS, HCALL_STAT_TB and HCALL_STAT_PURR:
        Added in 57852a85 "[POWERPC] powerpc: Instrument Hypervisor Calls"
        Last usage removed in c8cd093a "powerpc: tracing: Add hypervisor call tracepoints"
      
      VCPU_EPLC:
        Added in d30f6e48 "KVM: PPC: booke: category E.HV (GS-mode) support"
        Never used.
      
      CPU_DOWN_FLUSH:
        Added in e7affb1d "powerpc/cache: add cache flush operation for various e500"
        Never used.
      
      CFG_STAMP_XSEC:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Last usage removed in 0e469db8 "powerpc: Rework VDSO gettimeofday to prevent time going backwards"
      
      KVM_LPCR:
        Added in aa04b4cc "KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests"
        Last usage removed in a0144e2a "KVM: PPC: Book3S HV: Store LPCR value for each virtual core"
      
      GPR15, GPR16, GPR17, GPR18, GPR19, GPR20, GPR21, GPR22, GPR23, GPR24,
      GPR25, GPR26, GPR27, GPR28, GPR29, GPR30 and GPR31:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Never used.
      
      VCPU_SHADOW_FSCR:
        Added in 616dff86 "KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR"
        Never used.
      
      VCPU_SHADOW_SRR1:
        Added in a2d56020 "KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu"
        Never used.
      
      KVM_SPLIT_SIZE:
        Added in b4deba5c "KVM: PPC: Book3S HV: Implement dynamicmicro-threading on POWER8"
        Never used.
      
      VCPU_VCPUID:
        Added in de56a948 "KVM: PPC: Add support for Book3S processors in hypervisor mode"
        Last usage removed 1b400ba0 "KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations"
      
      _MQ:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Never used.
      
      AUDITCONTEXT:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Last usage removed in 401d1f02 "[PATCH] syscall entry/exit revamp"
      
      CLONE_VM:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Currently unused.
      
      CLONE_UNTRACED:
        Added in 14cf11af "powerpc: Merge enough to start building in arch/powerpc."
        Currently unused.
      Signed-off-by: NRashmica Gupta <rashmicy@gmail.com>
      [mpe: Munge change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      aac6a91f
  20. 01 5月, 2016 1 次提交
    • A
      powerpc/mm: Make page table size a variable · dd1842a2
      Aneesh Kumar K.V 提交于
      Radix and hash MMU models support different page table sizes. Make
      the #defines a variable so that existing code can work with variable
      sizes.
      
      Slice related code is only used by hash, so use hash constants there. We
      will replicate some of the boundary conditions with resepct to TASK_SIZE
      using radix values too. Right now we do boundary condition check using
      hash constants.
      
      Swapper pgdir size is initialized in asm code. We select the max pgd
      size to keep it simple. For now we select hash pgdir. When adding radix
      we will switch that to radix pgdir which is 64K.
      
      BUILD_BUG_ON check which is removed is already done in hugepage_init()
      using MAYBE_BUILD_BUG_ON().
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      dd1842a2
  21. 14 4月, 2016 1 次提交
    • M
      powerpc/livepatch: Add live patching support on ppc64le · 85baa095
      Michael Ellerman 提交于
      Add the kconfig logic & assembly support for handling live patched
      functions. This depends on DYNAMIC_FTRACE_WITH_REGS, which in turn
      depends on the new -mprofile-kernel ftrace ABI, which is only supported
      currently on ppc64le.
      
      Live patching is handled by a special ftrace handler. This means it runs
      from ftrace_caller(). The live patch handler modifies the NIP so as to
      redirect the return from ftrace_caller() to the new patched function.
      
      However there is one particularly tricky case we need to handle.
      
      If a function A calls another function B, and it is known at link time
      that they share the same TOC, then A will not save or restore its TOC,
      and will call the local entry point of B.
      
      When we live patch B, we replace it with a new function C, which may
      not have the same TOC as A. At live patch time it's too late to modify A
      to do the TOC save/restore, so the live patching code must interpose
      itself between A and C, and do the TOC save/restore that A omitted.
      
      An additionaly complication is that the livepatch code can not create a
      stack frame in order to save the TOC. That is because if C takes > 8
      arguments, or is varargs, A will have written the arguments for C in
      A's stack frame.
      
      To solve this, we introduce a "livepatch stack" which grows upward from
      the base of the regular stack, and is used to store the TOC & LR when
      calling a live patched function.
      
      When the patched function returns, we retrieve the real LR & TOC from
      the livepatch stack, restore them, and pop the livepatch "stack frame".
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      85baa095
  22. 05 3月, 2016 1 次提交
  23. 02 3月, 2016 1 次提交
    • C
      powerpc: Restore FPU/VEC/VSX if previously used · 70fe3d98
      Cyril Bur 提交于
      Currently the FPU, VEC and VSX facilities are lazily loaded. This is not
      a problem unless a process is using these facilities.
      
      Modern versions of GCC are very good at automatically vectorising code,
      new and modernised workloads make use of floating point and vector
      facilities, even the kernel makes use of vectorised memcpy.
      
      All this combined greatly increases the cost of a syscall since the
      kernel uses the facilities sometimes even in syscall fast-path making it
      increasingly common for a thread to take an *_unavailable exception soon
      after a syscall, not to mention potentially taking all three.
      
      The obvious overcompensation to this problem is to simply always load
      all the facilities on every exit to userspace. Loading up all FPU, VEC
      and VSX registers every time can be expensive and if a workload does
      avoid using them, it should not be forced to incur this penalty.
      
      An 8bit counter is used to detect if the registers have been used in the
      past and the registers are always loaded until the value wraps to back
      to zero.
      
      Several versions of the assembly in entry_64.S were tested:
      
        1. Always calling C.
        2. Performing a common case check and then calling C.
        3. A complex check in asm.
      
      After some benchmarking it was determined that avoiding C in the common
      case is a performance benefit (option 2). The full check in asm (option
      3) greatly complicated that codepath for a negligible performance gain
      and the trade-off was deemed not worth it.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      [mpe: Move load_vec in the struct to fill an existing hole, reword change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      
      fixup
      70fe3d98
  24. 27 12月, 2015 1 次提交
  25. 19 12月, 2015 1 次提交
  26. 22 8月, 2015 2 次提交
    • P
      KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8 · b4deba5c
      Paul Mackerras 提交于
      This builds on the ability to run more than one vcore on a physical
      core by using the micro-threading (split-core) modes of the POWER8
      chip.  Previously, only vcores from the same VM could be run together,
      and (on POWER8) only if they had just one thread per core.  With the
      ability to split the core on guest entry and unsplit it on guest exit,
      we can run up to 8 vcpu threads from up to 4 different VMs, and we can
      run multiple vcores with 2 or 4 vcpus per vcore.
      
      Dynamic micro-threading is only available if the static configuration
      of the cores is whole-core mode (unsplit), and only on POWER8.
      
      To manage this, we introduce a new kvm_split_mode struct which is
      shared across all of the subcores in the core, with a pointer in the
      paca on each thread.  In addition we extend the core_info struct to
      have information on each subcore.  When deciding whether to add a
      vcore to the set already on the core, we now have two possibilities:
      (a) piggyback the vcore onto an existing subcore, or (b) start a new
      subcore.
      
      Currently, when any vcpu needs to exit the guest and switch to host
      virtual mode, we interrupt all the threads in all subcores and switch
      the core back to whole-core mode.  It may be possible in future to
      allow some of the subcores to keep executing in the guest while
      subcore 0 switches to the host, but that is not implemented in this
      patch.
      
      This adds a module parameter called dynamic_mt_modes which controls
      which micro-threading (split-core) modes the code will consider, as a
      bitmap.  In other words, if it is 0, no micro-threading mode is
      considered; if it is 2, only 2-way micro-threading is considered; if
      it is 4, only 4-way, and if it is 6, both 2-way and 4-way
      micro-threading mode will be considered.  The default is 6.
      
      With this, we now have secondary threads which are the primary thread
      for their subcore and therefore need to do the MMU switch.  These
      threads will need to be started even if they have no vcpu to run, so
      we use the vcore pointer in the PACA rather than the vcpu pointer to
      trigger them.
      
      It is now possible for thread 0 to find that an exit has been
      requested before it gets to switch the subcore state to the guest.  In
      that case we haven't added the guest's timebase offset to the
      timebase, so we need to be careful not to subtract the offset in the
      guest exit path.  In fact we just skip the whole path that switches
      back to host context, since we haven't switched to the guest context.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b4deba5c
    • P
      KVM: PPC: Book3S HV: Make use of unused threads when running guests · ec257165
      Paul Mackerras 提交于
      When running a virtual core of a guest that is configured with fewer
      threads per core than the physical cores have, the extra physical
      threads are currently unused.  This makes it possible to use them to
      run one or more other virtual cores from the same guest when certain
      conditions are met.  This applies on POWER7, and on POWER8 to guests
      with one thread per virtual core.  (It doesn't apply to POWER8 guests
      with multiple threads per vcore because they require a 1-1 virtual to
      physical thread mapping in order to be able to use msgsndp and the
      TIR.)
      
      The idea is that we maintain a list of preempted vcores for each
      physical cpu (i.e. each core, since the host runs single-threaded).
      Then, when a vcore is about to run, it checks to see if there are
      any vcores on the list for its physical cpu that could be
      piggybacked onto this vcore's execution.  If so, those additional
      vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
      threads are started as well as the original vcore, which is called
      the master vcore.
      
      After the vcores have exited the guest, the extra ones are put back
      onto the preempted list if any of their VCPUs are still runnable and
      not idle.
      
      This means that vcpu->arch.ptid is no longer necessarily the same as
      the physical thread that the vcpu runs on.  In order to make it easier
      for code that wants to send an IPI to know which CPU to target, we
      now store that in a new field in struct vcpu_arch, called thread_cpu.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Tested-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ec257165
  27. 18 8月, 2015 1 次提交
  28. 07 6月, 2015 1 次提交
  29. 21 4月, 2015 3 次提交
    • P
      KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 · 66feed61
      Paul Mackerras 提交于
      This uses msgsnd where possible for signalling other threads within
      the same core on POWER8 systems, rather than IPIs through the XICS
      interrupt controller.  This includes waking secondary threads to run
      the guest, the interrupts generated by the virtual XICS, and the
      interrupts to bring the other threads out of the guest when exiting.
      
      Aggregated statistics from debugfs across vcpus for a guest with 32
      vcpus, 8 threads/vcore, running on a POWER8, show this before the
      change:
      
       rm_entry:     3387.6ns (228 - 86600, 1008969 samples)
        rm_exit:     4561.5ns (12 - 3477452, 1009402 samples)
        rm_intr:     1660.0ns (12 - 553050, 3600051 samples)
      
      and this after the change:
      
       rm_entry:     3060.1ns (212 - 65138, 953873 samples)
        rm_exit:     4244.1ns (12 - 9693408, 954331 samples)
        rm_intr:     1342.3ns (12 - 1104718, 3405326 samples)
      
      for a test of booting Fedora 20 big-endian to the login prompt.
      
      The time taken for a H_PROD hcall (which is handled in the host
      kernel) went down from about 35 microseconds to about 16 microseconds
      with this change.
      
      The noinline added to kvmppc_run_core turned out to be necessary for
      good performance, at least with gcc 4.9.2 as packaged with Fedora 21
      and a little-endian POWER8 host.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      66feed61
    • P
      KVM: PPC: Book3S HV: Use bitmap of active threads rather than count · 7d6c40da
      Paul Mackerras 提交于
      Currently, the entry_exit_count field in the kvmppc_vcore struct
      contains two 8-bit counts, one of the threads that have started entering
      the guest, and one of the threads that have started exiting the guest.
      This changes it to an entry_exit_map field which contains two bitmaps
      of 8 bits each.  The advantage of doing this is that it gives us a
      bitmap of which threads need to be signalled when exiting the guest.
      That means that we no longer need to use the trick of setting the
      HDEC to 0 to pull the other threads out of the guest, which led in
      some cases to a spurious HDEC interrupt on the next guest entry.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7d6c40da
    • P
      KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken · 5d5b99cd
      Paul Mackerras 提交于
      We can tell when a secondary thread has finished running a guest by
      the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
      is no real need for the nap_count field in the kvmppc_vcore struct.
      This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
      pointers of the secondary threads rather than polling vc->nap_count.
      Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
      this also means that we can tell which secondary threads have got
      stuck and thus print a more informative error message.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5d5b99cd