1. 21 10月, 2015 1 次提交
    • P
      powerpc: Revert "Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8" · 23316316
      Paul Mackerras 提交于
      This reverts commit 9678cdaa ("Use the POWER8 Micro Partition
      Prefetch Engine in KVM HV on POWER8") because the original commit had
      multiple, partly self-cancelling bugs, that could cause occasional
      memory corruption.
      
      In fact the logmpp instruction was incorrectly using register r0 as the
      source of the buffer address and operation code, and depending on what
      was in r0, it would either do nothing or corrupt the 64k page pointed to
      by r0.
      
      The logmpp instruction encoding and the operation code definitions could
      be corrected, but then there is the problem that there is no clearly
      defined way to know when the hardware has finished writing to the
      buffer.
      
      The original commit attempted to work around this by aborting the
      write-out before starting the prefetch, but this is ineffective in the
      case where the virtual core is now executing on a different physical
      core from the one where the write-out was initiated.
      
      These problems plus advice from the hardware designers not to use the
      function (since the measured performance improvement from using the
      feature was actually mostly negative), mean that reverting the code is
      the best option.
      
      Fixes: 9678cdaa ("Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8")
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      23316316
  2. 22 8月, 2015 1 次提交
    • T
      KVM: PPC: Fix warnings from sparse · 5358a963
      Thomas Huth 提交于
      When compiling the KVM code for POWER with "make C=1", sparse
      complains about functions missing proper prototypes and a 64-bit
      constant missing the ULL prefix. Let's fix this by making the
      functions static or by including the proper header with the
      prototypes, and by appending a ULL prefix to the constant
      PPC_MPPE_ADDRESS_MASK.
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5358a963
  3. 11 5月, 2015 1 次提交
    • D
      powerpc: Add ICSWX instruction · edc424f8
      Dan Streetman 提交于
      Add the asm ICSWX and ICSWEPX opcodes.  Add definitions for the
      Coprocessor Request structures needed to use the icswx calls to
      coprocessors.  Add icswx() function to perform the ICSWX asm
      using the provided Coprocessor Command Word value and
      Coprocessor Request Block structure.
      
      This is required for communication with the NX-842 coprocessor on
      a PowerNV system.
      Signed-off-by: NDan Streetman <ddstreet@ieee.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      edc424f8
  4. 20 3月, 2015 1 次提交
    • P
      powerpc/powernv: Fixes for hypervisor doorbell handling · 755563bc
      Paul Mackerras 提交于
      Since we can now use hypervisor doorbells for host IPIs, this makes
      sure we clear the host IPI flag when taking a doorbell interrupt, and
      clears any pending doorbell IPI in pnv_smp_cpu_kill_self() (as we
      already do for IPIs sent via the XICS interrupt controller).  Otherwise
      if there did happen to be a leftover pending doorbell interrupt for
      an offline CPU thread for any reason, it would prevent that thread from
      going into a power-saving mode; it would instead keep waking up because
      of the interrupt.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      755563bc
  5. 21 2月, 2015 1 次提交
  6. 15 12月, 2014 1 次提交
    • S
      powernv/powerpc: Add winkle support for offline cpus · 77b54e9f
      Shreyas B. Prabhu 提交于
      Winkle is a deep idle state supported in power8 chips. A core enters
      winkle when all the threads of the core enter winkle. In this state
      power supply to the entire chiplet i.e core, private L2 and private L3
      is turned off. As a result it gives higher powersavings compared to
      sleep.
      
      But entering winkle results in a total hypervisor state loss. Hence the
      hypervisor context has to be preserved before entering winkle and
      restored upon wake up.
      
      Power-on Reset Engine (PORE) is a dedicated engine which is responsible
      for powering on the chiplet during wake up. It can be programmed to
      restore the register contests of a few specific registers. This patch
      uses PORE to restore register state wherever possible and uses stack to
      save and restore rest of the necessary registers.
      
      With hypervisor state restore things fall under three categories-
      per-core state, per-subcore state and per-thread state. To manage this,
      extend the infrastructure introduced for sleep. Mainly we add a paca
      variable subcore_sibling_mask. Using this and the core_idle_state we can
      distingush first thread in core and subcore.
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      77b54e9f
  7. 04 11月, 2014 1 次提交
  8. 30 7月, 2014 1 次提交
    • A
      powerpc/e6500: Add support for hardware threads · e16c8765
      Andy Fleming 提交于
      The general idea is that each core will release all of its
      threads into the secondary thread startup code, which will
      eventually wait in the secondary core holding area, for the
      appropriate bit in the PACA to be set. The kick_cpu function
      pointer will set that bit in the PACA, and thus "release"
      the core/thread to boot. We also need to do a few things that
      U-Boot normally does for CPUs (like enable branch prediction).
      Signed-off-by: NAndy Fleming <afleming@freescale.com>
      [scottwood@freescale.com: various changes, including only enabling
       threads if Linux wants to kick them]
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      e16c8765
  9. 28 7月, 2014 1 次提交
    • S
      Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8 · 9678cdaa
      Stewart Smith 提交于
      The POWER8 processor has a Micro Partition Prefetch Engine, which is
      a fancy way of saying "has way to store and load contents of L2 or
      L2+MRU way of L3 cache". We initiate the storing of the log (list of
      addresses) using the logmpp instruction and start restore by writing
      to a SPR.
      
      The logmpp instruction takes parameters in a single 64bit register:
      - starting address of the table to store log of L2/L2+L3 cache contents
        - 32kb for L2
        - 128kb for L2+L3
        - Aligned relative to maximum size of the table (32kb or 128kb)
      - Log control (no-op, L2 only, L2 and L3, abort logout)
      
      We should abort any ongoing logging before initiating one.
      
      To initiate restore, we write to the MPPR SPR. The format of what to write
      to the SPR is similar to the logmpp instruction parameter:
      - starting address of the table to read from (same alignment requirements)
      - table size (no data, until end of table)
      - prefetch rate (from fastest possible to slower. about every 8, 16, 24 or
        32 cycles)
      
      The idea behind loading and storing the contents of L2/L3 cache is to
      reduce memory latency in a system that is frequently swapping vcores on
      a physical CPU.
      
      The best case scenario for doing this is when some vcores are doing very
      cache heavy workloads. The worst case is when they have about 0 cache hits,
      so we just generate needless memory operations.
      
      This implementation just does L2 store/load. In my benchmarks this proves
      to be useful.
      
      Benchmark 1:
       - 16 core POWER8
       - 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each
       - No split core/SMT
       - two guests running sysbench memory test.
         sysbench --test=memory --num-threads=8 run
       - one guest running apache bench (of default HTML page)
         ab -n 490000 -c 400 http://localhost/
      
      This benchmark aims to measure performance of real world application (apache)
      where other guests are cache hot with their own workloads. The sysbench memory
      benchmark does pointer sized writes to a (small) memory buffer in a loop.
      
      In this benchmark with this patch I can see an improvement both in requests
      per second (~5%) and in mean and median response times (again, about 5%).
      The spread of minimum and maximum response times were largely unchanged.
      
      benchmark 2:
       - Same VM config as benchmark 1
       - all three guests running sysbench memory benchmark
      
      This benchmark aims to see if there is a positive or negative affect to this
      cache heavy benchmark. Although due to the nature of the benchmark (stores) we
      may not see a difference in performance, but rather hopefully an improvement
      in consistency of performance (when vcore switched in, don't have to wait
      many times for cachelines to be pulled in)
      
      The results of this benchmark are improvements in consistency of performance
      rather than performance itself. With this patch, the few outliers in duration
      go away and we get more consistent performance in each guest.
      
      benchmark 3:
       - same 3 guests and CPU configuration as benchmark 1 and 2.
       - two idle guests
       - 1 guest running STREAM benchmark
      
      This scenario also saw performance improvement with this patch. On Copy and
      Scale workloads from STREAM, I got 5-6% improvement with this patch. For
      Add and triad, it was around 10% (or more).
      
      benchmark 4:
       - same 3 guests as previous benchmarks
       - two guests running sysbench --memory, distinctly different cache heavy
         workload
       - one guest running STREAM benchmark.
      
      Similar improvements to benchmark 3.
      
      benchmark 5:
       - 1 guest, 8 VCPUs, Ubuntu 14.04
       - Host configured with split core (SMT8, subcores-per-core=4)
       - STREAM benchmark
      
      In this benchmark, we see a 10-20% performance improvement across the board
      of STREAM benchmark results with this patch.
      
      Based on preliminary investigation and microbenchmarks
      by Prerna Saxena <prerna@linux.vnet.ibm.com>
      Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9678cdaa
  10. 31 10月, 2013 2 次提交
  11. 17 10月, 2013 1 次提交
  12. 11 10月, 2013 1 次提交
  13. 31 7月, 2013 1 次提交
  14. 06 5月, 2013 1 次提交
  15. 26 4月, 2013 1 次提交
    • A
      powerpc/perf: Add new BHRB related instructions for POWER8 · 95213959
      Anshuman Khandual 提交于
      This patch adds new POWER8 instruction encoding for reading
      and clearing Branch History Rolling Buffer entries. The new
      instruction 'mfbhrbe' (move from branch history rolling buffer
      entry) is used to read BHRB buffer entries and instruction
      'clrbhrb' (clear branch history rolling buffer) is used to
      clear the entire buffer. The instruction 'clrbhrb' has straight
      forward encoding. But the instruction encoding format for
      reading the BHRB entries is like 'mfbhrbe RT, BHRBE' where it
      takes two arguments, i.e the index for the BHRB buffer entry to
      read and a general purpose register to put the value which was
      read from the buffer entry.
      Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      95213959
  16. 15 2月, 2013 1 次提交
    • M
      powerpc: Add new instructions for transactional memory · 14c39a4c
      Michael Neuling 提交于
      Here we define the new instructions we need for transactional memory in the
      kernel.  This is so we can support compiling with binutils that don't support
      the new transactional memory instructions.
      
      Transactional memory results in two sets of architected state (GPRs/VSRs
      etc).
      
      treclaim allows us to read the checkpointed state (from the tbegin) so that we
      can store it away on a context switch.  It does this by overwriting the exiting
      architected state, so you have to save that away before you treclaim.  treclaim
      will also abort a transaction, so you can give a register value which contains
      an abort reason.
      
      trecheckpoint allows us to inject into the checkpointed state as if it were at
      the tbegin.  It does this by copying the current architected state into the
      checkpointed state.
      Signed-off-by: NMatt Evans <matt@ozlabs.org>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      14c39a4c
  17. 10 1月, 2013 1 次提交
  18. 18 11月, 2012 1 次提交
  19. 15 11月, 2012 2 次提交
  20. 17 9月, 2012 1 次提交
  21. 10 7月, 2012 9 次提交
  22. 05 3月, 2012 1 次提交
    • P
      KVM: PPC: Implement MMIO emulation support for Book3S HV guests · 697d3899
      Paul Mackerras 提交于
      This provides the low-level support for MMIO emulation in Book3S HV
      guests.  When the guest tries to map a page which is not covered by
      any memslot, that page is taken to be an MMIO emulation page.  Instead
      of inserting a valid HPTE, we insert an HPTE that has the valid bit
      clear but another hypervisor software-use bit set, which we call
      HPTE_V_ABSENT, to indicate that this is an absent page.  An
      absent page is treated much like a valid page as far as guest hcalls
      (H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that
      an absent HPTE doesn't need to be invalidated with tlbie since it
      was never valid as far as the hardware is concerned.
      
      When the guest accesses a page for which there is an absent HPTE, it
      will take a hypervisor data storage interrupt (HDSI) since we now set
      the VPM1 bit in the LPCR.  Our HDSI handler for HPTE-not-present faults
      looks up the hash table and if it finds an absent HPTE mapping the
      requested virtual address, will switch to kernel mode and handle the
      fault in kvmppc_book3s_hv_page_fault(), which at present just calls
      kvmppc_hv_emulate_mmio() to set up the MMIO emulation.
      
      This is based on an earlier patch by Benjamin Herrenschmidt, but since
      heavily reworked.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      697d3899
  23. 22 7月, 2011 1 次提交
    • M
      net: filter: BPF 'JIT' compiler for PPC64 · 0ca87f05
      Matt Evans 提交于
      An implementation of a code generator for BPF programs to speed up packet
      filtering on PPC64, inspired by Eric Dumazet's x86-64 version.
      
      Filter code is generated as an ABI-compliant function in module_alloc()'d mem
      with stackframe & prologue/epilogue generated if required (simple filters don't
      need anything more than an li/blr).  The filter's local variables, M[], live in
      registers.  Supports all BPF opcodes, although "complicated" loads from negative
      packet offsets (e.g. SKF_LL_OFF) are not yet supported.
      
      There are a couple of further optimisations left for future work; many-pass
      assembly with branch-reach reduction and a register allocator to push M[]
      variables into volatile registers would improve the code quality further.
      
      This currently supports big-endian 64-bit PowerPC only (but is fairly simple
      to port to PPC32 or LE!).
      
      Enabled in the same way as x86-64:
      
      	echo 1 > /proc/sys/net/core/bpf_jit_enable
      
      Or, enabled with extra debug output:
      
      	echo 2 > /proc/sys/net/core/bpf_jit_enable
      Signed-off-by: NMatt Evans <matt@ozlabs.org>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ca87f05
  24. 27 4月, 2011 1 次提交
    • A
      powerpc: Per process DSCR + some fixes (try#4) · efcac658
      Alexey Kardashevskiy 提交于
      The DSCR (aka Data Stream Control Register) is supported on some
      server PowerPC chips and allow some control over the prefetch
      of data streams.
      
      This patch allows the value to be specified per thread by emulating
      the corresponding mfspr and mtspr instructions. Children of such
      threads inherit the value. Other threads use a default value that
      can be specified in sysfs - /sys/devices/system/cpu/dscr_default.
      
      If a thread starts with non default value in the sysfs entry,
      all children threads inherit this non default value even if
      the sysfs value is changed later.
      Signed-off-by: NAlexey Kardashevskiy <aik@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      efcac658
  25. 20 4月, 2011 2 次提交
  26. 09 12月, 2010 1 次提交
  27. 22 6月, 2010 1 次提交
    • P
      powerpc: Emulate most Book I instructions in emulate_step() · 0016a4cf
      Paul Mackerras 提交于
      This extends the emulate_step() function to handle a large proportion
      of the Book I instructions implemented on current 64-bit server
      processors.  The aim is to handle all the load and store instructions
      used in the kernel, plus all of the instructions that appear between
      l[wd]arx and st[wd]cx., so this handles the Altivec/VMX lvx and stvx
      and the VSX lxv2dx and stxv2dx instructions (implemented in POWER7).
      
      The new code can emulate user mode instructions, and checks the
      effective address for a load or store if the saved state is for
      user mode.  It doesn't handle little-endian mode at present.
      
      For floating-point, Altivec/VMX and VSX instructions, it checks
      that the saved MSR has the enable bit for the relevant facility
      set, and if so, assumes that the FP/VMX/VSX registers contain
      valid state, and does loads or stores directly to/from the
      FP/VMX/VSX registers, using assembly helpers in ldstfp.S.
      
      Instructions supported now include:
      * Loads and stores, including some but not all VMX and VSX instructions,
        and lmw/stmw
      * Atomic loads and stores (l[dw]arx, st[dw]cx.)
      * Arithmetic instructions (add, subtract, multiply, divide, etc.)
      * Compare instructions
      * Rotate and mask instructions
      * Shift instructions
      * Logical instructions (and, or, xor, etc.)
      * Condition register logical instructions
      * mtcrf, cntlz[wd], exts[bhw]
      * isync, sync, lwsync, ptesync, eieio
      * Cache operations (dcbf, dcbst, dcbt, dcbtst)
      
      The overflow-checking arithmetic instructions are not included, but
      they appear not to be ever used in C code.
      
      This uses decimal values for the minor opcodes in the switch statements
      because that is what appears in the Power ISA specification, thus it is
      easier to check that they are correct if they are in decimal.
      
      If this is used to single-step an instruction where a data breakpoint
      interrupt occurred, then there is the possibility that the instruction
      is a lwarx or ldarx.  In that case we have to be careful not to lose the
      reservation until we get to the matching st[wd]cx., or we'll never make
      forward progress.  One alternative is to try to arrange that we can
      return from interrupts and handle data breakpoint interrupts without
      losing the reservation, which means not using any spinlocks, mutexes,
      or atomic ops (including bitops).  That seems rather fragile.  The
      other alternative is to emulate the larx/stcx and all the instructions
      in between.  This is why this commit adds support for a wide range
      of integer instructions.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      0016a4cf
  28. 17 3月, 2010 1 次提交
  29. 17 2月, 2010 1 次提交