1. 30 7月, 2014 1 次提交
    • A
      powerpc/e6500: Add support for hardware threads · e16c8765
      Andy Fleming 提交于
      The general idea is that each core will release all of its
      threads into the secondary thread startup code, which will
      eventually wait in the secondary core holding area, for the
      appropriate bit in the PACA to be set. The kick_cpu function
      pointer will set that bit in the PACA, and thus "release"
      the core/thread to boot. We also need to do a few things that
      U-Boot normally does for CPUs (like enable branch prediction).
      Signed-off-by: NAndy Fleming <afleming@freescale.com>
      [scottwood@freescale.com: various changes, including only enabling
       threads if Linux wants to kick them]
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      e16c8765
  2. 31 10月, 2013 2 次提交
  3. 17 10月, 2013 1 次提交
  4. 11 10月, 2013 1 次提交
  5. 31 7月, 2013 1 次提交
  6. 06 5月, 2013 1 次提交
  7. 26 4月, 2013 1 次提交
    • A
      powerpc/perf: Add new BHRB related instructions for POWER8 · 95213959
      Anshuman Khandual 提交于
      This patch adds new POWER8 instruction encoding for reading
      and clearing Branch History Rolling Buffer entries. The new
      instruction 'mfbhrbe' (move from branch history rolling buffer
      entry) is used to read BHRB buffer entries and instruction
      'clrbhrb' (clear branch history rolling buffer) is used to
      clear the entire buffer. The instruction 'clrbhrb' has straight
      forward encoding. But the instruction encoding format for
      reading the BHRB entries is like 'mfbhrbe RT, BHRBE' where it
      takes two arguments, i.e the index for the BHRB buffer entry to
      read and a general purpose register to put the value which was
      read from the buffer entry.
      Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      95213959
  8. 15 2月, 2013 1 次提交
    • M
      powerpc: Add new instructions for transactional memory · 14c39a4c
      Michael Neuling 提交于
      Here we define the new instructions we need for transactional memory in the
      kernel.  This is so we can support compiling with binutils that don't support
      the new transactional memory instructions.
      
      Transactional memory results in two sets of architected state (GPRs/VSRs
      etc).
      
      treclaim allows us to read the checkpointed state (from the tbegin) so that we
      can store it away on a context switch.  It does this by overwriting the exiting
      architected state, so you have to save that away before you treclaim.  treclaim
      will also abort a transaction, so you can give a register value which contains
      an abort reason.
      
      trecheckpoint allows us to inject into the checkpointed state as if it were at
      the tbegin.  It does this by copying the current architected state into the
      checkpointed state.
      Signed-off-by: NMatt Evans <matt@ozlabs.org>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      14c39a4c
  9. 10 1月, 2013 1 次提交
  10. 18 11月, 2012 1 次提交
  11. 15 11月, 2012 2 次提交
  12. 17 9月, 2012 1 次提交
  13. 10 7月, 2012 9 次提交
  14. 05 3月, 2012 1 次提交
    • P
      KVM: PPC: Implement MMIO emulation support for Book3S HV guests · 697d3899
      Paul Mackerras 提交于
      This provides the low-level support for MMIO emulation in Book3S HV
      guests.  When the guest tries to map a page which is not covered by
      any memslot, that page is taken to be an MMIO emulation page.  Instead
      of inserting a valid HPTE, we insert an HPTE that has the valid bit
      clear but another hypervisor software-use bit set, which we call
      HPTE_V_ABSENT, to indicate that this is an absent page.  An
      absent page is treated much like a valid page as far as guest hcalls
      (H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that
      an absent HPTE doesn't need to be invalidated with tlbie since it
      was never valid as far as the hardware is concerned.
      
      When the guest accesses a page for which there is an absent HPTE, it
      will take a hypervisor data storage interrupt (HDSI) since we now set
      the VPM1 bit in the LPCR.  Our HDSI handler for HPTE-not-present faults
      looks up the hash table and if it finds an absent HPTE mapping the
      requested virtual address, will switch to kernel mode and handle the
      fault in kvmppc_book3s_hv_page_fault(), which at present just calls
      kvmppc_hv_emulate_mmio() to set up the MMIO emulation.
      
      This is based on an earlier patch by Benjamin Herrenschmidt, but since
      heavily reworked.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      697d3899
  15. 22 7月, 2011 1 次提交
    • M
      net: filter: BPF 'JIT' compiler for PPC64 · 0ca87f05
      Matt Evans 提交于
      An implementation of a code generator for BPF programs to speed up packet
      filtering on PPC64, inspired by Eric Dumazet's x86-64 version.
      
      Filter code is generated as an ABI-compliant function in module_alloc()'d mem
      with stackframe & prologue/epilogue generated if required (simple filters don't
      need anything more than an li/blr).  The filter's local variables, M[], live in
      registers.  Supports all BPF opcodes, although "complicated" loads from negative
      packet offsets (e.g. SKF_LL_OFF) are not yet supported.
      
      There are a couple of further optimisations left for future work; many-pass
      assembly with branch-reach reduction and a register allocator to push M[]
      variables into volatile registers would improve the code quality further.
      
      This currently supports big-endian 64-bit PowerPC only (but is fairly simple
      to port to PPC32 or LE!).
      
      Enabled in the same way as x86-64:
      
      	echo 1 > /proc/sys/net/core/bpf_jit_enable
      
      Or, enabled with extra debug output:
      
      	echo 2 > /proc/sys/net/core/bpf_jit_enable
      Signed-off-by: NMatt Evans <matt@ozlabs.org>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ca87f05
  16. 27 4月, 2011 1 次提交
    • A
      powerpc: Per process DSCR + some fixes (try#4) · efcac658
      Alexey Kardashevskiy 提交于
      The DSCR (aka Data Stream Control Register) is supported on some
      server PowerPC chips and allow some control over the prefetch
      of data streams.
      
      This patch allows the value to be specified per thread by emulating
      the corresponding mfspr and mtspr instructions. Children of such
      threads inherit the value. Other threads use a default value that
      can be specified in sysfs - /sys/devices/system/cpu/dscr_default.
      
      If a thread starts with non default value in the sysfs entry,
      all children threads inherit this non default value even if
      the sysfs value is changed later.
      Signed-off-by: NAlexey Kardashevskiy <aik@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      efcac658
  17. 20 4月, 2011 2 次提交
  18. 09 12月, 2010 1 次提交
  19. 22 6月, 2010 1 次提交
    • P
      powerpc: Emulate most Book I instructions in emulate_step() · 0016a4cf
      Paul Mackerras 提交于
      This extends the emulate_step() function to handle a large proportion
      of the Book I instructions implemented on current 64-bit server
      processors.  The aim is to handle all the load and store instructions
      used in the kernel, plus all of the instructions that appear between
      l[wd]arx and st[wd]cx., so this handles the Altivec/VMX lvx and stvx
      and the VSX lxv2dx and stxv2dx instructions (implemented in POWER7).
      
      The new code can emulate user mode instructions, and checks the
      effective address for a load or store if the saved state is for
      user mode.  It doesn't handle little-endian mode at present.
      
      For floating-point, Altivec/VMX and VSX instructions, it checks
      that the saved MSR has the enable bit for the relevant facility
      set, and if so, assumes that the FP/VMX/VSX registers contain
      valid state, and does loads or stores directly to/from the
      FP/VMX/VSX registers, using assembly helpers in ldstfp.S.
      
      Instructions supported now include:
      * Loads and stores, including some but not all VMX and VSX instructions,
        and lmw/stmw
      * Atomic loads and stores (l[dw]arx, st[dw]cx.)
      * Arithmetic instructions (add, subtract, multiply, divide, etc.)
      * Compare instructions
      * Rotate and mask instructions
      * Shift instructions
      * Logical instructions (and, or, xor, etc.)
      * Condition register logical instructions
      * mtcrf, cntlz[wd], exts[bhw]
      * isync, sync, lwsync, ptesync, eieio
      * Cache operations (dcbf, dcbst, dcbt, dcbtst)
      
      The overflow-checking arithmetic instructions are not included, but
      they appear not to be ever used in C code.
      
      This uses decimal values for the minor opcodes in the switch statements
      because that is what appears in the Power ISA specification, thus it is
      easier to check that they are correct if they are in decimal.
      
      If this is used to single-step an instruction where a data breakpoint
      interrupt occurred, then there is the possibility that the instruction
      is a lwarx or ldarx.  In that case we have to be careful not to lose the
      reservation until we get to the matching st[wd]cx., or we'll never make
      forward progress.  One alternative is to try to arrange that we can
      return from interrupts and handle data breakpoint interrupts without
      losing the reservation, which means not using any spinlocks, mutexes,
      or atomic ops (including bitops).  That seems rather fragile.  The
      other alternative is to emulate the larx/stcx and all the instructions
      in between.  This is why this commit adds support for a wide range
      of integer instructions.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      0016a4cf
  20. 17 3月, 2010 1 次提交
  21. 17 2月, 2010 2 次提交
    • A
      powerpc: Use lwarx/ldarx hint in bit locks · 864b9e6f
      Anton Blanchard 提交于
      This patch implements the lwarx/ldarx hint bit for bit locks.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      864b9e6f
    • A
      powerpc: Use lwarx hint in spinlocks · 4e14a4d1
      Anton Blanchard 提交于
      Recent versions of the PowerPC architecture added a hint bit to the larx
      instructions to differentiate between an atomic operation and a lock operation:
      
      > 0 Other programs might attempt to modify the word in storage addressed by EA
      > even if the subsequent Store Conditional succeeds.
      >
      > 1 Other programs will not attempt to modify the word in storage addressed by
      > EA until the program that has acquired the lock performs a subsequent store
      > releasing the lock.
      
      To avoid a binutils dependency this patch create macros for the extended lwarx
      format and uses it in the spinlock code. To test this change I used a simple
      test case that acquires and releases a global pthread mutex:
      
      	pthread_mutex_lock(&mutex);
      	pthread_mutex_unlock(&mutex);
      
      On a 32 core POWER6, running 32 test threads we spend almost all our time in
      the futex spinlock code:
      
          94.37%     perf  [kernel]                     [k] ._raw_spin_lock
                     |
                     |--99.95%-- ._raw_spin_lock
                     |          |
                     |          |--63.29%-- .futex_wake
                     |          |
                     |          |--36.64%-- .futex_wait_setup
      
      Which is a good test for this patch. The results (in lock/unlock operations per
      second) are:
      
      before: 1538203 ops/sec
      after:  2189219 ops/sec
      
      An improvement of 42%
      
      A 32 core POWER7 improves even more:
      
      before: 1279529 ops/sec
      after:  2282076 ops/sec
      
      An improvement of 78%
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4e14a4d1
  22. 20 8月, 2009 1 次提交
  23. 21 5月, 2009 3 次提交
  24. 23 4月, 2009 1 次提交
  25. 07 4月, 2009 2 次提交