1. 15 7月, 2016 2 次提交
    • B
      powerpc/powernv: Add XICS emulation APIs · 9fedd3f8
      Benjamin Herrenschmidt 提交于
      OPAL provides an emulated XICS interrupt controller to
      use as a fallback on newer processors that don't have a
      XICS. It's meant as a way to provide backward compatibility
      with future processors. Add the corresponding interfaces.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9fedd3f8
    • S
      powerpc/powernv: Add platform support for stop instruction · bcef83a0
      Shreyas B. Prabhu 提交于
      POWER ISA v3 defines a new idle processor core mechanism. In summary,
       a) new instruction named stop is added. This instruction replaces
      	instructions like nap, sleep, rvwinkle.
       b) new per thread SPR named Processor Stop Status and Control Register
      	(PSSCR) is added which controls the behavior of stop instruction.
      
      PSSCR layout:
      ----------------------------------------------------------
      | PLS | /// | SD | ESL | EC | PSLL | /// | TR | MTL | RL |
      ----------------------------------------------------------
      0      4     41   42    43   44     48    54   56    60
      
      PSSCR key fields:
      	Bits 0:3  - Power-Saving Level Status. This field indicates the lowest
      	power-saving state the thread entered since stop instruction was last
      	executed.
      
      	Bit 42 - Enable State Loss
      	0 - No state is lost irrespective of other fields
      	1 - Allows state loss
      
      	Bits 44:47 - Power-Saving Level Limit
      	This limits the power-saving level that can be entered into.
      
      	Bits 60:63 - Requested Level
      	Used to specify which power-saving level must be entered on executing
      	stop instruction
      
      This patch adds support for stop instruction and PSSCR handling.
      Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bcef83a0
  2. 14 7月, 2016 4 次提交
  3. 08 7月, 2016 1 次提交
  4. 07 7月, 2016 1 次提交
  5. 05 7月, 2016 2 次提交
    • O
      powerpc/timer: Large Decrementer support · 79901024
      Oliver O'Halloran 提交于
      Power ISAv3 adds a large decrementer (LD) mode which increases the size
      of the decrementer register. The size of the enlarged decrementer
      register is between 32 and 64 bits with the exact size being dependent
      on the implementation. When in LD mode, reads are sign extended to 64
      bits and a decrementer exception is raised when the high bit is set (i.e
      the value goes below zero). Writes however are truncated to the physical
      register width so some care needs to be taken to ensure that the high
      bit is not set when reloading the decrementer. This patch adds support
      for using the LD inside the host kernel on processors that support it.
      
      When LD mode is supported firmware will supply the ibm,dec-bits property
      for CPU nodes to allow the kernel to determine the maximum decrementer
      value. Enabling LD mode is a hypervisor privileged operation so the kernel
      can only enable it manually when running in hypervisor mode. Guests that
      support LD mode can request it using the "ibm,client-architecture-support"
      firmware call (not implemented in this patch) or some other platform
      specific method. If this property is not supplied then the traditional
      decrementer width of 32 bit is assumed and LD mode will not be enabled.
      
      This patch was based on initial work by Jack Miller.
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      Acked-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      79901024
    • C
      powerpc: Send SIGBUS on unaligned copy and paste · ae26b36f
      Chris Smart 提交于
      Calling ISA 3.0 instructions copy, copy_first, paste and paste_last
      generates an alignment fault when copying or pasting unaligned
      data (128 byte). We catch this and send SIGBUS to the userspace
      process that caused it.
      
      We do not emulate these because paste may contain additional metadata
      when pasting to a co-processor and paste_last is the synchronisation
      point for preceding copy/paste sequences.
      
      Thanks to Michael Neuling <mikey@neuling.org> for his help.
      Signed-off-by: NChris Smart <chris@distroguy.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ae26b36f
  6. 30 6月, 2016 1 次提交
  7. 29 6月, 2016 2 次提交
  8. 24 6月, 2016 2 次提交
  9. 21 6月, 2016 7 次提交
  10. 17 6月, 2016 1 次提交
  11. 16 6月, 2016 2 次提交
  12. 14 6月, 2016 5 次提交
    • B
      powerpc/spinlock: Fix spin_unlock_wait() · 6262db7c
      Boqun Feng 提交于
      There is an ordering issue with spin_unlock_wait() on powerpc, because
      the spin_lock primitive is an ACQUIRE and an ACQUIRE is only ordering
      the load part of the operation with memory operations following it.
      Therefore the following event sequence can happen:
      
      CPU 1			CPU 2			CPU 3
      
      ==================	====================	==============
      						spin_unlock(&lock);
      			spin_lock(&lock):
      			  r1 = *lock; // r1 == 0;
      o = object;		o = READ_ONCE(object); // reordered here
      object = NULL;
      smp_mb();
      spin_unlock_wait(&lock);
      			  *lock = 1;
      smp_mb();
      o->dead = true;         < o = READ_ONCE(object); > // reordered upwards
      			if (o) // true
      				BUG_ON(o->dead); // true!!
      
      To fix this, we add a "nop" ll/sc loop in arch_spin_unlock_wait() on
      ppc, the "nop" ll/sc loop reads the lock
      value and writes it back atomically, in this way it will synchronize the
      view of the lock on CPU1 with that on CPU2. Therefore in the scenario
      above, either CPU2 will fail to get the lock at first or CPU1 will see
      the lock acquired by CPU2, both cases will eliminate this bug. This is a
      similar idea as what Will Deacon did for ARM64 in:
      
        d86b8da0 ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers")
      
      Furthermore, if the "nop" ll/sc figures out the lock is locked, we
      actually don't need to do the "nop" ll/sc trick again, we can just do a
      normal load+check loop for the lock to be released, because in that
      case, spin_unlock_wait() is called when someone is holding the lock, and
      the store part of the "nop" ll/sc happens before the lock release of the
      current lock holder:
      
      	"nop" ll/sc -> spin_unlock()
      
      and the lock release happens before the next lock acquisition:
      
      	spin_unlock() -> spin_lock() <next holder>
      
      which means the "nop" ll/sc happens before the next lock acquisition:
      
      	"nop" ll/sc -> spin_unlock() -> spin_lock() <next holder>
      
      With a smp_mb() preceding spin_unlock_wait(), the store of object is
      guaranteed to be observed by the next lock holder:
      
      	STORE -> smp_mb() -> "nop" ll/sc
      	-> spin_unlock() -> spin_lock() <next holder>
      
      This patch therefore fixes the issue and also cleans the
      arch_spin_unlock_wait() a little bit by removing superfluous memory
      barriers in loops and consolidating the implementations for PPC32 and
      PPC64 into one.
      Suggested-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      [mpe: Inline the "nop" ll/sc loop and set EH=0, munge change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6262db7c
    • M
      powerpc: Define and use PPC64_ELF_ABI_v2/v1 · f55d9665
      Michael Ellerman 提交于
      We're approaching 20 locations where we need to check for ELF ABI v2.
      That's fine, except the logic is a bit awkward, because we have to check
      that _CALL_ELF is defined and then what its value is.
      
      So check it once in asm/types.h and define PPC64_ELF_ABI_v2 when ELF ABI
      v2 is detected.
      
      We also have a few places where what we're really trying to check is
      that we are using the 64-bit v1 ABI, ie. function descriptors. So also
      add a #define for that, which simplifies several checks.
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f55d9665
    • M
      powerpc: Various typo fixes · 027dfac6
      Michael Ellerman 提交于
      Signed-off-by: NAndrea Gelmini <andrea.gelmini@gelma.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      027dfac6
    • A
      powerpc: Remove assembly versions of strcpy, strcat, strlen and strcmp · 3ece1663
      Anton Blanchard 提交于
      A number of our assembly implementations of string functions do not
      align their hot loops. I was going to align them manually, but I
      realised that they are are almost instruction for instruction
      identical to what gcc produces, with the advantage that gcc does
      align them.
      
      In light of that, let's just remove the assembly versions.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3ece1663
    • A
      powerpc/mm/hash: Use the correct PPP mask when updating HPTE · 8550e2fa
      Aneesh Kumar K.V 提交于
      With commit e58e87ad "powerpc/mm: Update _PAGE_KERNEL_RO" we now
      use all the three PPP bits. The top bit is now used to have a PPP value
      of 0b110 which will be mapped to kernel read only. When updating the
      hpte entry use right mask such that we update the 63rd bit (top 'P' bit)
      too.
      
      Prior to e58e87ad we didn't support KERNEL_RO at all (it was ==
      KERNEL_RW), so this isn't a regression as such.
      
      Fixes: e58e87ad ("powerpc/mm: Update _PAGE_KERNEL_RO")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8550e2fa
  13. 10 6月, 2016 2 次提交
    • A
      powerpc/mm/radix: Flush page walk cache when freeing page table · a145abf1
      Aneesh Kumar K.V 提交于
      Even though a tlb_flush() does a flush with invalidate all cache,
      we can end up doing an RCU page table free before calling tlb_flush().
      That means we can have page walk cache entries even after we free the
      page table pages. This can result in us doing wrong page table walk.
      
      Avoid this by doing pwc flush on every page table free. We can't batch
      the pwc flush, because the rcu call back function where we free the
      page table pages doesn't have information of the mmu gather. Thus we
      have to do a pwc on every page table page freed.
      
      Note: I also removed the dummy tlb_flush_pgtable call functions for
      hash 32.
      
      Fixes: 1a472c9d ("powerpc/mm/radix: Add tlbflush routines")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a145abf1
    • M
      powerpc/nohash: Fix build break with 64K pages · 8017ea35
      Michael Ellerman 提交于
      Commit 74701d59 "powerpc/mm: Rename function to indicate we are
      allocating fragments" renamed page_table_free() to pte_fragment_free().
      One occurrence was mistyped as pte_fragment_fre().
      
      This only breaks the nohash 64K page build, which is not the default or
      enabled in any defconfig.
      
      Fixes: 74701d59 ("powerpc/mm: Rename function to indicate we are allocating fragments")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8017ea35
  14. 31 5月, 2016 2 次提交
  15. 20 5月, 2016 1 次提交
    • H
      arch: fix has_transparent_hugepage() · fd8cfd30
      Hugh Dickins 提交于
      I've just discovered that the useful-sounding has_transparent_hugepage()
      is actually an architecture-dependent minefield: on some arches it only
      builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when
      not, but on some of those (arm and arm64) it then gives the wrong
      answer; and on mips alone it's marked __init, which would crash if
      called later (but so far it has not been called later).
      
      Straighten this out: make it available to all configs, with a sensible
      default in asm-generic/pgtable.h, removing its definitions from those
      arches (arc, arm, arm64, sparc, tile) which are served by the default,
      adding #define has_transparent_hugepage has_transparent_hugepage to
      those (mips, powerpc, s390, x86) which need to override the default at
      runtime, and removing the __init from mips (but maybe that kind of code
      should be avoided after init: set a static variable the first time it's
      called).
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: Vineet Gupta <vgupta@synopsys.com>		[arch/arc]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[arch/s390]
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd8cfd30
  16. 13 5月, 2016 1 次提交
    • C
      KVM: halt_polling: provide a way to qualify wakeups during poll · 3491caf2
      Christian Borntraeger 提交于
      Some wakeups should not be considered a sucessful poll. For example on
      s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
      would be considered runnable - letting all vCPUs poll all the time for
      transactional like workload, even if one vCPU would be enough.
      This can result in huge CPU usage for large guests.
      This patch lets architectures provide a way to qualify wakeups if they
      should be considered a good/bad wakeups in regard to polls.
      
      For s390 the implementation will fence of halt polling for anything but
      known good, single vCPU events. The s390 implementation for floating
      interrupts does a wakeup for one vCPU, but the interrupt will be delivered
      by whatever CPU checks first for a pending interrupt. We prefer the
      woken up CPU by marking the poll of this CPU as "good" poll.
      This code will also mark several other wakeup reasons like IPI or
      expired timers as "good". This will of course also mark some events as
      not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
      we prefer to not poll, unless we are really sure, though.
      
      This patch successfully limits the CPU usage for cases like uperf 1byte
      transactional ping pong workload or wakeup heavy workload like OLTP
      while still providing a proper speedup.
      
      This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
      wakeups that are considered not good for polling.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
      Cc: David Matlack <dmatlack@google.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      [Rename config symbol. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3491caf2
  17. 12 5月, 2016 1 次提交
    • G
      kvm: introduce KVM_MAX_VCPU_ID · 0b1b1dfd
      Greg Kurz 提交于
      The KVM_MAX_VCPUS define provides the maximum number of vCPUs per guest, and
      also the upper limit for vCPU ids. This is okay for all archs except PowerPC
      which can have higher ids, depending on the cpu/core/thread topology. In the
      worst case (single threaded guest, host with 8 threads per core), it limits
      the maximum number of vCPUS to KVM_MAX_VCPUS / 8.
      
      This patch separates the vCPU numbering from the total number of vCPUs, with
      the introduction of KVM_MAX_VCPU_ID, as the maximal valid value for vCPU ids
      plus one.
      
      The corresponding KVM_CAP_MAX_VCPU_ID allows userspace to validate vCPU ids
      before passing them to KVM_CREATE_VCPU.
      
      This patch only implements KVM_MAX_VCPU_ID with a specific value for PowerPC.
      Other archs continue to return KVM_MAX_VCPUS instead.
      Suggested-by: NRadim Krcmar <rkrcmar@redhat.com>
      Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0b1b1dfd
  18. 11 5月, 2016 3 次提交