1. 04 3月, 2015 1 次提交
  2. 04 2月, 2015 1 次提交
  3. 02 2月, 2015 1 次提交
    • R
      cxl: Fix device_node reference counting · 6f963ec2
      Ryan Grimm 提交于
      When unbinding and rebinding the driver on a system with a card in PHB0, this
      error condition is reached after a few attempts:
      
      ERROR: Bad of_node_put() on /pciex@3fffe40000000
      CPU: 0 PID: 3040 Comm: bash Not tainted 3.18.0-rc3-12545-g3627ffe #152
      Call Trace:
      [c000000721acb5c0] [c00000000086ef94] .dump_stack+0x84/0xb0 (unreliable)
      [c000000721acb640] [c00000000073a0a8] .of_node_release+0xd8/0xe0
      [c000000721acb6d0] [c00000000044bc44] .kobject_release+0x74/0xe0
      [c000000721acb760] [c0000000007394fc] .of_node_put+0x1c/0x30
      [c000000721acb7d0] [c000000000545cd8] .cxl_probe+0x1a98/0x1d50
      [c000000721acb900] [c0000000004845a0] .local_pci_probe+0x40/0xc0
      [c000000721acb980] [c000000000484998] .pci_device_probe+0x128/0x170
      [c000000721acba30] [c00000000052400c] .driver_probe_device+0xac/0x2a0
      [c000000721acbad0] [c000000000522468] .bind_store+0x108/0x160
      [c000000721acbb70] [c000000000521448] .drv_attr_store+0x38/0x60
      [c000000721acbbe0] [c000000000293840] .sysfs_kf_write+0x60/0xa0
      [c000000721acbc50] [c000000000292500] .kernfs_fop_write+0x140/0x1d0
      [c000000721acbcf0] [c000000000208648] .vfs_write+0xd8/0x260
      [c000000721acbd90] [c000000000208b18] .SyS_write+0x58/0x100
      [c000000721acbe30] [c000000000009258] syscall_exit+0x0/0x98
      
      We are missing a call to of_node_get(). pnv_pci_to_phb_node() should
      call of_node_get() otherwise np's reference count isn't incremented and
      it might go away. Rename pnv_pci_to_phb_node() to pnv_pci_get_phb_node()
      so it's clear it calls of_node_get().
      Signed-off-by: NRyan Grimm <grimm@linux.vnet.ibm.com>
      Acked-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6f963ec2
  4. 31 1月, 2015 1 次提交
  5. 30 1月, 2015 2 次提交
  6. 28 1月, 2015 4 次提交
  7. 27 1月, 2015 2 次提交
    • P
      powerpc/powernv: Skip registering log region when CONFIG_PRINTK=n · 6501ab5e
      Pranith Kumar 提交于
      When CONFIG_PRINTK=n, log_buf_addr_get() returns NULL and log_buf_len_get()
      return 0. Check for these return values and skip registering the dump buffer.
      Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
      Reviewed-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6501ab5e
    • C
      powerpc/pseries: Fix endian problems with LE migration · 3df76a9d
      Cyril Bur 提交于
      RTAS events require arguments be passed in big endian while hypercalls
      have their arguments passed in registers and the values should therefore
      be in CPU endian.
      
      The "ibm,suspend_me" 'RTAS' call makes a sequence of hypercalls to setup
      one true RTAS call. This means that "ibm,suspend_me" is handled
      specially in the ppc_rtas() syscall.
      
      The ppc_rtas() syscall has its arguments in big endian and can therefore
      pass these arguments directly to the RTAS call. "ibm,suspend_me" is
      handled specially from within ppc_rtas() (by calling rtas_ibm_suspend_me())
      which has left an endian bug on little endian systems due to the
      requirement of hypercalls. The return value from rtas_ibm_suspend_me()
      gets returned in cpu endian, and is left unconverted, also a bug on
      little endian systems.
      
      rtas_ibm_suspend_me() does not actually make use of the rtas_args that
      it is passed. This patch removes the convoluted use of the rtas_args
      struct to pass params to rtas_ibm_suspend_me() in favour of passing what
      it needs as actual arguments. This patch also ensures the two callers of
      rtas_ibm_suspend_me() pass function parameters in cpu endian and in the
      case of ppc_rtas(), converts the return value.
      
      migrate_store() (the other caller of rtas_ibm_suspend_me()) is from a
      sysfs file which deals with everything in cpu endian so this function
      only underwent cleanup.
      
      This patch has been tested with KVM both LE and BE and on PowerVM both
      LE and BE. Under QEMU/KVM the migration happens without touching these
      code pathes.
      
      For PowerVM there is no obvious regression on BE and the LE code path
      now provides the correct parameters to the hypervisor.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3df76a9d
  8. 23 1月, 2015 7 次提交
  9. 22 1月, 2015 5 次提交
    • R
      cxl: Enable CAPP recovery · 1212aa1c
      Ryan Grimm 提交于
      Turning snoops on is the last step in CAPP recovery. Sapphire is expected to
      have reinitialized the PHB and done the previous recovery steps.
      
      Add mode argument to opal call to do this. Driver can turn snoops off although
      it does not currently.
      Signed-off-by: NRyan Grimm <grimm@linux.vnet.ibm.com>
      Acked-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1212aa1c
    • G
      powerpc/ps3: Write highmem info to repository · 5ae74630
      Geoff Levand 提交于
      Add calls to the ps3_mm_set_repository_highmem() routine when the ps3
      r1 highmem region is either created or destroyed.
      Signed-off-by: NGeoff Levand <geoff@infradead.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5ae74630
    • G
      powerpc/ps3: Add ps3_mm_set_repository_highmem · d4b18bd6
      Geoff Levand 提交于
      Add the new routine ps3_mm_set_repository_highmem() that saves highmem info to
      the LV1 hypervisor registry so that the info will be available to second stage
      OS's loaded by petitboot/kexec. FreeBSD and some Linux derivatives use
      this feature.
      
      Also, move the existing ps3_mm_get_repository_highmem() routine up in
      the source file.
      
      This implementation of ps3_mm_set_repository_highmem() assumes the repository
      will have a single highmem region entry (at index 0).
      Signed-off-by: NGeoff Levand <geoff@infradead.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d4b18bd6
    • G
      powerpc/ps3: Add empty repository highmem routines · c02d3506
      Geoff Levand 提交于
      To avoid the need for preprocessor conditionals in C source files add a set of
      empty inline repository highmem write routines to platform.h that are used when
      CONFIG_PS3_REPOSITORY_WRITE is not defined.
      Signed-off-by: NGeoff Levand <geoff@infradead.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c02d3506
    • S
      powerpc/powernv: Restore LPCR with LPCR_PECE1 cleared · 0eb13208
      Shreyas B. Prabhu 提交于
      LPCR_PECE1 bit controls whether decrementer interrupts are allowed to
      cause exit from power-saving mode. While waking up from winkle, restoring
      LPCR with LPCR_PECE1 set (i.e Decrementer interrupts allowed) can cause
      issue in the following scenario:
      
      - All the threads in a core are offlined. The core enters deep winkle.
      - Spurious interrupt wakes up a thread in the core. Here LPCR is restored
        with LPCR_PECE1 bit set.
      - Since it was a spurious interrupt on a offline thread, the thread clears
        the interrupt and goes back to winkle.
      - Here before the thread executes winkle and puts the core into deep winkle,
        if a decrementer interrupt occurs on any of the sibling threads in the core
        that thread wakes up.
      - Since in offline loop we are flushing interrupt only in case of external
        interrupt, the decrementer interrupt does not get flushed. So at this stage
        the thread is stuck in this is loop of waking up at 0x100 due to decrementer
        interrupt, not flushing the interrupt as only external interrupts get flushed,
        entering winkle, waking up at 0x100 again.
      
      Fix this by programming PORE to restore LPCR with LPCR_PECE1 bit
      cleared when waking up from winkle.
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0eb13208
  10. 21 1月, 2015 1 次提交
  11. 12 1月, 2015 1 次提交
    • A
      powernv: Fix OPAL tracepoint code · bfe5fda8
      Anton Blanchard 提交于
      Patch c49f6353 ("powernv: Add OPAL tracepoints") has a spurious
      store to the stack:
      
      	ld      r12,opal_tracepoint_refcount@toc(r2);           \
      	std     r12,32(r1);                                     \
      
      The store was originally used to save the current tracepoint status
      so the entry and the exit tracepoints were always balanced. In the
      end I just created a separate path when tracepoints are enabled.
      
      The offset on the stack used for this store is not valid for ABIv2
      and it causes strange issues. I noticed it because OPAL console input
      was broken.
      
      Fixes: c49f6353 ("powernv: Add OPAL tracepoints")
      Cc: <stable@vger.kernel.org> # v3.17+
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bfe5fda8
  12. 29 12月, 2014 3 次提交
  13. 18 12月, 2014 1 次提交
  14. 15 12月, 2014 4 次提交
    • B
      powerpc/powernv: Expose OPAL firmware symbol map · c8742f85
      Benjamin Herrenschmidt 提交于
      Newer versions of OPAL will provide this, so let's expose it to user
      space so tools like perf can use it to properly decode samples in
      firmware space.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c8742f85
    • S
      powernv/powerpc: Add winkle support for offline cpus · 77b54e9f
      Shreyas B. Prabhu 提交于
      Winkle is a deep idle state supported in power8 chips. A core enters
      winkle when all the threads of the core enter winkle. In this state
      power supply to the entire chiplet i.e core, private L2 and private L3
      is turned off. As a result it gives higher powersavings compared to
      sleep.
      
      But entering winkle results in a total hypervisor state loss. Hence the
      hypervisor context has to be preserved before entering winkle and
      restored upon wake up.
      
      Power-on Reset Engine (PORE) is a dedicated engine which is responsible
      for powering on the chiplet during wake up. It can be programmed to
      restore the register contests of a few specific registers. This patch
      uses PORE to restore register state wherever possible and uses stack to
      save and restore rest of the necessary registers.
      
      With hypervisor state restore things fall under three categories-
      per-core state, per-subcore state and per-thread state. To manage this,
      extend the infrastructure introduced for sleep. Mainly we add a paca
      variable subcore_sibling_mask. Using this and the core_idle_state we can
      distingush first thread in core and subcore.
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      77b54e9f
    • S
      powernv/cpuidle: Redesign idle states management · 7cba160a
      Shreyas B. Prabhu 提交于
      Deep idle states like sleep and winkle are per core idle states. A core
      enters these states only when all the threads enter either the
      particular idle state or a deeper one. There are tasks like fastsleep
      hardware bug workaround and hypervisor core state save which have to be
      done only by the last thread of the core entering deep idle state and
      similarly tasks like timebase resync, hypervisor core register restore
      that have to be done only by the first thread waking up from these
      state.
      
      The current idle state management does not have a way to distinguish the
      first/last thread of the core waking/entering idle states. Tasks like
      timebase resync are done for all the threads. This is not only is
      suboptimal, but can cause functionality issues when subcores and kvm is
      involved.
      
      This patch adds the necessary infrastructure to track idle states of
      threads in a per-core structure. It uses this info to perform tasks like
      fastsleep workaround and timebase resync only once per core.
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Originally-by: NPreeti U. Murthy <preeti@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: linux-pm@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7cba160a
    • S
      powerpc/powernv: Enable Offline CPUs to enter deep idle states · 8eb8ac89
      Shreyas B. Prabhu 提交于
      The secondary threads should enter deep idle states so as to gain maximum
      powersavings when the entire core is offline. To do so the offline path
      must be made aware of the available deepest idle state. Hence probe the
      device tree for the possible idle states in powernv core code and
      expose the deepest idle state through flags.
      
      Since the  device tree is probed by the cpuidle driver as well, move
      the parameters required to discover the idle states into an appropriate
      common place to both the driver and the powernv core code.
      
      Another point is that fastsleep idle state may require workarounds in
      the kernel to function properly. This workaround is introduced in the
      subsequent patches. However neither the cpuidle driver or the hotplug
      path need be bothered about this workaround.
      
      They will be taken care of by the core powernv code.
      Originally-by: NSrivatsa S. Bhat <srivatsa@mit.edu>
      Signed-off-by: NPreeti U. Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Reviewed-by: NPaul Mackerras <paulus@samba.org>
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: linux-pm@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8eb8ac89
  15. 14 12月, 2014 1 次提交
  16. 11 12月, 2014 1 次提交
  17. 08 12月, 2014 1 次提交
    • P
      powerpc/powernv: Return to cpu offline loop when finished in KVM guest · 56548fc0
      Paul Mackerras 提交于
      When a secondary hardware thread has finished running a KVM guest, we
      currently put that thread into nap mode using a nap instruction in
      the KVM code.  This changes the code so that instead of doing a nap
      instruction directly, we instead cause the call to power7_nap() that
      put the thread into nap mode to return.  The reason for doing this is
      to avoid having the KVM code having to know what low-power mode to
      put the thread into.
      
      In the case of a secondary thread used to run a KVM guest, the thread
      will be offline from the point of view of the host kernel, and the
      relevant power7_nap() call is the one in pnv_smp_cpu_disable().
      In this case we don't want to clear pending IPIs in the offline loop
      in that function, since that might cause us to miss the wakeup for
      the next time the thread needs to run a guest.  To tell whether or
      not to clear the interrupt, we use the SRR1 value returned from
      power7_nap(), and check if it indicates an external interrupt.  We
      arrange that the return from power7_nap() when we have finished running
      a guest returns 0, so pending interrupts don't get flushed in that
      case.
      
      Note that it is important a secondary thread that has finished
      executing in the guest, or that didn't have a guest to run, should
      not return to power7_nap's caller while the kvm_hstate.hwthread_req
      flag in the PACA is non-zero, because the return from power7_nap
      will reenable the MMU, and the MMU might still be in guest context.
      In this situation we spin at low priority in real mode waiting for
      hwthread_req to become zero.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      56548fc0
  18. 05 12月, 2014 1 次提交
    • A
      powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault · aefa5688
      Aneesh Kumar K.V 提交于
      upatepp can get called for a nohpte fault when we find from the linux
      page table that the translation was hashed before. In that case
      we are sure that there is no existing translation, hence we could
      avoid doing tlbie.
      
      We could possibly race with a parallel fault filling the TLB. But
      that should be ok because updatepp is only ever relaxing permissions.
      We also look at linux pte permission bits when filling hash pte
      permission bits. We also hold the linux pte busy bits while
      inserting/updating a hashpte entry, hence a paralle update of
      linux pte is not possible. On the other hand mprotect involves
      ptep_modify_prot_start which cause a hpte invalidate and not updatepp.
      
      Performance number:
      We use randbox_access_bench written by Anton.
      
      Kernel with THP disabled and smaller hash page table size.
      
          86.60%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_updatepp
           2.10%  random_access_b  random_access_bench              [.] doit
           1.99%  random_access_b  [kernel.kallsyms]                [k] .do_raw_spin_lock
           1.85%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_insert
           1.26%  random_access_b  [kernel.kallsyms]                [k] .native_flush_hash_range
           1.18%  random_access_b  [kernel.kallsyms]                [k] .__delay
           0.69%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_remove
           0.37%  random_access_b  [kernel.kallsyms]                [k] .clear_user_page
           0.34%  random_access_b  [kernel.kallsyms]                [k] .__hash_page_64K
           0.32%  random_access_b  [kernel.kallsyms]                [k] fast_exception_return
           0.30%  random_access_b  [kernel.kallsyms]                [k] .hash_page_mm
      
      With Fix:
      
          27.54%  random_access_b  random_access_bench              [.] doit
          22.90%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_insert
           5.76%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_remove
           5.20%  random_access_b  [kernel.kallsyms]                [k] fast_exception_return
           5.12%  random_access_b  [kernel.kallsyms]                [k] .__hash_page_64K
           4.80%  random_access_b  [kernel.kallsyms]                [k] .hash_page_mm
           3.31%  random_access_b  [kernel.kallsyms]                [k] data_access_common
           1.84%  random_access_b  [kernel.kallsyms]                [k] .trace_hardirqs_on_caller
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      aefa5688
  19. 02 12月, 2014 2 次提交