1. 03 6月, 2018 2 次提交
    • A
      powerpc: use time64_t in read_persistent_clock · 5bfd6435
      Arnd Bergmann 提交于
      Looking through the remaining users of the deprecated mktime()
      function, I found the powerpc rtc handlers, which use it in
      place of rtc_tm_to_time64().
      
      To clean this up, I'm changing over the read_persistent_clock()
      function to the read_persistent_clock64() variant, and change
      all the platform specific handlers along with it.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5bfd6435
    • N
      powerpc/powernv: call OPAL_QUIESCE before OPAL_SIGNAL_SYSTEM_RESET · ee03b9b4
      Nicholas Piggin 提交于
      Although it is often possible to recover a CPU that was interrupted
      from OPAL with a system reset NMI, it's undesirable to interrupt them
      for a few reasons. Firstly because dump/debug code itself needs to
      call firmware, so it could hang on a lock or possibly corrupt a
      per-cpu data structure if it or another CPU was interrupted from
      OPAL. Secondly, the kexec crash dump code will not return from
      interrupt to unwind the OPAL call.
      
      Call OPAL_QUIESCE with QUIESCE_HOLD before sending an NMI IPI to
      another CPU, which wait for it to leave firmware (or time out) to
      avoid this problem in normal conditions. Firmware bugs may still
      result in a timeout and interrupting OPAL, but that is the best
      option (stops the CPU, and possibly allows firmware to be debugged).
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ee03b9b4
  2. 21 5月, 2018 1 次提交
  3. 10 4月, 2018 1 次提交
    • N
      powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops · 34dd25de
      Nicholas Piggin 提交于
      This is the start of an effort to tidy up and standardise all the
      delays. Existing loops have a range of delay/sleep periods from 1ms
      to 20ms, and some have no delay. They all loop forever except rtc,
      which times out after 10 retries, and that uses 10ms delays. So use
      10ms as our standard delay. The OPAL maintainer agrees 10ms is a
      reasonable starting point.
      
      The idea is to use the same recipe everywhere, once this is proven to
      work then it will be documented as an OPAL API standard. Then both
      firmware and OS can agree, and if a particular call needs something
      else, then that can be documented with reasoning.
      
      This is not the end-all of this effort, it's just a relatively easy
      change that fixes some existing high latency delays. There should be
      provision for standardising timeouts and/or interruptible loops where
      possible, so non-fatal firmware errors don't cause hangs.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      34dd25de
  4. 03 4月, 2018 1 次提交
    • N
      powerpc/powernv: Always stop secondaries before reboot/shutdown · f2748bdf
      Nicholas Piggin 提交于
      Currently powernv reboot and shutdown requests just leave secondaries
      to do their own things. This is undesirable because they can trigger
      any number of watchdogs while waiting for reboot, but also we don't
      know what else they might be doing -- they might be causing trouble,
      trampling memory, etc.
      
      The opal scheduled flash update code already ran into watchdog problems
      due to flashing taking a long time, and it was fixed with 2196c6f1
      ("powerpc/powernv: Return secondary CPUs to firmware before FW update"),
      which returns secondaries to opal. It's been found that regular reboots
      can take over 10 seconds, which can result in the hard lockup watchdog
      firing,
      
        reboot: Restarting system
        [  360.038896709,5] OPAL: Reboot request...
        Watchdog CPU:0 Hard LOCKUP
        Watchdog CPU:44 detected Hard LOCKUP other CPUS:16
        Watchdog CPU:16 Hard LOCKUP
        watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0]
      
      This patch removes the special case for flash update, and calls
      smp_send_stop in all cases before calling reboot/shutdown.
      
      smp_send_stop could return CPUs to OPAL, the main reason not to is
      that the request could come from a NMI that interrupts OPAL code,
      so re-entry to OPAL can cause a number of problems. Putting
      secondaries into simple spin loops improves the chances of a
      successful reboot.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f2748bdf
  5. 13 3月, 2018 1 次提交
  6. 24 1月, 2018 1 次提交
    • F
      powerpc/powernv: Add opal calls for opencapi · 74d656d2
      Frederic Barrat 提交于
      Add opal calls to interact with the NPU:
      
      OPAL_NPU_SPA_SETUP: set the Shared Process Area (SPA)
      The SPA is a table containing one entry (Process Element) per memory
      context which can be accessed by the opencapi device.
      
      OPAL_NPU_SPA_CLEAR_CACHE: clear the context cache
      The NPU keeps a cache of recently accessed memory contexts. When a
      Process Element is removed from the SPA, the cache for the link must
      be cleared.
      
      OPAL_NPU_TL_SET: configure the Transaction Layer
      The Transaction Layer specification defines several templates for
      messages to be exchanged on the link. During link setup, the host and
      device must negotiate what templates are supported on both sides and
      at what rates those messages can be sent.
      Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Acked-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      74d656d2
  7. 06 11月, 2017 2 次提交
  8. 04 10月, 2017 1 次提交
    • N
      powerpc/powernv: Implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESET · e36d0a2e
      Nicholas Piggin 提交于
      This allows MSR[EE]=0 lockups to be detected on an OPAL (bare metal)
      system similarly to the hcall NMI IPI on pseries guests, when the
      platform/firmware supports it.
      
      This is an example of CPU10 spinning with interrupts hard disabled:
      
        Watchdog CPU:32 detected Hard LOCKUP other CPUS:10
        Watchdog CPU:10 Hard LOCKUP
        CPU: 10 PID: 4410 Comm: bash Not tainted 4.13.0-rc7-00074-ge89ce1f8-dirty #34
        task: c0000003a82b4400 task.stack: c0000003af55c000
        NIP: c0000000000a7b38 LR: c000000000659044 CTR: c0000000000a7b00
        REGS: c00000000fd23d80 TRAP: 0100   Not tainted  (4.13.0-rc7-00074-ge89ce1f8-dirty)
        MSR: 90000000000c1033 <SF,HV,ME,IR,DR,RI,LE>
        CR: 28422222  XER: 20000000
        CFAR: c0000000000a7b38 SOFTE: 0
        GPR00: c000000000659044 c0000003af55fbb0 c000000001072a00 0000000000000078
        GPR04: c0000003c81b5c80 c0000003c81cc7e8 9000000000009033 0000000000000000
        GPR08: 0000000000000000 c0000000000a7b00 0000000000000001 9000000000001003
        GPR12: c0000000000a7b00 c00000000fd83200 0000000010180df8 0000000010189e60
        GPR16: 0000000010189ed8 0000000010151270 000000001018bd88 000000001018de78
        GPR20: 00000000370a0668 0000000000000001 00000000101645e0 0000000010163c10
        GPR24: 00007fffd14d6294 00007fffd14d6290 c000000000fba6f0 0000000000000004
        GPR28: c000000000f351d8 0000000000000078 c000000000f4095c 0000000000000000
        NIP [c0000000000a7b38] sysrq_handle_xmon+0x38/0x40
        LR [c000000000659044] __handle_sysrq+0xe4/0x270
        Call Trace:
        [c0000003af55fbd0] [c000000000659044] __handle_sysrq+0xe4/0x270
        [c0000003af55fc70] [c000000000659810] write_sysrq_trigger+0x70/0xa0
        [c0000003af55fca0] [c0000000003da650] proc_reg_write+0xb0/0x110
        [c0000003af55fcf0] [c0000000003423bc] __vfs_write+0x6c/0x1b0
        [c0000003af55fd90] [c000000000344398] vfs_write+0xd8/0x240
        [c0000003af55fde0] [c00000000034632c] SyS_write+0x6c/0x110
        [c0000003af55fe30] [c00000000000b220] system_call+0x58/0x6c
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Use kernel types for opal_signal_system_reset()]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e36d0a2e
  9. 31 8月, 2017 1 次提交
    • N
      powerpc/powernv: Flush console before platform error reboot · b746e3e0
      Nicholas Piggin 提交于
      Unrecovered MCE and HMI errors are sent through a special restart OPAL
      call to log the platform error. The downside is that they don't go
      through normal Linux crash paths, so they don't give much information
      to the Linux console.
      
      Change this by providing a special crash function which does some of
      the console flushing from the panic() path before calling firmware to
      reboot.
      
      The downside of this is a little more code to execute before reaching
      the firmware reboot. However in practice, it's critical to get the
      Linux console messages output in order to debug a problem. So this is
      a desirable tradeoff.
      
      Note on the implementation: It is difficult to plumb a custom reboot
      handler into the panic path, because panic does a little bit too much
      work. For example, it will try to delay with the timebase, but that
      may be corrupted in some cases resulting in a hang without reaching
      the platform reboot. Another problem is that panic can invoke the
      crash dump code which is not what we want in the case of a hardware
      platform error. Long-term the best solution will be to rework the
      panic path so it can be suitable for this kind of panic, but for now
      we just duplicate a bit of the code.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b746e3e0
  10. 10 8月, 2017 3 次提交
  11. 08 8月, 2017 1 次提交
    • F
      powerpc/powernv: Enable PCI peer-to-peer · 25529100
      Frederic Barrat 提交于
      P9 has support for PCI peer-to-peer, enabling a device to write in the
      MMIO space of another device directly, without interrupting the CPU.
      
      This patch adds support for it on powernv, by adding a new API to be
      called by drivers. The pnv_pci_set_p2p(...) call configures an
      'initiator', i.e the device which will issue the MMIO operation, and a
      'target', i.e. the device on the receiving side.
      
      P9 really only supports MMIO stores for the time being but that's
      expected to change in the future, so the API allows to define both
      load and store operations.
      
        /* PCI p2p descriptor */
        #define OPAL_PCI_P2P_ENABLE           0x1
        #define OPAL_PCI_P2P_LOAD             0x2
        #define OPAL_PCI_P2P_STORE            0x4
      
        int pnv_pci_set_p2p(struct pci_dev *initiator, struct pci_dev *target,
                            u64 desc)
      
      It uses a new OPAL call, as the configuration magic is done on the
      PHBs by skiboot.
      Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: NRussell Currey <ruscur@russell.cc>
      [mpe: Drop unrelated OPAL calls, s/uint64_t/u64/, minor formatting]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      25529100
  12. 24 7月, 2017 1 次提交
  13. 06 4月, 2017 1 次提交
  14. 04 4月, 2017 1 次提交
    • A
      powerpc/powernv: Introduce address translation services for Nvlink2 · 1ab66d1f
      Alistair Popple 提交于
      Nvlink2 supports address translation services (ATS) allowing devices
      to request address translations from an mmu known as the nest MMU
      which is setup to walk the CPU page tables.
      
      To access this functionality certain firmware calls are required to
      setup and manage hardware context tables in the nvlink processing unit
      (NPU). The NPU also manages forwarding of TLB invalidates (known as
      address translation shootdowns/ATSDs) to attached devices.
      
      This patch exports several methods to allow device drivers to register
      a process id (PASID/PID) in the hardware tables and to receive
      notification of when a device should stop issuing address translation
      requests (ATRs). It also adds a fault handler to allow device drivers
      to demand fault pages in.
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      [mpe: Fix up comment formatting, use flush_tlb_mm()]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1ab66d1f
  15. 07 2月, 2017 1 次提交
  16. 30 1月, 2017 1 次提交
    • A
      powerpc/powernv: Initialise nest mmu · 1d0761d2
      Alistair Popple 提交于
      POWER9 contains an off core mmu called the nest mmu (NMMU). This is
      used by other hardware units on the chip to translate virtual
      addresses into real addresses. The unit attempting an address
      translation provides the majority of the context required for the
      translation request except for the base address of the partition table
      (ie. the PTCR) which needs to be programmed into the NMMU.
      
      This patch adds a call to OPAL to set the PTCR for the nest mmu in
      opal_init().
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1d0761d2
  17. 23 11月, 2016 1 次提交
  18. 12 9月, 2016 1 次提交
    • P
      KVM: PPC: Book3S HV: Set server for passed-through interrupts · 5d375199
      Paul Mackerras 提交于
      When a guest has a PCI pass-through device with an interrupt, it
      will direct the interrupt to a particular guest VCPU.  In fact the
      physical interrupt might arrive on any CPU, and then get
      delivered to the target VCPU in the emulated XICS (guest interrupt
      controller), and eventually delivered to the target VCPU.
      
      Now that we have code to handle device interrupts in real mode
      without exiting to the host kernel, there is an advantage to having
      the device interrupt arrive on the same sub(core) as the target
      VCPU is running on.  In this situation, the interrupt can be
      delivered to the target VCPU without any exit to the host kernel
      (using a hypervisor doorbell interrupt between threads if
      necessary).
      
      This patch aims to get passed-through device interrupts arriving
      on the correct core by setting the interrupt server in the real
      hardware XICS for the interrupt to the first thread in the (sub)core
      where its target VCPU is running.  We do this in the real-mode H_EOI
      code because the H_EOI handler already needs to look at the
      emulated ICS state for the interrupt (whereas the H_XIRR handler
      doesn't), and we know we are running in the target VCPU context
      at that point.
      
      We set the server CPU in hardware using an OPAL call, regardless of
      what the IRQ affinity mask for the interrupt says, and without
      updating the affinity mask.  This amounts to saying that when an
      interrupt is passed through to a guest, as a matter of policy we
      allow the guest's affinity for the interrupt to override the host's.
      
      This is inspired by an earlier patch from Suresh Warrier, although
      none of this code came from that earlier patch.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      5d375199
  19. 21 7月, 2016 1 次提交
  20. 17 7月, 2016 1 次提交
  21. 15 7月, 2016 1 次提交
  22. 08 7月, 2016 1 次提交
  23. 29 6月, 2016 2 次提交
  24. 21 6月, 2016 2 次提交
  25. 09 2月, 2016 1 次提交
  26. 13 1月, 2016 1 次提交
  27. 27 12月, 2015 1 次提交
    • R
      powerpc/powernv: Add a kmsg_dumper that flushes console output on panic · affddff6
      Russell Currey 提交于
      On BMC machines, console output is controlled by the OPAL firmware and is
      only flushed when its pollers are called.  When the kernel is in a panic
      state, it no longer calls these pollers and thus console output does not
      completely flush, causing some output from the panic to be lost.
      
      Output is only actually lost when the kernel is configured to not power off
      or reboot after panic (i.e. CONFIG_PANIC_TIMEOUT is set to 0) since OPAL
      flushes the console buffer as part of its power down routines.  Before this
      patch, however, only partial output would be printed during the timeout wait.
      
      This patch adds a new kmsg_dumper which gets called at panic time to ensure
      panic output is not lost.  It accomplishes this by calling OPAL_CONSOLE_FLUSH
      in the OPAL API, and if that is not available, the pollers are called enough
      times to (hopefully) completely flush the buffer.
      
      The flushing mechanism will only affect output printed at and before the
      kmsg_dump call in kernel/panic.c:panic().  As such, the "end Kernel panic"
      message may still be truncated as follows:
      
      >Call Trace:
      >[c000000f1f603b00] [c0000000008e9458] dump_stack+0x90/0xbc (unreliable)
      >[c000000f1f603b30] [c0000000008e7e78] panic+0xf8/0x2c4
      >[c000000f1f603bc0] [c000000000be4860] mount_block_root+0x288/0x33c
      >[c000000f1f603c80] [c000000000be4d14] prepare_namespace+0x1f4/0x254
      >[c000000f1f603d00] [c000000000be43e8] kernel_init_freeable+0x318/0x350
      >[c000000f1f603dc0] [c00000000000bd74] kernel_init+0x24/0x130
      >[c000000f1f603e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac
      >---[ end Kernel panic - not
      
      This functionality is implemented as a kmsg_dumper as it seems to be the
      most sensible way to introduce platform-specific functionality to the
      panic function.
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      affddff6
  28. 20 8月, 2015 1 次提交
  29. 06 8月, 2015 1 次提交
  30. 16 7月, 2015 1 次提交
  31. 05 6月, 2015 1 次提交
  32. 22 5月, 2015 3 次提交
    • A
      powerpc/powernv: Add a virtual irqchip for opal events · 9f0fd049
      Alistair Popple 提交于
      Whenever an interrupt is received for opal the linux kernel gets a
      bitfield indicating certain events that have occurred and need handling
      by the various device drivers. Currently this is handled using a
      notifier interface where we call every device driver that has
      registered to receive opal events.
      
      This approach has several drawbacks. For example each driver has to do
      its own checking to see if the event is relevant as well as event
      masking. There is also no easy method of recording the number of times
      we receive particular events.
      
      This patch solves these issues by exposing opal events via the
      standard interrupt APIs by adding a new interrupt chip and
      domain. Drivers can then register for the appropriate events using
      standard kernel calls such as irq_of_parse_and_map().
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9f0fd049
    • A
      powerpc/powernv: Reorder OPAL subsystem initialisation · 96e023e7
      Alistair Popple 提交于
      Most of the OPAL subsystems are always compiled in for PowerNV and
      many of them need to be initialised before or after other OPAL
      subsystems. Rather than trying to control this ordering through
      machine initcalls it is clearer and easier to control initialisation
      order with explicit calls in opal_init.
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Cc: Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      96e023e7
    • S
      powerpc/powernv: Introduce sysfs control for fastsleep workaround behavior · 5703d2f4
      Shreyas B. Prabhu 提交于
      Fastsleep is one of the idle state which cpuidle subsystem currently
      uses on power8 machines. In this state L2 cache is brought down to a
      threshold voltage. Therefore when the core is in fastsleep, the
      communication between L2 and L3 needs to be fenced. But there is a bug
      in the current power8 chips surrounding this fencing.
      
      OPAL provides a workaround which precludes the possibility of hitting
      this bug. But running with this workaround applied causes checkstop
      if any correctable error in L2 cache directory is detected. Hence OPAL
      also provides a way to undo the workaround.
      
      In the existing implementation, workaround is applied by the last thread
      of the core entering fastsleep and undone by the first thread waking up.
      But this has a performance cost. These OPAL calls account for roughly
      4000 cycles everytime the core has to enter or wakeup from fastsleep.
      
      This patch introduces a sysfs attribute (fastsleep_workaround_applyonce)
      to choose the behavior of this workaround.
      
      By default, fastsleep_workaround_applyonce = 0. In this case, workaround
      is applied/undone everytime the core enters/exits fastsleep.
      
      fastsleep_workaround_applyonce = 1. In this case the workaround is
      applied once on all the cores and never undone. This can be triggered by
      echo 1 > /sys/devices/system/cpu/fastsleep_workaround_applyonce
      
      For simplicity this attribute can be modified only once. Implying, once
      fastsleep_workaround_applyonce is changed to 1, it cannot be reverted
      to the default state.
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5703d2f4