1. 22 1月, 2018 1 次提交
    • G
      powerpc/powernv: Add ppc_pci_reset_phbs parameter to issue a PHB reset · 45baee14
      Guilherme G. Piccoli 提交于
      During a kdump kernel boot in PowerPC, we request a reset of the PHBs
      to the FW. It makes sense, since if we are booting a kdump kernel it
      means we had some trouble before and we cannot rely in the adapters'
      health; they could be in a bad state, hence the reset is needed.
      
      But this reset is useful not only in kdump - there are situations,
      specially when debugging drivers, that we could break an adapter in
      a way it requires such reset. One can tell to just go ahead and
      reboot the machine, but happens that many times doing kexec is much
      faster, and so preferable than a full power cycle.
      
      This patch adds the ppc_pci_reset_phbs parameter to perform such reset.
      Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      45baee14
  2. 20 1月, 2018 1 次提交
  3. 11 12月, 2017 1 次提交
  4. 04 12月, 2017 1 次提交
  5. 07 11月, 2017 1 次提交
    • A
      powerpc/powernv/ioda: Remove explicit max window size check · 9003a249
      Alexey Kardashevskiy 提交于
      DMA windows can only have a size of power of two on IODA2 hardware and
      using memory_hotplug_max() to determine the upper limit won't work
      correcly if it returns not power of two value.
      
      This removes the check as the platform code does this check in
      pnv_pci_ioda2_setup_default_config() anyway; the other client is VFIO
      and that thing checks against locked_vm limit which prevents the userspace
      from locking too much memory.
      
      It is expected to impact DPDK on machines with non-power-of-two RAM size,
      mostly. KVM guests are less likely to be affected as usually guests get
      less than half of hosts RAM.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9003a249
  6. 06 11月, 2017 1 次提交
  7. 26 9月, 2017 1 次提交
    • B
      powerpc/powernv: Rework EEH initialization on powernv · b9fde58d
      Benjamin Herrenschmidt 提交于
      Remove the post_init callback which is only used
      by powernv, we can just call it explicitly from
      the powernv code.
      
      This partially kills the ability to "disable" eeh at
      runtime via debugfs as this was calling that same
      callback again, but this is both unused and broken
      in several ways. If we want to revive it, we need
      to create a dedicated enable/disable callback on the
      backend that does the right thing.
      
      Let the bulk of eeh initialize normally at
      core_initcall() like it does on pseries by removing
      the hack in eeh_init() that delays it.
      
      Instead we make sure our eeh->probe cleanly bails
      out of the PEs haven't been created yet and we force
      a re-probe where we used to call eeh_init() again.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b9fde58d
  8. 23 8月, 2017 1 次提交
  9. 08 8月, 2017 1 次提交
    • F
      powerpc/powernv: Enable PCI peer-to-peer · 25529100
      Frederic Barrat 提交于
      P9 has support for PCI peer-to-peer, enabling a device to write in the
      MMIO space of another device directly, without interrupting the CPU.
      
      This patch adds support for it on powernv, by adding a new API to be
      called by drivers. The pnv_pci_set_p2p(...) call configures an
      'initiator', i.e the device which will issue the MMIO operation, and a
      'target', i.e. the device on the receiving side.
      
      P9 really only supports MMIO stores for the time being but that's
      expected to change in the future, so the API allows to define both
      load and store operations.
      
        /* PCI p2p descriptor */
        #define OPAL_PCI_P2P_ENABLE           0x1
        #define OPAL_PCI_P2P_LOAD             0x2
        #define OPAL_PCI_P2P_STORE            0x4
      
        int pnv_pci_set_p2p(struct pci_dev *initiator, struct pci_dev *target,
                            u64 desc)
      
      It uses a new OPAL call, as the configuration magic is done on the
      PHBs by skiboot.
      Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: NRussell Currey <ruscur@russell.cc>
      [mpe: Drop unrelated OPAL calls, s/uint64_t/u64/, minor formatting]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      25529100
  10. 28 7月, 2017 1 次提交
  11. 27 6月, 2017 3 次提交
    • R
      powerpc/powernv/pci: Enable 64-bit devices to access >4GB DMA space · 8e3f1b1d
      Russell Currey 提交于
      On PHB3/POWER8 systems, devices can select between two different sections
      of address space, TVE#0 and TVE#1.  TVE#0 is intended for 32bit devices
      that aren't capable of addressing more than 4GB.  Selecting TVE#1 instead,
      with the capability of addressing over 4GB, is performed by setting bit 59
      of a PCI address.
      
      However, some devices aren't capable of addressing at least 59 bits, but
      still want more than 4GB of DMA space.  In order to enable this, reconfigure
      TVE#0 to be suitable for 64-bit devices by allocating memory past the
      initial 4GB that is inaccessible by 64-bit DMAs.
      
      This bypass mode is only enabled if a device requests 4GB or more of DMA
      address space, if the system has PHB3 (POWER8 systems), and if the device
      does not share a PE with any devices from different vendors.
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8e3f1b1d
    • R
      powerpc/powernv/pci: Add helper to check if a PE has a single vendor · a0f98629
      Russell Currey 提交于
      Add a helper that determines if all the devices contained in a given PE
      are all from the same vendor or not.  This can be useful in determining
      if it's okay to make PE-wide changes that may be suitable for some
      devices but not for others.
      
      This is used later in the series.
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a0f98629
    • R
      powerpc/powernv/pci: Dynamically allocate PHB diag data · 5cb1f8fd
      Russell Currey 提交于
      Diagnostic data for PHBs currently works by allocated a fixed-sized buffer.
      This is simple, but either wastes memory (though only a few kilobytes) or
      in the case of PHB4 isn't enough to fit the whole data blob.
      
      For machines that don't describe the diagnostic data size in the device
      tree, use the hardcoded buffer size as before.  For those that do, only
      allocate exactly what's needed.
      
      In the special case of P7IOC (which has two types of diag data), the larger
      should be specified in the device tree.
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5cb1f8fd
  12. 03 5月, 2017 1 次提交
    • A
      powerpc/powernv: Fix TCE kill on NVLink2 · 6b3d12a9
      Alistair Popple 提交于
      Commit 616badd2 ("powerpc/powernv: Use OPAL call for TCE kill on
      NVLink2") forced all TCE kills to go via the OPAL call for
      NVLink2. However the PHB3 implementation of TCE kill was still being
      called directly from some functions which in some circumstances caused
      a machine check.
      
      This patch adds an equivalent IODA2 version of the function which uses
      the correct invalidation method depending on PHB model and changes all
      external callers to use it instead.
      
      Fixes: 616badd2 ("powerpc/powernv: Use OPAL call for TCE kill on NVLink2")
      Cc: stable@vger.kernel.org # v4.11+
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6b3d12a9
  13. 28 4月, 2017 2 次提交
  14. 20 4月, 2017 1 次提交
  15. 11 4月, 2017 1 次提交
  16. 04 4月, 2017 1 次提交
    • A
      powerpc/powernv: Introduce address translation services for Nvlink2 · 1ab66d1f
      Alistair Popple 提交于
      Nvlink2 supports address translation services (ATS) allowing devices
      to request address translations from an mmu known as the nest MMU
      which is setup to walk the CPU page tables.
      
      To access this functionality certain firmware calls are required to
      setup and manage hardware context tables in the nvlink processing unit
      (NPU). The NPU also manages forwarding of TLB invalidates (known as
      address translation shootdowns/ATSDs) to attached devices.
      
      This patch exports several methods to allow device drivers to register
      a process id (PASID/PID) in the hardware tables and to receive
      notification of when a device should stop issuing address translation
      requests (ATRs). It also adds a fault handler to allow device drivers
      to demand fault pages in.
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      [mpe: Fix up comment formatting, use flush_tlb_mm()]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1ab66d1f
  17. 30 3月, 2017 3 次提交
  18. 20 3月, 2017 1 次提交
  19. 09 3月, 2017 2 次提交
    • A
      powerpc/powernv/ioda2: Update iommu table base on ownership change · db08e1d5
      Alexey Kardashevskiy 提交于
      On POWERNV platform, in order to do DMA via IOMMU (i.e. 32bit DMA in
      our case), a device needs an iommu_table pointer set via
      set_iommu_table_base().
      
      The codeflow is:
      - pnv_pci_ioda2_setup_dma_pe()
      	- pnv_pci_ioda2_setup_default_config()
      	- pnv_ioda_setup_bus_dma() [1]
      
      pnv_pci_ioda2_setup_dma_pe() creates IOMMU groups,
      pnv_pci_ioda2_setup_default_config() does default DMA setup,
      pnv_ioda_setup_bus_dma() takes a bus PE (on IODA2, all physical function
      PEs as bus PEs except NPU), walks through all underlying buses and
      devices, adds all devices to an IOMMU group and sets iommu_table.
      
      On IODA2, when VFIO is used, it takes ownership over a PE which means it
      removes all tables and creates new ones (with a possibility of sharing
      them among PEs). So when the ownership is returned from VFIO to
      the kernel, the iommu_table pointer written to a device at [1] is
      stale and needs an update.
      
      This adds an "add_to_group" parameter to pnv_ioda_setup_bus_dma()
      (in fact re-adds as it used to be there a while ago for different
      reasons) to tell the helper if a device needs to be added to
      an IOMMU group with an iommu_table update or just the latter.
      
      This calls pnv_ioda_setup_bus_dma(..., false) from
      pnv_ioda2_release_ownership() so when the ownership is restored,
      32bit DMA can work again for a device. This does the same thing
      on obtaining ownership as the iommu_table point is stale at this point
      anyway and it is safer to have NULL there.
      
      We did not hit this earlier as all tested devices in recent years were
      only using 64bit DMA; the rare exception for this is MPT3 SAS adapter
      which uses both 32bit and 64bit DMA access and it has not been tested
      with VFIO much.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      db08e1d5
    • A
      powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested · 7aafac11
      Alexey Kardashevskiy 提交于
      The IODA2 specification says that a 64 DMA address cannot use top 4 bits
      (3 are reserved and one is a "TVE select"); bottom page_shift bits
      cannot be used for multilevel table addressing either.
      
      The existing IODA2 table allocation code aligns the minimum TCE table
      size to PAGE_SIZE so in the case of 64K system pages and 4K IOMMU pages,
      we have 64-4-12=48 bits. Since 64K page stores 8192 TCEs, i.e. needs
      13 bits, the maximum number of levels is 48/13 = 3 so we physically
      cannot address more and EEH happens on DMA accesses.
      
      This adds a check that too many levels were requested.
      
      It is still possible to have 5 levels in the case of 4K system page size.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7aafac11
  20. 28 2月, 2017 1 次提交
  21. 17 2月, 2017 1 次提交
  22. 07 2月, 2017 1 次提交
  23. 30 1月, 2017 1 次提交
  24. 25 1月, 2017 1 次提交
  25. 22 11月, 2016 2 次提交
  26. 04 10月, 2016 1 次提交
  27. 29 9月, 2016 1 次提交
  28. 23 9月, 2016 2 次提交
  29. 21 9月, 2016 1 次提交
  30. 15 9月, 2016 2 次提交
  31. 14 9月, 2016 1 次提交