1. 26 9月, 2017 1 次提交
    • B
      powerpc/powernv: Rework EEH initialization on powernv · b9fde58d
      Benjamin Herrenschmidt 提交于
      Remove the post_init callback which is only used
      by powernv, we can just call it explicitly from
      the powernv code.
      
      This partially kills the ability to "disable" eeh at
      runtime via debugfs as this was calling that same
      callback again, but this is both unused and broken
      in several ways. If we want to revive it, we need
      to create a dedicated enable/disable callback on the
      backend that does the right thing.
      
      Let the bulk of eeh initialize normally at
      core_initcall() like it does on pseries by removing
      the hack in eeh_init() that delays it.
      
      Instead we make sure our eeh->probe cleanly bails
      out of the PEs haven't been created yet and we force
      a re-probe where we used to call eeh_init() again.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b9fde58d
  2. 23 8月, 2017 1 次提交
  3. 08 8月, 2017 1 次提交
    • F
      powerpc/powernv: Enable PCI peer-to-peer · 25529100
      Frederic Barrat 提交于
      P9 has support for PCI peer-to-peer, enabling a device to write in the
      MMIO space of another device directly, without interrupting the CPU.
      
      This patch adds support for it on powernv, by adding a new API to be
      called by drivers. The pnv_pci_set_p2p(...) call configures an
      'initiator', i.e the device which will issue the MMIO operation, and a
      'target', i.e. the device on the receiving side.
      
      P9 really only supports MMIO stores for the time being but that's
      expected to change in the future, so the API allows to define both
      load and store operations.
      
        /* PCI p2p descriptor */
        #define OPAL_PCI_P2P_ENABLE           0x1
        #define OPAL_PCI_P2P_LOAD             0x2
        #define OPAL_PCI_P2P_STORE            0x4
      
        int pnv_pci_set_p2p(struct pci_dev *initiator, struct pci_dev *target,
                            u64 desc)
      
      It uses a new OPAL call, as the configuration magic is done on the
      PHBs by skiboot.
      Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: NRussell Currey <ruscur@russell.cc>
      [mpe: Drop unrelated OPAL calls, s/uint64_t/u64/, minor formatting]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      25529100
  4. 28 7月, 2017 1 次提交
  5. 27 6月, 2017 3 次提交
    • R
      powerpc/powernv/pci: Enable 64-bit devices to access >4GB DMA space · 8e3f1b1d
      Russell Currey 提交于
      On PHB3/POWER8 systems, devices can select between two different sections
      of address space, TVE#0 and TVE#1.  TVE#0 is intended for 32bit devices
      that aren't capable of addressing more than 4GB.  Selecting TVE#1 instead,
      with the capability of addressing over 4GB, is performed by setting bit 59
      of a PCI address.
      
      However, some devices aren't capable of addressing at least 59 bits, but
      still want more than 4GB of DMA space.  In order to enable this, reconfigure
      TVE#0 to be suitable for 64-bit devices by allocating memory past the
      initial 4GB that is inaccessible by 64-bit DMAs.
      
      This bypass mode is only enabled if a device requests 4GB or more of DMA
      address space, if the system has PHB3 (POWER8 systems), and if the device
      does not share a PE with any devices from different vendors.
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8e3f1b1d
    • R
      powerpc/powernv/pci: Add helper to check if a PE has a single vendor · a0f98629
      Russell Currey 提交于
      Add a helper that determines if all the devices contained in a given PE
      are all from the same vendor or not.  This can be useful in determining
      if it's okay to make PE-wide changes that may be suitable for some
      devices but not for others.
      
      This is used later in the series.
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a0f98629
    • R
      powerpc/powernv/pci: Dynamically allocate PHB diag data · 5cb1f8fd
      Russell Currey 提交于
      Diagnostic data for PHBs currently works by allocated a fixed-sized buffer.
      This is simple, but either wastes memory (though only a few kilobytes) or
      in the case of PHB4 isn't enough to fit the whole data blob.
      
      For machines that don't describe the diagnostic data size in the device
      tree, use the hardcoded buffer size as before.  For those that do, only
      allocate exactly what's needed.
      
      In the special case of P7IOC (which has two types of diag data), the larger
      should be specified in the device tree.
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5cb1f8fd
  6. 03 5月, 2017 1 次提交
    • A
      powerpc/powernv: Fix TCE kill on NVLink2 · 6b3d12a9
      Alistair Popple 提交于
      Commit 616badd2 ("powerpc/powernv: Use OPAL call for TCE kill on
      NVLink2") forced all TCE kills to go via the OPAL call for
      NVLink2. However the PHB3 implementation of TCE kill was still being
      called directly from some functions which in some circumstances caused
      a machine check.
      
      This patch adds an equivalent IODA2 version of the function which uses
      the correct invalidation method depending on PHB model and changes all
      external callers to use it instead.
      
      Fixes: 616badd2 ("powerpc/powernv: Use OPAL call for TCE kill on NVLink2")
      Cc: stable@vger.kernel.org # v4.11+
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6b3d12a9
  7. 28 4月, 2017 2 次提交
  8. 20 4月, 2017 1 次提交
  9. 11 4月, 2017 1 次提交
  10. 04 4月, 2017 1 次提交
    • A
      powerpc/powernv: Introduce address translation services for Nvlink2 · 1ab66d1f
      Alistair Popple 提交于
      Nvlink2 supports address translation services (ATS) allowing devices
      to request address translations from an mmu known as the nest MMU
      which is setup to walk the CPU page tables.
      
      To access this functionality certain firmware calls are required to
      setup and manage hardware context tables in the nvlink processing unit
      (NPU). The NPU also manages forwarding of TLB invalidates (known as
      address translation shootdowns/ATSDs) to attached devices.
      
      This patch exports several methods to allow device drivers to register
      a process id (PASID/PID) in the hardware tables and to receive
      notification of when a device should stop issuing address translation
      requests (ATRs). It also adds a fault handler to allow device drivers
      to demand fault pages in.
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      [mpe: Fix up comment formatting, use flush_tlb_mm()]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1ab66d1f
  11. 30 3月, 2017 3 次提交
  12. 20 3月, 2017 1 次提交
  13. 09 3月, 2017 2 次提交
    • A
      powerpc/powernv/ioda2: Update iommu table base on ownership change · db08e1d5
      Alexey Kardashevskiy 提交于
      On POWERNV platform, in order to do DMA via IOMMU (i.e. 32bit DMA in
      our case), a device needs an iommu_table pointer set via
      set_iommu_table_base().
      
      The codeflow is:
      - pnv_pci_ioda2_setup_dma_pe()
      	- pnv_pci_ioda2_setup_default_config()
      	- pnv_ioda_setup_bus_dma() [1]
      
      pnv_pci_ioda2_setup_dma_pe() creates IOMMU groups,
      pnv_pci_ioda2_setup_default_config() does default DMA setup,
      pnv_ioda_setup_bus_dma() takes a bus PE (on IODA2, all physical function
      PEs as bus PEs except NPU), walks through all underlying buses and
      devices, adds all devices to an IOMMU group and sets iommu_table.
      
      On IODA2, when VFIO is used, it takes ownership over a PE which means it
      removes all tables and creates new ones (with a possibility of sharing
      them among PEs). So when the ownership is returned from VFIO to
      the kernel, the iommu_table pointer written to a device at [1] is
      stale and needs an update.
      
      This adds an "add_to_group" parameter to pnv_ioda_setup_bus_dma()
      (in fact re-adds as it used to be there a while ago for different
      reasons) to tell the helper if a device needs to be added to
      an IOMMU group with an iommu_table update or just the latter.
      
      This calls pnv_ioda_setup_bus_dma(..., false) from
      pnv_ioda2_release_ownership() so when the ownership is restored,
      32bit DMA can work again for a device. This does the same thing
      on obtaining ownership as the iommu_table point is stale at this point
      anyway and it is safer to have NULL there.
      
      We did not hit this earlier as all tested devices in recent years were
      only using 64bit DMA; the rare exception for this is MPT3 SAS adapter
      which uses both 32bit and 64bit DMA access and it has not been tested
      with VFIO much.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      db08e1d5
    • A
      powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested · 7aafac11
      Alexey Kardashevskiy 提交于
      The IODA2 specification says that a 64 DMA address cannot use top 4 bits
      (3 are reserved and one is a "TVE select"); bottom page_shift bits
      cannot be used for multilevel table addressing either.
      
      The existing IODA2 table allocation code aligns the minimum TCE table
      size to PAGE_SIZE so in the case of 64K system pages and 4K IOMMU pages,
      we have 64-4-12=48 bits. Since 64K page stores 8192 TCEs, i.e. needs
      13 bits, the maximum number of levels is 48/13 = 3 so we physically
      cannot address more and EEH happens on DMA accesses.
      
      This adds a check that too many levels were requested.
      
      It is still possible to have 5 levels in the case of 4K system page size.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7aafac11
  14. 28 2月, 2017 1 次提交
  15. 17 2月, 2017 1 次提交
  16. 07 2月, 2017 1 次提交
  17. 30 1月, 2017 1 次提交
  18. 25 1月, 2017 1 次提交
  19. 22 11月, 2016 2 次提交
  20. 04 10月, 2016 1 次提交
  21. 29 9月, 2016 1 次提交
  22. 23 9月, 2016 2 次提交
  23. 21 9月, 2016 1 次提交
  24. 15 9月, 2016 2 次提交
  25. 14 9月, 2016 1 次提交
  26. 09 9月, 2016 1 次提交
    • S
      powerpc/powernv: Provide facilities for EOI, usable from real mode · 4ee11c1a
      Suresh Warrier 提交于
      This adds a new function pnv_opal_pci_msi_eoi() which does the part of
      end-of-interrupt (EOI) handling of an MSI which involves doing an
      OPAL call.  This function can be called in real mode.  This doesn't
      just export pnv_ioda2_msi_eoi() because that does a call to
      icp_native_eoi(), which does not work in real mode.
      
      This also adds a function, is_pnv_opal_msi(), which KVM can call to
      check whether an interrupt is one for which we should be calling
      pnv_opal_pci_msi_eoi() when we need to do an EOI.
      
      [paulus@ozlabs.org - split out the addition of pnv_opal_pci_msi_eoi()
       from Suresh's patch "KVM: PPC: Book3S HV: Handle passthrough
       interrupts in guest"; added is_pnv_opal_msi(); wrote description.]
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      4ee11c1a
  27. 08 9月, 2016 1 次提交
  28. 06 9月, 2016 1 次提交
    • G
      powerpc/powernv: Fix crash on releasing compound PE · b314427a
      Gavin Shan 提交于
      The compound PE is created to accommodate the devices attached to
      one specific PCI bus that consume multiple M64 segments. The compound
      PE is made up of one master PE and possibly multiple slave PEs. The
      slave PEs should be destroyed when releasing the master PE. A kernel
      crash happens when derferencing @pe->pdev on releasing the slave PE
      in pnv_ioda_deconfigure_pe().
      
        # echo 0 > /sys/bus/pci/slots/C7/power
        iommu: Removing device 0000:01:00.1 from group 0
        iommu: Removing device 0000:01:00.0 from group 0
        Unable to handle kernel paging request for data at address 0x00000010
        Faulting instruction address: 0xc00000000005d898
        cpu 0x1: Vector: 300 (Data Access) at [c000000fe8217620]
            pc: c00000000005d898: pnv_ioda_release_pe+0x288/0x610
            lr: c00000000005dbdc: pnv_ioda_release_pe+0x5cc/0x610
            sp: c000000fe82178a0
           msr: 9000000000009033
           dar: 10
         dsisr: 40000000
          current = 0xc000000fe815ab80
          paca    = 0xc00000000ff00400	 softe: 0	 irq_happened: 0x01
            pid   = 2709, comm = sh
        Linux version 4.8.0-rc5-gavin-00006-g745efdb (gwshan@gwshan) \
        (gcc version 4.9.3 (Buildroot 2016.02-rc2-00093-g5ea3bce) ) #586 SMP \
        Tue Sep 6 13:37:29 AEST 2016
        enter ? for help
        [c000000fe8217940] c00000000005d684 pnv_ioda_release_pe+0x74/0x610
        [c000000fe82179e0] c000000000034460 pcibios_release_device+0x50/0x70
        [c000000fe8217a10] c0000000004aba80 pci_release_dev+0x50/0xa0
        [c000000fe8217a40] c000000000704898 device_release+0x58/0xf0
        [c000000fe8217ac0] c000000000470510 kobject_release+0x80/0xf0
        [c000000fe8217b00] c000000000704dd4 put_device+0x24/0x40
        [c000000fe8217b20] c0000000004af94c pci_remove_bus_device+0x12c/0x150
        [c000000fe8217b60] c000000000034244 pci_hp_remove_devices+0x94/0xd0
        [c000000fe8217ba0] c0000000004ca444 pnv_php_disable_slot+0x64/0xb0
        [c000000fe8217bd0] c0000000004c88c0 power_write_file+0xa0/0x190
        [c000000fe8217c50] c0000000004c248c pci_slot_attr_store+0x3c/0x60
        [c000000fe8217c70] c0000000002d6494 sysfs_kf_write+0x94/0xc0
        [c000000fe8217cb0] c0000000002d50f0 kernfs_fop_write+0x180/0x260
        [c000000fe8217d00] c0000000002334a0 __vfs_write+0x40/0x190
        [c000000fe8217d90] c000000000234738 vfs_write+0xc8/0x240
        [c000000fe8217de0] c000000000236250 SyS_write+0x60/0x110
        [c000000fe8217e30] c000000000009524 system_call+0x38/0x108
      
      It fixes the kernel crash by bypassing releasing resources (DMA,
      IO and memory segments, PELTM) because there are no resources assigned
      to the slave PE.
      
      Fixes: c5f7700b ("powerpc/powernv: Dynamically release PE")
      Reported-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b314427a
  29. 22 8月, 2016 1 次提交
  30. 09 8月, 2016 2 次提交