1. 09 8月, 2016 2 次提交
  2. 04 8月, 2016 1 次提交
    • K
      dma-mapping: use unsigned long for dma_attrs · 00085f1e
      Krzysztof Kozlowski 提交于
      The dma-mapping core and the implementations do not change the DMA
      attributes passed by pointer.  Thus the pointer can point to const data.
      However the attributes do not have to be a bitfield.  Instead unsigned
      long will do fine:
      
      1. This is just simpler.  Both in terms of reading the code and setting
         attributes.  Instead of initializing local attributes on the stack
         and passing pointer to it to dma_set_attr(), just set the bits.
      
      2. It brings safeness and checking for const correctness because the
         attributes are passed by value.
      
      Semantic patches for this change (at least most of them):
      
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
      
          @@
          f(...,
          - struct dma_attrs *attrs
          + unsigned long attrs
          , ...)
          {
          ...
          }
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      and
      
          // Options: --all-includes
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
          type t;
      
          @@
          t f(..., struct dma_attrs *attrs);
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.comSigned-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: NRobin Murphy <robin.murphy@arm.com>
      Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Acked-by: Mark Salter <msalter@redhat.com> [c6x]
      Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
      Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
      Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
      Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
      Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
      Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
      Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
      Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00085f1e
  3. 17 7月, 2016 7 次提交
  4. 14 7月, 2016 3 次提交
    • I
      cxl: Add support for interrupts on the Mellanox CX4 · a2f67d5e
      Ian Munsie 提交于
      The Mellanox CX4 in cxl mode uses a hybrid interrupt model, where
      interrupts are routed from the networking hardware to the XSL using the
      MSIX table, and from there will be transformed back into an MSIX
      interrupt using the cxl style interrupts (i.e. using IVTE entries and
      ranges to map a PE and AFU interrupt number to an MSIX address).
      
      We want to hide the implementation details of cxl interrupts as much as
      possible. To this end, we use a special version of the MSI setup &
      teardown routines in the PHB while in cxl mode to allocate the cxl
      interrupts and configure the IVTE entries in the process element.
      
      This function does not configure the MSIX table - the CX4 card uses a
      custom format in that table and it would not be appropriate to fill that
      out in generic code. The rest of the functionality is similar to the
      "Full MSI-X mode" described in the CAIA, and this could be easily
      extended to support other adapters that use that mode in the future.
      
      The interrupts will be associated with the default context. If the
      maximum number of interrupts per context has been limited (e.g. by the
      mlx5 driver), it will automatically allocate additional kernel contexts
      to associate extra interrupts as required. These contexts will be
      started using the same WED that was used to start the default context.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a2f67d5e
    • I
      powerpc/powernv: Add support for the cxl kernel api on the real phb · 4361b034
      Ian Munsie 提交于
      This adds support for the peer model of the cxl kernel api to the
      PowerNV PHB, in which physical function 0 represents the cxl function on
      the card (an XSL in the case of the CX4), which other physical functions
      will use for memory access and interrupt services. It is referred to as
      the peer model as these functions are peers of one another, as opposed
      to the Virtual PHB model which forms a hierarchy.
      
      This patch exports APIs to enable the peer mode, check if a PCI device
      is attached to a PHB in this mode, and to set and get the peer AFU for
      this mode.
      
      The cxl driver will enable this mode for supported cards by calling
      pnv_cxl_enable_phb_kernel_api(). This will set a flag in the PHB to note
      that this mode is enabled, and switch out it's controller_ops for the
      cxl version.
      
      The cxl version of the controller_ops struct implements it's own
      versions of the enable_device_hook and release_device to handle
      refcounting on the peer AFU and to allocate a default context for the
      device.
      
      Once enabled, the cxl kernel API may not be disabled on a PHB. Currently
      there is no safe way to disable cxl mode short of a reboot, so until
      that changes there is no reason to support the disable path.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4361b034
    • I
      powerpc/powernv: Split cxl code out into a separate file · f456834a
      Ian Munsie 提交于
      The support for using the Mellanox CX4 in cxl mode will require
      additions to the PHB code. In preparation for this, move the existing
      cxl code out of pci-ioda.c into a separate pci-cxl.c file to keep things
      more organised.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Reviewed-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f456834a
  5. 21 6月, 2016 9 次提交
    • G
      powerpc/powernv: Print correct PHB type names · 9497a1c1
      Gavin Shan 提交于
      We're initializing "IODA1" and "IODA2" PHBs though they are IODA2
      and NPU PHBs as below kernel log indicates.
      
         Initializing IODA1 OPAL PHB /pciex@3fffe40700000
         Initializing IODA2 OPAL PHB /pciex@3fff000400000
      
      This fixes the PHB names. After it's applied, we get:
      
         Initializing IODA2 PHB (/pciex@3fffe40700000)
         Initializing NPU PHB (/pciex@3fff000400000)
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9497a1c1
    • G
      powerpc/powernv: Dynamically release PE · c5f7700b
      Gavin Shan 提交于
      This supports releasing PEs dynamically. A reference count is
      introduced to PE representing number of PCI devices associated
      with the PE. The reference count is increased when PCI device
      joins the PE and decreased when PCI device leaves the PE in
      pnv_pci_release_device(). When the count becomes zero, the PE
      and its consumed resources are released. Note that the count
      is accessed concurrently. So a counter with "int" type is enough
      here.
      
      In order to release the sources consumed by the PE, couple of
      helper functions are introduced as below:
      
         * pnv_pci_ioda1_unset_window() - Unset IODA1 DMA32 window
         * pnv_pci_ioda1_release_dma_pe() - Release IODA1 DMA32 segments
         * pnv_pci_ioda2_release_dma_pe() - Release IODA2 DMA resource
         * pnv_ioda_release_pe_seg() - Unmap IO/M32/M64 segments
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c5f7700b
    • G
      powerpc/powernv: Make pnv_ioda_deconfigure_pe() visible · 93e01a50
      Gavin Shan 提交于
      pnv_ioda_deconfigure_pe() is visible only when CONFIG_PCI_IOV is
      enabled. The function will be used to tear down PE's associated
      mapping in PCI hotplug path that doesn't depend on CONFIG_PCI_IOV.
      
      This makes pnv_ioda_deconfigure_pe() visible and not depend on
      CONFIG_PCI_IOV.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      93e01a50
    • G
      powerpc/powernv: Extend PCI bridge resources · 40e2a47e
      Gavin Shan 提交于
      The PCI slots are associated with root port or downstream ports
      of the PCIe switch connected to root port. When adapter is hot
      added to the PCI slot, it usually requests more IO or memory
      resource from the directly connected parent bridge (port) and
      update the bridge's windows accordingly. The resource windows
      of upstream bridges can't be updated automatically. It possibly
      leads to unbalanced resource across the bridges: The window of
      downstream bridge is overruning that of upstream bridge. The
      IO or MMIO path won't work.
      
      This resolves the above issue by extending bridge windows of
      root port and upstream port of the PCIe switch connected to
      the root port to PHB's windows.
      
      The windows of root port and bridge behind that are extended to
      the PHB's windows to accomodate the PCI hotplug happening in
      future. The PHB's 64KB 32-bits MSI region is included in bridge's
      M32 windows (in hardware) though it's excluded in the corresponding
      resource, as the bridge's M32 windows have 1MB as their minimal
      alignment. We observed EEH error during system boot when the MSI
      region is included in bridge's M32 window.
      
      This excludes top 1MB (including 64KB 32-bits MSI region) region
      from bridge's M32 windows when extending them.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      40e2a47e
    • G
      powerpc/powernv: Setup PE for root bus · 63803c39
      Gavin Shan 提交于
      There is no parent bridge for root bus, meaning pcibios_setup_bridge()
      isn't invoked for root bus. The PE for root bus is the ancestor of
      other PEs in PELTV. It means we need PE for root bus populated before
      all others.
      
      This populates the PE for root bus in pcibios_setup_bridge() path
      if it's not populated yet. The PE number next to the reserved one
      is used as the PE# to avoid holes in continuous M64 space.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      63803c39
    • G
      powerpc/powernv: Create PEs in pcibios_setup_bridge() · ccd1c191
      Gavin Shan 提交于
      Currently, the PEs and their associated resources are assigned in
      ppc_md.pcibios_fixup() except those used by SRIOV VFs. The function
      is called for once after PCI probing and resources assignment is
      completed. So it's obviously not hotplug friendly.
      
      This creates PEs dynamically in pcibios_setup_bridge() that is
      called for the event during system bootup and PCI hotplug: updating
      PCI bridge's windows after resource assignment/reassignment are done.
      In partial hotplug case, not all PCI devices included to one particular
      PE are unplugged and plugged again, we just need unbinding/binding the
      hot added PCI devices with the corresponding PE without creating new
      one. The change is applied to IODA1 and IODA2 PHBs only. The behaviour
      on NPU PHBs aren't changed. There are no PCI bridges on NPU PHBs,
      meaning pcibios_setup_bridge() won't be invoked there. We have to use
      old path (pnv_pci_ioda_fixup()) to setup PEs on NPU PHBs.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ccd1c191
    • G
      powerpc/powernv: Allocate PE# in reverse order · 9fcd6f4a
      Gavin Shan 提交于
      PE number for one particular PE can be allocated dynamically or
      reserved according to the consumed M64 (64-bits prefetchable)
      segments of the PE. The M64 segment can't be remapped to arbitrary
      PE, meaning the PE number is determined according to the index
      of the consumed M64 segment. As below figure shows, M64 resource
      grows from low to high end, meaning the PE (number) reserved
      according to M64 segment grows from low to high end as well,
      so does the dynamically allocated PE number. It will lead to
      conflict: PE number (M64 segment) reserved by dynamic allocation
      is required by hot added PCI adapter at later point. It fails
      the PCI hotplug because of the PE number can't be reserved
      based on the index of the consumed M64 segment.
      
        +---+---+---+---+---+--------------------------------+-----+
        | 0 | 1 | 2 | 3 | 4 |      .......                   | 255 |
        +---+---+---+---+---+--------------------------------+-----+
      
        PE number for dynamic allocation          ----------------->
        PE number reserved for M64 segment        ----------------->
      
      To resolve above conflicts, this forces the PE number to be
      allocated dynamically in reverse order. With this patch applied,
      the PE numbers are reserved in ascending order, but allocated
      dynamically in reverse order.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9fcd6f4a
    • G
      powerpc/powernv: Increase PE# capacity · c127562a
      Gavin Shan 提交于
      Each PHB maintains an array helping to translate 2-bytes Request
      ID (RID) to PE# with the assumption that PE# takes one byte, meaning
      that we can't have more than 256 PEs. However, pci_dn->pe_number
      already had 4-bytes for the PE#.
      
      This extends the PE# capacity for every PHB. After that, the PE number
      is represented by 4-bytes value. Then we can reuse IODA_INVALID_PE to
      check the PE# in phb->pe_rmap[] is valid or not.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c127562a
    • G
      powerpc/powernv: Move pnv_pci_ioda_setup_opal_tce_kill() around · 577c8c88
      Gavin Shan 提交于
      pnv_pci_ioda_setup_opal_tce_kill() called by pnv_ioda_setup_dma()
      to remap the TCE kill regiter. What's done in pnv_ioda_setup_dma()
      will be covered in pcibios_setup_bridge() which is invoked on each
      PCI bridge. It means we will possibly remap the TCE kill register
      for multiple times and it's unnecessary.
      
      This moves pnv_pci_ioda_setup_opal_tce_kill() to where the PHB is
      initialized (pnv_pci_init_ioda_phb()) to avoid above issue.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      577c8c88
  6. 16 6月, 2016 1 次提交
  7. 14 6月, 2016 1 次提交
  8. 12 5月, 2016 2 次提交
    • A
      powerpc/powernv/npu: Add PE to PHB's list · 1d4e89cf
      Alexey Kardashevskiy 提交于
      Before commit 3e68dc57 "powerpc/powernv: Remove DMA32 PE list", NPU PEs
      were linked to the NPU PHB via phb->ioda.pe_dma_list; after that fix,
      the phb->ioda.pe_list is used.
      
      During the pe_dma_list removal, list_add_tail(&phb->ioda.pe_dma_list)
      was removed, however no list_add() was added so does this patch.
      
      Fixes: 3e68dc57219a ("powerpc/powernv: Remove DMA32 PE list")
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1d4e89cf
    • A
      powerpc/powernv: Fix insufficient memory allocation · 92a86756
      Alexey Kardashevskiy 提交于
      The pnv_pci_init_ioda_phb() helper allocates a blob to store auxilary
      data such PE and M32/M64 segment allocation maps; this single blob has
      few partitions, size of each is derived from the PE number -
      phb->ioda.total_pe_num.
      
      It was assumed that the minimum PE number is 8, however it is 4 for NPU
      so the pe_alloc part was missing in the allocated blob. It was invisible
      till recently as we were not tracking used M64 segments and NPUs do not
      use M32 segments so the phb->ioda.m32_segmap (which was pointing to the
      same address as phb->ioda.pe_alloc) has never been written to leaving
      the pe_alloc memory intact.
      
      After commit 401203ac2d "powerpc/powernv: Track M64 segment consumption"
      the pe_alloc gets corrupted and PE allocation cannot work. This fixes
      the issue by enforcing the minimum PE number to 8.
      
      Fixes: 401203ac2d15 ("powerpc/powernv: Track M64 segment consumption")
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      92a86756
  9. 11 5月, 2016 14 次提交
    • A
      powerpc/powernv/npu: Enable NVLink pass through · b5cb9ab1
      Alexey Kardashevskiy 提交于
      IBM POWER8 NVlink systems come with Tesla K40-ish GPUs each of which
      also has a couple of fast speed links (NVLink). The interface to links
      is exposed as an emulated PCI bridge which is included into the same
      IOMMU group as the corresponding GPU.
      
      In the kernel, NPUs get a separate PHB of the PNV_PHB_NPU type and a PE
      which behave pretty much as the standard IODA2 PHB except NPU PHB has
      just a single TVE in the hardware which means it can have either
      32bit window or 64bit window or DMA bypass but never two of these.
      
      In order to make these links work when GPU is passed to the guest,
      these bridges need to be passed as well; otherwise performance will
      degrade.
      
      This implements and exports API to manage NPU state in regard to VFIO;
      it replicates iommu_table_group_ops.
      
      This defines a new pnv_pci_ioda2_npu_ops which is assigned to
      the IODA2 bridge if there are NPUs for a GPU on the bridge.
      The new callbacks call the default IODA2 callbacks plus new NPU API.
      This adds a gpe_table_group_to_npe() helper to find NPU PE for the IODA2
      table_group, it is not expected to fail as the helper is only called
      from the pnv_pci_ioda2_npu_ops.
      
      This does not define NPU-specific .release_ownership() so after
      VFIO is finished, DMA on NPU is disabled which is ok as the nvidia
      driver sets DMA mask when probing which enable 32 or 64bit DMA on NPU.
      
      This adds a pnv_pci_npu_setup_iommu() helper which adds NPUs to
      the GPU group if any found. The helper uses helpers to look for
      the "ibm,gpu" property in the device tree which is a phandle of
      the corresponding GPU.
      
      This adds an additional loop over PEs in pnv_ioda_setup_dma() as the main
      loop skips NPU PEs as they do not have 32bit DMA segments.
      
      As pnv_npu_set_window() and pnv_npu_unset_window() are started being used
      by the new IODA2-NPU IOMMU group, this makes the helpers public and
      adds the DMA window number parameter.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-By: NAlistair Popple <alistair@popple.id.au>
      [mpe: Add pnv_pci_ioda_setup_iommu_api() to fix build with IOMMU_API=n]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b5cb9ab1
    • A
      powerpc/powernv/npu: Rework TCE Kill handling · 85674868
      Alexey Kardashevskiy 提交于
      The pnv_ioda_pe struct keeps an array of peers. At the moment it is only
      used to link GPU and NPU for 2 purposes:
      
      1. Access NPU quickly when configuring DMA for GPU - this was addressed
      in the previos patch by removing use of it as DMA setup is not what
      the kernel would constantly do.
      
      2. Invalidate TCE cache for NPU when it is invalidated for GPU.
      GPU and NPU are in different PE. There is already a mechanism to
      attach multiple iommu_table_group to the same iommu_table (used for VFIO),
      we can reuse it here so does this patch.
      
      This gets rid of peers[] array and PNV_IODA_PE_PEER flag as they are
      not needed anymore.
      
      While we are here, add TCE cache invalidation after enabling bypass.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-By: NAlistair Popple <alistair@popple.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      85674868
    • A
      powerpc/powernv/ioda2: Export debug helper pe_level_printk() · 7d623e42
      Alexey Kardashevskiy 提交于
      This exports debugging helper pe_level_printk() and corresponding macroses
      so they can be used in npu-dma.c.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-By: NAlistair Popple <alistair@popple.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7d623e42
    • A
      powerpc/powernv/npu: Simplify DMA setup · f9f83456
      Alexey Kardashevskiy 提交于
      NPU devices are emulated in firmware and mainly used for NPU NVLink
      training; one NPU device is per a hardware link. Their DMA/TCE setup
      must match the GPU which is connected via PCIe and NVLink so any changes
      to the DMA/TCE setup on the GPU PCIe device need to be propagated to
      the NVLink device as this is what device drivers expect and it doesn't
      make much sense to do anything else.
      
      This makes NPU DMA setup explicit.
      pnv_npu_ioda_controller_ops::pnv_npu_dma_set_mask is moved to pci-ioda,
      made static and prints warning as dma_set_mask() should never be called
      on this function as in any case it will not configure GPU; so we make
      this explicit.
      
      Instead of using PNV_IODA_PE_PEER and peers[] (which the next patch will
      remove), we test every PCI device if there are corresponding NVLink
      devices. If there are any, we propagate bypass mode to just found NPU
      devices by calling the setup helper directly (which takes @bypass) and
      avoid guessing (i.e. calculating from DMA mask) whether we need bypass
      or not on NPU devices. Since DMA setup happens in very rare occasion,
      this will not slow down booting or VFIO start/stop much.
      
      This renames pnv_npu_disable_bypass to pnv_npu_dma_set_32 to make it
      more clear what the function really does which is programming 32bit
      table address to the TVT ("disabling bypass" means writing zeroes to
      the TVT).
      
      This removes pnv_npu_dma_set_bypass() from pnv_npu_ioda_fixup() as
      the DMA configuration on NPU does not matter until dma_set_mask() is
      called on GPU and that will do the NPU DMA configuration.
      
      This removes phb->dma_dev_setup initialization for NPU as
      pnv_pci_ioda_dma_dev_setup is no-op for it anyway.
      
      This stops using npe->tce_bypass_base as it never changes and values
      other than zero are not supported.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAlistair Popple <alistair@popple.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f9f83456
    • A
      powerpc/powernv/npu: TCE Kill helpers cleanup · 0bbcdb43
      Alexey Kardashevskiy 提交于
      NPU PHB TCE Kill register is exactly the same as in the rest of POWER8
      so let's reuse the existing code for NPU. The only bit missing is
      a helper to reset the entire TCE cache so this moves such a helper
      from NPU code and renames it.
      
      Since pnv_npu_tce_invalidate() does really invalidate the entire cache,
      this uses pnv_pci_ioda2_tce_invalidate_entire() directly for NPU.
      This adds an explicit comment for workaround for invalidating NPU TCE
      cache.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAlistair Popple <alistair@popple.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0bbcdb43
    • A
      powerpc/powernv: Define TCE Kill flags · bef9253f
      Alexey Kardashevskiy 提交于
      This replaces magic constants for TCE Kill IODA2 register with macros.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bef9253f
    • A
      powerpc/powernv: Rename pnv_pci_ioda2_tce_invalidate_entire · a7cf13ca
      Alexey Kardashevskiy 提交于
      As in fact pnv_pci_ioda2_tce_invalidate_entire() invalidates TCEs for
      the specific PE rather than the entire cache, rename it to
      pnv_pci_ioda2_tce_invalidate_pe(). In later patches we will add
      a proper pnv_pci_ioda2_tce_invalidate_entire().
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a7cf13ca
    • G
      powerpc/powernv: Use PE instead of number during setup and release · 1e916772
      Gavin Shan 提交于
      In current implementation, the PEs that are allocated or picked
      from the reserved list are identified by PE number. The PE instance
      has to be picked according to the PE number eventually. We have
      same issue when PE is released.
      
      For pnv_ioda_pick_m64_pe() and pnv_ioda_alloc_pe(), this returns
      PE instance so that pnv_ioda_setup_bus_PE() can use the allocated
      or reserved PE instance directly. Also, pnv_ioda_setup_bus_PE()
      returns the reserved/allocated PE instance to be used in subsequent
      patches. On the other hand, pnv_ioda_free_pe() uses PE instance
      (not number) as its argument. No logical changes introduced.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1e916772
    • G
      powerpc/powernv/ioda1: Improve DMA32 segment track · 2b923ed1
      Gavin Shan 提交于
      In current implementation, the DMA32 segments required by one specific
      PE isn't calculated with the information hold in the PE independently.
      It conflicts with the PCI hotplug design: PE centralized, meaning the
      PE's DMA32 segments should be calculated from the information hold in
      the PE independently.
      
      This introduces an array (@dma32_segmap) for every PHB to track the
      DMA32 segmeng usage. Besides, this moves the logic calculating PE's
      consumed DMA32 segments to pnv_pci_ioda1_setup_dma_pe() so that PE's
      DMA32 segments are calculated/allocated from the information hold in
      the PE (DMA32 weight). Also the logic is improved: we try to allocate
      as much DMA32 segments as we can. It's acceptable that number of DMA32
      segments less than the expected number are allocated.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2b923ed1
    • G
      powerpc/powernv: Remove DMA32 PE list · 801846d1
      Gavin Shan 提交于
      PEs are put into PHB DMA32 list (phb->ioda.pe_dma_list) according
      to their DMA32 weight. The PEs on the list are iterated to setup
      their TCE32 tables at system booting time. The list is used for
      once at boot time and no need to keep it.
      
      This moves the logic calculating DMA32 weight of PHB and PE to
      pnv_ioda_setup_dma() to drop PHB's DMA32 list. Also, every PE
      traces the consumed DMA32 segment by @tce32_seg and @tce32_segcount
      are useless and they're removed.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      801846d1
    • G
      powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE · acce971c
      Gavin Shan 提交于
      Currently, there is one macro (TCE32_TABLE_SIZE) representing the
      TCE table size for one DMA32 segment. The constant representing
      the DMA32 segment size (1 << 28) is still used in the code.
      
      This defines PNV_IODA1_DMA32_SEGSIZE representing one DMA32
      segment size. the TCE table size can be calcualted when the page
      has fixed 4KB size. So all the related calculation depends on one
      macro (PNV_IODA1_DMA32_SEGSIZE). No logical changes introduced.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-By: NAlistair Popple <alistair@popple.id.au>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      acce971c
    • G
      powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe() · b30d936f
      Gavin Shan 提交于
      This renames pnv_pci_ioda_setup_dma_pe() to pnv_pci_ioda1_setup_dma_pe()
      as it's the counter-part of IODA2's pnv_pci_ioda2_setup_dma_pe().
      No logical changes introduced.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b30d936f
    • G
      powerpc/powernv/ioda1: M64 support on P7IOC · 99451551
      Gavin Shan 提交于
      This enables M64 window on P7IOC, which has been enabled on PHB3.
      Different from PHB3 where 16 M64 BARs are supported and each of
      them can be owned by one particular PE# exclusively or divided
      evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
      of them are divided to 8 segments. So every P7IOC PHB supports
      128 M64 segments in total. P7IOC has M64DT, which helps mapping
      one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
      M64DT, indicating that one M64 segment can only be pinned to the
      fixed PE#.
      
      In order to unified M64 support M64 on P7IOC and PHB3, we just
      provide 128 M64 segments on every P7IOC PHB and each of them is
      pinned to the fixed PE# by bypassing the function of M64DT. In
      turn, we just need different phb->init_m64() for P7IOC and PHB3
      and maps M64 segment in pnv_ioda_reserve_m64_pe() for P7IOC, most
      of the code are shared by them.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NAlistair Popple <alistair@popple.id.au>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      99451551
    • G
      powerpc/powernv: Rename M64 related functions · c430670a
      Gavin Shan 提交于
      This renames those functions picking PE number based on consumed
      M64 segments, mapping M64 segments to PEs as those functions are
      going to be shared by IODA1/IODA2 in next patch. No logical changes
      introduced.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c430670a