1. 28 4月, 2017 1 次提交
  2. 11 4月, 2017 1 次提交
  3. 04 4月, 2017 1 次提交
    • A
      powerpc/powernv: Introduce address translation services for Nvlink2 · 1ab66d1f
      Alistair Popple 提交于
      Nvlink2 supports address translation services (ATS) allowing devices
      to request address translations from an mmu known as the nest MMU
      which is setup to walk the CPU page tables.
      
      To access this functionality certain firmware calls are required to
      setup and manage hardware context tables in the nvlink processing unit
      (NPU). The NPU also manages forwarding of TLB invalidates (known as
      address translation shootdowns/ATSDs) to attached devices.
      
      This patch exports several methods to allow device drivers to register
      a process id (PASID/PID) in the hardware tables and to receive
      notification of when a device should stop issuing address translation
      requests (ATRs). It also adds a fault handler to allow device drivers
      to demand fault pages in.
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      [mpe: Fix up comment formatting, use flush_tlb_mm()]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1ab66d1f
  4. 30 3月, 2017 3 次提交
  5. 20 3月, 2017 1 次提交
  6. 09 3月, 2017 2 次提交
    • A
      powerpc/powernv/ioda2: Update iommu table base on ownership change · db08e1d5
      Alexey Kardashevskiy 提交于
      On POWERNV platform, in order to do DMA via IOMMU (i.e. 32bit DMA in
      our case), a device needs an iommu_table pointer set via
      set_iommu_table_base().
      
      The codeflow is:
      - pnv_pci_ioda2_setup_dma_pe()
      	- pnv_pci_ioda2_setup_default_config()
      	- pnv_ioda_setup_bus_dma() [1]
      
      pnv_pci_ioda2_setup_dma_pe() creates IOMMU groups,
      pnv_pci_ioda2_setup_default_config() does default DMA setup,
      pnv_ioda_setup_bus_dma() takes a bus PE (on IODA2, all physical function
      PEs as bus PEs except NPU), walks through all underlying buses and
      devices, adds all devices to an IOMMU group and sets iommu_table.
      
      On IODA2, when VFIO is used, it takes ownership over a PE which means it
      removes all tables and creates new ones (with a possibility of sharing
      them among PEs). So when the ownership is returned from VFIO to
      the kernel, the iommu_table pointer written to a device at [1] is
      stale and needs an update.
      
      This adds an "add_to_group" parameter to pnv_ioda_setup_bus_dma()
      (in fact re-adds as it used to be there a while ago for different
      reasons) to tell the helper if a device needs to be added to
      an IOMMU group with an iommu_table update or just the latter.
      
      This calls pnv_ioda_setup_bus_dma(..., false) from
      pnv_ioda2_release_ownership() so when the ownership is restored,
      32bit DMA can work again for a device. This does the same thing
      on obtaining ownership as the iommu_table point is stale at this point
      anyway and it is safer to have NULL there.
      
      We did not hit this earlier as all tested devices in recent years were
      only using 64bit DMA; the rare exception for this is MPT3 SAS adapter
      which uses both 32bit and 64bit DMA access and it has not been tested
      with VFIO much.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      db08e1d5
    • A
      powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested · 7aafac11
      Alexey Kardashevskiy 提交于
      The IODA2 specification says that a 64 DMA address cannot use top 4 bits
      (3 are reserved and one is a "TVE select"); bottom page_shift bits
      cannot be used for multilevel table addressing either.
      
      The existing IODA2 table allocation code aligns the minimum TCE table
      size to PAGE_SIZE so in the case of 64K system pages and 4K IOMMU pages,
      we have 64-4-12=48 bits. Since 64K page stores 8192 TCEs, i.e. needs
      13 bits, the maximum number of levels is 48/13 = 3 so we physically
      cannot address more and EEH happens on DMA accesses.
      
      This adds a check that too many levels were requested.
      
      It is still possible to have 5 levels in the case of 4K system page size.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7aafac11
  7. 28 2月, 2017 1 次提交
  8. 17 2月, 2017 1 次提交
  9. 07 2月, 2017 1 次提交
  10. 30 1月, 2017 1 次提交
  11. 25 1月, 2017 1 次提交
  12. 22 11月, 2016 2 次提交
  13. 04 10月, 2016 1 次提交
  14. 29 9月, 2016 1 次提交
  15. 23 9月, 2016 2 次提交
  16. 21 9月, 2016 1 次提交
  17. 15 9月, 2016 2 次提交
  18. 14 9月, 2016 1 次提交
  19. 09 9月, 2016 1 次提交
    • S
      powerpc/powernv: Provide facilities for EOI, usable from real mode · 4ee11c1a
      Suresh Warrier 提交于
      This adds a new function pnv_opal_pci_msi_eoi() which does the part of
      end-of-interrupt (EOI) handling of an MSI which involves doing an
      OPAL call.  This function can be called in real mode.  This doesn't
      just export pnv_ioda2_msi_eoi() because that does a call to
      icp_native_eoi(), which does not work in real mode.
      
      This also adds a function, is_pnv_opal_msi(), which KVM can call to
      check whether an interrupt is one for which we should be calling
      pnv_opal_pci_msi_eoi() when we need to do an EOI.
      
      [paulus@ozlabs.org - split out the addition of pnv_opal_pci_msi_eoi()
       from Suresh's patch "KVM: PPC: Book3S HV: Handle passthrough
       interrupts in guest"; added is_pnv_opal_msi(); wrote description.]
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      4ee11c1a
  20. 08 9月, 2016 1 次提交
  21. 06 9月, 2016 1 次提交
    • G
      powerpc/powernv: Fix crash on releasing compound PE · b314427a
      Gavin Shan 提交于
      The compound PE is created to accommodate the devices attached to
      one specific PCI bus that consume multiple M64 segments. The compound
      PE is made up of one master PE and possibly multiple slave PEs. The
      slave PEs should be destroyed when releasing the master PE. A kernel
      crash happens when derferencing @pe->pdev on releasing the slave PE
      in pnv_ioda_deconfigure_pe().
      
        # echo 0 > /sys/bus/pci/slots/C7/power
        iommu: Removing device 0000:01:00.1 from group 0
        iommu: Removing device 0000:01:00.0 from group 0
        Unable to handle kernel paging request for data at address 0x00000010
        Faulting instruction address: 0xc00000000005d898
        cpu 0x1: Vector: 300 (Data Access) at [c000000fe8217620]
            pc: c00000000005d898: pnv_ioda_release_pe+0x288/0x610
            lr: c00000000005dbdc: pnv_ioda_release_pe+0x5cc/0x610
            sp: c000000fe82178a0
           msr: 9000000000009033
           dar: 10
         dsisr: 40000000
          current = 0xc000000fe815ab80
          paca    = 0xc00000000ff00400	 softe: 0	 irq_happened: 0x01
            pid   = 2709, comm = sh
        Linux version 4.8.0-rc5-gavin-00006-g745efdb (gwshan@gwshan) \
        (gcc version 4.9.3 (Buildroot 2016.02-rc2-00093-g5ea3bce) ) #586 SMP \
        Tue Sep 6 13:37:29 AEST 2016
        enter ? for help
        [c000000fe8217940] c00000000005d684 pnv_ioda_release_pe+0x74/0x610
        [c000000fe82179e0] c000000000034460 pcibios_release_device+0x50/0x70
        [c000000fe8217a10] c0000000004aba80 pci_release_dev+0x50/0xa0
        [c000000fe8217a40] c000000000704898 device_release+0x58/0xf0
        [c000000fe8217ac0] c000000000470510 kobject_release+0x80/0xf0
        [c000000fe8217b00] c000000000704dd4 put_device+0x24/0x40
        [c000000fe8217b20] c0000000004af94c pci_remove_bus_device+0x12c/0x150
        [c000000fe8217b60] c000000000034244 pci_hp_remove_devices+0x94/0xd0
        [c000000fe8217ba0] c0000000004ca444 pnv_php_disable_slot+0x64/0xb0
        [c000000fe8217bd0] c0000000004c88c0 power_write_file+0xa0/0x190
        [c000000fe8217c50] c0000000004c248c pci_slot_attr_store+0x3c/0x60
        [c000000fe8217c70] c0000000002d6494 sysfs_kf_write+0x94/0xc0
        [c000000fe8217cb0] c0000000002d50f0 kernfs_fop_write+0x180/0x260
        [c000000fe8217d00] c0000000002334a0 __vfs_write+0x40/0x190
        [c000000fe8217d90] c000000000234738 vfs_write+0xc8/0x240
        [c000000fe8217de0] c000000000236250 SyS_write+0x60/0x110
        [c000000fe8217e30] c000000000009524 system_call+0x38/0x108
      
      It fixes the kernel crash by bypassing releasing resources (DMA,
      IO and memory segments, PELTM) because there are no resources assigned
      to the slave PE.
      
      Fixes: c5f7700b ("powerpc/powernv: Dynamically release PE")
      Reported-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b314427a
  22. 22 8月, 2016 1 次提交
  23. 09 8月, 2016 2 次提交
  24. 04 8月, 2016 1 次提交
    • K
      dma-mapping: use unsigned long for dma_attrs · 00085f1e
      Krzysztof Kozlowski 提交于
      The dma-mapping core and the implementations do not change the DMA
      attributes passed by pointer.  Thus the pointer can point to const data.
      However the attributes do not have to be a bitfield.  Instead unsigned
      long will do fine:
      
      1. This is just simpler.  Both in terms of reading the code and setting
         attributes.  Instead of initializing local attributes on the stack
         and passing pointer to it to dma_set_attr(), just set the bits.
      
      2. It brings safeness and checking for const correctness because the
         attributes are passed by value.
      
      Semantic patches for this change (at least most of them):
      
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
      
          @@
          f(...,
          - struct dma_attrs *attrs
          + unsigned long attrs
          , ...)
          {
          ...
          }
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      and
      
          // Options: --all-includes
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
          type t;
      
          @@
          t f(..., struct dma_attrs *attrs);
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.comSigned-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: NRobin Murphy <robin.murphy@arm.com>
      Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Acked-by: Mark Salter <msalter@redhat.com> [c6x]
      Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
      Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
      Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
      Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
      Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
      Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
      Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
      Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00085f1e
  25. 17 7月, 2016 7 次提交
  26. 14 7月, 2016 2 次提交
    • I
      cxl: Add support for interrupts on the Mellanox CX4 · a2f67d5e
      Ian Munsie 提交于
      The Mellanox CX4 in cxl mode uses a hybrid interrupt model, where
      interrupts are routed from the networking hardware to the XSL using the
      MSIX table, and from there will be transformed back into an MSIX
      interrupt using the cxl style interrupts (i.e. using IVTE entries and
      ranges to map a PE and AFU interrupt number to an MSIX address).
      
      We want to hide the implementation details of cxl interrupts as much as
      possible. To this end, we use a special version of the MSI setup &
      teardown routines in the PHB while in cxl mode to allocate the cxl
      interrupts and configure the IVTE entries in the process element.
      
      This function does not configure the MSIX table - the CX4 card uses a
      custom format in that table and it would not be appropriate to fill that
      out in generic code. The rest of the functionality is similar to the
      "Full MSI-X mode" described in the CAIA, and this could be easily
      extended to support other adapters that use that mode in the future.
      
      The interrupts will be associated with the default context. If the
      maximum number of interrupts per context has been limited (e.g. by the
      mlx5 driver), it will automatically allocate additional kernel contexts
      to associate extra interrupts as required. These contexts will be
      started using the same WED that was used to start the default context.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a2f67d5e
    • I
      powerpc/powernv: Add support for the cxl kernel api on the real phb · 4361b034
      Ian Munsie 提交于
      This adds support for the peer model of the cxl kernel api to the
      PowerNV PHB, in which physical function 0 represents the cxl function on
      the card (an XSL in the case of the CX4), which other physical functions
      will use for memory access and interrupt services. It is referred to as
      the peer model as these functions are peers of one another, as opposed
      to the Virtual PHB model which forms a hierarchy.
      
      This patch exports APIs to enable the peer mode, check if a PCI device
      is attached to a PHB in this mode, and to set and get the peer AFU for
      this mode.
      
      The cxl driver will enable this mode for supported cards by calling
      pnv_cxl_enable_phb_kernel_api(). This will set a flag in the PHB to note
      that this mode is enabled, and switch out it's controller_ops for the
      cxl version.
      
      The cxl version of the controller_ops struct implements it's own
      versions of the enable_device_hook and release_device to handle
      refcounting on the peer AFU and to allocate a default context for the
      device.
      
      Once enabled, the cxl kernel API may not be disabled on a PHB. Currently
      there is no safe way to disable cxl mode short of a reboot, so until
      that changes there is no reason to support the disable path.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4361b034