1. 11 6月, 2015 5 次提交
    • A
      powerpc/spapr: vfio: Replace iommu_table with iommu_table_group · b348aa65
      Alexey Kardashevskiy 提交于
      Modern IBM POWERPC systems support multiple (currently two) TCE tables
      per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
      for TCE tables. Right now just one table is supported.
      
      This defines iommu_table_group struct which stores pointers to
      iommu_group and iommu_table(s). This replaces iommu_table with
      iommu_table_group where iommu_table was used to identify a group:
      - iommu_register_group();
      - iommudata of generic iommu_group;
      
      This removes @data from iommu_table as it_table_group provides
      same access to pnv_ioda_pe.
      
      For IODA, instead of embedding iommu_table, the new iommu_table_group
      keeps pointers to those. The iommu_table structs are allocated
      dynamically.
      
      For P5IOC2, both iommu_table_group and iommu_table are embedded into
      PE struct. As there is no EEH and SRIOV support for P5IOC2,
      iommu_free_table() should not be called on iommu_table struct pointers
      so we can keep it embedded in pnv_phb::p5ioc2.
      
      For pSeries, this replaces multiple calls of kzalloc_node() with a new
      iommu_pseries_alloc_group() helper and stores the table group struct
      pointer into the pci_dn struct. For release, a iommu_table_free_group()
      helper is added.
      
      This moves iommu_table struct allocation from SR-IOV code to
      the generic DMA initialization code in pnv_pci_ioda_setup_dma_pe and
      pnv_pci_ioda2_setup_dma_pe as this is where DMA is actually initialized.
      This change is here because those lines had to be changed anyway.
      
      This should cause no behavioural change.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b348aa65
    • A
      powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table · da004c36
      Alexey Kardashevskiy 提交于
      This adds a iommu_table_ops struct and puts pointer to it into
      the iommu_table struct. This moves tce_build/tce_free/tce_get/tce_flush
      callbacks from ppc_md to the new struct where they really belong to.
      
      This adds the requirement for @it_ops to be initialized before calling
      iommu_init_table() to make sure that we do not leave any IOMMU table
      with iommu_table_ops uninitialized. This is not a parameter of
      iommu_init_table() though as there will be cases when iommu_init_table()
      will not be called on TCE tables, for example - VFIO.
      
      This does s/tce_build/set/, s/tce_free/clear/ and removes "tce_"
      redundant prefixes.
      
      This removes tce_xxx_rm handlers from ppc_md but does not add
      them to iommu_table_ops as this will be done later if we decide to
      support TCE hypercalls in real mode. This removes _vm callbacks as
      only virtual mode is supported by now so this also removes @rm parameter.
      
      For pSeries, this always uses tce_buildmulti_pSeriesLP/
      tce_buildmulti_pSeriesLP. This changes multi callback to fall back to
      tce_build_pSeriesLP/tce_free_pSeriesLP if FW_FEATURE_MULTITCE is not
      present. The reason for this is we still have to support "multitce=off"
      boot parameter in disable_multitce() and we do not want to walk through
      all IOMMU tables in the system and replace "multi" callbacks with single
      ones.
      
      For powernv, this defines _ops per PHB type which are P5IOC2/IODA1/IODA2.
      This makes the callbacks for them public. Later patches will extend
      callbacks for IODA1/2.
      
      No change in behaviour is expected.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      da004c36
    • A
      powerpc/powernv: Do not set "read" flag if direction==DMA_NONE · 10b35b2b
      Alexey Kardashevskiy 提交于
      Normally a bitmap from the iommu_table is used to track what TCE entry
      is in use. Since we are going to use iommu_table without its locks and
      do xchg() instead, it becomes essential not to put bits which are not
      implied in the direction flag as the old TCE value (more precisely -
      the permission bits) will be used to decide whether to put the page or not.
      
      This adds iommu_direction_to_tce_perm() (its counterpart is there already)
      and uses it for powernv's pnv_tce_build().
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      10b35b2b
    • A
      vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU driver · 9b14a1ff
      Alexey Kardashevskiy 提交于
      This moves page pinning (get_user_pages_fast()/put_page()) code out of
      the platform IOMMU code and puts it to VFIO IOMMU driver where it belongs
      to as the platform code does not deal with page pinning.
      
      This makes iommu_take_ownership()/iommu_release_ownership() deal with
      the IOMMU table bitmap only.
      
      This removes page unpinning from iommu_take_ownership() as the actual
      TCE table might contain garbage and doing put_page() on it is undefined
      behaviour.
      
      Besides the last part, the rest of the patch is mechanical.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9b14a1ff
    • A
      powerpc/iommu/powernv: Get rid of set_iommu_table_base_and_group · 4617082e
      Alexey Kardashevskiy 提交于
      The set_iommu_table_base_and_group() name suggests that the function
      sets table base and add a device to an IOMMU group.
      
      The actual purpose for table base setting is to put some reference
      into a device so later iommu_add_device() can get the IOMMU group
      reference and the device to the group.
      
      At the moment a group cannot be explicitly passed to iommu_add_device()
      as we want it to work from the bus notifier, we can fix it later and
      remove confusing calls of set_iommu_table_base().
      
      This replaces set_iommu_table_base_and_group() with a couple of
      set_iommu_table_base() + iommu_add_device() which makes reading the code
      easier.
      
      This adds few comments why set_iommu_table_base() and iommu_add_device()
      are called where they are called.
      
      For IODA1/2, this essentially removes iommu_add_device() call from
      the pnv_pci_ioda_dma_dev_setup() as it will always fail at this particular
      place:
      - for physical PE, the device is already attached by iommu_add_device()
      in pnv_pci_ioda_setup_dma_pe();
      - for virtual PE, the sysfs entries are not ready to create all symlinks
      so actual adding is happening in tce_iommu_bus_notifier.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4617082e
  2. 11 4月, 2015 1 次提交
  3. 31 3月, 2015 1 次提交
    • W
      powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically · 9e8d4a19
      Wei Yang 提交于
      Previously the iommu_table had the same lifetime as a struct pnv_ioda_pe
      and was embedded in it. The pnv_ioda_pe was assigned to a PE on the bootup
      stage. Since PEs are based on the hardware layout which is static in the
      system, they will never get released. This means the iommu_table in the
      pnv_ioda_pe will never get released either.
      
      This no longer works for VF PE. VF PEs are created and released dynamically
      when VFs are created and released. So we need to assign pnv_ioda_pe to VF
      PEs respectively when VFs are enabled and clean up those resources for VF
      PE when VFs are disabled. And iommu_table is one of the resources we need
      to handle dynamically.
      
      Current iommu_table is a static field in pnv_ioda_pe, which will face a
      problem when freeing it. During the disabling of a VF,
      pnv_pci_ioda2_release_dma_pe will call iommu_free_table to release the
      iommu_table for this PE. A static iommu_table will fail in
      iommu_free_table.
      
      According to these requirement, this patch allocates iommu_table
      dynamically.
      Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9e8d4a19
  4. 04 3月, 2015 1 次提交
  5. 18 11月, 2014 1 次提交
  6. 11 2月, 2014 1 次提交
    • B
      powerpc/powernv: Add iommu DMA bypass support for IODA2 · cd15b048
      Benjamin Herrenschmidt 提交于
      This patch adds the support for to create a direct iommu "bypass"
      window on IODA2 bridges (such as Power8) allowing to bypass iommu
      page translation completely for 64-bit DMA capable devices, thus
      significantly improving DMA performances.
      
      Additionally, this adds a hook to the struct iommu_table so that
      the IOMMU API / VFIO can disable the bypass when external ownership
      is requested, since in that case, the device will be used by an
      environment such as userspace or a KVM guest which must not be
      allowed to bypass translations.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cd15b048
  7. 30 12月, 2013 3 次提交
  8. 05 12月, 2013 1 次提交
    • A
      PPC: POWERNV: move iommu_add_device earlier · d905c5df
      Alexey Kardashevskiy 提交于
      The current implementation of IOMMU on sPAPR does not use iommu_ops
      and therefore does not call IOMMU API's bus_set_iommu() which
      1) sets iommu_ops for a bus
      2) registers a bus notifier
      Instead, PCI devices are added to IOMMU groups from
      subsys_initcall_sync(tce_iommu_init) which does basically the same
      thing without using iommu_ops callbacks.
      
      However Freescale PAMU driver (https://lkml.org/lkml/2013/7/1/158)
      implements iommu_ops and when tce_iommu_init is called, every PCI device
      is already added to some group so there is a conflict.
      
      This patch does 2 things:
      1. removes the loop in which PCI devices were added to groups and
      adds explicit iommu_add_device() calls to add devices as soon as they get
      the iommu_table pointer assigned to them.
      2. moves a bus notifier to powernv code in order to avoid conflict with
      the notifier from Freescale driver.
      
      iommu_add_device() and iommu_del_device() are public now.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d905c5df
  9. 01 7月, 2013 1 次提交
  10. 20 6月, 2013 1 次提交
    • A
      powerpc/vfio: Enable on PowerNV platform · 4e13c1ac
      Alexey Kardashevskiy 提交于
      This initializes IOMMU groups based on the IOMMU configuration
      discovered during the PCI scan on POWERNV (POWER non virtualized)
      platform.  The IOMMU groups are to be used later by the VFIO driver,
      which is used for PCI pass through.
      
      It also implements an API for mapping/unmapping pages for
      guest PCI drivers and providing DMA window properties.
      This API is going to be used later by QEMU-VFIO to handle
      h_put_tce hypercalls from the KVM guest.
      
      The iommu_put_tce_user_mode() does only a single page mapping
      as an API for adding many mappings at once is going to be
      added later.
      
      Although this driver has been tested only on the POWERNV
      platform, it should work on any platform which supports
      TCE tables.  As h_put_tce hypercall is received by the host
      kernel and processed by the QEMU (what involves calling
      the host kernel again), performance is not the best -
      circa 220MB/s on 10Gb ethernet network.
      
      To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config
      option and configure VFIO as required.
      
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4e13c1ac
  11. 03 7月, 2012 1 次提交
    • A
      powerpc/iommu: Implement IOMMU pools to improve multiqueue adapter performance · b4c3a872
      Anton Blanchard 提交于
      At the moment all queues in a multiqueue adapter will serialise
      against the IOMMU table lock. This is proving to be a big issue,
      especially with 10Gbit ethernet.
      
      This patch creates 4 pools and tries to spread the load across
      them. If the table is under 1GB in size we revert back to the
      original behaviour of 1 pool and 1 largealloc pool.
      
      We create a hash to map CPUs to pools. Since we prefer interrupts to
      be affinitised to primary CPUs, without some form of hashing we are
      very likely to end up using the same pool. As an example, POWER7
      has 4 way SMT and with 4 pools all primary threads will map to the
      same pool.
      
      The largealloc pool is reduced from 1/2 to 1/4 of the space to
      partially offset the overhead of breaking the table up into pools.
      
      Some performance numbers were obtained with a Chelsio T3 adapter on
      two POWER7 boxes, running a 100 session TCP round robin test.
      
      Performance improved 69% with this patch applied.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b4c3a872
  12. 28 3月, 2012 1 次提交
  13. 24 9月, 2009 1 次提交
  14. 20 8月, 2009 1 次提交
  15. 15 6月, 2009 1 次提交
  16. 31 10月, 2008 1 次提交
    • M
      powerpc: Update remaining dma_mapping_ops to use map/unmap_page · f9226d57
      Mark Nelson 提交于
      After the merge of the 32 and 64bit DMA code, dma_direct_ops lost
      their map/unmap_single() functions but gained map/unmap_page().  This
      caused a problem for Cell because Cell's dma_iommu_fixed_ops called
      the dma_direct_ops if the fixed linear mapping was to be used or the
      iommu ops if the dynamic window was to be used.  So in order to fix
      this problem we need to update the 64bit DMA code to use
      map/unmap_page.
      
      First, we update the generic IOMMU code so that iommu_map_single()
      becomes iommu_map_page() and iommu_unmap_single() becomes
      iommu_unmap_page().  Then we propagate these changes up through all
      the callers of these two functions and in the process update all the
      dma_mapping_ops so that they have map/unmap_page rahter than
      map/unmap_single.  We can do this because on 64bit there is no HIGHMEM
      memory so map/unmap_page ends up performing exactly the same function
      as map/unmap_single, just taking different arguments.
      
      This has no affect on drivers because the dma_map_single_attrs() just
      ends up calling the map_page() function of the appropriate
      dma_mapping_ops and similarly the dma_unmap_single_attrs() calls
      unmap_page().
      
      This fixes an oops on Cell blades, which oops on boot without this
      because they call dma_direct_ops.map_single, which is NULL.
      Signed-off-by: NMark Nelson <markn@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f9226d57
  17. 04 8月, 2008 1 次提交
  18. 09 7月, 2008 2 次提交
  19. 06 2月, 2008 2 次提交
  20. 11 12月, 2007 1 次提交
  21. 20 10月, 2007 1 次提交
  22. 07 5月, 2007 1 次提交
  23. 07 2月, 2007 1 次提交
  24. 04 12月, 2006 2 次提交
  25. 13 11月, 2006 1 次提交
  26. 01 11月, 2006 1 次提交
    • L
      [POWERPC] Use 4kB iommu pages even on 64kB-page systems · 5d2efba6
      Linas Vepstas 提交于
      The 10Gigabit ethernet device drivers appear to be able to chew
      up all 256MB of TCE mappings on pSeries systems, as evidenced by
      numerous error messages:
      
       iommu_alloc failed, tbl c0000000010d5c48 vaddr c0000000d875eff0 npages 1
      
      Some experimentation indicates that this is essentially because
      one 1500 byte ethernet MTU gets mapped as a 64K DMA region when
      the large 64K pages are enabled. Thus, it doesn't take much to
      exhaust all of the available DMA mappings for a high-speed card.
      
      This patch changes the iommu allocator to work with its own
      unique, distinct page size. Although the patch is long, its
      actually quite simple: it just #defines a distinct IOMMU_PAGE_SIZE
      and then uses this in all the places that matter.
      
      As a side effect, it also dramatically improves network performance
      on platforms with H-calls on iommu translation inserts/removes (since
      we no longer call it 16 times for a 1500 bytes packet when the iommu HW
      is still 4k).
      
      In the future, we might want to make the IOMMU_PAGE_SIZE a variable
      in the iommu_table instance, thus allowing support for different HW
      page sizes in the iommu itself.
      Signed-off-by: NLinas Vepstas <linas@austin.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NOlof Johansson <olof@lixom.net>
      Acked-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      5d2efba6
  27. 15 6月, 2006 1 次提交
  28. 09 6月, 2006 1 次提交
  29. 26 4月, 2006 1 次提交
  30. 21 4月, 2006 1 次提交
  31. 12 1月, 2006 1 次提交