1. 11 6月, 2015 14 次提交
    • A
      powerpc/iommu/powernv: Release replaced TCE · 05c6cfb9
      Alexey Kardashevskiy 提交于
      At the moment writing new TCE value to the IOMMU table fails with EBUSY
      if there is a valid entry already. However PAPR specification allows
      the guest to write new TCE value without clearing it first.
      
      Another problem this patch is addressing is the use of pool locks for
      external IOMMU users such as VFIO. The pool locks are to protect
      DMA page allocator rather than entries and since the host kernel does
      not control what pages are in use, there is no point in pool locks and
      exchange()+put_page(oldtce) is sufficient to avoid possible races.
      
      This adds an exchange() callback to iommu_table_ops which does the same
      thing as set() plus it returns replaced TCE and DMA direction so
      the caller can release the pages afterwards. The exchange() receives
      a physical address unlike set() which receives linear mapping address;
      and returns a physical address as the clear() does.
      
      This implements exchange() for P5IOC2/IODA/IODA2. This adds a requirement
      for a platform to have exchange() implemented in order to support VFIO.
      
      This replaces iommu_tce_build() and iommu_clear_tce() with
      a single iommu_tce_xchg().
      
      This makes sure that TCE permission bits are not set in TCE passed to
      IOMMU API as those are to be calculated by platform code from
      DMA direction.
      
      This moves SetPageDirty() to the IOMMU code to make it work for both
      VFIO ioctl interface in in-kernel TCE acceleration (when it becomes
      available later).
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      05c6cfb9
    • A
      powerpc/powernv: Implement accessor to TCE entry · c5bb44ed
      Alexey Kardashevskiy 提交于
      This replaces direct accesses to TCE table with a helper which
      returns an TCE entry address. This does not make difference now but will
      when multi-level TCE tables get introduces.
      
      No change in behavior is expected.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c5bb44ed
    • A
      powerpc/powernv/ioda2: Add TCE invalidation for all attached groups · e57080f1
      Alexey Kardashevskiy 提交于
      The iommu_table struct keeps a list of IOMMU groups it is used for.
      At the moment there is just a single group attached but further
      patches will add TCE table sharing. When sharing is enabled, TCE cache
      in each PE needs to be invalidated so does the patch.
      
      This does not change pnv_pci_ioda1_tce_invalidate() as there is no plan
      to enable TCE table sharing on PHBs older than IODA2.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e57080f1
    • A
      powerpc/powernv/ioda2: Move TCE kill register address to PE · 5780fb04
      Alexey Kardashevskiy 提交于
      At the moment the DMA setup code looks for the "ibm,opal-tce-kill"
      property which contains the TCE kill register address. Writing to
      this register invalidates TCE cache on IODA/IODA2 hub.
      
      This moves the register address from iommu_table to pnv_pnb as this
      register belongs to PHB and invalidates TCE cache for all tables of
      all attached PEs.
      
      This moves the property reading/remapping code to a helper which is
      called when DMA is being configured for PE and which does DMA setup
      for both IODA1 and IODA2.
      
      This adds a new pnv_pci_ioda2_tce_invalidate_entire() helper which
      invalidates cache for the entire table. It should be called after
      every call to opal_pci_map_pe_dma_window(). It was not required before
      because there was just a single TCE table and 64bit DMA was handled via
      bypass window (which has no table so no cache was used) but this is going
      to change with Dynamic DMA windows (DDW).
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5780fb04
    • A
      vfio: powerpc/spapr/iommu/powernv/ioda2: Rework IOMMU ownership control · f87a8864
      Alexey Kardashevskiy 提交于
      This adds tce_iommu_take_ownership() and tce_iommu_release_ownership
      which call in a loop iommu_take_ownership()/iommu_release_ownership()
      for every table on the group. As there is just one now, no change in
      behaviour is expected.
      
      At the moment the iommu_table struct has a set_bypass() which enables/
      disables DMA bypass on IODA2 PHB. This is exposed to POWERPC IOMMU code
      which calls this callback when external IOMMU users such as VFIO are
      about to get over a PHB.
      
      The set_bypass() callback is not really an iommu_table function but
      IOMMU/PE function. This introduces a iommu_table_group_ops struct and
      adds take_ownership()/release_ownership() callbacks to it which are
      called when an external user takes/releases control over the IOMMU.
      
      This replaces set_bypass() with ownership callbacks as it is not
      necessarily just bypass enabling, it can be something else/more
      so let's give it more generic name.
      
      The callbacks is implemented for IODA2 only. Other platforms (P5IOC2,
      IODA1) will use the old iommu_take_ownership/iommu_release_ownership API.
      The following patches will replace iommu_take_ownership/
      iommu_release_ownership calls in IODA2 with full IOMMU table release/
      create.
      
      As we here and touching bypass control, this removes
      pnv_pci_ioda2_setup_bypass_pe() as it does not do much
      more compared to pnv_pci_ioda2_set_bypass. This moves tce_bypass_base
      initialization to pnv_pci_ioda2_setup_dma_pe.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f87a8864
    • A
      powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group · 0eaf4def
      Alexey Kardashevskiy 提交于
      So far one TCE table could only be used by one IOMMU group. However
      IODA2 hardware allows programming the same TCE table address to
      multiple PE allowing sharing tables.
      
      This replaces a single pointer to a group in a iommu_table struct
      with a linked list of groups which provides the way of invalidating
      TCE cache for every PE when an actual TCE table is updated. This adds
      pnv_pci_link_table_and_group() and pnv_pci_unlink_table_and_group()
      helpers to manage the list. However without VFIO, it is still going
      to be a single IOMMU group per iommu_table.
      
      This changes iommu_add_device() to add a device to a first group
      from the group list of a table as it is only called from the platform
      init code or PCI bus notifier and at these moments there is only
      one group per table.
      
      This does not change TCE invalidation code to loop through all
      attached groups in order to simplify this patch and because
      it is not really needed in most cases. IODA2 is fixed in a later
      patch.
      
      This should cause no behavioural change.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0eaf4def
    • A
      powerpc/spapr: vfio: Replace iommu_table with iommu_table_group · b348aa65
      Alexey Kardashevskiy 提交于
      Modern IBM POWERPC systems support multiple (currently two) TCE tables
      per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
      for TCE tables. Right now just one table is supported.
      
      This defines iommu_table_group struct which stores pointers to
      iommu_group and iommu_table(s). This replaces iommu_table with
      iommu_table_group where iommu_table was used to identify a group:
      - iommu_register_group();
      - iommudata of generic iommu_group;
      
      This removes @data from iommu_table as it_table_group provides
      same access to pnv_ioda_pe.
      
      For IODA, instead of embedding iommu_table, the new iommu_table_group
      keeps pointers to those. The iommu_table structs are allocated
      dynamically.
      
      For P5IOC2, both iommu_table_group and iommu_table are embedded into
      PE struct. As there is no EEH and SRIOV support for P5IOC2,
      iommu_free_table() should not be called on iommu_table struct pointers
      so we can keep it embedded in pnv_phb::p5ioc2.
      
      For pSeries, this replaces multiple calls of kzalloc_node() with a new
      iommu_pseries_alloc_group() helper and stores the table group struct
      pointer into the pci_dn struct. For release, a iommu_table_free_group()
      helper is added.
      
      This moves iommu_table struct allocation from SR-IOV code to
      the generic DMA initialization code in pnv_pci_ioda_setup_dma_pe and
      pnv_pci_ioda2_setup_dma_pe as this is where DMA is actually initialized.
      This change is here because those lines had to be changed anyway.
      
      This should cause no behavioural change.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b348aa65
    • A
      powerpc/powernv/ioda/ioda2: Rework TCE invalidation in tce_build()/tce_free() · decbda25
      Alexey Kardashevskiy 提交于
      The pnv_pci_ioda_tce_invalidate() helper invalidates TCE cache. It is
      supposed to be called on IODA1/2 and not called on p5ioc2. It receives
      start and end host addresses of TCE table.
      
      IODA2 actually needs PCI addresses to invalidate the cache. Those
      can be calculated from host addresses but since we are going
      to implement multi-level TCE tables, calculating PCI address from
      a host address might get either tricky or ugly as TCE table remains flat
      on PCI bus but not in RAM.
      
      This moves pnv_pci_ioda_tce_invalidate() from generic pnv_tce_build/
      pnt_tce_free and defines IODA1/2-specific callbacks which call generic
      ones and do PHB-model-specific TCE cache invalidation. P5IOC2 keeps
      using generic callbacks as before.
      
      This changes pnv_pci_ioda2_tce_invalidate() to receives TCE index and
      number of pages which are PCI addresses shifted by IOMMU page shift.
      
      No change in behaviour is expected.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      decbda25
    • A
      powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table · da004c36
      Alexey Kardashevskiy 提交于
      This adds a iommu_table_ops struct and puts pointer to it into
      the iommu_table struct. This moves tce_build/tce_free/tce_get/tce_flush
      callbacks from ppc_md to the new struct where they really belong to.
      
      This adds the requirement for @it_ops to be initialized before calling
      iommu_init_table() to make sure that we do not leave any IOMMU table
      with iommu_table_ops uninitialized. This is not a parameter of
      iommu_init_table() though as there will be cases when iommu_init_table()
      will not be called on TCE tables, for example - VFIO.
      
      This does s/tce_build/set/, s/tce_free/clear/ and removes "tce_"
      redundant prefixes.
      
      This removes tce_xxx_rm handlers from ppc_md but does not add
      them to iommu_table_ops as this will be done later if we decide to
      support TCE hypercalls in real mode. This removes _vm callbacks as
      only virtual mode is supported by now so this also removes @rm parameter.
      
      For pSeries, this always uses tce_buildmulti_pSeriesLP/
      tce_buildmulti_pSeriesLP. This changes multi callback to fall back to
      tce_build_pSeriesLP/tce_free_pSeriesLP if FW_FEATURE_MULTITCE is not
      present. The reason for this is we still have to support "multitce=off"
      boot parameter in disable_multitce() and we do not want to walk through
      all IOMMU tables in the system and replace "multi" callbacks with single
      ones.
      
      For powernv, this defines _ops per PHB type which are P5IOC2/IODA1/IODA2.
      This makes the callbacks for them public. Later patches will extend
      callbacks for IODA1/2.
      
      No change in behaviour is expected.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      da004c36
    • A
      powerpc/powernv: Do not set "read" flag if direction==DMA_NONE · 10b35b2b
      Alexey Kardashevskiy 提交于
      Normally a bitmap from the iommu_table is used to track what TCE entry
      is in use. Since we are going to use iommu_table without its locks and
      do xchg() instead, it becomes essential not to put bits which are not
      implied in the direction flag as the old TCE value (more precisely -
      the permission bits) will be used to decide whether to put the page or not.
      
      This adds iommu_direction_to_tce_perm() (its counterpart is there already)
      and uses it for powernv's pnv_tce_build().
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      10b35b2b
    • A
      powerpc/iommu: Put IOMMU group explicitly · ac9a5889
      Alexey Kardashevskiy 提交于
      So far an iommu_table lifetime was the same as PE. Dynamic DMA windows
      will change this and iommu_free_table() will not always require
      the group to be released.
      
      This moves iommu_group_put() out of iommu_free_table().
      
      This adds a iommu_pseries_free_table() helper which does
      iommu_group_put() and iommu_free_table(). Later it will be
      changed to receive a table_group and we will have to change less
      lines then.
      
      This should cause no behavioural change.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ac9a5889
    • A
      powerpc/powernv/ioda: Clean up IOMMU group registration · c5773822
      Alexey Kardashevskiy 提交于
      The existing code has 3 calls to iommu_register_group() and
      all 3 branches actually cover all possible cases.
      
      This replaces 3 calls with one and moves the registration earlier;
      the latter will make more sense when we add TCE table sharing.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c5773822
    • A
      powerpc/iommu/powernv: Get rid of set_iommu_table_base_and_group · 4617082e
      Alexey Kardashevskiy 提交于
      The set_iommu_table_base_and_group() name suggests that the function
      sets table base and add a device to an IOMMU group.
      
      The actual purpose for table base setting is to put some reference
      into a device so later iommu_add_device() can get the IOMMU group
      reference and the device to the group.
      
      At the moment a group cannot be explicitly passed to iommu_add_device()
      as we want it to work from the bus notifier, we can fix it later and
      remove confusing calls of set_iommu_table_base().
      
      This replaces set_iommu_table_base_and_group() with a couple of
      set_iommu_table_base() + iommu_add_device() which makes reading the code
      easier.
      
      This adds few comments why set_iommu_table_base() and iommu_add_device()
      are called where they are called.
      
      For IODA1/2, this essentially removes iommu_add_device() call from
      the pnv_pci_ioda_dma_dev_setup() as it will always fail at this particular
      place:
      - for physical PE, the device is already attached by iommu_add_device()
      in pnv_pci_ioda_setup_dma_pe();
      - for virtual PE, the sysfs entries are not ready to create all symlinks
      so actual adding is happening in tce_iommu_bus_notifier.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4617082e
    • A
      powerpc/eeh/ioda2: Use device::iommu_group to check IOMMU group · ea30e99e
      Alexey Kardashevskiy 提交于
      This relies on the fact that a PCI device always has an IOMMU table
      which may not be the case when we get dynamic DMA windows so
      let's use more reliable check for IOMMU group here.
      
      As we do not rely on the table presence here, remove the workaround
      from pnv_pci_ioda2_set_bypass(); also remove the @add_to_iommu_group
      parameter from pnv_ioda_setup_bus_dma().
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ea30e99e
  2. 05 6月, 2015 3 次提交
  3. 04 6月, 2015 1 次提交
  4. 03 6月, 2015 2 次提交
  5. 02 6月, 2015 5 次提交
  6. 22 5月, 2015 11 次提交
  7. 18 5月, 2015 1 次提交
  8. 13 5月, 2015 2 次提交
  9. 11 5月, 2015 1 次提交