1. 28 6月, 2017 1 次提交
  2. 20 4月, 2017 1 次提交
  3. 30 3月, 2017 2 次提交
  4. 04 8月, 2016 1 次提交
    • K
      dma-mapping: use unsigned long for dma_attrs · 00085f1e
      Krzysztof Kozlowski 提交于
      The dma-mapping core and the implementations do not change the DMA
      attributes passed by pointer.  Thus the pointer can point to const data.
      However the attributes do not have to be a bitfield.  Instead unsigned
      long will do fine:
      
      1. This is just simpler.  Both in terms of reading the code and setting
         attributes.  Instead of initializing local attributes on the stack
         and passing pointer to it to dma_set_attr(), just set the bits.
      
      2. It brings safeness and checking for const correctness because the
         attributes are passed by value.
      
      Semantic patches for this change (at least most of them):
      
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
      
          @@
          f(...,
          - struct dma_attrs *attrs
          + unsigned long attrs
          , ...)
          {
          ...
          }
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      and
      
          // Options: --all-includes
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
          type t;
      
          @@
          t f(..., struct dma_attrs *attrs);
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.comSigned-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: NRobin Murphy <robin.murphy@arm.com>
      Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Acked-by: Mark Salter <msalter@redhat.com> [c6x]
      Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
      Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
      Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
      Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
      Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
      Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
      Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
      Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00085f1e
  5. 21 7月, 2016 1 次提交
    • B
      powerpc/dart: Use a cachable DART · c40785ad
      Benjamin Herrenschmidt 提交于
      Instead of punching a hole in the linear mapping, just use normal
      cachable memory, and apply the flush sequence documented in the
      CPC625 (aka U3) user manual.
      
      This allows us to remove quite a bit of code related to the early
      allocation of the DART and the hole in the linear mapping. We can
      also get rid of the copy of the DART for suspend/resume as the
      original memory can just be saved/restored now, as long as we
      properly sync the caches.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [mpe: Integrate dart_init() fix to return ENODEV when DART disabled]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c40785ad
  6. 13 7月, 2015 1 次提交
  7. 11 6月, 2015 13 次提交
    • A
      vfio: powerpc/spapr: Support Dynamic DMA windows · e633bc86
      Alexey Kardashevskiy 提交于
      This adds create/remove window ioctls to create and remove DMA windows.
      sPAPR defines a Dynamic DMA windows capability which allows
      para-virtualized guests to create additional DMA windows on a PCI bus.
      The existing linux kernels use this new window to map the entire guest
      memory and switch to the direct DMA operations saving time on map/unmap
      requests which would normally happen in a big amounts.
      
      This adds 2 ioctl handlers - VFIO_IOMMU_SPAPR_TCE_CREATE and
      VFIO_IOMMU_SPAPR_TCE_REMOVE - to create and remove windows.
      Up to 2 windows are supported now by the hardware and by this driver.
      
      This changes VFIO_IOMMU_SPAPR_TCE_GET_INFO handler to return additional
      information such as a number of supported windows and maximum number
      levels of TCE tables.
      
      DDW is added as a capability, not as a SPAPR TCE IOMMU v2 unique feature
      as we still want to support v2 on platforms which cannot do DDW for
      the sake of TCE acceleration in KVM (coming soon).
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e633bc86
    • A
      vfio: powerpc/spapr: Register memory and define IOMMU v2 · 2157e7b8
      Alexey Kardashevskiy 提交于
      The existing implementation accounts the whole DMA window in
      the locked_vm counter. This is going to be worse with multiple
      containers and huge DMA windows. Also, real-time accounting would requite
      additional tracking of accounted pages due to the page size difference -
      IOMMU uses 4K pages and system uses 4K or 64K pages.
      
      Another issue is that actual pages pinning/unpinning happens on every
      DMA map/unmap request. This does not affect the performance much now as
      we spend way too much time now on switching context between
      guest/userspace/host but this will start to matter when we add in-kernel
      DMA map/unmap acceleration.
      
      This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU.
      New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces
      2 new ioctls to register/unregister DMA memory -
      VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY -
      which receive user space address and size of a memory region which
      needs to be pinned/unpinned and counted in locked_vm.
      New IOMMU splits physical pages pinning and TCE table update
      into 2 different operations. It requires:
      1) guest pages to be registered first
      2) consequent map/unmap requests to work only with pre-registered memory.
      For the default single window case this means that the entire guest
      (instead of 2GB) needs to be pinned before using VFIO.
      When a huge DMA window is added, no additional pinning will be
      required, otherwise it would be guest RAM + 2GB.
      
      The new memory registration ioctls are not supported by
      VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration
      will require memory to be preregistered in order to work.
      
      The accounting is done per the user process.
      
      This advertises v2 SPAPR TCE IOMMU and restricts what the userspace
      can do with v1 or v2 IOMMUs.
      
      In order to support memory pre-registration, we need a way to track
      the use of every registered memory region and only allow unregistration
      if a region is not in use anymore. So we need a way to tell from what
      region the just cleared TCE was from.
      
      This adds a userspace view of the TCE table into iommu_table struct.
      It contains userspace address, one per TCE entry. The table is only
      allocated when the ownership over an IOMMU group is taken which means
      it is only used from outside of the powernv code (such as VFIO).
      
      As v2 IOMMU supports IODA2 and pre-IODA2 IOMMUs (which do not support
      DDW API), this creates a default DMA window for IODA2 for consistency.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2157e7b8
    • A
      powerpc/iommu/ioda2: Add get_table_size() to calculate the size of future table · 00547193
      Alexey Kardashevskiy 提交于
      This adds a way for the IOMMU user to know how much a new table will
      use so it can be accounted in the locked_vm limit before allocation
      happens.
      
      This stores the allocated table size in pnv_pci_ioda2_get_table_size()
      so the locked_vm counter can be updated correctly when a table is
      being disposed.
      
      This defines an iommu_table_group_ops callback to let VFIO know
      how much memory will be locked if a table is created.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      00547193
    • A
      vfio: powerpc/spapr: powerpc/powernv/ioda: Define and implement DMA windows API · 4793d65d
      Alexey Kardashevskiy 提交于
      This extends iommu_table_group_ops by a set of callbacks to support
      dynamic DMA windows management.
      
      create_table() creates a TCE table with specific parameters.
      it receives iommu_table_group to know nodeid in order to allocate
      TCE table memory closer to the PHB. The exact format of allocated
      multi-level table might be also specific to the PHB model (not
      the case now though).
      This callback calculated the DMA window offset on a PCI bus from @num
      and stores it in a just created table.
      
      set_window() sets the window at specified TVT index + @num on PHB.
      
      unset_window() unsets the window from specified TVT.
      
      This adds a free() callback to iommu_table_ops to free the memory
      (potentially a tree of tables) allocated for the TCE table.
      
      create_table() and free() are supposed to be called once per
      VFIO container and set_window()/unset_window() are supposed to be
      called for every group in a container.
      
      This adds IOMMU capabilities to iommu_table_group such as default
      32bit window parameters and others. This makes use of new values in
      vfio_iommu_spapr_tce. IODA1/P5IOC2 do not support DDW so they do not
      advertise pagemasks to the userspace.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4793d65d
    • A
      powerpc/powernv: Implement multilevel TCE tables · bbb845c4
      Alexey Kardashevskiy 提交于
      TCE tables might get too big in case of 4K IOMMU pages and DDW enabled
      on huge guests (hundreds of GB of RAM) so the kernel might be unable to
      allocate contiguous chunk of physical memory to store the TCE table.
      
      To address this, POWER8 CPU (actually, IODA2) supports multi-level
      TCE tables, up to 5 levels which splits the table into a tree of
      smaller subtables.
      
      This adds multi-level TCE tables support to
      pnv_pci_ioda2_table_alloc_pages() and pnv_pci_ioda2_table_free_pages()
      helpers.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bbb845c4
    • A
      powerpc/iommu/powernv: Release replaced TCE · 05c6cfb9
      Alexey Kardashevskiy 提交于
      At the moment writing new TCE value to the IOMMU table fails with EBUSY
      if there is a valid entry already. However PAPR specification allows
      the guest to write new TCE value without clearing it first.
      
      Another problem this patch is addressing is the use of pool locks for
      external IOMMU users such as VFIO. The pool locks are to protect
      DMA page allocator rather than entries and since the host kernel does
      not control what pages are in use, there is no point in pool locks and
      exchange()+put_page(oldtce) is sufficient to avoid possible races.
      
      This adds an exchange() callback to iommu_table_ops which does the same
      thing as set() plus it returns replaced TCE and DMA direction so
      the caller can release the pages afterwards. The exchange() receives
      a physical address unlike set() which receives linear mapping address;
      and returns a physical address as the clear() does.
      
      This implements exchange() for P5IOC2/IODA/IODA2. This adds a requirement
      for a platform to have exchange() implemented in order to support VFIO.
      
      This replaces iommu_tce_build() and iommu_clear_tce() with
      a single iommu_tce_xchg().
      
      This makes sure that TCE permission bits are not set in TCE passed to
      IOMMU API as those are to be calculated by platform code from
      DMA direction.
      
      This moves SetPageDirty() to the IOMMU code to make it work for both
      VFIO ioctl interface in in-kernel TCE acceleration (when it becomes
      available later).
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      05c6cfb9
    • A
      vfio: powerpc/spapr/iommu/powernv/ioda2: Rework IOMMU ownership control · f87a8864
      Alexey Kardashevskiy 提交于
      This adds tce_iommu_take_ownership() and tce_iommu_release_ownership
      which call in a loop iommu_take_ownership()/iommu_release_ownership()
      for every table on the group. As there is just one now, no change in
      behaviour is expected.
      
      At the moment the iommu_table struct has a set_bypass() which enables/
      disables DMA bypass on IODA2 PHB. This is exposed to POWERPC IOMMU code
      which calls this callback when external IOMMU users such as VFIO are
      about to get over a PHB.
      
      The set_bypass() callback is not really an iommu_table function but
      IOMMU/PE function. This introduces a iommu_table_group_ops struct and
      adds take_ownership()/release_ownership() callbacks to it which are
      called when an external user takes/releases control over the IOMMU.
      
      This replaces set_bypass() with ownership callbacks as it is not
      necessarily just bypass enabling, it can be something else/more
      so let's give it more generic name.
      
      The callbacks is implemented for IODA2 only. Other platforms (P5IOC2,
      IODA1) will use the old iommu_take_ownership/iommu_release_ownership API.
      The following patches will replace iommu_take_ownership/
      iommu_release_ownership calls in IODA2 with full IOMMU table release/
      create.
      
      As we here and touching bypass control, this removes
      pnv_pci_ioda2_setup_bypass_pe() as it does not do much
      more compared to pnv_pci_ioda2_set_bypass. This moves tce_bypass_base
      initialization to pnv_pci_ioda2_setup_dma_pe.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f87a8864
    • A
      powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group · 0eaf4def
      Alexey Kardashevskiy 提交于
      So far one TCE table could only be used by one IOMMU group. However
      IODA2 hardware allows programming the same TCE table address to
      multiple PE allowing sharing tables.
      
      This replaces a single pointer to a group in a iommu_table struct
      with a linked list of groups which provides the way of invalidating
      TCE cache for every PE when an actual TCE table is updated. This adds
      pnv_pci_link_table_and_group() and pnv_pci_unlink_table_and_group()
      helpers to manage the list. However without VFIO, it is still going
      to be a single IOMMU group per iommu_table.
      
      This changes iommu_add_device() to add a device to a first group
      from the group list of a table as it is only called from the platform
      init code or PCI bus notifier and at these moments there is only
      one group per table.
      
      This does not change TCE invalidation code to loop through all
      attached groups in order to simplify this patch and because
      it is not really needed in most cases. IODA2 is fixed in a later
      patch.
      
      This should cause no behavioural change.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0eaf4def
    • A
      powerpc/spapr: vfio: Replace iommu_table with iommu_table_group · b348aa65
      Alexey Kardashevskiy 提交于
      Modern IBM POWERPC systems support multiple (currently two) TCE tables
      per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
      for TCE tables. Right now just one table is supported.
      
      This defines iommu_table_group struct which stores pointers to
      iommu_group and iommu_table(s). This replaces iommu_table with
      iommu_table_group where iommu_table was used to identify a group:
      - iommu_register_group();
      - iommudata of generic iommu_group;
      
      This removes @data from iommu_table as it_table_group provides
      same access to pnv_ioda_pe.
      
      For IODA, instead of embedding iommu_table, the new iommu_table_group
      keeps pointers to those. The iommu_table structs are allocated
      dynamically.
      
      For P5IOC2, both iommu_table_group and iommu_table are embedded into
      PE struct. As there is no EEH and SRIOV support for P5IOC2,
      iommu_free_table() should not be called on iommu_table struct pointers
      so we can keep it embedded in pnv_phb::p5ioc2.
      
      For pSeries, this replaces multiple calls of kzalloc_node() with a new
      iommu_pseries_alloc_group() helper and stores the table group struct
      pointer into the pci_dn struct. For release, a iommu_table_free_group()
      helper is added.
      
      This moves iommu_table struct allocation from SR-IOV code to
      the generic DMA initialization code in pnv_pci_ioda_setup_dma_pe and
      pnv_pci_ioda2_setup_dma_pe as this is where DMA is actually initialized.
      This change is here because those lines had to be changed anyway.
      
      This should cause no behavioural change.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b348aa65
    • A
      powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table · da004c36
      Alexey Kardashevskiy 提交于
      This adds a iommu_table_ops struct and puts pointer to it into
      the iommu_table struct. This moves tce_build/tce_free/tce_get/tce_flush
      callbacks from ppc_md to the new struct where they really belong to.
      
      This adds the requirement for @it_ops to be initialized before calling
      iommu_init_table() to make sure that we do not leave any IOMMU table
      with iommu_table_ops uninitialized. This is not a parameter of
      iommu_init_table() though as there will be cases when iommu_init_table()
      will not be called on TCE tables, for example - VFIO.
      
      This does s/tce_build/set/, s/tce_free/clear/ and removes "tce_"
      redundant prefixes.
      
      This removes tce_xxx_rm handlers from ppc_md but does not add
      them to iommu_table_ops as this will be done later if we decide to
      support TCE hypercalls in real mode. This removes _vm callbacks as
      only virtual mode is supported by now so this also removes @rm parameter.
      
      For pSeries, this always uses tce_buildmulti_pSeriesLP/
      tce_buildmulti_pSeriesLP. This changes multi callback to fall back to
      tce_build_pSeriesLP/tce_free_pSeriesLP if FW_FEATURE_MULTITCE is not
      present. The reason for this is we still have to support "multitce=off"
      boot parameter in disable_multitce() and we do not want to walk through
      all IOMMU tables in the system and replace "multi" callbacks with single
      ones.
      
      For powernv, this defines _ops per PHB type which are P5IOC2/IODA1/IODA2.
      This makes the callbacks for them public. Later patches will extend
      callbacks for IODA1/2.
      
      No change in behaviour is expected.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      da004c36
    • A
      powerpc/powernv: Do not set "read" flag if direction==DMA_NONE · 10b35b2b
      Alexey Kardashevskiy 提交于
      Normally a bitmap from the iommu_table is used to track what TCE entry
      is in use. Since we are going to use iommu_table without its locks and
      do xchg() instead, it becomes essential not to put bits which are not
      implied in the direction flag as the old TCE value (more precisely -
      the permission bits) will be used to decide whether to put the page or not.
      
      This adds iommu_direction_to_tce_perm() (its counterpart is there already)
      and uses it for powernv's pnv_tce_build().
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      10b35b2b
    • A
      vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU driver · 9b14a1ff
      Alexey Kardashevskiy 提交于
      This moves page pinning (get_user_pages_fast()/put_page()) code out of
      the platform IOMMU code and puts it to VFIO IOMMU driver where it belongs
      to as the platform code does not deal with page pinning.
      
      This makes iommu_take_ownership()/iommu_release_ownership() deal with
      the IOMMU table bitmap only.
      
      This removes page unpinning from iommu_take_ownership() as the actual
      TCE table might contain garbage and doing put_page() on it is undefined
      behaviour.
      
      Besides the last part, the rest of the patch is mechanical.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9b14a1ff
    • A
      powerpc/iommu/powernv: Get rid of set_iommu_table_base_and_group · 4617082e
      Alexey Kardashevskiy 提交于
      The set_iommu_table_base_and_group() name suggests that the function
      sets table base and add a device to an IOMMU group.
      
      The actual purpose for table base setting is to put some reference
      into a device so later iommu_add_device() can get the IOMMU group
      reference and the device to the group.
      
      At the moment a group cannot be explicitly passed to iommu_add_device()
      as we want it to work from the bus notifier, we can fix it later and
      remove confusing calls of set_iommu_table_base().
      
      This replaces set_iommu_table_base_and_group() with a couple of
      set_iommu_table_base() + iommu_add_device() which makes reading the code
      easier.
      
      This adds few comments why set_iommu_table_base() and iommu_add_device()
      are called where they are called.
      
      For IODA1/2, this essentially removes iommu_add_device() call from
      the pnv_pci_ioda_dma_dev_setup() as it will always fail at this particular
      place:
      - for physical PE, the device is already attached by iommu_add_device()
      in pnv_pci_ioda_setup_dma_pe();
      - for virtual PE, the sysfs entries are not ready to create all symlinks
      so actual adding is happening in tce_iommu_bus_notifier.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4617082e
  8. 11 4月, 2015 1 次提交
  9. 31 3月, 2015 1 次提交
    • W
      powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically · 9e8d4a19
      Wei Yang 提交于
      Previously the iommu_table had the same lifetime as a struct pnv_ioda_pe
      and was embedded in it. The pnv_ioda_pe was assigned to a PE on the bootup
      stage. Since PEs are based on the hardware layout which is static in the
      system, they will never get released. This means the iommu_table in the
      pnv_ioda_pe will never get released either.
      
      This no longer works for VF PE. VF PEs are created and released dynamically
      when VFs are created and released. So we need to assign pnv_ioda_pe to VF
      PEs respectively when VFs are enabled and clean up those resources for VF
      PE when VFs are disabled. And iommu_table is one of the resources we need
      to handle dynamically.
      
      Current iommu_table is a static field in pnv_ioda_pe, which will face a
      problem when freeing it. During the disabling of a VF,
      pnv_pci_ioda2_release_dma_pe will call iommu_free_table to release the
      iommu_table for this PE. A static iommu_table will fail in
      iommu_free_table.
      
      According to these requirement, this patch allocates iommu_table
      dynamically.
      Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9e8d4a19
  10. 04 3月, 2015 1 次提交
  11. 18 11月, 2014 1 次提交
  12. 11 2月, 2014 1 次提交
    • B
      powerpc/powernv: Add iommu DMA bypass support for IODA2 · cd15b048
      Benjamin Herrenschmidt 提交于
      This patch adds the support for to create a direct iommu "bypass"
      window on IODA2 bridges (such as Power8) allowing to bypass iommu
      page translation completely for 64-bit DMA capable devices, thus
      significantly improving DMA performances.
      
      Additionally, this adds a hook to the struct iommu_table so that
      the IOMMU API / VFIO can disable the bypass when external ownership
      is requested, since in that case, the device will be used by an
      environment such as userspace or a KVM guest which must not be
      allowed to bypass translations.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cd15b048
  13. 30 12月, 2013 3 次提交
  14. 05 12月, 2013 1 次提交
    • A
      PPC: POWERNV: move iommu_add_device earlier · d905c5df
      Alexey Kardashevskiy 提交于
      The current implementation of IOMMU on sPAPR does not use iommu_ops
      and therefore does not call IOMMU API's bus_set_iommu() which
      1) sets iommu_ops for a bus
      2) registers a bus notifier
      Instead, PCI devices are added to IOMMU groups from
      subsys_initcall_sync(tce_iommu_init) which does basically the same
      thing without using iommu_ops callbacks.
      
      However Freescale PAMU driver (https://lkml.org/lkml/2013/7/1/158)
      implements iommu_ops and when tce_iommu_init is called, every PCI device
      is already added to some group so there is a conflict.
      
      This patch does 2 things:
      1. removes the loop in which PCI devices were added to groups and
      adds explicit iommu_add_device() calls to add devices as soon as they get
      the iommu_table pointer assigned to them.
      2. moves a bus notifier to powernv code in order to avoid conflict with
      the notifier from Freescale driver.
      
      iommu_add_device() and iommu_del_device() are public now.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d905c5df
  15. 01 7月, 2013 1 次提交
  16. 20 6月, 2013 1 次提交
    • A
      powerpc/vfio: Enable on PowerNV platform · 4e13c1ac
      Alexey Kardashevskiy 提交于
      This initializes IOMMU groups based on the IOMMU configuration
      discovered during the PCI scan on POWERNV (POWER non virtualized)
      platform.  The IOMMU groups are to be used later by the VFIO driver,
      which is used for PCI pass through.
      
      It also implements an API for mapping/unmapping pages for
      guest PCI drivers and providing DMA window properties.
      This API is going to be used later by QEMU-VFIO to handle
      h_put_tce hypercalls from the KVM guest.
      
      The iommu_put_tce_user_mode() does only a single page mapping
      as an API for adding many mappings at once is going to be
      added later.
      
      Although this driver has been tested only on the POWERNV
      platform, it should work on any platform which supports
      TCE tables.  As h_put_tce hypercall is received by the host
      kernel and processed by the QEMU (what involves calling
      the host kernel again), performance is not the best -
      circa 220MB/s on 10Gb ethernet network.
      
      To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config
      option and configure VFIO as required.
      
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4e13c1ac
  17. 03 7月, 2012 1 次提交
    • A
      powerpc/iommu: Implement IOMMU pools to improve multiqueue adapter performance · b4c3a872
      Anton Blanchard 提交于
      At the moment all queues in a multiqueue adapter will serialise
      against the IOMMU table lock. This is proving to be a big issue,
      especially with 10Gbit ethernet.
      
      This patch creates 4 pools and tries to spread the load across
      them. If the table is under 1GB in size we revert back to the
      original behaviour of 1 pool and 1 largealloc pool.
      
      We create a hash to map CPUs to pools. Since we prefer interrupts to
      be affinitised to primary CPUs, without some form of hashing we are
      very likely to end up using the same pool. As an example, POWER7
      has 4 way SMT and with 4 pools all primary threads will map to the
      same pool.
      
      The largealloc pool is reduced from 1/2 to 1/4 of the space to
      partially offset the overhead of breaking the table up into pools.
      
      Some performance numbers were obtained with a Chelsio T3 adapter on
      two POWER7 boxes, running a 100 session TCP round robin test.
      
      Performance improved 69% with this patch applied.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b4c3a872
  18. 28 3月, 2012 1 次提交
  19. 24 9月, 2009 1 次提交
  20. 20 8月, 2009 1 次提交
  21. 15 6月, 2009 1 次提交
  22. 31 10月, 2008 1 次提交
    • M
      powerpc: Update remaining dma_mapping_ops to use map/unmap_page · f9226d57
      Mark Nelson 提交于
      After the merge of the 32 and 64bit DMA code, dma_direct_ops lost
      their map/unmap_single() functions but gained map/unmap_page().  This
      caused a problem for Cell because Cell's dma_iommu_fixed_ops called
      the dma_direct_ops if the fixed linear mapping was to be used or the
      iommu ops if the dynamic window was to be used.  So in order to fix
      this problem we need to update the 64bit DMA code to use
      map/unmap_page.
      
      First, we update the generic IOMMU code so that iommu_map_single()
      becomes iommu_map_page() and iommu_unmap_single() becomes
      iommu_unmap_page().  Then we propagate these changes up through all
      the callers of these two functions and in the process update all the
      dma_mapping_ops so that they have map/unmap_page rahter than
      map/unmap_single.  We can do this because on 64bit there is no HIGHMEM
      memory so map/unmap_page ends up performing exactly the same function
      as map/unmap_single, just taking different arguments.
      
      This has no affect on drivers because the dma_map_single_attrs() just
      ends up calling the map_page() function of the appropriate
      dma_mapping_ops and similarly the dma_unmap_single_attrs() calls
      unmap_page().
      
      This fixes an oops on Cell blades, which oops on boot without this
      because they call dma_direct_ops.map_single, which is NULL.
      Signed-off-by: NMark Nelson <markn@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f9226d57
  23. 04 8月, 2008 1 次提交
  24. 09 7月, 2008 2 次提交