1. 12 7月, 2016 1 次提交
  2. 05 7月, 2016 1 次提交
    • A
      spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) · ae4de14c
      Alexey Kardashevskiy 提交于
      This adds support for Dynamic DMA Windows (DDW) option defined by
      the SPAPR specification which allows to have additional DMA window(s)
      
      The "ddw" property is enabled by default on a PHB but for compatibility
      the pseries-2.6 machine and older disable it.
      This also creates a single DMA window for the older machines to
      maintain backward migration.
      
      This implements DDW for PHB with emulated and VFIO devices. The host
      kernel support is required. The advertised IOMMU page sizes are 4K and
      64K; 16M pages are supported but not advertised by default, in order to
      enable them, the user has to specify "pgsz" property for PHB and
      enable huge pages for RAM.
      
      The existing linux guests try creating one additional huge DMA window
      with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
      the guest switches to dma_direct_ops and never calls TCE hypercalls
      (H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
      and not waste time on map/unmap later. This adds a "dma64_win_addr"
      property which is a bus address for the 64bit window and by default
      set to 0x800.0000.0000.0000 as this is what the modern POWER8 hardware
      uses and this allows having emulated and VFIO devices on the same bus.
      
      This adds 4 RTAS handlers:
      * ibm,query-pe-dma-window
      * ibm,create-pe-dma-window
      * ibm,remove-pe-dma-window
      * ibm,reset-pe-dma-window
      These are registered from type_init() callback.
      
      These RTAS handlers are implemented in a separate file to avoid polluting
      spapr_iommu.c with PCI.
      
      This changes sPAPRPHBState::dma_liobn to an array to allow 2 LIOBNs
      and updates all references to dma_liobn. However this does not add
      64bit LIOBN to the migration stream as in fact even 32bit LIOBN is
      rather pointless there (as it is a PHB property and the management
      software can/should pass LIOBNs via CLI) but we keep it for the backward
      migration support.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      ae4de14c
  3. 04 7月, 2016 2 次提交
  4. 01 7月, 2016 1 次提交
  5. 29 6月, 2016 1 次提交
  6. 07 6月, 2016 1 次提交
  7. 16 3月, 2016 4 次提交
    • D
      spapr_pci: Remove finish_realize hook · a36304fd
      David Gibson 提交于
      Now that spapr-pci-vfio-host-bridge is reduced to just a stub, there is
      only one implementation of the finish_realize hook in sPAPRPHBClass.  So,
      we can fold that implementation into its (single) caller, and remove the
      hook.  That's the last thing left in sPAPRPHBClass, so that can go away as
      well.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      a36304fd
    • D
      spapr_pci: (Mostly) remove spapr-pci-vfio-host-bridge · 72700d7e
      David Gibson 提交于
      Now that the regular spapr-pci-host-bridge can handle EEH, there are only
      two things that spapr-pci-vfio-host-bridge does differently:
          1. automatically sizes its DMA window to match the host IOMMU
          2. checks if the attached VFIO container is backed by the
             VFIO_SPAPR_TCE_IOMMU type on the host
      
      (1) is not particularly useful, since the default window used by the
      regular host bridge will work with the host IOMMU configuration on all
      current systems anyway.
      
      Plus, automatically changing guest visible configuration (such as the DMA
      window) based on host settings is generally a bad idea.  It's not
      definitively broken, since spapr-pci-vfio-host-bridge is only supposed to
      support VFIO devices which can't be migrated anyway, but still.
      
      (2) is not really useful, because if a guest tries to configure EEH on a
      different host IOMMU, the first call will fail and that will be that.
      
      It's possible there are scripts or tools out there which expect
      spapr-pci-vfio-host-bridge, so we don't remove it entirely.  This patch
      reduces it to just a stub for backwards compatibility.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      72700d7e
    • D
      spapr_pci: Allow EEH on spapr-pci-host-bridge · c1fa017c
      David Gibson 提交于
      Now that the EEH code is independent of the special
      spapr-vfio-pci-host-bridge device, we can allow it on all spapr PCI
      host bridges instead.  We do this by changing spapr_phb_eeh_available()
      to be based on the vfio_eeh_as_ok() call instead of the host bridge class.
      
      Because the value of vfio_eeh_as_ok() can change with devices being
      hotplugged or unplugged, this can potentially lead to some strange edge
      cases where the guest starts using EEH, then it starts failing because
      of a change in status.
      
      However, it's not really any worse than the current situation.  Cases that
      would have worked previously will still work (i.e. VFIO devices from at
      most one VFIO IOMMU group per vPHB), it's just that it's no longer
      necessary to use spapr-vfio-pci-host-bridge with the groupid pre-specified.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      c1fa017c
    • D
      spapr_pci: Eliminate class callbacks · fbb4e983
      David Gibson 提交于
      The EEH operations in the spapr-vfio-pci-host-bridge no longer rely on the
      special groupid field in sPAPRPHBVFIOState.  So we can simplify, removing
      the class specific callbacks with direct calls based on a simple
      spapr_phb_eeh_enabled() helper.  For now we implement that in terms of
      a boolean in the class, but we'll continue to clean that up later.
      
      On its own this is a rather strange way of doing things, but it's a useful
      intermediate step to further cleanups.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      fbb4e983
  8. 22 12月, 2015 1 次提交
  9. 23 10月, 2015 1 次提交
  10. 07 7月, 2015 1 次提交
    • D
      spapr: Merge sPAPREnvironment into sPAPRMachineState · 28e02042
      David Gibson 提交于
      The code for -machine pseries maintains a global sPAPREnvironment structure
      which keeps track of general state information about the guest platform.
      This predates the existence of the MachineState structure, but performs
      basically the same function.
      
      Now that we have the generic MachineState, fold sPAPREnvironment into
      sPAPRMachineState, the pseries specific subclass of MachineState.
      
      This is mostly a matter of search and replace, although a few places which
      relied on the global spapr variable are changed to find the structure via
      qdev_get_machine().
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      28e02042
  11. 06 6月, 2015 1 次提交
  12. 05 6月, 2015 7 次提交
  13. 04 6月, 2015 3 次提交
  14. 30 4月, 2015 1 次提交
  15. 09 3月, 2015 3 次提交
    • G
      sPAPR: Implement EEH RTAS calls · ee954280
      Gavin Shan 提交于
      The emulation for EEH RTAS requests from guest isn't covered
      by QEMU yet and the patch implements them.
      
      The patch defines constants used by EEH RTAS calls and adds
      callbacks sPAPRPHBClass::{eeh_set_option, eeh_get_state, eeh_reset,
      eeh_configure}, which are going to be used as follows:
      
        * RTAS calls are received in spapr_pci.c, sanity check is done
          there.
        * RTAS handlers handle what they can. If there is something it
          cannot handle and the corresponding sPAPRPHBClass callback is
          defined, it is called.
        * Those callbacks are only implemented for VFIO now. They do ioctl()
          to the IOMMU container fd to complete the calls. Error codes from
          that ioctl() are transferred back to the guest.
      
      [aik: defined RTAS tokens for EEH RTAS calls]
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ee954280
    • A
      spapr-pci: Enable huge BARs · b194df47
      Alexey Kardashevskiy 提交于
      At the moment sPAPR only supports 512MB window for MMIO BARs. However
      modern devices might want bigger 64bit BARs.
      
      This extends MMIO window from 512MB to 62GB (aligned to
      SPAPR_PCI_WINDOW_SPACING) and advertises it in 2 records in
      the PHB "ranges" property. 32bit gets the space from
      SPAPR_PCI_MEM_WIN_BUS_OFFSET till the end of 4GB, 64bit gets the rest
      of the space. If no space is left, 64bit range is not advertised.
      
      The MMIO space size is set to old value of 0x20000000 by default
      for pseries machines older than 2.3.
      
      The approach changes the device tree which is a guest visible change, however
      it won't break migration as:
      1. we do not support migration to older QEMU versions
      2. migration to newer QEMU will migrate the device tree as well and since
      the new layout only extends the old one and does not change address mappigns,
      no breakage is expected here too.
      
      SLOF change is required to utilize this extension.
      Suggested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b194df47
    • D
      pseries: Limit PCI host bridge "index" value · 3e4ac968
      David Gibson 提交于
      pseries guests can have large numbers of PCI host bridges.  To avoid the
      user having to specify a number of different configuration values for every
      one, the device supports an "index" property which is a shorthand setting
      the various window and configuration addresses from a predefined sensible
      set.
      
      There are some problems with the details at present:
        * The "index" propery is signed, but negative values will create PCI
      windows below where we expect, potentially colliding with other devices
        * No limit is imposed on the "index" property and large values can
      translate to extremely large window addresses.  With PCI passthrough in
      particular this can mean we exceed various mapping and physical address
      limits causing the guest host bridge to not work in strange ways.
      
      This patch addresses this, by making "index" unsigned, and imposing a
      limit.  Currently the limit allows indices from 0..255 which is probably
      enough host bridges for the time being.  It's fairly easy to extend if
      we discover we need more.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3e4ac968
  16. 13 2月, 2015 1 次提交
  17. 08 9月, 2014 1 次提交
    • G
      spapr_pci: map the MSI window in each PHB · 8c46f7ec
      Greg Kurz 提交于
      On sPAPR, virtio devices are connected to the PCI bus and use MSI-X.
      Commit cc943c36 has modified MSI-X
      so that writes are made using the bus master address space and follow
      the IOMMU path.
      
      Unfortunately, the IOMMU address space address space does not have an
      MSI window: the notification is silently dropped in unassigned_mem_write
      instead of reaching the guest... The most visible effect is that all
      virtio devices are non-functional on sPAPR since then. :(
      
      This patch does the following:
      1) map the MSI window into the IOMMU address space for each PHB
         - since each PHB instantiates its own IOMMU address space, we
           can safely map the window at a fixed address (SPAPR_PCI_MSI_WINDOW)
         - no real need to keep the MSI window setup in a separate function,
           the spapr_pci_msi_init() code moves to spapr_phb_realize().
      
      2) kill the global MSI window as it is not needed in the end
      Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8c46f7ec
  18. 29 8月, 2014 1 次提交
  19. 15 8月, 2014 1 次提交
  20. 27 6月, 2014 2 次提交
    • A
      spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB · 9a321e92
      Alexey Kardashevskiy 提交于
      Currently SPAPR PHB keeps track of all allocated MSI (here and below
      MSI stands for both MSI and MSIX) interrupt because
      XICS used to be unable to reuse interrupts. This is a problem for
      dynamic MSI reconfiguration which happens when guest reloads a driver
      or performs PCI hotplug. Another problem is that the existing
      implementation can enable MSI on 32 devices maximum
      (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason for that.
      
      This makes use of new XICS ability to reuse interrupts.
      
      This reorganizes MSI information storage in sPAPRPHBState. Instead of
      static array of 32 descriptors (one per a PCI function), this patch adds
      a GHashTable when @config_addr is a key and (first_irq, num) pair is
      a value. GHashTable can dynamically grow and shrink so the initial limit
      of 32 devices is gone.
      
      This changes migration stream as @msi_table was a static array while new
      @msi_devs is a dynamic hash table. This adds temporary array which is
      used for migration, it is populated in "spapr_pci"::pre_save() callback
      and expanded into the hash table in post_load() callback. Since
      the destination side does not know the number of MSI-enabled devices
      in advance and cannot pre-allocate the temporary array to receive
      migration state, this makes use of new VMSTATE_STRUCT_VARRAY_ALLOC macro
      which allocates the array automatically.
      
      This resets the MSI configuration space when interrupts are released by
      the ibm,change-msi RTAS call.
      
      This fixed traces to be more informative.
      
      This changes vmstate_spapr_pci_msi name from "...lsi" to "...msi" which
      was incorrect by accident. As the internal representation changed,
      thus bumps migration version number.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [agraf: drop g_malloc_n usage]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9a321e92
    • A
      spapr_pci_vfio: Add spapr-pci-vfio-host-bridge to support vfio · 9fc34ada
      Alexey Kardashevskiy 提交于
      The patch adds a spapr-pci-vfio-host-bridge device type
      which is a PCI Host Bridge with VFIO support. The new device
      inherits from the spapr-pci-host-bridge device and adds an "iommu"
      property which is an IOMMU id. This ID represents a minimal entity
      for which IOMMU isolation can be guaranteed. In SPAPR architecture IOMMU
      group is called a Partitionable Endpoint (PE).
      
      Current implementation supports one IOMMU id per QEMU VFIO PHB. Since
      SPAPR allows multiple PHB for no extra cost, this does not seem to
      be a problem. This limitation may change in the future though.
      
      Example of use:
      Configure and Add 3 functions of a multifunctional device to QEMU:
      (the NEC PCI USB card is used as an example here):
      -device spapr-pci-vfio-host-bridge,id=USB,iommu=4,index=7 \
      -device vfio-pci,host=4:0:1.0,addr=1.0,bus=USB,multifunction=true
      -device vfio-pci,host=4:0:1.1,addr=1.1,bus=USB
      -device vfio-pci,host=4:0:1.2,addr=1.2,bus=USB
      
      where:
      * index=7 is a QEMU PHB index (used as source for MMIO/MSI/IO windows
      offset);
      * iommu=4 is an IOMMU id which can be found in sysfs:
      [aik@vpl2 ~]$ cd /sys/bus/pci/devices/0004:00:00.0/
      [aik@vpl2 0004:00:00.0]$ ls -l iommu_group
      lrwxrwxrwx 1 root root 0 Jun  5 12:49 iommu_group -> ../../../kernel/iommu_groups/4
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9fc34ada
  21. 16 6月, 2014 4 次提交
    • A
      spapr_iommu: Get rid of window_size in sPAPRTCETable · 523e7b8a
      Alexey Kardashevskiy 提交于
      This removes window_size as it is basically a copy of nb_table
      shifted by SPAPR_TCE_PAGE_SHIFT. As new dynamic DMA windows are
      going to support windows as big as the entire RAM and this number
      will be bigger that 32 capacity, we will have to do something
      about @window_size anyway and removal seems to be the right way to go.
      
      This removes dma_window_start/dma_window_size from sPAPRPHBState as
      they are no longer used.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      523e7b8a
    • A
      spapr_pci: Allow multiple TCE tables per PHB · e28c16f6
      Alexey Kardashevskiy 提交于
      At the moment sPAPRPHBState contains a @tcet pointer to the only
      TCE table. However sPAPR spec allows having more than one DMA window.
      
      Since the TCE object is already a child of SPAPR PHB object, there is
      no need to keep an additional pointer to it in sPAPRPHBState so remove it.
      
      This changes the way sPAPRPHBState::reset performs reset of sPAPRTCETable
      objects.
      
      This changes the default DMA window properties calculation.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e28c16f6
    • A
      spapr_pci: spapr_iommu: Make DMA window a subregion · cca7fad5
      Alexey Kardashevskiy 提交于
      Currently the default DMA window is represented by a single MemoryRegion.
      However there can be more than just one window so we need
      a "root" memory region to be separated from the actual DMA window(s).
      
      This introduces a "root" IOMMU memory region and adds a subregion for
      the default DMA 32bit window. Following patches will add other
      subregion(s).
      
      This initializes a default DMA window subregion size to the guest RAM
      size as this window can be switched into "bypass" mode which implements
      direct DMA mapping.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      cca7fad5
    • A
      spapr_pci: Introduce a finish_realize() callback · da6ccee4
      Alexey Kardashevskiy 提交于
      The spapr-pci PHB initializes IOMMU for emulated devices only.
      The upcoming VFIO support will do it different. However both emulated
      and VFIO PHB types share most of the initialization code.
      For the type specific things a new finish_realize() callback is
      introduced.
      
      This introduces sPAPRPHBClass derived from PCIHostBridgeClass and
      adds the callback pointer.
      
      This implements finish_realize() for emulated devices.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [agraf: Fix compilation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      da6ccee4
  22. 11 3月, 2014 1 次提交