1. 04 6月, 2015 6 次提交
  2. 09 3月, 2015 4 次提交
    • G
      sPAPR: Implement EEH RTAS calls · ee954280
      Gavin Shan 提交于
      The emulation for EEH RTAS requests from guest isn't covered
      by QEMU yet and the patch implements them.
      
      The patch defines constants used by EEH RTAS calls and adds
      callbacks sPAPRPHBClass::{eeh_set_option, eeh_get_state, eeh_reset,
      eeh_configure}, which are going to be used as follows:
      
        * RTAS calls are received in spapr_pci.c, sanity check is done
          there.
        * RTAS handlers handle what they can. If there is something it
          cannot handle and the corresponding sPAPRPHBClass callback is
          defined, it is called.
        * Those callbacks are only implemented for VFIO now. They do ioctl()
          to the IOMMU container fd to complete the calls. Error codes from
          that ioctl() are transferred back to the guest.
      
      [aik: defined RTAS tokens for EEH RTAS calls]
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ee954280
    • D
      pseries: Switch VGA endian on H_SET_MODE · eefaccc0
      David Gibson 提交于
      When the guest switches the interrupt endian mode, which essentially
      means a global machine endian switch, we want to change the VGA
      framebuffer endian mode as well in order to be backward compatible
      with existing guests who don't know about the new endian control
      register.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      eefaccc0
    • A
      spapr-pci: Enable huge BARs · b194df47
      Alexey Kardashevskiy 提交于
      At the moment sPAPR only supports 512MB window for MMIO BARs. However
      modern devices might want bigger 64bit BARs.
      
      This extends MMIO window from 512MB to 62GB (aligned to
      SPAPR_PCI_WINDOW_SPACING) and advertises it in 2 records in
      the PHB "ranges" property. 32bit gets the space from
      SPAPR_PCI_MEM_WIN_BUS_OFFSET till the end of 4GB, 64bit gets the rest
      of the space. If no space is left, 64bit range is not advertised.
      
      The MMIO space size is set to old value of 0x20000000 by default
      for pseries machines older than 2.3.
      
      The approach changes the device tree which is a guest visible change, however
      it won't break migration as:
      1. we do not support migration to older QEMU versions
      2. migration to newer QEMU will migrate the device tree as well and since
      the new layout only extends the old one and does not change address mappigns,
      no breakage is expected here too.
      
      SLOF change is required to utilize this extension.
      Suggested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b194df47
    • D
      pseries: Limit PCI host bridge "index" value · 3e4ac968
      David Gibson 提交于
      pseries guests can have large numbers of PCI host bridges.  To avoid the
      user having to specify a number of different configuration values for every
      one, the device supports an "index" property which is a shorthand setting
      the various window and configuration addresses from a predefined sensible
      set.
      
      There are some problems with the details at present:
        * The "index" propery is signed, but negative values will create PCI
      windows below where we expect, potentially colliding with other devices
        * No limit is imposed on the "index" property and large values can
      translate to extremely large window addresses.  With PCI passthrough in
      particular this can mean we exceed various mapping and physical address
      limits causing the guest host bridge to not work in strange ways.
      
      This patch addresses this, by making "index" unsigned, and imposing a
      limit.  Currently the limit allows indices from 0..255 which is probably
      enough host bridges for the time being.  It's fairly easy to extend if
      we discover we need more.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3e4ac968
  3. 05 11月, 2014 1 次提交
  4. 08 9月, 2014 2 次提交
    • G
      spapr_pci: map the MSI window in each PHB · 8c46f7ec
      Greg Kurz 提交于
      On sPAPR, virtio devices are connected to the PCI bus and use MSI-X.
      Commit cc943c36 has modified MSI-X
      so that writes are made using the bus master address space and follow
      the IOMMU path.
      
      Unfortunately, the IOMMU address space address space does not have an
      MSI window: the notification is silently dropped in unassigned_mem_write
      instead of reaching the guest... The most visible effect is that all
      virtio devices are non-functional on sPAPR since then. :(
      
      This patch does the following:
      1) map the MSI window into the IOMMU address space for each PHB
         - since each PHB instantiates its own IOMMU address space, we
           can safely map the window at a fixed address (SPAPR_PCI_MSI_WINDOW)
         - no real need to keep the MSI window setup in a separate function,
           the spapr_pci_msi_init() code moves to spapr_phb_realize().
      
      2) kill the global MSI window as it is not needed in the end
      Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8c46f7ec
    • A
      spapr_pci: Fix config space corruption · 32420522
      Alexey Kardashevskiy 提交于
      When disabling MSI/MSIX via "ibm,change-msi" RTAS call, no check was made
      if MSI or MSIX is actually supported and the MSI message was reset
      unconditionally. If this happened on a device which does not support MSI
      (but does support MSIX, otherwise "ibm,change-msi" would not be called),
      this device would have PCIDevice::msi_cap field (MSI capability offset)
      set to zero and writing a vector would actually clear PCI status.
      
      This clears MSI message only if MSI or MSIX is present on a device.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      32420522
  5. 27 6月, 2014 4 次提交
    • A
      spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB · 9a321e92
      Alexey Kardashevskiy 提交于
      Currently SPAPR PHB keeps track of all allocated MSI (here and below
      MSI stands for both MSI and MSIX) interrupt because
      XICS used to be unable to reuse interrupts. This is a problem for
      dynamic MSI reconfiguration which happens when guest reloads a driver
      or performs PCI hotplug. Another problem is that the existing
      implementation can enable MSI on 32 devices maximum
      (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason for that.
      
      This makes use of new XICS ability to reuse interrupts.
      
      This reorganizes MSI information storage in sPAPRPHBState. Instead of
      static array of 32 descriptors (one per a PCI function), this patch adds
      a GHashTable when @config_addr is a key and (first_irq, num) pair is
      a value. GHashTable can dynamically grow and shrink so the initial limit
      of 32 devices is gone.
      
      This changes migration stream as @msi_table was a static array while new
      @msi_devs is a dynamic hash table. This adds temporary array which is
      used for migration, it is populated in "spapr_pci"::pre_save() callback
      and expanded into the hash table in post_load() callback. Since
      the destination side does not know the number of MSI-enabled devices
      in advance and cannot pre-allocate the temporary array to receive
      migration state, this makes use of new VMSTATE_STRUCT_VARRAY_ALLOC macro
      which allocates the array automatically.
      
      This resets the MSI configuration space when interrupts are released by
      the ibm,change-msi RTAS call.
      
      This fixed traces to be more informative.
      
      This changes vmstate_spapr_pci_msi name from "...lsi" to "...msi" which
      was incorrect by accident. As the internal representation changed,
      thus bumps migration version number.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [agraf: drop g_malloc_n usage]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9a321e92
    • A
      spapr: Move interrupt allocator to xics · bee763db
      Alexey Kardashevskiy 提交于
      The current allocator returns IRQ numbers from a pool and does not
      support IRQs reuse in any form as it did not keep track of what it
      previously returned, it only keeps the last returned IRQ. Some use
      cases such as PCI hot(un)plug may require IRQ release and reallocation.
      
      This moves an allocator from SPAPR to XICS.
      
      This switches IRQ users to use new API.
      
      This uses LSI/MSI flags to know if interrupt is allocated.
      
      The interrupt release function will be posted as a separate patch.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      bee763db
    • A
      spapr_iommu: Make in-kernel TCE table optional · 9bb62a07
      Alexey Kardashevskiy 提交于
      POWER KVM supports an KVM_CAP_SPAPR_TCE capability which allows allocating
      TCE tables in the host kernel memory and handle H_PUT_TCE requests
      targeted to specific LIOBN (logical bus number) right in the host without
      switching to QEMU. At the moment this is used for emulated devices only
      and the handler only puts TCE to the table. If the in-kernel H_PUT_TCE
      handler finds a LIOBN and corresponding table, it will put a TCE to
      the table and complete hypercall execution. The user space will not be
      notified.
      
      Upcoming VFIO support is going to use the same sPAPRTCETable device class
      so KVM_CAP_SPAPR_TCE is going to be used as well. That means that TCE
      tables for VFIO are going to be allocated in the host as well.
      However VFIO operates with real IOMMU tables and simple copying of
      a TCE to the real hardware TCE table will not work as guest physical
      to host physical address translation is requited.
      
      So until the host kernel gets VFIO support for H_PUT_TCE, we better not
      to register VFIO's TCE in the host.
      
      This adds a place holder for KVM_CAP_SPAPR_TCE_VFIO capability. It is not
      in upstream yet and being discussed so now it is always false which means
      that in-kernel VFIO acceleration is not supported.
      
      This adds a bool @vfio_accel flag to the sPAPRTCETable device telling
      that sPAPRTCETable should not try allocating TCE table in the host kernel
      for VFIO. The flag is false now as at the moment there is no VFIO.
      
      This adds an vfio_accel parameter to spapr_tce_new_table(), the semantic
      is the same. Since there is only emulated PCI and VIO now, the flag is set
      to false. Upcoming VFIO support will set it to true.
      
      This is a preparation patch so no change in behaviour is expected
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9bb62a07
    • A
      spapr: Fix RTAS token numbers · 3a3b8502
      Alexey Kardashevskiy 提交于
      At the moment spapr_rtas_register() allocates a new token number for every
      new RTAS callback so numbers are not fixed and depend on the number of
      supported RTAS handlers and the exact order of spapr_rtas_register() calls.
      These tokens are copied into the device tree and remain the same during
      the guest lifetime.
      
      When we start another guest to receive a migration, it calls
      spapr_rtas_register() as well. If the number of RTAS handlers or their
      order is different in QEMU on source and destination sides, the "/rtas"
      node in the device tree will differ. Since migration overwrites the device
      tree (as it overwrites the entire RAM), the actual RTAS config on
      the destination side gets broken.
      
      This defines global contant values for every RTAS token which QEMU
      is using today.
      
      This changes spapr_rtas_register() to accept a token number instead of
      allocating one. This changes all users of spapr_rtas_register().
      
      This changes XICS-KVM not to cache tokens registered with KVM as they
      constant now.
      
      This makes TOKEN_BASE global as RTAS_XXX use TOKEN_BASE as
      a base. TOKEN_MAX is moved and renamed too and its value is changed
      to the last token + 1. Boundary checks for token values are adjusted.
      
      This reserves token numbers for "os-term" handlers and PCI hotplug
      which we are working on.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3a3b8502
  6. 16 6月, 2014 11 次提交
  7. 13 3月, 2014 3 次提交
  8. 05 3月, 2014 2 次提交
    • A
      PPC: sPAPR: Only use getpagesize() when we run with kvm · 3c3b0dde
      Alexander Graf 提交于
      We currently size the msi window trap page according to the host's page
      size so that we poke a working hole into a memory slot in case we overlap.
      
      However, this is only ever necessary with KVM active. Without KVM, we should
      rather try to be host platform agnostic and use a constant size: 4k.
      
      This fixes a build breakage on win32 hosts.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3c3b0dde
    • A
      spapr-pci: enable adding PHB via -device · 09aa9a52
      Alexey Kardashevskiy 提交于
      Recent changes introduced cannot_instantiate_with_device_add_yet
      and removed capability of adding yet another PCI host bridge via
      command line for SPAPR platform (POWERPC64 server).
      
      This brings the capability back and puts SPAPR PHB into "bridge"
      category.
      
      This is not much use for emulated PHB but it is absolutely required
      for VFIO as we put an IOMMU group onto a separate PHB on SPAPR.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      09aa9a52
  9. 15 2月, 2014 1 次提交
  10. 20 12月, 2013 1 次提交
  11. 10 12月, 2013 1 次提交
    • M
      spapr_pci: s/INT64_MAX/UINT64_MAX/ · 92b8e39c
      Michael S. Tsirkin 提交于
      It doesn't make sense for a region to be INT64_MAX in size:
      memory core uses UINT64_MAX as a special value meaning
      "all 64 bit" this is what was meant here.
      
      While this should never affect the spapr system which at the moment always
      has < 63 bit size, this makes us hit all kind of corner case bugs with
      sub-pages, so users are probably better off if we just use UINT64_MAX
      instead.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NAlexander Graf <agraf@suse.de>
      92b8e39c
  12. 26 10月, 2013 1 次提交
  13. 02 9月, 2013 2 次提交
    • A
      spapr-pci: rework MSI/MSIX · f1c2dc7c
      Alexey Kardashevskiy 提交于
      On the sPAPR platform a guest allocates MSI/MSIX vectors via RTAS
      hypercalls which return global IRQ numbers to a guest so it only
      operates with those and never touches MSIMessage.
      
      Therefore MSIMessage handling is completely hidden in QEMU.
      
      Previously every sPAPR PCI host bridge implemented its own MSI window
      to catch msi_notify()/msix_notify() calls from QEMU devices (virtio-pci
      or vfio) and route them to the guest via qemu_pulse_irq().
      MSIMessage used to be encoded as:
      	.addr - address within the PHB MSI window;
      	.data - the device index on PHB plus vector number.
      The MSI MR write function translated this MSIMessage to a global IRQ
      number and called qemu_pulse_irq().
      
      However the total number of IRQs is not really big (at the moment it is
      1024 IRQs starting from 4096) and even 16bit data field of MSIMessage
      seems to be enough to store an IRQ number there.
      
      This simplifies MSI handling in sPAPR PHB. Specifically, this does:
      1. remove a MSI window from a PHB;
      2. add a single memory region for all MSIs to sPAPREnvironment
      and spapr_pci_msi_init() to initialize it;
      3. encode MSIMessage as:
          * .addr - a fixed address of SPAPR_PCI_MSI_WINDOW==0x40000000000ULL;
          * .data as an IRQ number.
      4. change IRQ allocator to align first IRQ number in a block for MSI.
      MSI uses lower bits to specify the vector number so the first IRQ has to
      be aligned. MSIX does not need any special allocator though.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NAnthony Liguori <aliguori@us.ibm.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      f1c2dc7c
    • A
      spapr-pci: fix config space access to support bridges · 5dac82ce
      Alexey Kardashevskiy 提交于
      spapr-pci config space accessors use find_dev() to find a PCI device.
      However find_dev() only searched on a primary bus and did not do
      recursive search through secondary buses so config space access was not
      possible for devices other that on a primary bus.
      
      This fixed find_dev() by using the PCI API pci_find_device() function.
      This effectively enabled pci bridges on spapr.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5dac82ce
  14. 30 7月, 2013 1 次提交