1. 23 10月, 2015 2 次提交
    • D
      spapr_iommu: Provide a function to switch a TCE table to allowing VFIO · c10325d6
      David Gibson 提交于
      Because of the way non-VFIO guest IOMMU operations are KVM accelerated, not
      all TCE tables (guest IOMMU contexts) can support VFIO devices.  Currently,
      this is decided at creation time.
      
      To support hotplug of VFIO devices, we need to allow a TCE table which
      previously didn't allow VFIO devices to be switched so that it can.  This
      patch adds an spapr_tce_set_need_vfio() function to do this, by
      reallocating the table in userspace if necessary.
      
      Currently this doesn't allow the KVM acceleration to be re-enabled if all
      the VFIO devices are removed.  That's an optimization for another time.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      c10325d6
    • D
      spapr_iommu: Rename vfio_accel parameter · 6a81dd17
      David Gibson 提交于
      The vfio_accel parameter used when creating a new TCE table (guest IOMMU
      context) has a confusing name.  What it really means is whether we need the
      TCE table created to be able to support VFIO devices.
      
      VFIO is relevant, because when available we use in-kernel acceleration of
      the TCE table, but that may not work with VFIO devices because updates to
      the table are handled in kernel, bypass qemu and so don't hit qemu's
      infrastructure for keeping the VFIO host IOMMU state in sync with the guest
      IOMMU state.
      
      Rename the parameter to "need_vfio" throughout.  This is a cosmetic change,
      with no impact on the logic.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      6a81dd17
  2. 23 9月, 2015 8 次提交
  3. 07 7月, 2015 5 次提交
  4. 04 6月, 2015 10 次提交
  5. 09 3月, 2015 7 次提交
    • G
      sPAPR: Implement EEH RTAS calls · ee954280
      Gavin Shan 提交于
      The emulation for EEH RTAS requests from guest isn't covered
      by QEMU yet and the patch implements them.
      
      The patch defines constants used by EEH RTAS calls and adds
      callbacks sPAPRPHBClass::{eeh_set_option, eeh_get_state, eeh_reset,
      eeh_configure}, which are going to be used as follows:
      
        * RTAS calls are received in spapr_pci.c, sanity check is done
          there.
        * RTAS handlers handle what they can. If there is something it
          cannot handle and the corresponding sPAPRPHBClass callback is
          defined, it is called.
        * Those callbacks are only implemented for VFIO now. They do ioctl()
          to the IOMMU container fd to complete the calls. Error codes from
          that ioctl() are transferred back to the guest.
      
      [aik: defined RTAS tokens for EEH RTAS calls]
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ee954280
    • D
      pseries: Switch VGA endian on H_SET_MODE · eefaccc0
      David Gibson 提交于
      When the guest switches the interrupt endian mode, which essentially
      means a global machine endian switch, we want to change the VGA
      framebuffer endian mode as well in order to be backward compatible
      with existing guests who don't know about the new endian control
      register.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      eefaccc0
    • D
      pseries: Move rtc_offset into RTC device's state structure · 880ae7de
      David Gibson 提交于
      The initial creation of the PAPR RTC qdev class left a wart - the rtc's
      offset was left in the sPAPREnvironment structure, accessed via a global.
      
      This patch moves it into the RTC device's own state structure, were it
      belongs.  This requires a small change to the migration stream format.  In
      order to handle incoming streams from older versions, we also need to
      retain the rtc_offset field in the sPAPREnvironment structure, so that it
      can be loaded into via the vmsd, then pushed into the RTC device.
      
      Since we're changing the migration format, this also takes the opportunity
      to:
      
        * Change the rtc offset from a value in seconds to a value in
          nanoseconds, allowing nanosecond offsets between host and guest
          rtc time, if desired.
      
        * Remove both the already unused "next_irq" field and now unused
          "rtc_offset" field from the new version of the spapr migration
          stream
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      880ae7de
    • D
      pseries: Make the PAPR RTC a qdev device · 28df36a1
      David Gibson 提交于
      At present the PAPR RTC isn't a "device" as such - it's accessed only via
      firmware/hypervisor calls, and is handled in the sPAPR core code.  This
      becomes inconvenient as we extend it in various ways.
      
      This patch makes the PAPR RTC a separate device in the qemu device model.
      
      For now, the only piece of device state - the rtc_offset - is still kept in
      the global sPAPREnvironment structure.  That's clearly wrong, but leaving
      it to be fixed in a following patch makes for a clearer separation between
      the internal re-organization of the device, and the behavioural changes
      (because the migration stream format needs to change slightly when the
      offset is moved into the device's own state).
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      28df36a1
    • D
      pseries: Add spapr_rtc_read() helper function · e5dad1d7
      David Gibson 提交于
      The virtual RTC time is used in two places in the pseries machine.  First
      is in the RTAS get-time-of-day function which returns the RTC time to the
      guest.  Second is in the spapr events code which is used to timestamp
      event messages from the hypervisor to the guest.
      
      Currently both call qemu_get_timedate() directly, but we want to change
      that so we can properly handle the various -rtc options.  In preparation,
      create a helper function to return the virtual RTC time.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e5dad1d7
    • D
      pseries: Move sPAPR RTC code into its own file · 12f42174
      David Gibson 提交于
      At the moment the RTAS (firmware/hypervisor) time of day functions are
      implemented in spapr_rtas.c along with a bunch of other things.  Since
      we're going to be expanding these a bit, move the RTAS RTC related code
      out into new file spapr_rtc.c.  Also add its own initialization function,
      spapr_rtc_init() called from the main machine init routine.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      12f42174
    • A
      spapr_vio/spapr_iommu: Move VIO bypass where it belongs · ee9a569a
      Alexey Kardashevskiy 提交于
      Instead of tweaking a TCE table device by adding there a bypass flag,
      let's add an alias to RAM and IOMMU memory region, and enable/disable
      those according to the selected bypass mode.
      This way IOMMU memory region can have size of the actual window rather
      than ram_size which is essential for upcoming DDW support.
      
      This moves bypass logic to VIO layer and keeps @bypass flag in TCE table
      for migration compatibility only. This replaces spapr_tce_set_bypass()
      calls with explicit assignment to avoid confusion as the function could
      do something more that just syncing the @bypass flag.
      
      This adds a pointer to VIO device into the sPAPRTCETable struct to provide
      the sPAPRTCETable device a way to update bypass mode for the VIO device.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ee9a569a
  6. 07 1月, 2015 1 次提交
  7. 08 9月, 2014 3 次提交
    • G
      spapr_pci: map the MSI window in each PHB · 8c46f7ec
      Greg Kurz 提交于
      On sPAPR, virtio devices are connected to the PCI bus and use MSI-X.
      Commit cc943c36 has modified MSI-X
      so that writes are made using the bus master address space and follow
      the IOMMU path.
      
      Unfortunately, the IOMMU address space address space does not have an
      MSI window: the notification is silently dropped in unassigned_mem_write
      instead of reaching the guest... The most visible effect is that all
      virtio devices are non-functional on sPAPR since then. :(
      
      This patch does the following:
      1) map the MSI window into the IOMMU address space for each PHB
         - since each PHB instantiates its own IOMMU address space, we
           can safely map the window at a fixed address (SPAPR_PCI_MSI_WINDOW)
         - no real need to keep the MSI window setup in a separate function,
           the spapr_pci_msi_init() code moves to spapr_phb_realize().
      
      2) kill the global MSI window as it is not needed in the end
      Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8c46f7ec
    • B
      spapr: Locate RTAS and device-tree based on real RMA · b7d1f77a
      Benjamin Herrenschmidt 提交于
      We currently calculate the final RTAS and FDT location based on
      the early estimate of the RMA size, cropped to 256M on KVM since
      we only know the real RMA size at reset time which happens much
      later in the boot process.
      
      This means the FDT and RTAS end up right below 256M while they
      could be much higher, using precious RMA space and limiting
      what the OS bootloader can put there which has proved to be
      a problem with some OSes (such as when using very large initrd's)
      
      Fortunately, we do the actual copy of the device-tree into guest
      memory much later, during reset, late enough to be able to do it
      using the final RMA value, we just need to move the calculation
      to the right place.
      
      However, RTAS is still loaded too early, so we change the code to
      load the tiny blob into qemu memory early on, and then copy it into
      guest memory at reset time. It's small enough that the memory usage
      doesn't matter.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [aik: fixed errors from checkpatch.pl, defined RTAS_MAX_ADDR]
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [agraf: fix compilation on 32bit hosts]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b7d1f77a
    • N
      ppc: spapr-rtas - implement os-term rtas call · 2e14072f
      Nikunj A Dadhania 提交于
      PAPR compliant guest calls this in absence of kdump. This finally
      reaches the guest and can be handled according to the policies set by
      higher level tools(like taking dump) for further analysis by tools like
      crash.
      
      Linux kernel calls ibm,os-term when extended property of os-term is set.
      This makes sure that a return to the linux kernel is gauranteed.
      Signed-off-by: NNikunj A Dadhania <nikunj@linux.vnet.ibm.com>
      [agraf: reduce RTAS_TOKEN_MAX]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      2e14072f
  8. 27 6月, 2014 4 次提交
    • A
      spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB · 9a321e92
      Alexey Kardashevskiy 提交于
      Currently SPAPR PHB keeps track of all allocated MSI (here and below
      MSI stands for both MSI and MSIX) interrupt because
      XICS used to be unable to reuse interrupts. This is a problem for
      dynamic MSI reconfiguration which happens when guest reloads a driver
      or performs PCI hotplug. Another problem is that the existing
      implementation can enable MSI on 32 devices maximum
      (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason for that.
      
      This makes use of new XICS ability to reuse interrupts.
      
      This reorganizes MSI information storage in sPAPRPHBState. Instead of
      static array of 32 descriptors (one per a PCI function), this patch adds
      a GHashTable when @config_addr is a key and (first_irq, num) pair is
      a value. GHashTable can dynamically grow and shrink so the initial limit
      of 32 devices is gone.
      
      This changes migration stream as @msi_table was a static array while new
      @msi_devs is a dynamic hash table. This adds temporary array which is
      used for migration, it is populated in "spapr_pci"::pre_save() callback
      and expanded into the hash table in post_load() callback. Since
      the destination side does not know the number of MSI-enabled devices
      in advance and cannot pre-allocate the temporary array to receive
      migration state, this makes use of new VMSTATE_STRUCT_VARRAY_ALLOC macro
      which allocates the array automatically.
      
      This resets the MSI configuration space when interrupts are released by
      the ibm,change-msi RTAS call.
      
      This fixed traces to be more informative.
      
      This changes vmstate_spapr_pci_msi name from "...lsi" to "...msi" which
      was incorrect by accident. As the internal representation changed,
      thus bumps migration version number.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [agraf: drop g_malloc_n usage]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9a321e92
    • A
      spapr: Move interrupt allocator to xics · bee763db
      Alexey Kardashevskiy 提交于
      The current allocator returns IRQ numbers from a pool and does not
      support IRQs reuse in any form as it did not keep track of what it
      previously returned, it only keeps the last returned IRQ. Some use
      cases such as PCI hot(un)plug may require IRQ release and reallocation.
      
      This moves an allocator from SPAPR to XICS.
      
      This switches IRQ users to use new API.
      
      This uses LSI/MSI flags to know if interrupt is allocated.
      
      The interrupt release function will be posted as a separate patch.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      bee763db
    • S
      spapr: Add RTAS sysparm SPLPAR Characteristics · 3b50d897
      Sam bobroff 提交于
      Add support for the SPLPAR Characteristics parameter to the emulated
      RTAS call ibm,get-system-parameter.
      
      The support provides just enough information to allow "cat
      /proc/powerpc/lparcfg" to succeed without generating a kernel error
      message.
      
      Without this patch the above command will produce the following kernel
      message: arch/powerpc/platforms/pseries/lparcfg.c \
      parse_system_parameter_string Error calling get-system-parameter \
      (0xfffffffd)
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3b50d897
    • S
      spapr: Add RTAS sysparm UUID · b907d7b0
      Sam bobroff 提交于
      Add support for the UUID parameter to the emulated RTAS call
      ibm,get-system-parameter.
      
      Return the guest's UUID as the value for the RTAS UUID system
      parameter, or null (a zero length result) if it is not set.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b907d7b0