1. 01 3月, 2017 8 次提交
    • C
      ppc/xics: use the QOM interface to resend irqs · 2cd908d0
      Cédric Le Goater 提交于
      Also change the ICPState 'xics' backlink to be a XICSFabric, this
      removes the need of using qdev_get_machine() to get the QOM interface
      in some of the routines.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      2cd908d0
    • C
      ppc/xics: use the QOM interface under the sPAPR machine · 7844e12b
      Cédric Le Goater 提交于
      Add 'ics_get' and 'ics_resend' handlers to the sPAPR machine. These
      are relatively simple for a single ICS.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      7844e12b
    • C
      ppc/xics: store the ICS object under the sPAPR machine · 681bfade
      Cédric Le Goater 提交于
      A list of ICS objects was introduced under the XICS object for the
      PowerNV machine but, for the sPAPR machine, it brings extra complexity
      as there is only a single ICS. To simplify the code, let's add the ICS
      pointer under the sPAPR machine and try to reduce the use of this list
      where possible.
      
      Also, change the xics_spapr_*() routines to use an ICS object instead
      of an XICSState and change their name to reflect that these are
      specific to the sPAPR ICS object.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      681bfade
    • C
      ppc/xics: remove set_nr_servers() handler from XICSStateClass · 817bb6a4
      Cédric Le Goater 提交于
      Today, the ICP (Interrupt Controller Presenter) objects are created by
      the 'nr_servers' property handler of the XICS object and a class
      handler. They are realized in the XICS object realize routine.
      
      Let's simplify the process by creating the ICP objects along with the
      XICS object at the machine level.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      817bb6a4
    • C
      ppc/xics: remove set_nr_irqs() handler from XICSStateClass · 4e4169f7
      Cédric Le Goater 提交于
      Today, the ICS (Interrupt Controller Source) object is created and
      realized by the init and realize routines of the XICS object, but some
      of the parameters are only known at the machine level.
      
      These parameters are passed from the sPAPR machine to the ICS object
      in a rather convoluted way using property handlers and a class handler
      of the XICS object. The number of irqs required to allocate the IRQ
      state objects in the ICS realize routine is one of them.
      
      Let's simplify the process by creating the ICS object along with the
      XICS object at the machine level and link the ICS into the XICS list
      of ICSs at this level also. In the sPAPR machine, there is only a
      single ICS but that will change with the PowerNV machine.
      
      Also, QOMify the creation of the objects and get rid of the
      superfluous code.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      4e4169f7
    • D
      xics: XICS should not be a SysBusDevice · 738d5db8
      David Gibson 提交于
      Currently xics - the component of the IBM POWER interrupt controller
      representing the overall interrupt fabric / architecture is
      represented as a descendent of SysBusDevice.  However, this is not
      really correct - the xics presents nothing in MMIO space so it should
      be an "unattached" device in the current QOM model.
      
      Since this device will always be created by the machine type, not created
      specifically from the command line, and because it has no migrated state
      it should be safe to move it around the device composition tree.
      
      Therefore this patch changes it to a descendent of TYPE_DEVICE, and
      makes it an unattached device.  So that its reset handler still gets
      called correctly, we add a qdev_set_parent_bus() to attach it to
      sysbus.  It's not really clear that's correct (instead of using
      register_reset()) but it appears to a common technique.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [clg corrected problems with reset]
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      [dwg folded together and updated commit message]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      738d5db8
    • D
      target/ppc: Manage external HPT via virtual hypervisor · e57ca75c
      David Gibson 提交于
      The pseries machine type implements the behaviour of a PAPR compliant
      hypervisor, without actually executing such a hypervisor on the virtual
      CPU.  To do this we need some hooks in the CPU code to make hypervisor
      facilities get redirected to the machine instead of emulated internally.
      
      For hypercalls this is managed through the cpu->vhyp field, which points
      to a QOM interface with a method implementing the hypercall.
      
      For the hashed page table (HPT) - also a hypervisor resource - we use an
      older hack.  CPUPPCState has an 'external_htab' field which when non-NULL
      indicates that the HPT is stored in qemu memory, rather than within the
      guest's address space.
      
      For consistency - and to make some future extensions easier - this merges
      the external HPT mechanism into the vhyp mechanism.  Methods are added
      to vhyp for the basic operations the core hash MMU code needs: map_hptes()
      and unmap_hptes() for reading the HPT, store_hpte() for updating it and
      hpt_mask() to retrieve its size.
      
      To match this, the pseries machine now sets these vhyp fields in its
      existing vhyp class, rather than reaching into the cpu object to set the
      external_htab field.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      e57ca75c
    • G
      sysemu: support up to 1024 vCPUs · 6244bb7e
      Greg Kurz 提交于
      Some systems can already provide more than 255 hardware threads.
      
      Bumping the QEMU limit to 1024 seems reasonable:
      - it has no visible overhead in top;
      - the limit itself has no effect on hot paths.
      
      Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      6244bb7e
  2. 24 2月, 2017 1 次提交
    • J
      tcg: drop global lock during TCG code execution · 8d04fb55
      Jan Kiszka 提交于
      This finally allows TCG to benefit from the iothread introduction: Drop
      the global mutex while running pure TCG CPU code. Reacquire the lock
      when entering MMIO or PIO emulation, or when leaving the TCG loop.
      
      We have to revert a few optimization for the current TCG threading
      model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
      kicking it in qemu_cpu_kick. We also need to disable RAM block
      reordering until we have a more efficient locking mechanism at hand.
      
      Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
      These numbers demonstrate where we gain something:
      
      20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
      20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm
      
      The guest CPU was fully loaded, but the iothread could still run mostly
      independent on a second core. Without the patch we don't get beyond
      
      32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
      32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm
      
      We don't benefit significantly, though, when the guest is not fully
      loading a host CPU.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com>
      [FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
      Signed-off-by: NKONRAD Frederic <fred.konrad@greensocs.com>
      [EGC: fixed iothread lock for cpu-exec IRQ handling]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      [AJB: -smp single-threaded fix, clean commit msg, BQL fixes]
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>
      [PM: target-arm changes]
      Acked-by: NPeter Maydell <peter.maydell@linaro.org>
      8d04fb55
  3. 22 2月, 2017 6 次提交
  4. 01 2月, 2017 1 次提交
  5. 31 1月, 2017 10 次提交
    • M
      ppc: switch to constants within BUILD_BUG_ON · 25e6a118
      Michael S. Tsirkin 提交于
      We are switching BUILD_BUG_ON to verify that it's parameter is a
      compile-time constant, and it turns out that some gcc versions
      (specifically gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609) are
      not smart enough to figure it out for expressions involving local
      variables. This is harmless but means that the check is ineffective for
      these platforms.  To fix, replace the variable with macros.
      Reported-by: NPeter Maydell <peter.maydell@linaro.org>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      [dwg: Correct a printf format warning]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      25e6a118
    • L
      spapr: clock should count only if vm is running · 42043e4f
      Laurent Vivier 提交于
      This is a port to ppc of the i386 commit:
          00f4d64e kvmclock: clock should count only if vm is running
      
      We remove timebase_post_load function, and use the VM state
      change handler to save and restore the guest_timebase (on stop
      and continue).
      
      We keep timebase_pre_save to reduce the clock difference on
      migration like in:
          6053a86f kvmclock: reduce kvmclock difference on migration
      
      Time base offset has originally been introduced by commit
          98a8b524 spapr: Add support for time base offset migration
      
      So while VM is paused, the time is stopped. This allows to have
      the same result with date (based on Time Base Register) and
      hwclock (based on "get-time-of-day" RTAS call).
      
      Moreover in TCG mode, the Time Base is always paused, so this
      patch also adjust the behavior between TCG and KVM.
      
      VM state field "time_of_the_day_ns" is now useless but we keep
      it to be able to migrate to older version of the machine.
      
      As vmstate_ppc_timebase structure (with timebase_pre_save() and
      timebase_post_load() functions) was only used by vmstate_spapr,
      we register the VM state change handler only in ppc_spapr_init().
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      42043e4f
    • D
      ppc: Rewrite ppc_get_compat_smt_threads() · 12dbeb16
      David Gibson 提交于
      To continue consolidation of compatibility mode information, this rewrites
      the ppc_get_compat_smt_threads() function using the table of compatiblity
      modes in target-ppc/compat.c.
      
      It's not a direct replacement, the new ppc_compat_max_threads() function
      has simpler semantics - it just returns the number of threads the cpu
      model has, taking into account any compatiblity mode it is in.
      
      This no longer takes into account kvmppc_smt_threads() as the previous
      version did.  That check wasn't useful because we check in
      ppc_cpu_realizefn() that CPUs aren't instantiated with more threads
      than kvm allows (or if we didn't things will already be broken and
      this won't make it any worse).
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      12dbeb16
    • D
      fa325e6c
    • T
      hw/ppc/spapr: Fix boot path of usb-host storage devices · b99260eb
      Thomas Huth 提交于
      When passing through an USB storage device to a pseries guest, it
      is currently not possible to automatically boot from the device
      if the "bootindex" property has been specified, too (e.g. when using
      "-device nec-usb-xhci -device usb-host,hostbus=1,hostaddr=2,bootindex=0"
      at the command line). The problem is that QEMU builds a device tree path
      like "/pci@800000020000000/usb@0/usb-host@1" and passes it to SLOF
      in the /chosen/qemu,boot-list property. SLOF, however, probes the
      USB device, recognizes that it is a storage device and thus changes
      its name to "storage", and additionally adds a child node for the
      SCSI LUN, so the correct boot path in SLOF is something like
      "/pci@800000020000000/usb@0/storage@1/disk@101000000000000" instead.
      So when we detect an USB mass storage device with SCSI interface,
      we've got to adjust the firmware boot-device path properly that
      SLOF can automatically boot from the device.
      
      Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1354177Signed-off-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      b99260eb
    • N
      ppc/spapr: implement H_SIGNAL_SYS_RESET · 1c7ad77e
      Nicholas Piggin 提交于
      The H_SIGNAL_SYS_RESET hcall allows a guest CPU to raise a system reset
      exception on CPUs within the same guest -- all CPUs, all-but-self, or a
      specific CPU (including self).
      
      This has not made its way to a PAPR release yet, but we have an hcall
      number assigned.
      
        H_SIGNAL_SYS_RESET = 0x380
      
        Syntax:
          hcall(uint64 H_SIGNAL_SYS_RESET, int64 target);
      
        Generate a system reset NMI on the threads indicated by target.
      
        Values for target:
          -1 = target all online threads including the caller
          -2 = target all online threads except for the caller
          All other negative values: reserved
          Positive values: The thread to be targeted, obtained from the value
          of the "ibm,ppc-interrupt-server#s" property of the CPU in the OF
          device tree.
      
        Semantics:
          - Invalid target: return H_Parameter.
          - Otherwise: Generate a system reset NMI on target thread(s),
            return H_Success.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      1c7ad77e
    • D
      ppc: Rename cpu_version to compat_pvr · d6e166c0
      David Gibson 提交于
      The 'cpu_version' field in PowerPCCPU is badly named.  It's named after the
      'cpu-version' device tree property where it is advertised, but that meaning
      may not be obvious in most places it appears.
      
      Worse, it doesn't even really correspond to that device tree property.  The
      property contains either the processor's PVR, or, if the CPU is running in
      a compatibility mode, a special "logical PVR" representing which mode.
      
      Rename the cpu_version field, and a number of related variables to
      compat_pvr to make this clearer.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      d6e166c0
    • D
      ppc: Clean up and QOMify hypercall emulation · 1d1be34d
      David Gibson 提交于
      The pseries machine type is a bit unusual in that it runs a paravirtualized
      guest.  The guest expects to interact with a hypervisor, and qemu
      emulates the functions of that hypervisor directly, rather than executing
      hypervisor code within the emulated system.
      
      To implement this in TCG, we need to intercept hypercall instructions and
      direct them to the machine's hypercall handlers, rather than attempting to
      perform a privilege change within TCG.  This is controlled by a global
      hook - cpu_ppc_hypercall.
      
      This cleanup makes the handling a little cleaner and more extensible than
      a single global variable.  Instead, each CPU to have hypercalls intercepted
      has a pointer set to a QOM object implementing a new virtual hypervisor
      interface.  A method in that interface is called by TCG when it sees a
      hypercall instruction.  It's possible we may want to add other methods in
      future.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      1d1be34d
    • D
      pseries: Make cpu_update during CAS unconditional · 5b120785
      David Gibson 提交于
      spapr_h_cas_compose_response() includes a cpu_update parameter which
      controls whether it includes updated information on the CPUs in the device
      tree fragment returned from the ibm,client-architecture-support (CAS) call.
      
      Providing the updated information is essential when CAS has negotiated
      compatibility options which require different cpu information to be
      presented to the guest.  However, it should be safe to provide in other
      cases (it will just override the existing data in the device tree with
      identical data).  This simplifies the code by removing the parameter and
      always providing the cpu update information.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      5b120785
    • D
      pseries: Always use core objects for CPU construction · 0c86d0fd
      David Gibson 提交于
      Currently the pseries machine has two paths for constructing CPUs.  On
      newer machine type versions, which support cpu hotplug, it constructs
      cpu core objects, which in turn construct CPU threads.  For older machine
      versions it individually constructs the CPU threads.
      
      This division is going to make some future changes to the cpu construction
      harder, so this patch unifies them.  Now cpu core objects are always
      created.  This requires some updates to allow core objects to be created
      without a full complement of threads (since older versions allowed a
      number of cpus not a multiple of the threads-per-core).  Likewise it needs
      some changes to the cpu core hot/cold plug path so as not to choke on the
      old machine types without hotplug support.
      
      For good measure, we move the cpu construction to its own subfunction,
      spapr_init_cpus().
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      0c86d0fd
  6. 20 1月, 2017 1 次提交
  7. 01 12月, 2016 1 次提交
    • M
      spapr: fix default DRC state for coldplugged LMBs · 5c0139a8
      Michael Roth 提交于
      Currently we set the initial isolation/allocation state for DRCs
      associated with coldplugged LMBs to ISOLATED/UNUSABLE,
      respectively, under the assumption that the guest will move this
      state to UNISOLATED/USABLE.
      
      In fact, this is only the case for LMBs added via hotplug. For
      coldplugged LMBs, the guest actually assumes the initial state to
      be UNISOLATED/USABLE.
      
      In practice, this only becomes an issue when we attempt to unplug
      one of these LMBs, where the guest kernel will issue an
      rtas-get-sensor-state call to check that the corresponding DRC is
      in an USABLE state before it will release the LMB back to
      QEMU. If the returned state is otherwise, the guest will assume no
      further action is needed, which bypasses the QEMU-side cleanup that
      occurs during the USABLE->UNUSABLE transition. This results in
      LMBs and their corresponding pc-dimm devices to stick around
      indefinitely.
      
      This patch fixes the issue by manually setting DRCs associated with
      cold-plugged LMBs to UNISOLATED/ALLOCATED, but leaving the hotplug
      state untouched. As it turns out, this is analogous to the handling
      for cold-plugged CPUs in spapr_core_plug().
      
      Cc: qemu-ppc@nongnu.org
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      5c0139a8
  8. 23 11月, 2016 3 次提交
    • D
      spapr: Fix 2.7<->2.8 migration of PCI host bridge · 5c4537bd
      David Gibson 提交于
      daa23699 "spapr_pci: Add a 64-bit MMIO window" subtly broke migration
      from qemu-2.7 to the current version.  It split the device's MMIO
      window into two pieces for 32-bit and 64-bit MMIO.
      
      The patch included backwards compatibility code to convert the old
      property into the new format.  However, the property value was also
      transferred in the migration stream and compared with a (probably
      unwise) VMSTATE_EQUAL.  So, the "raw" value from 2.7 is compared to
      the new style converted value from (pre-)2.8 giving a mismatch and
      migration failure.
      
      Along with the actual field that caused the breakage, there are
      several other ill-advised VMSTATE_EQUAL()s.  To fix forwards
      migration, we read the values in the stream into scratch variables and
      ignore them, instead of comparing for equality.  To fix backwards
      migration, we populate those scratch variables in pre_save() with
      adjusted values to match the old behaviour.
      
      To permit the eventual possibility of removing this cruft from the
      stream, we only include these compatibility fields if a new
      'pre-2.8-migration' property is set.  We clear it on the pseries-2.8
      machine type, which obviously can't be migrated backwards, but set it
      on earlier machine type versions.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      5c4537bd
    • D
      target-ppc: Allow eventual removal of old migration mistakes · 146c11f1
      David Gibson 提交于
      Until very recently, the vmstate for ppc cpus included some poorly
      thought out VMSTATE_EQUAL() components, that can easily break
      migration compatibility, and did so between qemu-2.6 and later
      versions.  A hack was recently added which fixes this migration
      breakage, but it leaves the unhelpful cruft of these fields in the
      migration stream.
      
      This patch adds a new cpu property allowing these fields to be removed
      from the stream entirely.  For the pseries-2.8 machine type - which
      comes after the fix - and for all non-pseries machine types - which
      aren't mature enough to care about cross-version migration - we remove
      the fields from the stream.
      
      For pseries-2.7 and earlier, The migration hack remains in place,
      allowing backwards and forwards migration with the older machine
      types.
      
      This restricts the migration compatibility cruft to older machine
      types, and at least opens the possibility of eventually deprecating
      and removing it entirely.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      146c11f1
    • M
      spapr: migration support for CAS-negotiated option vectors · 62ef3760
      Michael Roth 提交于
      With the additional of the OV5_HP_EVT option vector, we now have
      certain functionality (namely, memory unplug) that checks at run-time
      for whether or not the guest negotiated the option via CAS. Because
      we don't currently migrate these negotiated values, we are unable
      to unplug memory from a guest after it's been migrated until after
      the guest is rebooted and CAS-negotiation is repeated.
      
      This patch fixes this by adding CAS-negotiated options to the
      migration stream. We do this using a subsection, since the
      negotiated value of OV5_HP_EVT is the only option currently needed
      to maintain proper functionality for a running guest.
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      62ef3760
  9. 31 10月, 2016 1 次提交
  10. 28 10月, 2016 8 次提交
    • A
      clean-up: removed duplicate #includes · 814bb12a
      Anand J 提交于
      Some files contain multiple #includes of the same header file.
      Removed most of those unnecessary duplicate entries using
      scripts/clean-includes.
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NAnand J <anand.indukala@gmail.com>
      Signed-off-by: NMichael Tokarev <mjt@tls.msk.ru>
      814bb12a
    • B
      spapr: Memory hot-unplug support · cf632463
      Bharata B Rao 提交于
      Add support to hot remove pc-dimm memory devices.
      
      Since we're introducing a machine-level unplug_request hook, we also
      had handling for CPU unplug there as well to ensure CPU unplug
      continues to work as it did before.
      Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
      * add hooks to CAS/cmdline enablement of hotplug ACR support
      * add hook for CPU unplug
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      cf632463
    • M
      spapr: use count+index for memory hotplug · 79b78a6b
      Michael Roth 提交于
      Commit 0a417869:
      
          spapr: Move memory hotplug to RTAS_LOG_V6_HP_ID_DRC_COUNT type
      
      dropped per-DRC/per-LMB hotplugs event in favor of a bulk add via a
      single LMB count value. This was to avoid overrunning the guest EPOW
      event queue with hotplug events. This works fine, but relies on the
      guest exhaustively scanning for pluggable LMBs to satisfy the
      requested count by issuing rtas-get-sensor(DR_ENTITY_SENSE, ...) calls
      until all the LMBs associated with the DIMM are identified.
      
      With newer support for dedicated hotplug event source, this queue
      exhaustion is no longer as much of an issue due to implementation
      details on the guest side, but we still try to avoid excessive hotplug
      events by now supporting both a count and a starting index to avoid
      unecessary work. This patch makes use of that approach when the
      capability is available.
      
      Cc: bharata@linux.vnet.ibm.com
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      79b78a6b
    • M
      spapr: add hotplug interrupt machine options · f6229214
      Michael Roth 提交于
      This adds machine options of the form:
      
        -machine pseries,modern-hotplug-events=true
        -machine pseries,modern-hotplug-events=false
      
      If false, QEMU will force the use of "legacy" style hotplug events,
      which are surfaced through EPOW events instead of a dedicated
      hot plug event source, and lack certain features necessary, mainly,
      for memory unplug support.
      
      If true, QEMU will enable support for "modern" dedicated hot plug
      event source. Note that we will still default to "legacy" style unless
      the guest advertises support for the "modern" hotplug events via
      ibm,client-architecture-support hcall during early boot.
      
      For pseries-2.7 and earlier we default to false, for newer machine
      types we default to true.
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      f6229214
    • M
      spapr_events: add support for dedicated hotplug event source · ffbb1705
      Michael Roth 提交于
      Hotplug events were previously delivered using an EPOW interrupt
      and were queued by linux guests into a circular buffer. For traditional
      EPOW events like shutdown/resets, this isn't an issue, but for hotplug
      events there are cases where this buffer can be exhausted, resulting
      in the loss of hotplug events, resets, etc.
      
      Newer-style hotplug event are delivered using a dedicated event source.
      We enable this in supported guests by adding standard an additional
      event source in the guest device-tree via /event-sources, and, if
      the guest advertises support for the newer-style hotplug events,
      using the corresponding interrupt to signal the available of
      hotplug/unplug events.
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      ffbb1705
    • M
      spapr: improve ibm,architecture-vec-5 property handling · 417ece33
      Michael Roth 提交于
      ibm,architecture-vec-5 is supposed to encode all option vector 5 bits
      negotiated between platform/guest. Currently we hardcode this property
      in the boot-time device tree to advertise a single negotiated
      capability, "Form 1" NUMA Affinity, regardless of whether or not CAS
      has been invoked or that capability has actually been negotiated.
      
      Improve this by generating ibm,architecture-vec-5 based on the full
      set of option vector 5 capabilities negotiated via CAS.
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      417ece33
    • M
      spapr: add option vector handling in CAS-generated resets · 6787d27b
      Michael Roth 提交于
      In some cases, ibm,client-architecture-support calls can fail. This
      could happen in the current code for situations where the modified
      device tree segment exceeds the buffer size provided by the guest
      via the call parameters. In these cases, QEMU will reset, allowing
      an opportunity to regenerate the device tree from scratch via
      boot-time handling. There are potentially other scenarios as well,
      not currently reachable in the current code, but possible in theory,
      such as cases where device-tree properties or nodes need to be removed.
      
      We currently don't handle either of these properly for option vector
      capabilities however. Instead of carrying the negotiated capability
      beyond the reset and creating the boot-time device tree accordingly,
      we start from scratch, generating the same boot-time device tree as we
      did prior to the CAS-generated and the same device tree updates as we
      did before. This could (in theory) cause us to get stuck in a reset
      loop. This hasn't been observed, but depending on the extensiveness
      of CAS-induced device tree updates in the future, could eventually
      become an issue.
      
      Address this by pulling capability-related device tree
      updates resulting from CAS calls into a common routine,
      spapr_dt_cas_updates(), and adding an sPAPROptionVector*
      parameter that allows us to test for newly-negotiated capabilities.
      We invoke it as follows:
      
      1) When ibm,client-architecture-support gets called, we
         call spapr_dt_cas_updates() with the set of capabilities
         added since the previous call to ibm,client-architecture-support.
         For the initial boot, or a system reset generated by something
         other than the CAS call itself, this set will consist of *all*
         options supported both the platform and the guest. For calls
         to ibm,client-architecture-support immediately after a CAS-induced
         reset, we call spapr_dt_cas_updates() with only the set
         of capabilities added since the previous call, since the other
         capabilities will have already been addressed by the boot-time
         device-tree this time around. In the unlikely event that
         capabilities are *removed* since the previous CAS, we will
         generate a CAS-induced reset. In the unlikely event that we
         cannot fit the device-tree updates into the buffer provided
         by the guest, well generate a CAS-induced reset.
      
      2) When a CAS update results in the need to reset the machine and
         include the updates in the boot-time device tree, we call the
         spapr_dt_cas_updates() using the full set of negotiated
         capabilities as part of the reset path. At initial boot, or after
         a reset generated by something other than the CAS call itself,
         this set will be empty, resulting in what should be the same
         boot-time device-tree as we generated prior to this patch. For
         CAS-induced reset, this routine will be called with the full set of
         capabilities negotiated by the platform/guest in the previous
         CAS call, which should result in CAS updates from previous call
         being accounted for in the initial boot-time device tree.
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [dwg: Changed an int -> bool conversion to be more explicit]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      6787d27b
    • M
      spapr_hcall: use spapr_ovec_* interfaces for CAS options · facdb8b6
      Michael Roth 提交于
      Currently we access individual bytes of an option vector via
      ldub_phys() to test for the presence of a particular capability
      within that byte. Currently this is only done for the "dynamic
      reconfiguration memory" capability bit. If that bit is present,
      we pass a boolean value to spapr_h_cas_compose_response()
      to generate a modified device tree segment with the additional
      properties required to enable this functionality.
      
      As more capability bits are added, will would need to modify the
      code to add additional option vector accesses and extend the
      param list for spapr_h_cas_compose_response() to include similar
      boolean values for these parameters.
      
      Avoid this by switching to spapr_ovec_* helpers so we can do all
      the parsing in one shot and then test for these additional bits
      within spapr_h_cas_compose_response() directly.
      
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      facdb8b6