1. 17 7月, 2017 18 次提交
    • D
      pseries: Use smaller default hash page tables when guest can resize · 2772cf6b
      David Gibson 提交于
      We've now implemented a PAPR extension allowing PAPR guest to resize
      their hash page table (HPT) during runtime.
      
      This patch makes use of that facility to allocate smaller HPTs by default.
      Specifically when a guest is aware of the HPT resize facility, qemu sizes
      the HPT to the initial memory size, rather than the maximum memory size on
      the assumption that the guest will resize its HPT if necessary for hot
      plugged memory.
      
      When the initial memory size is much smaller than the maximum memory size
      (a common configuration with e.g. oVirt / RHEV) then this can save
      significant memory on the HPT.
      
      If the guest does *not* advertise HPT resize awareness when it makes the
      ibm,client-architecture-support call, qemu resizes the HPT for maxmimum
      memory size (unless it's been configured not to allow such guests at all).
      
      For now we make that reallocation assuming the guest has not yet used the
      HPT at all.  That's true in practice, but not, strictly, an architectural
      or PAPR requirement.  If we need to in future we can fix this by having
      the client-architecture-support call reboot the guest with the revised
      HPT size (the client-architecture-support call is explicitly permitted to
      trigger a reboot in this way).
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      2772cf6b
    • D
      pseries: Enable HPT resizing for 2.10 · 52b81ab5
      David Gibson 提交于
      We've now implemented a PAPR extensions which allows PAPR guests (i.e.
      "pseries" machine type) to resize their hash page table during runtime.
      
      However, that extension is only enabled if explicitly chosen on the
      command line.  This patch enables it by default for spapr-2.10, but leaves
      it disabled (by default) for older machine types.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      52b81ab5
    • D
      pseries: Implement HPT resizing · 0b0b8310
      David Gibson 提交于
      This patch implements hypercalls allowing a PAPR guest to resize its own
      hash page table.  This will eventually allow for more flexible memory
      hotplug.
      
      The implementation is partially asynchronous, handled in a special thread
      running the hpt_prepare_thread() function.  The state of a pending resize
      is stored in SPAPR_MACHINE->pending_hpt.
      
      The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
      if one is already in progress, monitor it for completion.  If there is an
      existing HPT resize in progress that doesn't match the size specified in
      the call, it will cancel it, replacing it with a new one matching the
      given size.
      
      The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
      be called successfully once H_RESIZE_HPT_PREPARE has successfully
      completed initialization of a new HPT.  The guest must ensure that there
      are no concurrent accesses to the existing HPT while this is called (this
      effectively means stop_machine() for Linux guests).
      
      For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
      HPTE into the new HPT.  This can have quite high latency, but it seems to
      be of the order of typical migration downtime latencies for HPTs of size
      up to ~2GiB (which would be used in a 256GiB guest).
      
      In future we probably want to move more of the rehashing to the "prepare"
      phase, by having H_ENTER and other hcalls update both current and
      pending HPTs.  That's a project for another day, but should be possible
      without any changes to the guest interface.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      0b0b8310
    • D
      pseries: Stubs for HPT resizing · 30f4b05b
      David Gibson 提交于
      This introduces stub implementations of the H_RESIZE_HPT_PREPARE and
      H_RESIZE_HPT_COMMIT hypercalls which we hope to add in a PAPR
      extension to allow run time resizing of a guest's hash page table.  It
      also adds a new machine property for controlling whether this new
      facility is available.
      
      For now we only allow resizing with TCG, allowing it with KVM will require
      kernel changes as well.
      
      Finally, it adds a new string to the hypertas property in the device
      tree, advertising to the guest the availability of the HPT resizing
      hypercalls.  This is a tentative suggested value, and would need to be
      standardized by PAPR before being merged.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      30f4b05b
    • A
      ppc/pnv: Remove unused XICSState reference · 2ee77040
      Alexey Kardashevskiy 提交于
      e6f7e110 "ppc/xics: remove the XICSState classes" got rid of
      XICSState, this is just an leftover.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      2ee77040
    • G
      spapr: fix potential memory leak in spapr_core_plug() · e49c63d5
      Greg Kurz 提交于
      Since commit 5c1da812 ("spapr: Remove unnecessary differences between
      hotplug and coldplug paths"), the CPU DT for the DRC is always allocated.
      This causes a memory leak for pseries-2.6 and older machine types, that
      don't support CPU hotplug and don't allocate DRCs for CPUs.
      Reported-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      e49c63d5
    • D
      spapr: Implement DR-indicator for physical DRCs only · 67fea71b
      David Gibson 提交于
      According to PAPR, the DR-indicator should only be valid for physical DRCs,
      not logical DRCs.  At the moment we implement it for all DRCs, so restrict
      it to physical ones only.
      
      We move the state to the physical DRC subclass, which means adding some
      QOM boilerplate to handle the newly distinct type.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NDaniel Barboza <danielhb@linux.vnet.ibm.com>
      Tested-by: NDaniel Barboza <danielhb@linux.vnet.ibm.com>
      67fea71b
    • D
      spapr: Remove sPAPRConfigureConnectorState sub-structure · 4445b1d2
      David Gibson 提交于
      Most of the time, the state of a DRC object is contained in the single
      'state' variable.  However, during the transition from UNISOLATE to
      CONFIGURED state requires multiple calls to the ibm,configure-connector
      RTAS call to retrieve the device tree for the attached device.  We need
      some extra state to keep track of where we're up to in delivering the
      device tree information to the guest.
      
      Currently that extra state is in a sPAPRConfigureConnectorState
      substructure which is only allocated when we're in the middle of the
      configure connector process.  That sounds like a good idea, but the extra
      state is only two integers - on many platforms that will take up the same
      room as the (maybe NULL) ccs pointer even before malloc() overhead.  Plus
      it's another object whose lifetime we need to manage.  In short, it's not
      worth it.
      
      So, fold the sPAPRConfigureConnectorState substructure directly into the
      DRC object.
      
      Previously the structure was allocated lazily when the configure-connector
      call discovers it's not there.  Now, we need to initialize the subfields
      pre-emptively, as soon as we enter UNISOLATE state.
      
      Although it's not strictly necessary (the field values should only ever
      be consulted when in UNISOLATE state), we try to keep them at -1 when in
      other states, as a debugging aid.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NDaniel Barboza <danielhb@linux.vnet.ibm.com>
      Tested-by: NDaniel Barboza <danielhb@linux.vnet.ibm.com>
      4445b1d2
    • D
      spapr: Consolidate DRC state variables · 9d4c0f4f
      David Gibson 提交于
      Each DRC has three fields describing its state: isolation_state,
      allocation_state and configured.  At first this seems like a reasonable
      representation, since its based directly on the PAPR defined
      isolation-state and allocation-state indicators.  However:
        * Only a few combinations of the two fields' values are permitted
        * allocation_state isn't used at all for physical DRCs
        * The indicators are write only so they don't really have a well
          defined current value independent of each other
      
      This replaces these variables with a single state variable, whose names
      and numbers are based on the diagram in LoPAPR section 13.4.  Along with
      this we add code to check the current state on various operations and make
      sure the requested transition is permitted.
      
      Strictly speaking, this makes guest visible changes to behaviour (since we
      probably allowed some transitions we shouldn't have before).  However, a
      hypothetical guest broken by that wasn't PAPR compliant, and probably
      wouldn't have worked under PowerVM.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NDaniel Barboza <danielhb@linux.vnet.ibm.com>
      Tested-by: NDaniel Barboza <danielhb@linux.vnet.ibm.com>
      9d4c0f4f
    • D
      spapr: Cleanups relating to DRC awaiting_release field · f1c52354
      David Gibson 提交于
      'awaiting_release' indicates that the host has requested an unplug of the
      device attached to the DRC, but the guest has not (yet) put the device
      into a state where it is safe to complete removal.
      
      1. Rename it to 'unplug_requested' which to me at least is clearer
      
      2. Remove the ->release_pending() method used to check this from outside
      spapr_drc.c.  The method only plausibly has one implementation, so use
      a plain function (spapr_drc_unplug_requested()) instead.
      
      3. Remove it from the migration stream.  Attempting to migrate mid-unplug
      is broken not just for spapr - in general management has no good way to
      determine if the device should be present on the destination or not.  So,
      until that's fixed, there's no point adding extra things to the stream.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Tested-by: NDaniel Barboza <danielhb@linux.vnet.ibm.com>
      f1c52354
    • D
      spapr: Refactor spapr_drc_detach() · a8dc47fd
      David Gibson 提交于
      This function has two unused parameters - remove them.
      
      It also sets awaiting_release on all paths, except one.  On that path
      setting it is harmless, since it will be immediately cleared by
      spapr_drc_release().  So factor it out of the if statements.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Tested-by: NDaniel Barboza <danielhb@linux.vnet.ibm.com>
      a8dc47fd
    • D
      spapr: Abort on delete failure in spapr_drc_release() · ba50822f
      David Gibson 提交于
      We currently ignore errors from the object_property_del() in
      spapr_drc_release().  But the only way that could fail is if the property
      doesn't exist, in which case it's a bug that we're in spapr_drc_release()
      at all.  So change from ignoring to abort()ing on errors.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      ba50822f
    • D
      spapr: Simplify unplug path · 765d1bdd
      David Gibson 提交于
      spapr_lmb_release() and spapr_core_release() call hotplug_handler_unplug()
      which after a bunch of indirection calls spapr_memory_unplug() or
      spapr_core_unplug().  But we already know which is the appropriate thing
      to call here, so we can just fold it directly into the release function.
      
      Once that's done, there's no need for an hc->unplug method in the spapr
      machine at all: since we also have an hc->unplug_request method, the
      hotplug core will never use ->unplug.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Tested-by: NDaniel Barboza <danielhb@linux.vnet.ibm.com>
      765d1bdd
    • D
      spapr: Remove 'awaiting_allocation' DRC flag · 82a93a1d
      David Gibson 提交于
      The awaiting_allocation flag in the DRC was introduced by aab99135
      "spapr_drc: Prevent detach racing against attach for CPU DR", allegedly to
      prevent a guest crash on racing attach and detach.  Except.. information
      from the BZ actually suggests a qemu crash, not a guest crash.  And there
      shouldn't be a problem here anyway: if the guest has already moved the DRC
      away from UNUSABLE state, the detach would already be deferred, and if it
      hadn't it should be safe to detach it (the guest should fail gracefully
      when it attempts to change the allocation state).
      
      I think this was probably just a bandaid for some other problem in the
      state management.  So, remove awaiting_allocation and associated code.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Tested-by: NGreg Kurz <groug@kaod.org>
      Tested-by: NDaniel Barboza <danielhb@linux.vnet.ibm.com>
      82a93a1d
    • L
      spapr: Treat devices added before inbound migration as coldplugged · 94fd9cba
      Laurent Vivier 提交于
      When migrating a guest which has already had devices hotplugged,
      libvirt typically starts the destination qemu with -incoming defer,
      adds those hotplugged devices with qmp, then initiates the incoming
      migration.
      
      This causes problems for the management of spapr DRC state.  Because
      the device is treated as hotplugged, it goes into a DRC state for a
      device immediately after it's plugged, but before the guest has
      acknowledged its presence.  However, chances are the guest on the
      source machine *has* acknowledged the device's presence and configured
      it.
      
      If the source has fully configured the device, then DRC state won't be
      sent in the migration stream: for maximum migration compatibility with
      earlier versions we don't migrate DRCs in coldplug-equivalent state.
      That means that the DRC effectively changes state over the migrate,
      causing problems later on.
      
      In addition, logging hotplug events for these devices isn't what we
      want because a) those events should already have been issued on the
      source host and b) the event queue should get wiped out by the
      incoming state anyway.
      
      In short, what we really want is to treat devices added before an
      incoming migration as if they were coldplugged.
      
      To do this, we first add a spapr_drc_hotplugged() helper which
      determines if the device is hotplugged in the sense relevant for DRC
      state management.  We only send hotplug events when this is true.
      Second, when we add a device which isn't hotplugged in this sense, we
      force a reset of the DRC state - this ensures the DRC is in a
      coldplug-equivalent state (there isn't usually a system reset between
      these device adds and the incoming migration).
      
      This is based on an earlier patch by Laurent Vivier, cleaned up and
      extended.
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Tested-by: NDaniel Barboza <danielhb@linux.vnet.ibm.com>
      94fd9cba
    • D
      spapr: Minor cleanups to events handling · 5341258e
      David Gibson 提交于
      The rtas_error_log structure is marked packed, which strongly suggests its
      precise layout is important to match an external interface.  Along with
      that one could expect it to have a fixed endianness to match the same
      interface.  That used to be the case - matching the layout of PAPR RTAS
      event format and requiring BE fields.
      
      Now, however, it's only used embedded within sPAPREventLogEntry with the
      fields in native order, since they're processed internally.
      
      Clear that up by removing the nested structure in sPAPREventLogEntry.
      struct rtas_error_log is moved back to spapr_events.c where it is used as
      a temporary to help convert the fields in sPAPREventLogEntry to the correct
      in memory format when delivering an event to the guest.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      5341258e
    • D
      spapr: migrate pending_events of spapr state · fd38804b
      Daniel Henrique Barboza 提交于
      In racing situations between hotplug events and migration operation,
      a rtas hotplug event could have not yet be delivered to the source
      guest when migration is started. In this case the pending_events of
      spapr state need be transmitted to the target so that the hotplug
      event can be finished on the target.
      
      To achieve the minimal VMSD possible to migrate the pending_events list,
      this patch makes the changes in spapr_events.c:
      
      - 'log_type' of sPAPREventLogEntry struct deleted. This information can be
      derived by inspecting the rtas_error_log summary field. A new function
      called 'spapr_event_log_entry_type' was added to retrieve the type of
      a given sPAPREventLogEntry.
      
      - sPAPREventLogEntry, epow_log_full and hp_log_full were redesigned. The
      only data we're going to migrate in the VMSD is the event log data itself,
      which can be divided in two parts: a rtas_error_log header and an extended
      event log field. The rtas_error_log header contains information about the
      size of the extended log field, which can be used inside VMSD as the size
      parameter of the VBUFFER_ALOC field that will store it. To allow this use,
      the header.extended_length field must be exposed inline to the VMSD instead
      of embedded into a 'data' field that holds everything. With this in mind,
      the following changes were done:
      
          * a new 'header' field was added to sPAPREventLogEntry. This field holds a
      a struct rtas_error_log inline.
          * the declaration of the 'rtas_error_log' struct was moved to spapr.h
      to be visible to the VMSD macros.
          * 'data' field of sPAPREventLogEntry was renamed to 'extended_log' and
      now holds only the contents of the extended event log.
         *  'struct rtas_error_log hdr' were taken away from both epow_log_full
      and hp_log_full. This information is now available at the header field of
      sPAPREventLogEntry.
         * epow_log_full and hp_log_full were renamed to epow_extended_log and
      hp_extended_log respectively. This rename makes it clearer to understand
      the new purpose of both structures: hold the information of an extended
      event log field.
          * spapr_powerdown_req and spapr_hotplug_req_event now creates a
      sPAPREventLogEntry structure that contains the full rtas log entry.
          * rtas_event_log_queue and rtas_event_log_dequeue now receives a
      sPAPREventLogEntry pointer as a parameter instead of a void pointer.
      
      - the endianess of the sPAPREventLogEntry header is now native instead
      of be32. We can use the fields in native endianess internally and write
      them in be32 in the guest physical memory inside 'check_exception'. This
      allows the VMSD inside spapr.c to read the correct size of the
      entended_log field.
      
      - inside spapr.c, pending_events is put in a subsection in the spapr state
      VMSD to make sure migration across different versions is not broken.
      
      A small change in rtas_event_log_queue and rtas_event_log_dequeue were also
      made: instead of calling qdev_get_machine(), both functions now receive
      a pointer to the sPAPRMachineState. This pointer is already available in
      the callers of these functions and we don't need to waste resources
      calling qdev() again.
      Signed-off-by: NDaniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      fd38804b
    • D
      spapr: Remove unnecessary instance_size specifications from DRC subtypes · 3579d606
      David Gibson 提交于
      All the DRC subtypes explicitly list instance_size in TypeInfo (all as
      sizeof(sPAPRDRConnector).  This isn't necessary, since if it's not listed
      it will be derived from the parent type.
      
      Worse, this is dangerous, because if a subtype is changed in future to
      have a larger structure, then subtypes of that subtype also need to have
      instance_size changed, or it will lead to hard to track memory corruption
      bugs.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      3579d606
  2. 15 7月, 2017 1 次提交
  3. 14 7月, 2017 21 次提交