1. 22 12月, 2018 22 次提交
  2. 21 12月, 2018 18 次提交
    • P
      Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.0-20181221' into staging · 891ff9f4
      Peter Maydell 提交于
      ppc patch queue 2018-12-21
      
      This pull request supersedes the one from 2018-12-13.
      
      This is a revised first ppc pull request for qemu-4.0.  Highlights
      are:
      
       * Most of the code for the POWER9 "XIVE" interrupt controller
         (not complete yet, but we're getting there)
       * A number of g_new vs. g_malloc cleanups
       * Some IRQ wiring cleanups
       * A fix for how we advertise NUMA nodes to the guest for pseries
      
      # gpg: Signature made Fri 21 Dec 2018 05:34:12 GMT
      # gpg:                using RSA key 6C38CACA20D9B392
      # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
      # gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
      # gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
      # gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
      # Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392
      
      * remotes/dgibson/tags/ppc-for-4.0-20181221: (40 commits)
        MAINTAINERS: PPC: add a XIVE section
        spapr: change default CPU type to POWER9
        spapr: introduce an 'ic-mode' machine option
        spapr: add an extra OV5 field to the sPAPR IRQ backend
        spapr: add a 'reset' method to the sPAPR IRQ backend
        spapr: extend the sPAPR IRQ backend for XICS migration
        spapr: allocate the interrupt thread context under the CPU core
        spapr: add device tree support for the XIVE exploitation mode
        spapr: add hcalls support for the XIVE exploitation interrupt mode
        spapr: introduce a new machine IRQ backend for XIVE
        spapr-iommu: Always advertise the maximum possible DMA window size
        spapr/xive: use the VCPU id as a NVT identifier
        spapr/xive: introduce a XIVE interrupt controller
        ppc/xive: notify the CPU when the interrupt priority is more privileged
        ppc/xive: introduce a simplified XIVE presenter
        ppc/xive: introduce the XIVE interrupt thread context
        ppc/xive: add support for the END Event State Buffers
        Changes requirement for "vsubsbs" instruction
        spapr: export and rename the xics_max_server_number() routine
        spapr: introduce a spapr_irq_init() routine
        ...
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      891ff9f4
    • P
      Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging · 15763776
      Peter Maydell 提交于
      pci, pc, virtio: fixes, features
      
      VTD fixes
      IR and split irqchip are now the default for Q35
      ACPI refactoring
      hotplug refactoring
      new names for virtio devices
      multiple pcie link width/speeds
      PCI fixes
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      
      # gpg: Signature made Thu 20 Dec 2018 18:26:03 GMT
      # gpg:                using RSA key 281F0DB8D28D5469
      # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
      # gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
      # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
      #      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469
      
      * remotes/mst/tags/for_upstream: (44 commits)
        x86-iommu: turn on IR by default if proper
        x86-iommu: switch intr_supported to OnOffAuto type
        q35: set split kernel irqchip as default
        pci: Adjust PCI config limit based on bus topology
        spapr_pci: perform unplug via the hotplug handler
        pci/shpc: perform unplug via the hotplug handler
        pci: Reuse pci-bridge hotplug handler handlers for pcie-pci-bridge
        pci/pcie: perform unplug via the hotplug handler
        pci/pcihp: perform unplug via the hotplug handler
        pci/pcihp: overwrite hotplug handler recursively from the start
        pci/pcihp: perform check for bus capability in pre_plug handler
        s390x/pci: rename hotplug handler callbacks
        pci/shpc: rename hotplug handler callbacks
        pci/pcie: rename hotplug handler callbacks
        hw/i386: Remove deprecated machines pc-0.10 and pc-0.11
        hw: acpi: Remove AcpiRsdpDescriptor and fix tests
        hw: acpi: Export and share the ARM RSDP build
        hw: arm: Support both legacy and current RSDP build
        hw: arm: Convert the RSDP build to the buid_append_foo() API
        hw: arm: Carry RSDP specific data through AcpiRsdpData
        ...
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      15763776
    • C
      MAINTAINERS: PPC: add a XIVE section · b62c6e12
      Cédric Le Goater 提交于
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      b62c6e12
    • C
      34a6b015
    • C
      spapr: introduce an 'ic-mode' machine option · 3ba3d0bc
      Cédric Le Goater 提交于
      This option is used to select the interrupt controller mode (XICS or
      XIVE) with which the machine will operate. XICS being the default
      mode for now.
      
      When running a machine with the XIVE interrupt mode backend, the guest
      OS is required to have support for the XIVE exploitation mode. In the
      case of legacy OS, the mode selected by CAS should be XICS and the OS
      should fail to boot. However, QEMU could possibly detect it, terminate
      the boot process and reset to stop in the SLOF firmware. This is not
      yet handled.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      3ba3d0bc
    • C
      spapr: add an extra OV5 field to the sPAPR IRQ backend · db592b5b
      Cédric Le Goater 提交于
      The interrupt modes supported by the hypervisor are advertised to the
      guest with new bits definitions of the option vector 5 of property
      "ibm,arch-vec-5-platform-support. The byte 23 bits 0-1 of the OV5 are
      defined as follow :
      
        0b00   PAPR 2.7 and earlier (Legacy systems)
        0b01   XIVE Exploitation mode only
        0b10   Either available
      
      If the client/guest selects the XIVE interrupt mode, it informs the
      hypervisor by returning the value 0b01 in byte 23 bits 0-1. A 0b00
      value indicates the use of the XICS interrupt mode (Legacy systems).
      
      The sPAPR IRQ backend is extended with these definitions and the
      values are directly used to populate the "ibm,arch-vec-5-platform-support"
      property. The interrupt mode is advertised under TCG and under KVM.
      Although a KVM XIVE device is not yet available, the machine can still
      operate with kernel_irqchip=off. However, we apply a restriction on
      the CPU which is required to be a POWER9 when a XIVE interrupt
      controller is in use.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      db592b5b
    • C
      spapr: add a 'reset' method to the sPAPR IRQ backend · b2e22477
      Cédric Le Goater 提交于
      For the time being, the XIVE reset handler updates the OS CAM line of
      the vCPU as it is done under a real hypervisor when a vCPU is
      scheduled to run on a HW thread. This will let the XIVE presenter
      engine find a match among the NVTs dispatched on the HW threads.
      
      This handler will become even more useful when we introduce the
      machine supporting both interrupt modes, XIVE and XICS. In this
      machine, the interrupt mode is chosen by the CAS negotiation process
      and activated after a reset.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      [dwg: Fix style nits]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      b2e22477
    • C
      spapr: extend the sPAPR IRQ backend for XICS migration · 1c53b06c
      Cédric Le Goater 提交于
      Introduce a new sPAPR IRQ handler to handle resend after migration
      when the machine is using a KVM XICS interrupt controller model.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      1c53b06c
    • C
      spapr: allocate the interrupt thread context under the CPU core · 1a937ad7
      Cédric Le Goater 提交于
      Each interrupt mode has its own specific interrupt presenter object,
      that we store under the CPU object, one for XICS and one for XIVE.
      
      Extend the sPAPR IRQ backend with a new handler to support them both.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      1a937ad7
    • C
      spapr: add device tree support for the XIVE exploitation mode · 6e21de4a
      Cédric Le Goater 提交于
      The XIVE interface for the guest is described in the device tree under
      the "interrupt-controller" node. A couple of new properties are
      specific to XIVE :
      
       - "reg"
      
         contains the base address and size of the thread interrupt
         managnement areas (TIMA), for the User level and for the Guest OS
         level. Only the Guest OS level is taken into account today.
      
       - "ibm,xive-eq-sizes"
      
         the size of the event queues. One cell per size supported, contains
         log2 of size, in ascending order.
      
       - "ibm,xive-lisn-ranges"
      
         the IRQ interrupt number ranges assigned to the guest for the IPIs.
      
      and also under the root node :
      
       - "ibm,plat-res-int-priorities"
      
         contains a list of priorities that the hypervisor has reserved for
         its own use. OPAL uses the priority 7 queue to automatically
         escalate interrupts for all other queues (DD2.X POWER9). So only
         priorities [0..6] are allowed for the guest.
      
      Extend the sPAPR IRQ backend with a new handler to populate the DT
      with the appropriate "interrupt-controller" node.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      [dwg: Fix style nits]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      6e21de4a
    • C
      spapr: add hcalls support for the XIVE exploitation interrupt mode · 23bcd5eb
      Cédric Le Goater 提交于
      The different XIVE virtualization structures (sources and event queues)
      are configured with a set of Hypervisor calls :
      
       - H_INT_GET_SOURCE_INFO
      
         used to obtain the address of the MMIO page of the Event State
         Buffer (ESB) entry associated with the source.
      
       - H_INT_SET_SOURCE_CONFIG
      
         assigns a source to a "target".
      
       - H_INT_GET_SOURCE_CONFIG
      
         determines which "target" and "priority" is assigned to a source
      
       - H_INT_GET_QUEUE_INFO
      
         returns the address of the notification management page associated
         with the specified "target" and "priority".
      
       - H_INT_SET_QUEUE_CONFIG
      
         sets or resets the event queue for a given "target" and "priority".
         It is also used to set the notification configuration associated
         with the queue, only unconditional notification is supported for
         the moment. Reset is performed with a queue size of 0 and queueing
         is disabled in that case.
      
       - H_INT_GET_QUEUE_CONFIG
      
         returns the queue settings for a given "target" and "priority".
      
       - H_INT_RESET
      
         resets all of the guest's internal interrupt structures to their
         initial state, losing all configuration set via the hcalls
         H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
      
       - H_INT_SYNC
      
         issue a synchronisation on a source to make sure all notifications
         have reached their queue.
      
      Calls that still need to be addressed :
      
         H_INT_SET_OS_REPORTING_LINE
         H_INT_GET_OS_REPORTING_LINE
      
      See the code for more documentation on each hcall.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [dwg: Folded in fix for field accessors]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      23bcd5eb
    • C
      spapr: introduce a new machine IRQ backend for XIVE · dcc345b6
      Cédric Le Goater 提交于
      The XIVE IRQ backend uses the same layout as the new XICS backend but
      covers the full range of the IRQ number space. The IRQ numbers for the
      CPU IPIs are allocated at the bottom of this space, below 4K, to
      preserve compatibility with XICS which does not use that range.
      
      This should be enough given that the maximum number of CPUs is 1024
      for the sPAPR machine under QEMU. For the record, the biggest POWER8
      or POWER9 system has a maximum of 1536 HW threads (16 sockets, 192
      cores, SMT8).
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      dcc345b6
    • A
      spapr-iommu: Always advertise the maximum possible DMA window size · 8994e91e
      Alexey Kardashevskiy 提交于
      When deciding about the huge DMA window, the typical Linux pseries guest
      uses the maximum allowed RAM size as the upper limit. We did the same
      on QEMU side to match that logic. Now we are going to support a GPU RAM
      pass through which is not available at the guest boot time as it requires
      the guest driver interaction. As the result, the guest requests a smaller
      window than it should. Therefore the guest needs to be patched to
      understand this new memory and so does QEMU.
      
      Instead of reimplementing here whatever solution we choose for the guest,
      this advertises the biggest possible window size limited by 32 bit
      (as defined by LoPAPR). Since the window size has to be power-of-two
      (the create rtas call receives a window shift, not a size),
      this uses 0x8000.0000 as the maximum number of TCEs possible (rather than
      32bit maximum of 0xffff.ffff).
      
      This is safe as:
      1. The guest visible emulated table is allocated in KVM (actual pages
      are allocated in page fault handler) and QEMU (actual pages are allocated
      when updated);
      2. The hardware table (and corresponding userspace address table)
      supports sparse allocation and also checks for locked_vm limit so
      it is unable to cause the host any damage.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      8994e91e
    • C
      spapr/xive: use the VCPU id as a NVT identifier · 0cddee8d
      Cédric Le Goater 提交于
      The IVPE scans the O/S CAM line of the XIVE thread interrupt contexts
      to find a matching Notification Virtual Target (NVT) among the NVTs
      dispatched on the HW processor threads.
      
      On a real system, the thread interrupt contexts are updated by the
      hypervisor when a Virtual Processor is scheduled to run on a HW
      thread. Under QEMU, the model will emulate the same behavior by
      hardwiring the NVT identifier in the thread context registers at
      reset.
      
      The NVT identifier used by the sPAPRXive model is the VCPU id. The END
      identifier is also derived from the VCPU id. A set of helpers doing
      the conversion between identifiers are provided for the hcalls
      configuring the sources and the ENDs.
      
      The model does not need a NVT table but the XiveRouter NVT operations
      are provided to perform some extra checks in the routing algorithm.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      0cddee8d
    • C
      spapr/xive: introduce a XIVE interrupt controller · 3aa597f6
      Cédric Le Goater 提交于
      sPAPRXive models the XIVE interrupt controller of the sPAPR machine.
      It inherits from the XiveRouter and provisions storage for the routing
      tables :
      
        - Event Assignment Structure (EAS)
        - Event Notification Descriptor (END)
      
      The sPAPRXive model incorporates an internal XiveSource for the IPIs
      and for the interrupts of the virtual devices of the guest. This model
      is consistent with XIVE architecture which also incorporates an
      internal IVSE for IPIs and accelerator interrupts in the IVRE
      sub-engine.
      
      The sPAPRXive model exports two memory regions, one for the ESB
      trigger and management pages used to control the sources and one for
      the TIMA pages. They are mapped by default at the addresses found on
      chip 0 of a baremetal system. This is also consistent with the XIVE
      architecture which defines a Virtualization Controller BAR for the
      internal IVSE ESB pages and a Thread Managment BAR for the TIMA.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [dwg: Fold in field accessor fixes]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      3aa597f6
    • C
      ppc/xive: notify the CPU when the interrupt priority is more privileged · cdd4de68
      Cédric Le Goater 提交于
      After the event data was enqueued in the O/S Event Queue, the IVPE
      raises the bit corresponding to the priority of the pending interrupt
      in the register IBP (Interrupt Pending Buffer) to indicate there is an
      event pending in one of the 8 priority queues. The Pending Interrupt
      Priority Register (PIPR) is also updated using the IPB. This register
      represent the priority of the most favored pending notification.
      
      The PIPR is then compared to the the Current Processor Priority
      Register (CPPR). If it is more favored (numerically less than), the
      CPU interrupt line is raised and the EO bit of the Notification Source
      Register (NSR) is updated to notify the presence of an exception for
      the O/S. The check needs to be done whenever the PIPR or the CPPR are
      changed.
      
      The O/S acknowledges the interrupt with a special load in the Thread
      Interrupt Management Area. If the EO bit of the NSR is set, the CPPR
      takes the value of PIPR. The bit number in the IBP corresponding to
      the priority of the pending interrupt is reseted and so is the EO bit
      of the NSR.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [dwg: Fix style nits]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      cdd4de68
    • C
      ppc/xive: introduce a simplified XIVE presenter · af53dbf6
      Cédric Le Goater 提交于
      The last sub-engine of the XIVE architecture is the Interrupt
      Virtualization Presentation Engine (IVPE). On HW, the IVRE and the
      IVPE share elements, the Power Bus interface (CQ), the routing table
      descriptors, and they can be combined in the same HW logic. We do the
      same in QEMU and combine both engines in the XiveRouter for
      simplicity.
      
      When the IVRE has completed its job of matching an event source with a
      Notification Virtual Target (NVT) to notify, it forwards the event
      notification to the IVPE sub-engine. The IVPE scans the thread
      interrupt contexts of the Notification Virtual Targets (NVT)
      dispatched on the HW processor threads and if a match is found, it
      signals the thread. If not, the IVPE escalates the notification to
      some other targets and records the notification in a backlog queue.
      
      The IVPE maintains the thread interrupt context state for each of its
      NVTs not dispatched on HW processor threads in the Notification
      Virtual Target table (NVTT).
      
      The model currently only supports single NVT notifications.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      [dwg: Folded in fix for field accessors]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      af53dbf6
    • C
      ppc/xive: introduce the XIVE interrupt thread context · 207d9fe9
      Cédric Le Goater 提交于
      Each POWER9 processor chip has a XIVE presenter that can generate four
      different exceptions to its threads:
      
        - hypervisor exception,
        - O/S exception
        - Event-Based Branch (EBB)
        - msgsnd (doorbell).
      
      Each exception has a state independent from the others called a Thread
      Interrupt Management context. This context is a set of registers which
      lets the thread handle priority management and interrupt acknowledgment
      among other things. The most important ones being :
      
        - Interrupt Priority Register  (PIPR)
        - Interrupt Pending Buffer     (IPB)
        - Current Processor Priority   (CPPR)
        - Notification Source Register (NSR)
      
      These registers are accessible through a specific MMIO region, called
      the Thread Interrupt Management Area (TIMA), four aligned pages, each
      exposing a different view of the registers. First page (page address
      ending in 0b00) gives access to the entire context and is reserved for
      the ring 0 view for the physical thread context. The second (page
      address ending in 0b01) is for the hypervisor, ring 1 view. The third
      (page address ending in 0b10) is for the operating system, ring 2
      view. The fourth (page address ending in 0b11) is for user level, ring
      3 view.
      
      The thread interrupt context is modeled with a XiveTCTX object
      containing the values of the different exception registers. The TIMA
      region is mapped at the same address for each CPU.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      207d9fe9