1. 16 11月, 2017 2 次提交
    • D
      NUMA: Enable adding NUMA node implicitly · 7b8be49d
      Dou Liyang 提交于
      Linux and Windows need ACPI SRAT table to make memory hotplug work properly,
      however currently QEMU doesn't create SRAT table if numa options aren't present
      on CLI.
      
      Which breaks both linux and windows guests in certain conditions:
       * Windows: won't enable memory hotplug without SRAT table at all
       * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table
         present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers
         when memory is hotplugged and guest tries to use it with that drivers.
      
      Fix above issues by automatically creating a numa node when QEMU is started with
      memory hotplug enabled but without '-numa' options on CLI.
      (PS: auto-create numa node only for new machine types so not to break migration).
      
      Which would provide SRAT table to guests without explicit -numa options on CLI
      and would allow:
       * Windows: to enable memory hotplug
       * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated
         buffers that legacy drivers/hw can handle.
      
      [Rewritten by Igor]
      Reported-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Suggested-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Marcel Apfelbaum <marcel@redhat.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Thomas Huth <thuth@redhat.com>
      Cc: Alistair Francis <alistair23@gmail.com>
      Cc: Takao Indoh <indou.takao@jp.fujitsu.com>
      Cc: Izumi Taku <izumi.taku@jp.fujitsu.com>
      Reviewed-by: NIgor Mammedov <imammedo@redhat.com>
      Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      7b8be49d
    • M
      hw/pci-host: Fix x86 Host Bridges 64bit PCI hole · 9fa99d25
      Marcel Apfelbaum 提交于
      Currently there is no MMIO range over 4G
      reserved for PCI hotplug. Since the 32bit PCI hole
      depends on the number of cold-plugged PCI devices
      and other factors, it is very possible is too small
      to hotplug PCI devices with large BARs.
      
      Fix it by reserving 2G for I4400FX chipset
      in order to comply with older Win32 Guest OSes
      and 32G for Q35 chipset.
      
      Even if the new defaults of pci-hole64-size will appear in
      "info qtree" also for older machines, the property was
      not implemented so no changes will be visible to guests.
      
      Note this is a regression since prev QEMU versions had
      some range reserved for 64bit PCI hotplug.
      Reviewed-by: NLaszlo Ersek <lersek@redhat.com>
      Reviewed-by: NGerd Hoffmann <kraxel@redhat.com>
      Signed-off-by: NMarcel Apfelbaum <marcel@redhat.com>
      Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      9fa99d25
  2. 05 11月, 2017 1 次提交
  3. 27 10月, 2017 2 次提交
  4. 15 10月, 2017 4 次提交
  5. 12 10月, 2017 1 次提交
    • I
      pc: make sure that plugged CPUs are of the same type · 6970c5ff
      Igor Mammedov 提交于
      heterogeneous cpus are not supported and hotplugging different
      cpu model crashes QEMU:
      
        qemu-system-x86_64 -cpu qemu64 -smp 1,maxcpus=2
        (qemu) device_add host-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,id=foo
        (qemu) info cpus
        error: failed to get MSR 0x38d
        qemu-system-x86_64: target/i386/kvm.c:2121: kvm_get_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
        Aborted (core dumped)
      
      Gracefully fail hotplug process in case of user mistake.
      Reported-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Message-Id: <1507638879-200718-1-git-send-email-imammedo@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6970c5ff
  6. 02 10月, 2017 1 次提交
  7. 27 9月, 2017 1 次提交
  8. 20 9月, 2017 2 次提交
    • E
      hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM · 4926403c
      Eduardo Habkost 提交于
      Currently, Using the fisrt node without memory on the machine makes
      QEMU unhappy. With this example command line:
        ... \
        -m 1024M,slots=4,maxmem=32G \
        -numa node,nodeid=0 \
        -numa node,mem=1024M,nodeid=1 \
        -numa node,nodeid=2 \
        -numa node,nodeid=3 \
      Guest reports "No NUMA configuration found" and the NUMA topology is
      wrong.
      
      This is because when QEMU builds ACPI SRAT, it regards node 0 as the
      default node to deal with the memory hole(640K-1M). this means the
      node0 must have some memory(>1M), but, actually it can have no
      memory.
      
      Fix this problem by cut out the 640K hole in the same way the PCI
      4G hole does.
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Message-Id: <1504231805-30957-2-git-send-email-douly.fnst@cn.fujitsu.com>
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      4926403c
    • I
      numa: cpu: calculate/set default node-ids after all -numa CLI options are parsed · 79e07936
      Igor Mammedov 提交于
      Calculating default node-ids for CPUs in possible_cpu_arch_ids()
      is rather fragile since defaults calculation uses nb_numa_nodes but
      callback might be potentially called early before all -numa CLI
      options are parsed, which would lead to cpus assigned only upto
      nb_numa_nodes at the time possible_cpu_arch_ids() is called.
      
      Issue was introduced by
      (7c88e65d numa: mirror cpu to node mapping in MachineState::possible_cpus)
      and for example CLI:
        -smp 4 -numa node,cpus=0 -numa node
      would set props.node-id in possible_cpus array for every non
      explicitly mapped CPU to the first node.
      
      Issue is not visible to guest nor to mgmt interface due to
        1) implictly mapped cpus are forced to the first node in
           case of partial mapping
        2) in case of default mapping possible_cpu_arch_ids() is
           called after all -numa options are parsed (resulting
           in correct mapping).
      
      However it's fragile to rely on late execution of
      possible_cpu_arch_ids(), therefore add machine specific
      callback that returns node-id for CPU and use it to calculate/
      set defaults at machine_numa_finish_init() time when all -numa
      options are parsed.
      Reported-by: NEduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Message-Id: <1496314408-163972-1-git-send-email-imammedo@redhat.com>
      Reviewed-by: NEduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      79e07936
  9. 19 9月, 2017 6 次提交
    • A
      General warn report fixups · b62e39b4
      Alistair Francis 提交于
      Tidy up some of the warn_report() messages after having converted them
      to use warn_report().
      Signed-off-by: NAlistair Francis <alistair.francis@xilinx.com>
      Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
      Message-Id: <9cb1d23551898c9c9a5f84da6773e99871285120.1505158760.git.alistair.francis@xilinx.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b62e39b4
    • A
      Convert multi-line fprintf() to warn_report() · 8297be80
      Alistair Francis 提交于
      Convert all the multi-line uses of fprintf(stderr, "warning:"..."\n"...
      to use warn_report() instead. This helps standardise on a single
      method of printing warnings to the user.
      
      All of the warnings were changed using these commands:
        find ./* -type f -exec sed -i \
          'N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \
          {} +
        find ./* -type f -exec sed -i \
          'N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \
          {} +
        find ./* -type f -exec sed -i \
          'N;N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \
          {} +
        find ./* -type f -exec sed -i \
          'N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \
          {} +
        find ./* -type f -exec sed -i \
          'N;N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \
          {} +
        find ./* -type f -exec sed -i \
          'N;N;N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \
          {} +
        find ./* -type f -exec sed -i \
          'N;N;N;N;N;N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \
          {} +
      
      Indentation fixed up manually afterwards.
      
      Some of the lines were manually edited to reduce the line length to below
      80 charecters. Some of the lines with newlines in the middle of the
      string were also manually edit to avoid checkpatch errrors.
      
      The #include lines were manually updated to allow the code to compile.
      
      Several of the warning messages can be improved after this patch, to
      keep this patch mechanical this has been moved into a later patch.
      Signed-off-by: NAlistair Francis <alistair.francis@xilinx.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Kevin Wolf <kwolf@redhat.com>
      Cc: Max Reitz <mreitz@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Peter Maydell <peter.maydell@linaro.org>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Anthony Perard <anthony.perard@citrix.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Aurelien Jarno <aurelien@aurel32.net>
      Cc: Yongbok Kim <yongbok.kim@imgtec.com>
      Cc: Cornelia Huck <cohuck@redhat.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Alexander Graf <agraf@suse.de>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Gerd Hoffmann <kraxel@redhat.com>
      Acked-by: NCornelia Huck <cohuck@redhat.com>
      Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
      Message-Id: <5def63849ca8f551630c6f2b45bcb1c482f765a6.1505158760.git.alistair.francis@xilinx.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8297be80
    • A
      Convert single line fprintf(.../n) to warn_report() · 2ab4b135
      Alistair Francis 提交于
      Convert all the single line uses of fprintf(stderr, "warning:"..."\n"...
      to use warn_report() instead. This helps standardise on a single
      method of printing warnings to the user.
      
      All of the warnings were changed using this command:
        find ./* -type f -exec sed -i \
          's|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig' \
          {} +
      
      Some of the lines were manually edited to reduce the line length to below
      80 charecters.
      
      The #include lines were manually updated to allow the code to compile.
      Signed-off-by: NAlistair Francis <alistair.francis@xilinx.com>
      Cc: Kevin Wolf <kwolf@redhat.com>
      Cc: Max Reitz <mreitz@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Gerd Hoffmann <kraxel@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Aurelien Jarno <aurelien@aurel32.net>
      Cc: Yongbok Kim <yongbok.kim@imgtec.com>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: James Hogan <james.hogan@imgtec.com> [mips]
      Message-Id: <ae8f8a7f0a88ded61743dff2adade21f8122a9e7.1505158760.git.alistair.francis@xilinx.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2ab4b135
    • A
      hw/i386: Improve some of the warning messages · 9e5d2c52
      Alistair Francis 提交于
      Signed-off-by: NAlistair Francis <alistair.francis@xilinx.com>
      Suggested-by: NEduardo Habkost <ehabkost@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Message-Id: <1d6ef2ccd9667878ed5820fcf17eef35957ea5d8.1505158760.git.alistair.francis@xilinx.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9e5d2c52
    • P
      multiboot: validate multiboot header address values · ed4f86e8
      Prasad J Pandit 提交于
      While loading kernel via multiboot-v1 image, (flags & 0x00010000)
      indicates that multiboot header contains valid addresses to load
      the kernel image. These addresses are used to compute kernel
      size and kernel text offset in the OS image. Validate these
      address values to avoid an OOB access issue.
      
      This is CVE-2017-14167.
      Reported-by: NThomas Garnier <thgarnie@google.com>
      Signed-off-by: NPrasad J Pandit <pjp@fedoraproject.org>
      Message-Id: <20170907063256.7418-1-ppandit@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ed4f86e8
    • I
      pc: use generic cpu_model parsing · 311ca98d
      Igor Mammedov 提交于
      define default CPU type in generic way in pc_machine_class_init()
      and let common machine code to handle cpu_model parsing
      
      Patch also introduces TARGET_DEFAULT_CPU_TYPE define for 2 purposes:
        * make foo_machine_class_init() look uniform on every target
        * use define in [bsd|linux]-user targets to pick default
          cpu type
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
      Message-Id: <1505318697-77161-5-git-send-email-imammedo@redhat.com>
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      311ca98d
  10. 08 9月, 2017 3 次提交
  11. 31 8月, 2017 1 次提交
  12. 23 8月, 2017 1 次提交
  13. 22 8月, 2017 1 次提交
    • T
      hw/ppc/spapr: Fix segfault when instantiating a 'pc-dimm' without 'memdev' · 04790978
      Thomas Huth 提交于
      QEMU currently crashes when trying to use a 'pc-dimm' on the pseries
      machine without specifying its 'memdev' property. This happens because
      pc_dimm_get_memory_region() does not check whether the 'memdev' property
      has properly been set by the user. Looking closer at this function, it's
      also obvious that it is using &error_abort to call another function - and
      this is bad in a function that is used in the hot-plugging calling chain
      since this can also cause QEMU to exit unexpectedly.
      
      So let's fix these issues in a proper way now: Add a "Error **errp"
      parameter to pc_dimm_get_memory_region() which we use in case the 'memdev'
      property has not been set by the user, and which we can use instead of
      the &error_abort, and change the callers of get_memory_region() to make
      use of this "errp" parameter for proper error checking.
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Reviewed-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      04790978
  14. 08 8月, 2017 1 次提交
  15. 02 8月, 2017 4 次提交
  16. 01 8月, 2017 2 次提交
  17. 31 7月, 2017 1 次提交
  18. 22 7月, 2017 2 次提交
    • A
      xen-mapcache: Fix the bug when overlapping emulated DMA operations may cause... · 7fb394ad
      Alexey G 提交于
      xen-mapcache: Fix the bug when overlapping emulated DMA operations may cause inconsistency in guest memory mappings
      
      Under certain circumstances normal xen-mapcache functioning may be broken
      by guest's actions. This may lead to either QEMU performing exit() due to
      a caught bad pointer (and with QEMU process gone the guest domain simply
      appears hung afterwards) or actual use of the incorrect pointer inside
      QEMU address space -- a write to unmapped memory is possible. The bug is
      hard to reproduce on a i440 machine as multiple DMA sources are required
      (though it's possible in theory, using multiple emulated devices), but can
      be reproduced somewhat easily on a Q35 machine using an emulated AHCI
      controller -- each NCQ queue command slot may be used as an independent
      DMA source ex. using READ FPDMA QUEUED command, so a single storage
      device on the AHCI controller port will be enough to produce multiple DMAs
      (up to 32). The detailed description of the issue follows.
      
      Xen-mapcache provides an ability to map parts of a guest memory into
      QEMU's own address space to work with.
      
      There are two types of cache lookups:
       - translating a guest physical address into a pointer in QEMU's address
         space, mapping a part of guest domain memory if necessary (while trying
         to reduce a number of such (re)mappings to a minimum)
       - translating a QEMU's pointer back to its physical address in guest RAM
      
      These lookups are managed via two linked-lists of structures.
      MapCacheEntry is used for forward cache lookups, while MapCacheRev -- for
      reverse lookups.
      
      Every guest physical address is broken down into 2 parts:
          address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
          address_offset = phys_addr & (MCACHE_BUCKET_SIZE - 1);
      
      MCACHE_BUCKET_SHIFT depends on a system (32/64) and is equal to 20 for
      a 64-bit system (which assumed for the further description). Basically,
      this means that we deal with 1 MB chunks and offsets within those 1 MB
      chunks. All mappings are created with 1MB-granularity, i.e. 1MB/2MB/3MB
      etc. Most DMA transfers typically are less than 1MB, however, if the
      transfer crosses any 1MB border(s) - than a nearest larger mapping size
      will be used, so ex. a 512-byte DMA transfer with the start address
      700FFF80h will actually require a 2MB range.
      
      Current implementation assumes that MapCacheEntries are unique for a given
      address_index and size pair and that a single MapCacheEntry may be reused
      by multiple requests -- in this case the 'lock' field will be larger than
      1. On other hand, each requested guest physical address (with 'lock' flag)
      is described by each own MapCacheRev. So there may be multiple MapCacheRev
      entries corresponding to a single MapCacheEntry. The xen-mapcache code
      uses MapCacheRev entries to retrieve the address_index & size pair which
      in turn used to find a related MapCacheEntry. The 'lock' field within
      a MapCacheEntry structure is actually a reference counter which shows
      a number of corresponding MapCacheRev entries.
      
      The bug lies in ability for the guest to indirectly manipulate with the
      xen-mapcache MapCacheEntries list via a special sequence of DMA
      operations, typically for storage devices. In order to trigger the bug,
      guest needs to issue DMA operations in specific order and timing.
      Although xen-mapcache is protected by the mutex lock -- this doesn't help
      in this case, as the bug is not due to a race condition.
      
      Suppose we have 3 DMA transfers, namely A, B and C, where
      - transfer A crosses 1MB border and thus uses a 2MB mapping
      - transfers B and C are normal transfers within 1MB range
      - and all 3 transfers belong to the same address_index
      
      In this case, if all these transfers are to be executed one-by-one
      (without overlaps), no special treatment necessary -- each transfer's
      mapping lock will be set and then cleared on unmap before starting
      the next transfer.
      The situation changes when DMA transfers overlap in time, ex. like this:
      
        |===== transfer A (2MB) =====|
      
                    |===== transfer B (1MB) =====|
      
                                |===== transfer C (1MB) =====|
       time --->
      
      In this situation the following sequence of actions happens:
      
      1. transfer A creates a mapping to 2MB area (lock=1)
      2. transfer B (1MB) tries to find available mapping but cannot find one
         because transfer A is still in progress, and it has 2MB size + non-zero
         lock. So transfer B creates another mapping -- same address_index,
         but 1MB size.
      3. transfer A completes, making 1st mapping entry available by setting its
         lock to 0
      4. transfer C starts and tries to find available mapping entry and sees
         that 1st entry has lock=0, so it uses this entry but remaps the mapping
         to a 1MB size
      5. transfer B completes and by this time
        - there are two locked entries in the MapCacheEntry list with the SAME
          values for both address_index and size
        - the entry for transfer B actually resides farther in list while
          transfer C's entry is first
      6. xen_ram_addr_from_mapcache() for transfer B gets correct address_index
         and size pair from corresponding MapCacheRev entry, but then it starts
         looking for MapCacheEntry with these values and finds the first entry
         -- which belongs to transfer C.
      
      At this point there may be following possible (bad) consequences:
      
      1. xen_ram_addr_from_mapcache() will use a wrong entry->vaddr_base value
         in this statement:
      
         raddr = (reventry->paddr_index << MCACHE_BUCKET_SHIFT) +
             ((unsigned long) ptr - (unsigned long) entry->vaddr_base);
      
      resulting in an incorrent raddr value returned from the function. The
      (ptr - entry->vaddr_base) expression may produce both positive and negative
      numbers and its actual value may differ greatly as there are many
      map/unmap operations take place. If the value will be beyond guest RAM
      limits then a "Bad RAM offset" error will be triggered and logged,
      followed by exit() in QEMU.
      
      2. If raddr value won't exceed guest RAM boundaries, the same sequence
      of actions will be performed for xen_invalidate_map_cache_entry() on DMA
      unmap, resulting in a wrong MapCacheEntry being unmapped while DMA
      operation which uses it is still active. The above example must
      be extended by one more DMA transfer in order to allow unmapping as the
      first mapping in the list is sort of resident.
      
      The patch modifies the behavior in which MapCacheEntry's are added to the
      list, avoiding duplicates.
      Signed-off-by: NAlexey Gerasimenko <x1917x@gmail.com>
      Signed-off-by: NStefano Stabellini <sstabellini@kernel.org>
      7fb394ad
    • I
      9e6bdb92
  19. 19 7月, 2017 4 次提交