1. 20 12月, 2018 3 次提交
    • E
      virtio: Helper for registering virtio device types · a4ee4c8b
      Eduardo Habkost 提交于
      Introduce a helper for registering different flavours of virtio
      devices.  Convert code to use the helper, but keep only the
      existing generic types.  Transitional and non-transitional device
      types will be added by another patch.
      Acked-by: NAndrea Bolognani <abologna@redhat.com>
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      a4ee4c8b
    • C
      pc:piix4: Update smbus I/O space after a migration · 2b4e573c
      Corey Minyard 提交于
      Otherwise it won't be set up correctly and won't work after
      miigration.
      Signed-off-by: NCorey Minyard <cminyard@mvista.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: qemu-stable@nongnu.org
      Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      2b4e573c
    • Z
      pcie: set link state inactive/active after hot unplug/plug · 2f2b18f6
      Zheng Xiang 提交于
      When VM boots from the latest version of linux kernel, after
      hot-unpluging virtio-blk disks which are hotplugged into
      pcie-root-port, the VM's dmesg log shows:
      
      [  151.046242] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0001 from Slot Status
      [  151.046365] pciehp 0000:00:05.0:pcie004: Slot(0-3): Attention button pressed
      [  151.046369] pciehp 0000:00:05.0:pcie004: Slot(0-3): Powering off due to button press
      [  151.046420] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
      [  151.046425] pciehp 0000:00:05.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
      [  151.046464] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
      [  151.046468] pciehp 0000:00:05.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
      [  156.163421] pciehp 0000:00:05.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 2f1
      [  156.163427] pciehp 0000:00:05.0:pcie004: pciehp_unconfigure_device: domain:bus:dev = 0000:06:00
      [  156.198736] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
      [  156.198772] pciehp 0000:00:05.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
      [  157.224124] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0018 from Slot Status
      [  157.224194] pciehp 0000:00:05.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
      [  157.224220] pciehp 0000:00:05.0:pcie004: pciehp_check_link_active: lnk_status = 2011
      [  157.224223] pciehp 0000:00:05.0:pcie004: Slot(0-3): Link Up
      [  157.224233] pciehp 0000:00:05.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 7f1
      [  157.224281] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
      [  157.224285] pciehp 0000:00:05.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0
      [  157.224300] pciehp 0000:00:05.0:pcie004: __pciehp_link_set: lnk_ctrl = 0
      [  157.224336] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
      [  157.224339] pciehp 0000:00:05.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
      [  159.739294] pci 0000:06:00.0 id reading try 50 times with interval 20 ms to get ffffffff
      [  159.739315] pciehp 0000:00:05.0:pcie004: pciehp_check_link_status: lnk_status = 2011
      [  159.739318] pciehp 0000:00:05.0:pcie004: Failed to check link status
      [  159.739371] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
      [  159.739394] pciehp 0000:00:05.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
      [  160.771426] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
      [  160.771452] pciehp 0000:00:05.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
      [  160.771495] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
      [  160.771499] pciehp 0000:00:05.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
      [  160.771535] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
      [  160.771539] pciehp 0000:00:05.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
      
      After analyzing the log information, it seems that qemu doesn't
      change the Link Status from active to inactive after hot-unplug.
      This results in the abnormal log after the linux kernel commit
      d331710ea78fea merged.
      
      Furthermore, If I hotplug the same virtio-blk disk after hot-unplug,
      the virtio-blk would turn on and then back off.
      
      So this patch set the Link Status inactive after hot-unplug and
      active after hot-plug.
      Signed-off-by: NZheng Xiang <zhengxiang9@huawei.com>
      Signed-off-by: NZheng Xiang <xiang.zheng@linaro.org>
      Cc: Wang Haibin <wanghaibin.wang@huawei.com>
      Cc: qemu-stable@nongnu.org
      Reviewed-by: NMarcel Apfelbaum <marcel.apfelbaum@gmail.com>
      Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      2f2b18f6
  2. 19 12月, 2018 2 次提交
    • P
      Merge remote-tracking branch 'remotes/vivier2/tags/trivial-patches-pull-request' into staging · b72566a4
      Peter Maydell 提交于
      Trivial patches (2018-12-18)
      
      # gpg: Signature made Tue 18 Dec 2018 14:28:41 GMT
      # gpg:                using RSA key F30C38BD3F2FBE3C
      # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>"
      # gpg:                 aka "Laurent Vivier <laurent@vivier.eu>"
      # gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>"
      # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C
      
      * remotes/vivier2/tags/trivial-patches-pull-request:
        error: Remove NULL checks on error_propagate() calls
        vl: Use error_fatal to simplify obvious fatal errors (again)
        i386: hvf: drop debug printf in decode_sldtgroup
        docs/devel/build-system: fix 'softmu' typo
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      b72566a4
    • P
      Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2018-12-18' into staging · adf02c44
      Peter Maydell 提交于
      QAPI patches for 2018-12-18
      
      # gpg: Signature made Tue 18 Dec 2018 07:20:11 GMT
      # gpg:                using RSA key 3870B400EB918653
      # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
      # gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"
      # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653
      
      * remotes/armbru/tags/pull-qapi-2018-12-18:
        qapi: fix flat union on uncovered branches conditionals
        qmp hmp: Make system_wakeup check wake-up support and run state
        qga: update guest-suspend-ram and guest-suspend-hybrid descriptions
        qmp: query-current-machine with wakeup-suspend-support
        qmp: Split ShutdownCause host-qmp into quit and system-reset
        qmp: Add reason to SHUTDOWN and RESET events
        qapi: Turn ShutdownCause into QAPI enum
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      adf02c44
  3. 18 12月, 2018 13 次提交
  4. 17 12月, 2018 22 次提交
    • P
      tests/bios-tables-test: Sanitize test verbose output · fe17cca6
      Philippe Mathieu-Daudé 提交于
      Fix the extraneous extra blank lines in the test output when running with V=1.
      
      Before:
      
          TEST: tests/bios-tables-test... (pid=25678)
            /i386/acpi/piix4:
          Looking for expected file 'tests/acpi-test-data/pc/DSDT'
      
          Using expected file 'tests/acpi-test-data/pc/DSDT'
      
          Looking for expected file 'tests/acpi-test-data/pc/FACP'
      
          Using expected file 'tests/acpi-test-data/pc/FACP'
      
          Looking for expected file 'tests/acpi-test-data/pc/APIC'
      
          Using expected file 'tests/acpi-test-data/pc/APIC'
      
          Looking for expected file 'tests/acpi-test-data/pc/HPET'
      
          Using expected file 'tests/acpi-test-data/pc/HPET'
          OK
      
      After:
      
          TEST: tests/bios-tables-test... (pid=667)
            /i386/acpi/piix4:
          Looking for expected file 'tests/acpi-test-data/pc/DSDT'
          Using expected file 'tests/acpi-test-data/pc/DSDT'
          Looking for expected file 'tests/acpi-test-data/pc/FACP'
          Using expected file 'tests/acpi-test-data/pc/FACP'
          Looking for expected file 'tests/acpi-test-data/pc/APIC'
          Using expected file 'tests/acpi-test-data/pc/APIC'
          Looking for expected file 'tests/acpi-test-data/pc/HPET'
          Using expected file 'tests/acpi-test-data/pc/HPET'
          OK
      Suggested-by: NPeter Maydell <peter.maydell@linaro.org>
      Signed-off-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
      Reviewed-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      fe17cca6
    • I
      tests: acpi: remove not used ACPI_READ_GENERIC_ADDRESS macro · da15af64
      Igor Mammedov 提交于
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
      [thuth: Fixed conflicts with additional "qts" parameter]
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      da15af64
    • R
      tests: Exit boot-serial-test loop if child dies · 21f80286
      Richard Henderson 提交于
      There's no point in waiting 5 full minutes when there will be
      no more output.  Compute timeout based on elapsed wall clock
      time instead of N * delays, as the delay is a minimum sleep time.
      
      Cc: Thomas Huth <thuth@redhat.com>
      Cc: Laurent Vivier <lvivier@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
      Reviewed-by: NWainer dos Santos Moschetta <wainersm@redhat.com>
      [thuth: Replaced global_qtest with local qts variable]
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      21f80286
    • T
      tests/pxe: Make test independent of global_qtest · 43497c43
      Thomas Huth 提交于
      global_qtest is not really required here, since boot_sector_test()
      is already independent from that global variable.
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      43497c43
    • T
      tests/prom-env: Make test independent of global_qtest · dc4c1587
      Thomas Huth 提交于
      global_qtest is only needed here for one readl(). Let's replace it
      with qtest_readl() and we can remove the global_qtest variable here.
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      dc4c1587
    • T
      tests/machine-none: Make test independent of global_qtest · ed398a12
      Thomas Huth 提交于
      Apart from using qmp() in one spot, this test does not have any
      dependencies to the global_qtest variable, so we can simply get
      rid of it here by replacing the qmp() with qtest_qmp().
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      ed398a12
    • T
      tests/test-filter: Make tests independent of global_qtest · a2569b00
      Thomas Huth 提交于
      Apart from using qmp() in the qmp_discard_response() macro, these
      tests do not have any dependencies to the global_qtest variable,
      so we can simply get rid of it here by replacing the qmp() with
      qtest_qmp() in the macro.
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      a2569b00
    • T
      tests/boot-serial: Get rid of global_qtest variable · e6426b74
      Thomas Huth 提交于
      The test does not use any of the functions that require global_qtest,
      so we can simply get rid of this global variable here.
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      e6426b74
    • T
      tests/pvpanic: Make the pvpanic test independent of global_qtest · 791a289b
      Thomas Huth 提交于
      We want to get rid of global_qtest in the long run, thus do not
      use the wrappers like inb() and outb() here anymore.
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      791a289b
    • T
      tests/vmgenid: Make test independent of global_qtest · ac16ab75
      Thomas Huth 提交于
      The biggest part has already been done in the previous patch, we now
      only have to replace some few qmp() and readb() calls with the
      corresponding qtest_*() functions to get there.
      Acked-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      ac16ab75
    • E
      tests/acpi-utils: Drop dependence on global_qtest · 273e3d92
      Eric Blake 提交于
      As a general rule, we prefer avoiding implicit global state
      because it makes code harder to safely copy and paste without
      thinking about the global state.  Adjust the helper code to
      use explicit state instead, and update all callers.
      
      bios-tables-test no longer depends on global_qtest, now that it
      passes explicit state through the testsuite data; an assert
      proves this fact (although we will get rid of it later, once
      global_qtest is gone).
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Acked-by: NIgor Mammedov <imammedo@redhat.com>
      Tested-by: NIgor Mammedov <imammedo@redhat.com>
      [thuth: adapted patch to current master branch]
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      273e3d92
    • E
      ivshmem-test: Drop dependence on global_qtest · 24c01ffa
      Eric Blake 提交于
      Managing parallel connections to two different monitors via
      the implicit global_qtest makes it hard to copy-and-paste code
      to tests that are not aware of the implicit state.  Since we
      have already fixed qpci to avoid global_qtest, we can now
      simplify by not using global_qtest anywhere in ivshmem-test.
      
      We can assert that the conversion is correct by checking that
      global_qtest remains NULL throughout the test (a later patch
      that changes global_qtest to not be a public global variable
      will drop the assertions).
      Signed-off-by: NEric Blake <eblake@redhat.com>
      [thuth: Dropped the changes to test_ivshmem_hotplug() - will be fixed later]
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      24c01ffa
    • T
      tests/libqos/pci: Make PCI access functions independent of global_qtest · d786f782
      Thomas Huth 提交于
      QPCIBus already tracks QTestState, so use that state instead of an
      implicit reliance on global_qtest.
      
      Based on an earlier patch ("libqos: Use explicit QTestState for pci
      operations") from Eric Blake.
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      d786f782
    • P
      Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20181216' into staging · f1634485
      Peter Maydell 提交于
      - Remove retranslation remenents
      - Return success from patch_reloc
      - Preserve 32-bit values as zero-extended on x86_64
      - Make bswap during memory ops as optional
      - Cleanup xxhash
      - Revert constant pooling for tcg/sparc/
      
      # gpg: Signature made Mon 17 Dec 2018 03:25:21 GMT
      # gpg:                using RSA key 64DF38E8AF7E215F
      # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>"
      # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F
      
      * remotes/rth/tags/pull-tcg-20181216: (33 commits)
        xxhash: match output against the original xxhash32
        include: move exec/tb-hash-xx.h to qemu/xxhash.h
        exec: introduce qemu_xxhash{2,4,5,6,7}
        qht-bench: document -p flag
        tcg: Drop nargs from tcg_op_insert_{before,after}
        tcg/mips: Improve the add2/sub2 command to use TCG_TARGET_REG_BITS
        tcg: Add TCG_TARGET_HAS_MEMORY_BSWAP
        tcg/optimize: Optimize bswap
        tcg: Clean up generic bswap64
        tcg: Clean up generic bswap32
        tcg/i386: Add setup_guest_base_seg for FreeBSD
        tcg/i386: Precompute all guest_base parameters
        tcg/i386: Assume 32-bit values are zero-extended
        tcg/i386: Implement INDEX_op_extr{lh}_i64_i32 for 32-bit guests
        tcg/i386: Propagate is64 to tcg_out_qemu_ld_slow_path
        tcg/i386: Propagate is64 to tcg_out_qemu_ld_direct
        tcg/s390x: Return false on failure from patch_reloc
        tcg/ppc: Return false on failure from patch_reloc
        tcg/arm: Return false on failure from patch_reloc
        tcg/aarch64: Return false on failure from patch_reloc
        ...
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      f1634485
    • A
      .shippable.yml: disable the win cross tests · 139108f6
      Alex Bennée 提交于
      The pkg.mxe.cc package repositories have been down for the last two
      weeks causing the builds to fail when shippable re-builds the
      containers.
      
      This is really just a sticking plaster until we can get our own docker
      hub images properly setup so we can avoid having dependencies on
      external repos.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Acked-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
      Message-id: 20181214151718.5041-1-alex.bennee@linaro.org
      Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      139108f6
    • E
      hardfloat: implement float32/64 comparison · d9fe9db9
      Emilio G. Cota 提交于
      Performance results for fp-bench:
      
      Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
      - before:
      cmp-single: 110.98 MFlops
      cmp-double: 107.12 MFlops
      - after:
      cmp-single: 506.28 MFlops
      cmp-double: 524.77 MFlops
      
      Note that flattening both eq and eq_signaling versions
      would give us extra performance (695v506, 615v524 Mflops
      for single/double, respectively) but this would emit two
      essentially identical functions for each eq/signaling pair,
      which is a waste.
      
      Aggregate performance improvement for the last few patches:
      [ all charts in png: https://imgur.com/a/4yV8p ]
      
      1. Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
      
                         qemu-aarch64 NBench score; higher is better
                       Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
      
        16 +-+-----------+-------------+----===-------+---===-------+-----------+-+
        14 +-+..........................@@@&&.=.......@@@&&.=...................+-+
        12 +-+..........................@.@.&.=.......@.@.&.=.....+befor===     +-+
        10 +-+..........................@.@.&.=.......@.@.&.=.....+ad@@&& =     +-+
         8 +-+.......................$$$%.@.&.=.......@.@.&.=.....+  @@u& =     +-+
         6 +-+............@@@&&=+***##.$%.@.&.=***##$$%+@.&.=..###$$%%@i& =     +-+
         4 +-+.......###$%%.@.&=.*.*.#.$%.@.&.=*.*.#.$%.@.&.=+**.#+$ +@m& =     +-+
         2 +-+.....***.#$.%.@.&=.*.*.#.$%.@.&.=*.*.#.$%.@.&.=.**.#+$+sqr& =     +-+
         0 +-+-----***##$%%@@&&=-***##$$%@@&&==***##$$%@@&&==-**##$$%+cmp==-----+-+
                  FOURIER    NEURAL NELU DECOMPOSITION         gmean
      
                                    qemu-aarch64 SPEC06fp (test set) speedup over QEMU 4c2c1015
                                            Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
                                                  error bars: 95% confidence interval
      
        4.5 +-+---+-----+----+-----+-----+-&---+-----+----+-----+-----+-----+----+-----+-----+-----+-----+----+-----+---+-+
          4 +-+..........................+@@+...........................................................................+-+
        3.5 +-+..............%%@&.........@@..............%%@&............................................+++dsub       +-+
        2.5 +-+....&&+.......%%@&.......+%%@..+%%&+..@@&+.%%@&....................................+%%&+.+%@&++%%@&      +-+
          2 +-+..+%%&..+%@&+.%%@&...+++..%%@...%%&.+$$@&..%%@&..%%@&.......+%%&+.%%@&+......+%%@&.+%%&++$$@&++d%@&  %%@&+-+
        1.5 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&*+f%@&**$%@&+-+
        0.5 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&+sqr@&**$%@&+-+
          0 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&*+cmp&**$%@&+-+
        410.bw416.gam433.434.z435.436.cac437.lesli444.447.de450.so453454.ca459.GemsF465.tont470.lb4482.sphinxgeomean
      
      2. Host: ARM Aarch64 A57 @ 2.4GHz
      
                          qemu-aarch64 NBench score; higher is better
                       Host: Applied Micro X-Gene, Aarch64 A57 @ 2.4 GHz
      
          5 +-+-----------+-------------+-------------+-------------+-----------+-+
        4.5 +-+........................................@@@&==...................+-+
        3 4 +-+..........................@@@&==........@.@&.=.....+before       +-+
          3 +-+..........................@.@&.=........@.@&.=.....+ad@@@&==     +-+
        2.5 +-+.....................##$$%%.@&.=........@.@&.=.....+  @m@& =     +-+
          2 +-+............@@@&==.***#.$.%.@&.=.***#$$%%.@&.=.***#$$%%d@& =     +-+
        1.5 +-+.....***#$$%%.@&.=.*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#+$ +f@& =     +-+
        0.5 +-+.....*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#+$+sqr& =     +-+
          0 +-+-----***#$$%%@@&==-***#$$%%@@&==-***#$$%%@@&==-***#$$%+cmp==-----+-+
                   FOURIER    NEURAL NLU DECOMPOSITION         gmean
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      d9fe9db9
    • E
      hardfloat: implement float32/64 square root · f131bae8
      Emilio G. Cota 提交于
      Performance results for fp-bench:
      
      Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
      - before:
      sqrt-single: 42.30 MFlops
      sqrt-double: 22.97 MFlops
      - after:
      sqrt-single: 311.42 MFlops
      sqrt-double: 311.08 MFlops
      
      Here USE_FP makes a huge difference for f64's, with throughput
      going from ~200 MFlops to ~300 MFlops.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      f131bae8
    • E
      hardfloat: implement float32/64 fused multiply-add · ccf770ba
      Emilio G. Cota 提交于
      Performance results for fp-bench:
      
      1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
      - before:
      fma-single: 74.73 MFlops
      fma-double: 74.54 MFlops
      - after:
      fma-single: 203.37 MFlops
      fma-double: 169.37 MFlops
      
      2. ARM Aarch64 A57 @ 2.4GHz
      - before:
      fma-single: 23.24 MFlops
      fma-double: 23.70 MFlops
      - after:
      fma-single: 66.14 MFlops
      fma-double: 63.10 MFlops
      
      3. IBM POWER8E @ 2.1 GHz
      - before:
      fma-single: 37.26 MFlops
      fma-double: 37.29 MFlops
      - after:
      fma-single: 48.90 MFlops
      fma-double: 59.51 MFlops
      
      Here having 3FP64 set to 1 pays off for x86_64:
      [1] 170.15 vs [0] 153.12 MFlops
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      ccf770ba
    • E
      hardfloat: implement float32/64 division · 4a629561
      Emilio G. Cota 提交于
      Performance results for fp-bench:
      
      1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
      - before:
      div-single: 34.84 MFlops
      div-double: 34.04 MFlops
      - after:
      div-single: 275.23 MFlops
      div-double: 216.38 MFlops
      
      2. ARM Aarch64 A57 @ 2.4GHz
      - before:
      div-single: 9.33 MFlops
      div-double: 9.30 MFlops
      - after:
      div-single: 51.55 MFlops
      div-double: 15.09 MFlops
      
      3. IBM POWER8E @ 2.1 GHz
      - before:
      div-single: 25.65 MFlops
      div-double: 24.91 MFlops
      - after:
      div-single: 96.83 MFlops
      div-double: 31.01 MFlops
      
      Here setting 2FP64_USE_FP to 1 pays off for x86_64:
      [1] 215.97 vs [0] 62.15 MFlops
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      4a629561
    • E
      hardfloat: implement float32/64 multiplication · 2dfabc86
      Emilio G. Cota 提交于
      Performance results for fp-bench:
      
      1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
      - before:
      mul-single: 126.91 MFlops
      mul-double: 118.28 MFlops
      - after:
      mul-single: 258.02 MFlops
      mul-double: 197.96 MFlops
      
      2. ARM Aarch64 A57 @ 2.4GHz
      - before:
      mul-single: 37.42 MFlops
      mul-double: 38.77 MFlops
      - after:
      mul-single: 73.41 MFlops
      mul-double: 76.93 MFlops
      
      3. IBM POWER8E @ 2.1 GHz
      - before:
      mul-single: 58.40 MFlops
      mul-double: 59.33 MFlops
      - after:
      mul-single: 60.25 MFlops
      mul-double: 94.79 MFlops
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      2dfabc86
    • E
      hardfloat: implement float32/64 addition and subtraction · 1b615d48
      Emilio G. Cota 提交于
      Performance results (single and double precision) for fp-bench:
      
      1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
      - before:
      add-single: 135.07 MFlops
      add-double: 131.60 MFlops
      sub-single: 130.04 MFlops
      sub-double: 133.01 MFlops
      - after:
      add-single: 443.04 MFlops
      add-double: 301.95 MFlops
      sub-single: 411.36 MFlops
      sub-double: 293.15 MFlops
      
      2. ARM Aarch64 A57 @ 2.4GHz
      - before:
      add-single: 44.79 MFlops
      add-double: 49.20 MFlops
      sub-single: 44.55 MFlops
      sub-double: 49.06 MFlops
      - after:
      add-single: 93.28 MFlops
      add-double: 88.27 MFlops
      sub-single: 91.47 MFlops
      sub-double: 88.27 MFlops
      
      3. IBM POWER8E @ 2.1 GHz
      - before:
      add-single: 72.59 MFlops
      add-double: 72.27 MFlops
      sub-single: 75.33 MFlops
      sub-double: 70.54 MFlops
      - after:
      add-single: 112.95 MFlops
      add-double: 201.11 MFlops
      sub-single: 116.80 MFlops
      sub-double: 188.72 MFlops
      
      Note that the IBM and ARM machines benefit from having
      HARDFLOAT_2F{32,64}_USE_FP set to 0. Otherwise their performance
      can suffer significantly:
      - IBM Power8:
      add-single: [1] 54.94 vs [0] 116.37 MFlops
      add-double: [1] 58.92 vs [0] 201.44 MFlops
      - Aarch64 A57:
      add-single: [1] 80.72 vs [0] 93.24 MFlops
      add-double: [1] 82.10 vs [0] 88.18 MFlops
      
      On the Intel machine, having 2F64 set to 1 pays off, but it
      doesn't for 2F32:
      - Intel i7-6700K:
      add-single: [1] 285.79 vs [0] 426.70 MFlops
      add-double: [1] 302.15 vs [0] 278.82 MFlops
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      1b615d48
    • E
      fpu: introduce hardfloat · a94b7839
      Emilio G. Cota 提交于
      The appended paves the way for leveraging the host FPU for a subset
      of guest FP operations. For most guest workloads (e.g. FP flags
      aren't ever cleared, inexact occurs often and rounding is set to the
      default [to nearest]) this will yield sizable performance speedups.
      
      The approach followed here avoids checking the FP exception flags register.
      See the added comment for details.
      
      This assumes that QEMU is running on an IEEE754-compliant FPU and
      that the rounding is set to the default (to nearest). The
      implementation-dependent specifics of the FPU should not matter; things
      like tininess detection and snan representation are still dealt with in
      soft-fp. However, this approach will break on most hosts if we compile
      QEMU with flags that break IEEE compatibility. There is no way to detect
      all of these flags at compilation time, but at least we check for
      -ffast-math (which defines __FAST_MATH__) and disable hardfloat
      (plus emit a #warning) when it is set.
      
      This patch just adds common code. Some operations will be migrated
      to hardfloat in subsequent patches to ease bisection.
      
      Note: some architectures (at least PPC, there might be others) clear
      the status flags passed to softfloat before most FP operations. This
      precludes the use of hardfloat, so to avoid introducing a performance
      regression for those targets, we add a flag to disable hardfloat.
      In the long run though it would be good to fix the targets so that
      at least the inexact flag passed to softfloat is indeed sticky.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      a94b7839