1. 28 10月, 2016 12 次提交
    • T
      sparc: Use the new common NVRAM functions for system and free space partition · 2024c014
      Thomas Huth 提交于
      The system and free space NVRAM partitions (for OpenBIOS) are created
      in exactly the same way as the Mac-style CHRP NVRAM partitions, so we
      can use the new common helper functions to do this job here, too.
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Tested-by: NMark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      2024c014
    • T
      nvram: Introduce helper functions for CHRP "system" and "free space" partitions · 55d9950a
      Thomas Huth 提交于
      The "system partition" and "free space" partition layouts are
      defined by the CHRP and LoPAPR specification, and used by
      OpenBIOS and SLOF. We can re-use this code for other machines
      that use OpenBIOS and SLOF, too. So let's make this code independent
      from the MAC NVRAM environment and put it into two proper helper
      functions.
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Tested-by: NMark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      55d9950a
    • M
      spapr_pci: advertise explicit numa IDs even when there's 1 node · 4bcfa56c
      Michael Roth 提交于
      With the addition of "numa_node" properties for PHBs we began
      advertising NUMA affinity in cases where nb_numa_nodes > 1.
      
      Since the default on the guest side is to make no assumptions about
      PHB NUMA affinity (defaulting to -1), there is still a valid use-case
      for explicitly defining a PHB's NUMA affinity even when there's just
      one node. In particular, some workloads make faulty assumptions about
      /sys/bus/pci/<devid>/numa_node being >= 0, warranting the use of
      this property as a workaround even if there's just 1 PHB or NUMA
      node.
      
      Enable this use-case by always advertising the PHB's NUMA affinity
      if "numa_node" has been explicitly set.
      
      We could achieve this by relaxing the check to simply be
      nb_numa_nodes > 0, but even safer would be to check
      numa_info[nodeid].present explicitly, and to fail at start time
      for cases where it does not exist.
      
      This has an additional affect of no longer advertising PHB NUMA
      affinity unconditionally if nb_numa_nodes > 1 and "numa_node"
      property is unset/-1, but since the default value on the guest
      side for each PHB is also -1, the behavior should be the same for
      that situation. We could still retain the old behavior if desired,
      but the decision seems arbitrary, so we take the simpler route.
      
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Shivaprasad G. Bhat <shivapbh@in.ibm.com>
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      4bcfa56c
    • L
      tests: enable virtio tests on SPAPR · 30ca440e
      Laurent Vivier 提交于
      but disable MSI-X tests on SPAPR as we can't check the result
      (the memory region used on PC is not readable on SPAPR).
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      30ca440e
    • L
      tests: use qtest_pc_boot()/qtest_shutdown() in virtio tests · a980f7f2
      Laurent Vivier 提交于
      This patch replaces calls to qtest_start() and qtest_end() by
      calls to qtest_pc_boot() and qtest_shutdown().
      
      This allows to initialize memory allocator and PCI interface
      functions. This will ease to enable virtio tests on other
      architectures by only adding a specific qtest_XXX_boot() (like
      qtest_spapr_boot()).
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      a980f7f2
    • L
      tests: rename target_big_endian() as qvirtio_is_big_endian() · 8b4b80c3
      Laurent Vivier 提交于
      Move the definition to libqos/virtio.h as it must be used
      only with virtio functions.
      
      Add a QVirtioDevice parameter as it will be needed to
      know if the virtio device is using virtio 1.0 specification
      and thus is always little-endian (to do)
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      8b4b80c3
    • L
      tests: move QVirtioBus pointer into QVirtioDevice · 6b9cdf4c
      Laurent Vivier 提交于
      This allows to not have to pass bus and device for every virtio functions.
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      [dwg: Fix style nit]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      6b9cdf4c
    • L
      tests: don't check if qtest_spapr_boot() returns NULL · 458f3b2c
      Laurent Vivier 提交于
      qtest_spapr_boot()/qtest_pc_boot()/qtest_boot() call qtest_vboot()
      and qtest_vboot() calls g_malloc(),
      and g_malloc() never fails:
      if memory allocation fails, the application is terminated.
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      458f3b2c
    • L
      tests: fix memory leak in virtio-scsi-test · f62e0bbb
      Laurent Vivier 提交于
      vs is allocated in qvirtio_scsi_pci_init() and never freed.
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      f62e0bbb
    • B
      ppc/xics: Add xics to the monitor "info pic" command · b1fc72f0
      Benjamin Herrenschmidt 提交于
      Useful to debug interrupt problems.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [clg: - updated for qemu-2.7
            - added a test on ->irqs as it is not necessarily allocated
              (PHB3_MSI)
            - removed static variable g_xics and replace with a loop on all
              children to find the xics objects.
            - rebased on InterruptStatsProvider interface ]
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      b1fc72f0
    • A
      pseries: Update SLOF firmware image to 20161019 · f77d4ff8
      Alexey Kardashevskiy 提交于
      The main changes are:
      * virtio-serial
      * booting speed imrovement
      * better PCI bridge support
      
      The complete changelog is:
        > virtio-serial: Fix compile error
        > scsi: Remove debug functions from scsi-loader.fs
        > scsi: Remove unused read-6 command
        > obp-tftp: Remove the ciregs-buffer
        > libnet: Simplify the net-load arguments passing
        > libnet: Simplify the Forth-to-C wrapper of ping()
        > Do not link libnet to net-snk anymore, and remove net-snk from board-qemu
        > Add a Forth-to-C wrapper for the ping command, too
        > Link libnet code to Paflof and add a wrapper for netboot()
        > Remember execution tokens of "write" and "read" for socket operations
        > Add virtio-serial device support
        > Generalize output banner write routine
        > Improve indentation in OF.fs
        > scsi: implement READ (16) command
        > rtas: Improve rtas-do-config-@ and rtas-do-config-! a little bit
        > libnet: Make netapps.h includable from .code files
        > libnet: Remove unused prototypes from netapps.h
        > libnet: Fix the printout of the ping command
        > libnet: Make sure to close sockets when we're done
        > scsi: implement read-capacity-16
        > pci: Fix secondary and subordinate PCI bus enumeration with board-qemu
        > pci-phb: Fix stack underflow in phb-pci-walk-bridge
        > paflof: Add a read() function to read keyboard input
        > paflof: Add socket(), send() and recv() functions to paflof
        > paflof: Provide get_timer() and set_timer() helper functions
        > paflof: Add a write_mm_log helper function
        > paflof: Copy sbrk code from net-snk
        > paflof: Use CFLAGS from make.rules instead of completely redefining them
        > Do not include the FCode evaluator by default anymore
        > Source code beautification of board-qemu/slof/pci-interrupts.fs
        > Allow PCI devices in PCI bridge slots greater than 4
        > Fix bad interrupt pin numbering in interrupt-map property of PCI bridges
        > Improve SLOF_alloc_mem_aligned()
        > instance: Fix set-my-args for empty arguments
        > Fix remaining compiler warnings in sloffs.c
        > Remove misleading padding fields from ROM header definition
        > Improve indentation in calculatecrc.h
        > Do not include calculatecrc.h from assembler files
        > Remove unused defines in calculatecrc.h
        > libnet: Re-initialize global variables at the beginning of tftp()
        > Remove dependency on cpu/@0 for booting
        > usb: Set XHCI slot speed according to port status
        > usb: Build correct route string for USB3 devices behind a hub
        > usb: Initialize USB3 devices on a hub and keep track of hub topology
        > usb: Increase amount of maximum slot IDs and add a sanity check
        > usb: Move XHCI port state arrays from header to .c file
        > tools: add copy functionality
        > tools: added support to sloffs to read from /dev/slof_flash
        > tools: added file append functionality
        > tools: use crc checking code from romfs/tools
        > tools: added initial version of sloffs
        > romfs: factored out crc code, to make it usable from other locations
        > tools: remove unused parts from the Makefile
        > usb-hid: Fix non-working comma key
        > fat-files: Fix access to FAT32 dir/files when cluster > 16-bits
        > virtio-net: fix ring handling in receive
        > net: Remove remainders of the MTFTP code
        > net: Move also files from clients/net-snk/app/netapps/ to lib/libnet/
        > net: Move files from clients/net-snk/app/netlib/ to lib/libnet/
        > net-snk: Get rid of netlib and netapps prefixes in include statements
        > usb-xhci: assign field4 before conditional
        > Improve F12 key handling in boot menu
        > Fix stack underflow that occurs with duplicated ESC in input
        > rtas-nvram: optimize erase
        > ipv6: Replace magic number 1500 with ETH_MTU_SIZE (i.e. 1518)
        > ipv6: Fix NULL pointer dereference in ip6addr_add()
        > ipv6: Fix memory leak in set_ipv6_address() / ip6_create_ll_address()
        > ipv6: Clear memory after malloc if necessary
        > ipv6: Fix possible NULL-pointer dereference in send_ipv6()
        > ping: use gateway address for routing
        > ping: add netmask in the ping argument
        > xhci: fix missing keys from keyboard
        > xhci: add memory barrier after filling the trb
        > loaders: Remove netflash command
        > boot: Remove legacy Forth words for network loading
        > base: Move cnt-bits and bcd-to-bin to board-js2x folder
        > base: Move huge-tftp-load variable to obp-tftp package
        > base: Remove unused IP address conversion functions
        > virtio: White space cleanup in virtio-9p.c
        > virtio: Add modern version 1.0 support to 9p driver
        > virtio: Set a proper name for virtio-9p device tree nodes
        > pci: Fix mistype in "unkown-bridge"
        > ipv6: Indent code with tabs, not with spaces
        > ipv6: send_ipv6() has to return after doing NDP
        > ipv6: Do not use unitialized MAC address array
        > ipv6: Add support for sending packets through a router
        > Remove unused sms code.
        > virtio-net: initialize to populate mac address
        > libbootmsg: Do not use '\b' characters when printing checkpoints
        > dev-null: The "read" function has to return 0 if nothing has been read
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      f77d4ff8
    • P
      Merge remote-tracking branch 'remotes/kraxel/tags/pull-audio-20161027-1' into staging · 835f3d24
      Peter Maydell 提交于
      audio: intel-hda: check stream entry count during transfer
      
      # gpg: Signature made Thu 27 Oct 2016 15:30:51 BST
      # gpg:                using RSA key 0x4CB6D8EED3E87138
      # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
      # gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
      # gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
      # Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138
      
      * remotes/kraxel/tags/pull-audio-20161027-1:
        audio: intel-hda: check stream entry count during transfer
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      835f3d24
  2. 27 10月, 2016 3 次提交
    • P
      Merge remote-tracking branch 'remotes/rth/tags/pull-atomic-20161026' into staging · 5929d7e8
      Peter Maydell 提交于
      cmpxchg emulation of atomics, v8
      
      # gpg: Signature made Wed 26 Oct 2016 16:30:03 BST
      # gpg:                using RSA key 0xAD1270CC4DD0279B
      # gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
      # gpg:                 aka "Richard Henderson <rth@redhat.com>"
      # gpg:                 aka "Richard Henderson <rth@twiddle.net>"
      # Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B
      
      * remotes/rth/tags/pull-atomic-20161026: (37 commits)
        target-alpha: Emulate LL/SC using cmpxchg helpers
        target-alpha: Introduce MMU_PHYS_IDX
        target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}
        linux-user: remove handling of aarch64's EXCP_STREX
        linux-user: remove handling of ARM's EXCP_STREX
        target-arm: emulate aarch64's LL/SC using cmpxchg helpers
        target-arm: emulate SWP with atomic_xchg helper
        target-arm: emulate LL/SC using cmpxchg helpers
        target-arm: Rearrange aa32 load and store functions
        tests: add atomic_add-bench
        target-i386: remove helper_lock()
        target-i386: emulate XCHG using atomic helper
        target-i386: emulate LOCK'ed BTX ops using atomic helpers
        target-i386: emulate LOCK'ed XADD using atomic helper
        target-i386: emulate LOCK'ed NEG using cmpxchg helper
        target-i386: emulate LOCK'ed NOT using atomic helper
        target-i386: emulate LOCK'ed INC using atomic helper
        target-i386: emulate LOCK'ed OP instructions using atomic helpers
        target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
        tcg: Emit barriers with parallel_cpus
        ...
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      5929d7e8
    • P
      Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging · 8f9d84df
      Peter Maydell 提交于
      # gpg: Signature made Wed 26 Oct 2016 03:19:06 BST
      # gpg:                using RSA key 0xEF04965B398D6211
      # gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
      # gpg: WARNING: This key is not certified with sufficiently trusted signatures!
      # gpg:          It is not certain that the signature belongs to the owner.
      # Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211
      
      * remotes/jasowang/tags/net-pull-request:
        colo-proxy: fix memory leak
        net: rtl8139: limit processing of ring descriptors
        net: vmxnet: initialise local tx descriptor
        e1000e: Don't zero out buffer address in rx descriptor
        net: rocker: set limit to DMA buffer size
        net: eepro100: fix memory leak in device uninit
        tap-bsd: OpenBSD uses tap(4) now
        net: pcnet: fix source formatting and indentation
        net: pcnet: check rx/tx descriptor ring length
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      8f9d84df
    • P
      Merge remote-tracking branch 'remotes/vivier/tags/m68k-part1-pull-request' into staging · 991a97ac
      Peter Maydell 提交于
      # gpg: Signature made Tue 25 Oct 2016 19:58:46 BST
      # gpg:                using RSA key 0xF30C38BD3F2FBE3C
      # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>"
      # gpg:                 aka "Laurent Vivier <laurent@vivier.eu>"
      # gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>"
      # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C
      
      * remotes/vivier/tags/m68k-part1-pull-request: (23 commits)
        target-m68k: Optimize gen_flush_flags
        target-m68k: Optimize some comparisons
        target-m68k: Use setcond for scc
        target-m68k: Introduce DisasCompare
        target-m68k: Reorg flags handling
        target-m68k: Remove incorrect clearing of cc_x
        target-m68k: Some fixes to SR and flags management
        target-m68k: Print flags properly
        target-m68k: update CPU flags management
        target-m68k: don't update cc_dest in helpers
        target-m68k: update move to/from ccr/sr
        target-m68k: remove m68k_cpu_exec_enter() and m68k_cpu_exec_exit()
        target-m68k: Replace helper_xflag_lt with setcond
        target-m68k: allow to update flags with operation on words and bytes
        target-m68k: REG() macro cleanup
        target-m68k: set PAGE_BITS to 12 for m68k
        target-m68k: define operand sizes
        target-m68k: set disassembler mode to 680x0 or coldfire
        target-m68k: introduce read_imXX() functions
        target-m68k: manage scaled index
        ...
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      991a97ac
  3. 26 10月, 2016 25 次提交
    • R
      target-alpha: Emulate LL/SC using cmpxchg helpers · ed283916
      Richard Henderson 提交于
      Emulating LL/SC with cmpxchg is not correct, since it can
      suffer from the ABA problem.  However, portable parallel
      code is written assuming only cmpxchg which means that in
      practice this is a viable alternative.
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      ed283916
    • R
      target-alpha: Introduce MMU_PHYS_IDX · 6a73ecf5
      Richard Henderson 提交于
      Rather than using helpers for physical accesses, use a mmu index.
      The primary cleanup is with store-conditional on physical addresses.
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      6a73ecf5
    • E
      target-arm: remove EXCP_STREX + cpu_exclusive_{test, info} · 05188cc7
      Emilio G. Cota 提交于
      The exception is not emitted anymore; remove it and the associated
      TCG variables.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      Message-Id: <1467054136-10430-31-git-send-email-cota@braap.org>
      05188cc7
    • E
      linux-user: remove handling of aarch64's EXCP_STREX · f4e6eb7f
      Emilio G. Cota 提交于
      The exception is not emitted anymore.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      Message-Id: <1467054136-10430-30-git-send-email-cota@braap.org>
      f4e6eb7f
    • E
      linux-user: remove handling of ARM's EXCP_STREX · b50b82fc
      Emilio G. Cota 提交于
      The exception is not emitted anymore.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twidle.net>
      Message-Id: <1467054136-10430-29-git-send-email-cota@braap.org>
      b50b82fc
    • E
      target-arm: emulate aarch64's LL/SC using cmpxchg helpers · 1dd089d0
      Emilio G. Cota 提交于
      Emulating LL/SC with cmpxchg is not correct, since it can
      suffer from the ABA problem. Portable parallel code, however,
      is written assuming only cmpxchg--and not LL/SC--is available.
      This means that in practice emulating LL/SC with cmpxchg is
      a viable alternative.
      
      The appended emulates LL/SC pairs in aarch64 with cmpxchg helpers.
      This works in both user and system mode. In usermode, it avoids
      pausing all other CPUs to perform the LL/SC pair. The subsequent
      performance and scalability improvement is significant, as the
      plots below show. They plot the throughput of atomic_add-bench
      compiled for ARM and executed on a 64-core x86 machine.
      
      Hi-res plots: http://imgur.com/a/JVc8Y
      
                      atomic_add-bench: 1000000 ops/thread, [0,1] range
      
        18 ++---------+----------+---------+----------+----------+----------+---++
           +cmpxchg +-E--+       +         +          +          +          +    |
        16 ++master +-H--+                                                      ++
           ||                                                                    |
        14 ++                                                                   ++
           | |                                                                   |
        12 ++|                                                                  ++
           | |                                                                   |
        10 ++++                                                                 ++
         8 ++E                                                                  ++
           |+++                                                                  |
         6 ++ |                                                                 ++
           |  |                                                                  |
         4 ++ |                                                                 ++
           |   |                                                                 |
         2 +H++E+---                                                            ++
           + |     +E++----+E+---+--+E+----++E+------+E+------+E++----+E+---+--+E|
         0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
           0          10         20        30         40         50         60
                                      Number of threads
      
                      atomic_add-bench: 1000000 ops/thread, [0,2] range
      
        18 ++---------+----------+---------+----------+----------+----------+---++
           +cmpxchg +-E--+       +         +          +          +          +    |
        16 ++master +-H--+                                                      ++
           | |                                                                   |
        14 ++E                                                                  ++
           | |                                                                   |
        12 ++|                                                                  ++
           |+++                                                                  |
        10 ++ |                                                                 ++
         8 ++ |                                                                 ++
           |  |                                                                  |
         6 ++ |                                                                 ++
           |   |                                                                 |
         4 ++  |                                                                ++
           |  +E+---                                                             |
         2 +H+     +E+-----+++              +++      +++   ---+E+-----+E+------+++
           +++        +    +E+---+--+E+----++E+------+E+---   ++++    +++   +  +E|
         0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
           0          10         20        30         40         50         60
                                      Number of threads
      
                     atomic_add-bench: 1000000 ops/thread, [0,128] range
      
        70 ++---------+----------+---------+----------+----------+----------+---++
           +cmpxchg +-E--+       +         +          +          +          +    |
        60 ++master +-H--+                  +++            ---+E+-----+E+------+E+
           |                        +E+------E-------+E+---                      |
           |                     ---        +++                                  |
        50 ++              +++---                                               ++
           |              -+E+                                                   |
        40 ++      +++----                                                      ++
           |        E-                                                           |
           |      --|                                                            |
        30 ++   -- +++                                                          ++
           |  +E+                                                                |
        20 ++E+                                                                 ++
           |E+                                                                   |
           |                                                                     |
        10 ++                                                                   ++
           +          +          +         +          +          +          +    |
         0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
           0          10         20        30         40         50         60
                                      Number of threads
      
                    atomic_add-bench: 1000000 ops/thread, [0,1024] range
      
        160 ++---------+---------+----------+---------+----------+----------+---++
            +cmpxchg +-E--+      +          +         +          +          +    |
        140 ++master +-H--+                                           +++      +++
            |                                                -+E+-----+E+-------E|
        120 ++                                       +++ ----                  +++
            |                                +++  ----E--                        |
        100 ++                              --E---   +++                        ++
            |                       +++ ---- +++                                 |
         80 ++                     --E--                                        ++
            |                  ---- +++                                          |
            |              -+E+                                                  |
         60 ++         ---- +++                                                 ++
            |      +E+-                                                          |
         40 ++   --                                                             ++
            |  +E+                                                               |
         20 +EE+                                                                ++
            +++        +         +          +         +          +          +    |
          0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
            0          10        20         30        40         50         60
                                      Number of threads
      
      [rth: Rearrange 128-bit cmpxchg helper.  Enforce alignment on LL.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-28-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      1dd089d0
    • E
      target-arm: emulate SWP with atomic_xchg helper · cf12bce0
      Emilio G. Cota 提交于
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-25-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      cf12bce0
    • E
      target-arm: emulate LL/SC using cmpxchg helpers · 354161b3
      Emilio G. Cota 提交于
      Emulating LL/SC with cmpxchg is not correct, since it can
      suffer from the ABA problem. Portable parallel code, however,
      is written assuming only cmpxchg--and not LL/SC--is available.
      This means that in practice emulating LL/SC with cmpxchg is
      a viable alternative.
      
      The appended emulates LL/SC pairs in ARM with cmpxchg helpers.
      This works in both user and system mode. In usermode, it avoids
      pausing all other CPUs to perform the LL/SC pair. The subsequent
      performance and scalability improvement is significant, as the
      plots below show. They plot the throughput of atomic_add-bench
      compiled for ARM and executed on a 64-core x86 machine.
      
      Hi-res plots: http://imgur.com/a/aNQpB
      
                     atomic_add-bench: 1000000 ops/thread, [0,1] range
      
        9 ++---------+----------+----------+----------+----------+----------+---++
          +cmpxchg +-E--+       +          +          +          +          +    |
        8 +Emaster +-H--+                                                       ++
          | |                                                                    |
        7 ++E                                                                   ++
          | |                                                                    |
        6 ++++                                                                  ++
          |  |                                                                   |
        5 ++ |                                                                  ++
        4 ++ |                                                                  ++
          |  |                                                                   |
        3 ++ |                                                                  ++
          |   |                                                                  |
        2 ++  |                                                                 ++
          |H++E+---                                  +++  ---+E+------+E+------+E|
        1 +++     +E+-----+E+------+E+------+E+------+E+--   +++      +++       ++
          ++H+       +    +++   +  +++     ++++       +          +          +    |
        0 ++--H----H-+-----H----+----------+----------+----------+----------+---++
          0          10         20         30         40         50         60
                                     Number of threads
      
                      atomic_add-bench: 1000000 ops/thread, [0,2] range
      
        16 ++---------+----------+---------+----------+----------+----------+---++
           +cmpxchg +-E--+       +         +          +          +          +    |
        14 ++master +-H--+                                                      ++
           | |                                                                   |
        12 ++|                                                                  ++
           | E                                                                   |
        10 ++|                                                                  ++
           | |                                                                   |
         8 ++++                                                                 ++
           |E+|                                                                  |
           |  |                                                                  |
         6 ++ |                                                                 ++
           |   |                                                                 |
         4 ++  |                                                                ++
           |  +E+---       +++      +++              +++           ---+E+------+E|
         2 +H+     +E+------E-------+E+-----+E+------+E+------+E+--            +++
           + |        +    +++   +         ++++       +          +          +    |
         0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
           0          10         20        30         40         50         60
                                      Number of threads
      
                     atomic_add-bench: 1000000 ops/thread, [0,128] range
      
        70 ++---------+----------+---------+----------+----------+----------+---++
           +cmpxchg +-E--+       +         +          +       ++++          +    |
        60 ++master +-H--+                                 ----E------+E+-------++
           |                                        -+E+---   +++     +++      +E|
           |                                +++ ---- +++                       ++|
        50 ++                       +++  ---+E+-                                ++
           |                        -E---                                        |
        40 ++                    ---+++                                         ++
           |               +++---                                                |
           |              -+E+                                                   |
        30 ++      +++----                                                      ++
           |       +E+                                                           |
        20 ++ +++--                                                             ++
           |  +E+                                                                |
           |+E+                                                                  |
        10 +E+                                                                  ++
           +          +          +         +          +          +          +    |
         0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
           0          10         20        30         40         50         60
                                      Number of threads
      
                    atomic_add-bench: 1000000 ops/thread, [0,1024] range
      
        120 ++---------+---------+----------+---------+----------+----------+---++
            +cmpxchg +-E--+      +          +         +          +          +    |
            | master +-H--+                                                    ++|
        100 ++                                                              ----E+
            |                                                 +++  ---+E+---   ++|
            |                                                --E---   +++        |
         80 ++                                           ---- +++               ++
            |                                     ---+E+-                        |
         60 ++                              -+E+--                              ++
            |                       +++ ---- +++                                 |
            |                      -+E+-                                         |
         40 ++              +++----                                             ++
            |      +++   ---+E+                                                  |
            |     -+E+---                                                        |
         20 ++ +E+                                                              ++
            |+E+++                                                               |
            +E+        +         +          +         +          +          +    |
          0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
            0          10        20         30        40         50         60
                                      Number of threads
      
      [rth: Enforce alignment for ldrexd.]
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-23-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      354161b3
    • R
      target-arm: Rearrange aa32 load and store functions · 7f5616f5
      Richard Henderson 提交于
      Stop specializing on TARGET_LONG_BITS == 32; unconditionally allocate
      a temp and expand with tcg_gen_extu_i32_tl.  Split out gen_aa32_addr,
      gen_aa32_frob64, gen_aa32_ld_i32 and gen_aa32_st_i32 as separate interfaces.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      7f5616f5
    • E
      tests: add atomic_add-bench · 070e3edc
      Emilio G. Cota 提交于
      With this microbenchmark we can measure the overhead of emulating atomic
      instructions with a configurable degree of contention.
      
      The benchmark spawns $n threads, each performing $o atomic ops (additions)
      in a loop. Each atomic operation is performed on a different cache line
      (assuming lines are 64b long) that is randomly selected from a range [0, $r).
      
      [ Note: each $foo corresponds to a -foo flag ]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      Message-Id: <1467054136-10430-20-git-send-email-cota@braap.org>
      070e3edc
    • E
      target-i386: remove helper_lock() · 37b995f6
      Emilio G. Cota 提交于
      It's been superseded by the atomic helpers.
      
      The use of the atomic helpers provides a significant performance and scalability
      improvement. Below is the result of running the atomic_add-test microbenchmark with:
       $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
      , where $n is the number of threads and $r is the allowed range for the additions.
      
      The scenarios measured are:
      - atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
      - cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
      - master: before this patchset
      
      Results sorted in ascending range, i.e. descending degree of contention.
      Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
      Opteron 6376 cores.
      
                      atomic_add-bench: 5000000 ops/thread, [0,1] range
      
        25 ++---------+----------+---------+----------+----------+----------+---++
           + atomic +-E--+       +         +          +          +          +    |
           |cmpxchg +-H--+                                                       |
        20 +Emaster +-N--+                                                      ++
           ||                                                                    |
           |++                                                                   |
           ||                                                                    |
        15 +++                                                                  ++
           |N|                                                                   |
           |+|                                                                   |
        10 ++|                                                                  ++
           |+|+                                                                  |
           | |    -+E+------        +++  ---+E+------+E+------+E+-----+E+------+E|
           |+E+E+- +++     +E+------+E+--                                        |
         5 ++|+                                                                 ++
           |+N+H+---                                 +++                         |
           ++++N+--+H++----+++   +  +++  --++H+------+H+------+H++----+H+---+--- |
         0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
           0          10         20        30         40         50         60
                                      Number of threads
      
                      atomic_add-bench: 5000000 ops/thread, [0,2] range
      
        25 ++---------+----------+---------+----------+----------+----------+---++
           ++atomic +-E--+       +         +          +          +          +    |
           |cmpxchg +-H--+                                                       |
        20 ++master +-N--+                                                      ++
           |E|                                                                   |
           |++                                                                   |
           ||E                                                                   |
        15 ++|                                                                  ++
           |N||                                                                  |
           |+||                                   ---+E+------+E+-----+E+------+E|
        10 ++| |        ---+E+------+E+-----+E+---                    +++      +++
           ||H+E+--+E+--                                                         |
           |+++++                                                                |
           | ||                                                                  |
         5 ++|+H+--                                  +++                        ++
           |+N+    -                              ---+H+------+H+------          |
           +  +N+--+H++----+H+---+--+H+----++H+---    +          +    +H+---+--+H|
         0 ++---------+----------+---------+----------+----------+----------+---++
           0          10         20        30         40         50         60
                                      Number of threads
      
                      atomic_add-bench: 5000000 ops/thread, [0,8] range
      
        40 ++---------+----------+---------+----------+----------+----------+---++
           ++atomic +-E--+       +         +          +          +          +    |
        35 +cmpxchg +-H--+                                                      ++
           | master +-N--+               ---+E+------+E+------+E+-----+E+------+E|
        30 ++|                   ---+E+--   +++                                 ++
           | |            -+E+---                                                |
        25 ++E        ---- +++                                                  ++
           |+++++ -+E+                                                           |
        20 +E+ E-- +++                                                          ++
           |H|+++                                                                |
           |+|                                       +H+-------                  |
        15 ++H+                                   ---+++      +H+------         ++
           |N++H+--                         +++---                    +H+------++|
        10 ++ +++  -       +++           ---+H+                       +++      +H+
           | |     +H+-----+H+------+H+--                                        |
         5 ++|                      +++                                         ++
           ++N+N+--+N++          +         +          +          +          +    |
         0 ++---------+----------+---------+----------+----------+----------+---++
           0          10         20        30         40         50         60
                                      Number of threads
      
                     atomic_add-bench: 5000000 ops/thread, [0,128] range
      
        160 ++---------+---------+----------+---------+----------+----------+---++
            + atomic +-E--+      +          +         +          +          +    |
        140 +cmpxchg +-H--+                          +++      +++               ++
            | master +-N--+                           E--------E------+E+------++|
        120 ++                                      --|        |      +++       E+
            |                                     -- +++      +++              ++|
        100 ++                                   -                              ++
            |                                +++-                     +++      ++|
         80 ++                              -+E+    -+H+------+H+------H--------++
            |                           ----    ----                  +++       H|
            |            ---+E+-----+E+-  ---+H+                               ++|
         60 ++     +E+---   +++  ---+H+---                                      ++
            |    --+++   ---+H+--                                                |
         40 ++ +E+-+H+---                                                       ++
            |  +H+                                                               |
         20 +EE+                                                                ++
            +N+        +         +          +         +          +          +    |
          0 ++N-N---N--+---------+----------+---------+----------+----------+---++
            0          10        20         30        40         50         60
                                      Number of threads
      
                    atomic_add-bench: 5000000 ops/thread, [0,1024] range
      
        350 ++---------+---------+----------+---------+----------+----------+---++
            + atomic +-E--+      +          +         +          +          +    |
        300 +cmpxchg +-H--+                                                    +++
            | master +-N--+                                           +++       ||
            |                                                 +++      |    ----E|
        250 ++                                                 |   ----E----    ++
            |                                              ----E---    |    ---+H|
        200 ++                                      -+E+---   +++  ---+H+---    ++
            |                                   ----         -+H+--              |
            |                                +E+     +++ ---- +++                |
        150 ++                            ---+++  ---+H+-                       ++
            |                          ---  -+H+--                               |
        100 ++                   ---+E+ ---- +++                                ++
            |      +++   ---+E+-----+H+-                                         |
            |     -+E+------+H+--                                                |
         50 ++ +E+                                                              ++
            +EE+       +         +          +         +          +          +    |
          0 ++N-N---N--+---------+----------+---------+----------+----------+---++
            0          10        20         30        40         50         60
                                      Number of threads
      
        hi-res: http://imgur.com/a/fMRmq
      
      For master I stopped measuring master after 8 threads, because there is little
      point in measuring the well-known performance collapse of a contended lock.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      37b995f6
    • E
      target-i386: emulate XCHG using atomic helper · ea97ebe8
      Emilio G. Cota 提交于
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-19-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      ea97ebe8
    • E
      target-i386: emulate LOCK'ed BTX ops using atomic helpers · cfe819d3
      Emilio G. Cota 提交于
      [rth: Avoid redundant qemu_ld in locked case.  Fix previously unnoticed
      incorrect zero-extension of address in register-offset case.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-18-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      cfe819d3
    • E
      target-i386: emulate LOCK'ed XADD using atomic helper · f53b0181
      Emilio G. Cota 提交于
      [rth: Move load of reg value to common location.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-17-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      f53b0181
    • E
      target-i386: emulate LOCK'ed NEG using cmpxchg helper · 8eb8c738
      Emilio G. Cota 提交于
      [rth: Move redundant qemu_load out of cmpxchg loop.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-16-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      8eb8c738
    • E
      target-i386: emulate LOCK'ed NOT using atomic helper · 2a5fe8ae
      Emilio G. Cota 提交于
      [rth: Avoid qemu_load that's redundant with the atomic op.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-15-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      2a5fe8ae
    • E
      target-i386: emulate LOCK'ed INC using atomic helper · 60e57346
      Emilio G. Cota 提交于
      [rth: Merge gen_inc_locked back into gen_inc to share cc update.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-14-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      60e57346
    • E
      target-i386: emulate LOCK'ed OP instructions using atomic helpers · a7cee522
      Emilio G. Cota 提交于
      [rth: Eliminate some unnecessary temporaries.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-13-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      a7cee522
    • E
      target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers · ae03f8de
      Emilio G. Cota 提交于
      The diff here is uglier than necessary. All this does is to turn
      
      FOO
      
      into:
      
      if (s->prefix & PREFIX_LOCK) {
        BAR
      } else {
        FOO
      }
      
      where FOO is the original implementation of an unlocked cmpxchg.
      
      [rth: Adjust unlocked cmpxchg to use movcond instead of branches.
      Adjust helpers to use atomic helpers.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-6-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      ae03f8de
    • R
      91682118
    • R
      tcg: Add CONFIG_ATOMIC64 · df79b996
      Richard Henderson 提交于
      Allow qemu to build on 32-bit hosts without 64-bit atomic ops.
      
      Even if we only allow 32-bit hosts to multi-thread emulate 32-bit
      guests, we still need some way to handle the 32-bit guest using a
      64-bit atomic operation.  Do so by dropping back to single-step.
      Reviewed-by: NEmilio G. Cota <cota@braap.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      df79b996
    • R
      tcg: Add atomic128 helpers · 7ebee43e
      Richard Henderson 提交于
      Force the use of cmpxchg16b on x86_64.
      
      Wikipedia suggests that only very old AMD64 (circa 2004) did not have
      this instruction.  Further, it's required by Windows 8 so no new cpus
      will ever omit it.
      
      If we truely care about these, then we could check this at startup time
      and then avoid executing paths that use it.
      Reviewed-by: NEmilio G. Cota <cota@braap.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      7ebee43e
    • R
      tcg: Add atomic helpers · c482cb11
      Richard Henderson 提交于
      Add all of cmpxchg, op_fetch, fetch_op, and xchg.
      Handle both endian-ness, and sizes up to 8.
      Handle expanding non-atomically, when emulating in serial.
      Reviewed-by: NEmilio G. Cota <cota@braap.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      c482cb11
    • R
      cputlb: Tidy some macros · c86c6e4c
      Richard Henderson 提交于
      TGT_LE and TGT_BE are not size dependent and do not need to be
      redefined.  The others are no longer used at all.
      Reviewed-by: NEmilio G. Cota <cota@braap.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      c86c6e4c
    • R
      cputlb: Move most of iotlb code out of line · 82a45b96
      Richard Henderson 提交于
      Saves 2k code size off of a cold path.
      Reviewed-by: NEmilio G. Cota <cota@braap.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      82a45b96