提交 · 4bcfa56ca9f6970974255dd5621b08e5aa9e5af5 · openeuler / qemu

28 10月, 2016 10 次提交

spapr_pci: advertise explicit numa IDs even when there's 1 node · 4bcfa56c

由 Michael Roth 提交于 10月 18, 2016

With the addition of "numa_node" properties for PHBs we began
advertising NUMA affinity in cases where nb_numa_nodes > 1.

Since the default on the guest side is to make no assumptions about
PHB NUMA affinity (defaulting to -1), there is still a valid use-case
for explicitly defining a PHB's NUMA affinity even when there's just
one node. In particular, some workloads make faulty assumptions about
/sys/bus/pci/<devid>/numa_node being >= 0, warranting the use of
this property as a workaround even if there's just 1 PHB or NUMA
node.

Enable this use-case by always advertising the PHB's NUMA affinity
if "numa_node" has been explicitly set.

We could achieve this by relaxing the check to simply be
nb_numa_nodes > 0, but even safer would be to check
numa_info[nodeid].present explicitly, and to fail at start time
for cases where it does not exist.

This has an additional affect of no longer advertising PHB NUMA
affinity unconditionally if nb_numa_nodes > 1 and "numa_node"
property is unset/-1, but since the default value on the guest
side for each PHB is also -1, the behavior should be the same for
that situation. We could still retain the old behavior if desired,
but the decision seems arbitrary, so we take the simpler route.

Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Shivaprasad G. Bhat <shivapbh@in.ibm.com>
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

4bcfa56c

tests: enable virtio tests on SPAPR · 30ca440e

由 Laurent Vivier 提交于 10月 17, 2016

but disable MSI-X tests on SPAPR as we can't check the result
(the memory region used on PC is not readable on SPAPR).
Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

30ca440e

tests: use qtest_pc_boot()/qtest_shutdown() in virtio tests · a980f7f2

由 Laurent Vivier 提交于 10月 17, 2016

This patch replaces calls to qtest_start() and qtest_end() by
calls to qtest_pc_boot() and qtest_shutdown().

This allows to initialize memory allocator and PCI interface
functions. This will ease to enable virtio tests on other
architectures by only adding a specific qtest_XXX_boot() (like
qtest_spapr_boot()).
Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

a980f7f2

tests: rename target_big_endian() as qvirtio_is_big_endian() · 8b4b80c3

由 Laurent Vivier 提交于 10月 17, 2016

Move the definition to libqos/virtio.h as it must be used
only with virtio functions.

Add a QVirtioDevice parameter as it will be needed to
know if the virtio device is using virtio 1.0 specification
and thus is always little-endian (to do)
Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

8b4b80c3

tests: move QVirtioBus pointer into QVirtioDevice · 6b9cdf4c

由 Laurent Vivier 提交于 10月 17, 2016

This allows to not have to pass bus and device for every virtio functions.
Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NThomas Huth <thuth@redhat.com>
[dwg: Fix style nit]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

6b9cdf4c

tests: don't check if qtest_spapr_boot() returns NULL · 458f3b2c

由 Laurent Vivier 提交于 10月 17, 2016

qtest_spapr_boot()/qtest_pc_boot()/qtest_boot() call qtest_vboot()
and qtest_vboot() calls g_malloc(),
and g_malloc() never fails:
if memory allocation fails, the application is terminated.
Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

458f3b2c

tests: fix memory leak in virtio-scsi-test · f62e0bbb

由 Laurent Vivier 提交于 10月 17, 2016

vs is allocated in qvirtio_scsi_pci_init() and never freed.
Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

f62e0bbb

ppc/xics: Add xics to the monitor "info pic" command · b1fc72f0

由 Benjamin Herrenschmidt 提交于 10月 17, 2016

Useful to debug interrupt problems.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - updated for qemu-2.7
      - added a test on ->irqs as it is not necessarily allocated
        (PHB3_MSI)
      - removed static variable g_xics and replace with a loop on all
        children to find the xics objects.
      - rebased on InterruptStatsProvider interface ]
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

b1fc72f0

pseries: Update SLOF firmware image to 20161019 · f77d4ff8

由 Alexey Kardashevskiy 提交于 10月 19, 2016

The main changes are:
* virtio-serial
* booting speed imrovement
* better PCI bridge support

The complete changelog is:
  > virtio-serial: Fix compile error
  > scsi: Remove debug functions from scsi-loader.fs
  > scsi: Remove unused read-6 command
  > obp-tftp: Remove the ciregs-buffer
  > libnet: Simplify the net-load arguments passing
  > libnet: Simplify the Forth-to-C wrapper of ping()
  > Do not link libnet to net-snk anymore, and remove net-snk from board-qemu
  > Add a Forth-to-C wrapper for the ping command, too
  > Link libnet code to Paflof and add a wrapper for netboot()
  > Remember execution tokens of "write" and "read" for socket operations
  > Add virtio-serial device support
  > Generalize output banner write routine
  > Improve indentation in OF.fs
  > scsi: implement READ (16) command
  > rtas: Improve rtas-do-config-@ and rtas-do-config-! a little bit
  > libnet: Make netapps.h includable from .code files
  > libnet: Remove unused prototypes from netapps.h
  > libnet: Fix the printout of the ping command
  > libnet: Make sure to close sockets when we're done
  > scsi: implement read-capacity-16
  > pci: Fix secondary and subordinate PCI bus enumeration with board-qemu
  > pci-phb: Fix stack underflow in phb-pci-walk-bridge
  > paflof: Add a read() function to read keyboard input
  > paflof: Add socket(), send() and recv() functions to paflof
  > paflof: Provide get_timer() and set_timer() helper functions
  > paflof: Add a write_mm_log helper function
  > paflof: Copy sbrk code from net-snk
  > paflof: Use CFLAGS from make.rules instead of completely redefining them
  > Do not include the FCode evaluator by default anymore
  > Source code beautification of board-qemu/slof/pci-interrupts.fs
  > Allow PCI devices in PCI bridge slots greater than 4
  > Fix bad interrupt pin numbering in interrupt-map property of PCI bridges
  > Improve SLOF_alloc_mem_aligned()
  > instance: Fix set-my-args for empty arguments
  > Fix remaining compiler warnings in sloffs.c
  > Remove misleading padding fields from ROM header definition
  > Improve indentation in calculatecrc.h
  > Do not include calculatecrc.h from assembler files
  > Remove unused defines in calculatecrc.h
  > libnet: Re-initialize global variables at the beginning of tftp()
  > Remove dependency on cpu/@0 for booting
  > usb: Set XHCI slot speed according to port status
  > usb: Build correct route string for USB3 devices behind a hub
  > usb: Initialize USB3 devices on a hub and keep track of hub topology
  > usb: Increase amount of maximum slot IDs and add a sanity check
  > usb: Move XHCI port state arrays from header to .c file
  > tools: add copy functionality
  > tools: added support to sloffs to read from /dev/slof_flash
  > tools: added file append functionality
  > tools: use crc checking code from romfs/tools
  > tools: added initial version of sloffs
  > romfs: factored out crc code, to make it usable from other locations
  > tools: remove unused parts from the Makefile
  > usb-hid: Fix non-working comma key
  > fat-files: Fix access to FAT32 dir/files when cluster > 16-bits
  > virtio-net: fix ring handling in receive
  > net: Remove remainders of the MTFTP code
  > net: Move also files from clients/net-snk/app/netapps/ to lib/libnet/
  > net: Move files from clients/net-snk/app/netlib/ to lib/libnet/
  > net-snk: Get rid of netlib and netapps prefixes in include statements
  > usb-xhci: assign field4 before conditional
  > Improve F12 key handling in boot menu
  > Fix stack underflow that occurs with duplicated ESC in input
  > rtas-nvram: optimize erase
  > ipv6: Replace magic number 1500 with ETH_MTU_SIZE (i.e. 1518)
  > ipv6: Fix NULL pointer dereference in ip6addr_add()
  > ipv6: Fix memory leak in set_ipv6_address() / ip6_create_ll_address()
  > ipv6: Clear memory after malloc if necessary
  > ipv6: Fix possible NULL-pointer dereference in send_ipv6()
  > ping: use gateway address for routing
  > ping: add netmask in the ping argument
  > xhci: fix missing keys from keyboard
  > xhci: add memory barrier after filling the trb
  > loaders: Remove netflash command
  > boot: Remove legacy Forth words for network loading
  > base: Move cnt-bits and bcd-to-bin to board-js2x folder
  > base: Move huge-tftp-load variable to obp-tftp package
  > base: Remove unused IP address conversion functions
  > virtio: White space cleanup in virtio-9p.c
  > virtio: Add modern version 1.0 support to 9p driver
  > virtio: Set a proper name for virtio-9p device tree nodes
  > pci: Fix mistype in "unkown-bridge"
  > ipv6: Indent code with tabs, not with spaces
  > ipv6: send_ipv6() has to return after doing NDP
  > ipv6: Do not use unitialized MAC address array
  > ipv6: Add support for sending packets through a router
  > Remove unused sms code.
  > virtio-net: initialize to populate mac address
  > libbootmsg: Do not use '\b' characters when printing checkpoints
  > dev-null: The "read" function has to return 0 if nothing has been read
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

f77d4ff8

Merge remote-tracking branch 'remotes/kraxel/tags/pull-audio-20161027-1' into staging · 835f3d24

由 Peter Maydell 提交于 10月 27, 2016

audio: intel-hda: check stream entry count during transfer

# gpg: Signature made Thu 27 Oct 2016 15:30:51 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-audio-20161027-1:
  audio: intel-hda: check stream entry count during transfer
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

835f3d24

27 10月, 2016 3 次提交

Merge remote-tracking branch 'remotes/rth/tags/pull-atomic-20161026' into staging · 5929d7e8

由 Peter Maydell 提交于 10月 27, 2016

cmpxchg emulation of atomics, v8

# gpg: Signature made Wed 26 Oct 2016 16:30:03 BST
# gpg:                using RSA key 0xAD1270CC4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"
# Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B

* remotes/rth/tags/pull-atomic-20161026: (37 commits)
  target-alpha: Emulate LL/SC using cmpxchg helpers
  target-alpha: Introduce MMU_PHYS_IDX
  target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}
  linux-user: remove handling of aarch64's EXCP_STREX
  linux-user: remove handling of ARM's EXCP_STREX
  target-arm: emulate aarch64's LL/SC using cmpxchg helpers
  target-arm: emulate SWP with atomic_xchg helper
  target-arm: emulate LL/SC using cmpxchg helpers
  target-arm: Rearrange aa32 load and store functions
  tests: add atomic_add-bench
  target-i386: remove helper_lock()
  target-i386: emulate XCHG using atomic helper
  target-i386: emulate LOCK'ed BTX ops using atomic helpers
  target-i386: emulate LOCK'ed XADD using atomic helper
  target-i386: emulate LOCK'ed NEG using cmpxchg helper
  target-i386: emulate LOCK'ed NOT using atomic helper
  target-i386: emulate LOCK'ed INC using atomic helper
  target-i386: emulate LOCK'ed OP instructions using atomic helpers
  target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
  tcg: Emit barriers with parallel_cpus
  ...
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

5929d7e8

Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging · 8f9d84df

由 Peter Maydell 提交于 10月 27, 2016

# gpg: Signature made Wed 26 Oct 2016 03:19:06 BST
# gpg:                using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  colo-proxy: fix memory leak
  net: rtl8139: limit processing of ring descriptors
  net: vmxnet: initialise local tx descriptor
  e1000e: Don't zero out buffer address in rx descriptor
  net: rocker: set limit to DMA buffer size
  net: eepro100: fix memory leak in device uninit
  tap-bsd: OpenBSD uses tap(4) now
  net: pcnet: fix source formatting and indentation
  net: pcnet: check rx/tx descriptor ring length
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

8f9d84df

Merge remote-tracking branch 'remotes/vivier/tags/m68k-part1-pull-request' into staging · 991a97ac

由 Peter Maydell 提交于 10月 27, 2016

# gpg: Signature made Tue 25 Oct 2016 19:58:46 BST
# gpg:                using RSA key 0xF30C38BD3F2FBE3C
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>"
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>"
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>"
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier/tags/m68k-part1-pull-request: (23 commits)
  target-m68k: Optimize gen_flush_flags
  target-m68k: Optimize some comparisons
  target-m68k: Use setcond for scc
  target-m68k: Introduce DisasCompare
  target-m68k: Reorg flags handling
  target-m68k: Remove incorrect clearing of cc_x
  target-m68k: Some fixes to SR and flags management
  target-m68k: Print flags properly
  target-m68k: update CPU flags management
  target-m68k: don't update cc_dest in helpers
  target-m68k: update move to/from ccr/sr
  target-m68k: remove m68k_cpu_exec_enter() and m68k_cpu_exec_exit()
  target-m68k: Replace helper_xflag_lt with setcond
  target-m68k: allow to update flags with operation on words and bytes
  target-m68k: REG() macro cleanup
  target-m68k: set PAGE_BITS to 12 for m68k
  target-m68k: define operand sizes
  target-m68k: set disassembler mode to 680x0 or coldfire
  target-m68k: introduce read_imXX() functions
  target-m68k: manage scaled index
  ...
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

991a97ac

26 10月, 2016 27 次提交

target-alpha: Emulate LL/SC using cmpxchg helpers · ed283916

由 Richard Henderson 提交于 9月 02, 2016

Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem.  However, portable parallel
code is written assuming only cmpxchg which means that in
practice this is a viable alternative.
Signed-off-by: NRichard Henderson <rth@twiddle.net>

ed283916

target-alpha: Introduce MMU_PHYS_IDX · 6a73ecf5

由 Richard Henderson 提交于 9月 03, 2016

Rather than using helpers for physical accesses, use a mmu index.
The primary cleanup is with store-conditional on physical addresses.
Signed-off-by: NRichard Henderson <rth@twiddle.net>

6a73ecf5

target-arm: remove EXCP_STREX + cpu_exclusive_{test, info} · 05188cc7

由 Emilio G. Cota 提交于 6月 27, 2016

The exception is not emitted anymore; remove it and the associated
TCG variables.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>
Message-Id: <1467054136-10430-31-git-send-email-cota@braap.org>

05188cc7

linux-user: remove handling of aarch64's EXCP_STREX · f4e6eb7f

由 Emilio G. Cota 提交于 6月 27, 2016

The exception is not emitted anymore.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>
Message-Id: <1467054136-10430-30-git-send-email-cota@braap.org>

f4e6eb7f

linux-user: remove handling of ARM's EXCP_STREX · b50b82fc

由 Emilio G. Cota 提交于 6月 27, 2016

The exception is not emitted anymore.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twidle.net>
Message-Id: <1467054136-10430-29-git-send-email-cota@braap.org>

b50b82fc

target-arm: emulate aarch64's LL/SC using cmpxchg helpers · 1dd089d0

由 Emilio G. Cota 提交于 6月 27, 2016

Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in aarch64 with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/JVc8Y

                atomic_add-bench: 1000000 ops/thread, [0,1] range

  18 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  16 ++master +-H--+                                                      ++
     ||                                                                    |
  14 ++                                                                   ++
     | |                                                                   |
  12 ++|                                                                  ++
     | |                                                                   |
  10 ++++                                                                 ++
   8 ++E                                                                  ++
     |+++                                                                  |
   6 ++ |                                                                 ++
     |  |                                                                  |
   4 ++ |                                                                 ++
     |   |                                                                 |
   2 +H++E+---                                                            ++
     + |     +E++----+E+---+--+E+----++E+------+E+------+E++----+E+---+--+E|
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 1000000 ops/thread, [0,2] range

  18 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  16 ++master +-H--+                                                      ++
     | |                                                                   |
  14 ++E                                                                  ++
     | |                                                                   |
  12 ++|                                                                  ++
     |+++                                                                  |
  10 ++ |                                                                 ++
   8 ++ |                                                                 ++
     |  |                                                                  |
   6 ++ |                                                                 ++
     |   |                                                                 |
   4 ++  |                                                                ++
     |  +E+---                                                             |
   2 +H+     +E+-----+++              +++      +++   ---+E+-----+E+------+++
     +++        +    +E+---+--+E+----++E+------+E+---   ++++    +++   +  +E|
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 1000000 ops/thread, [0,128] range

  70 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  60 ++master +-H--+                  +++            ---+E+-----+E+------+E+
     |                        +E+------E-------+E+---                      |
     |                     ---        +++                                  |
  50 ++              +++---                                               ++
     |              -+E+                                                   |
  40 ++      +++----                                                      ++
     |        E-                                                           |
     |      --|                                                            |
  30 ++   -- +++                                                          ++
     |  +E+                                                                |
  20 ++E+                                                                 ++
     |E+                                                                   |
     |                                                                     |
  10 ++                                                                   ++
     +          +          +         +          +          +          +    |
   0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

              atomic_add-bench: 1000000 ops/thread, [0,1024] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      +cmpxchg +-E--+      +          +         +          +          +    |
  140 ++master +-H--+                                           +++      +++
      |                                                -+E+-----+E+-------E|
  120 ++                                       +++ ----                  +++
      |                                +++  ----E--                        |
  100 ++                              --E---   +++                        ++
      |                       +++ ---- +++                                 |
   80 ++                     --E--                                        ++
      |                  ---- +++                                          |
      |              -+E+                                                  |
   60 ++         ---- +++                                                 ++
      |      +E+-                                                          |
   40 ++   --                                                             ++
      |  +E+                                                               |
   20 +EE+                                                                ++
      +++        +         +          +         +          +          +    |
    0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

[rth: Rearrange 128-bit cmpxchg helper.  Enforce alignment on LL.]
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-28-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

1dd089d0

target-arm: emulate SWP with atomic_xchg helper · cf12bce0

由 Emilio G. Cota 提交于 6月 27, 2016

Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-25-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

cf12bce0

target-arm: emulate LL/SC using cmpxchg helpers · 354161b3

由 Emilio G. Cota 提交于 6月 27, 2016

Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in ARM with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/aNQpB

               atomic_add-bench: 1000000 ops/thread, [0,1] range

  9 ++---------+----------+----------+----------+----------+----------+---++
    +cmpxchg +-E--+       +          +          +          +          +    |
  8 +Emaster +-H--+                                                       ++
    | |                                                                    |
  7 ++E                                                                   ++
    | |                                                                    |
  6 ++++                                                                  ++
    |  |                                                                   |
  5 ++ |                                                                  ++
  4 ++ |                                                                  ++
    |  |                                                                   |
  3 ++ |                                                                  ++
    |   |                                                                  |
  2 ++  |                                                                 ++
    |H++E+---                                  +++  ---+E+------+E+------+E|
  1 +++     +E+-----+E+------+E+------+E+------+E+--   +++      +++       ++
    ++H+       +    +++   +  +++     ++++       +          +          +    |
  0 ++--H----H-+-----H----+----------+----------+----------+----------+---++
    0          10         20         30         40         50         60
                               Number of threads

                atomic_add-bench: 1000000 ops/thread, [0,2] range

  16 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  14 ++master +-H--+                                                      ++
     | |                                                                   |
  12 ++|                                                                  ++
     | E                                                                   |
  10 ++|                                                                  ++
     | |                                                                   |
   8 ++++                                                                 ++
     |E+|                                                                  |
     |  |                                                                  |
   6 ++ |                                                                 ++
     |   |                                                                 |
   4 ++  |                                                                ++
     |  +E+---       +++      +++              +++           ---+E+------+E|
   2 +H+     +E+------E-------+E+-----+E+------+E+------+E+--            +++
     + |        +    +++   +         ++++       +          +          +    |
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 1000000 ops/thread, [0,128] range

  70 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +       ++++          +    |
  60 ++master +-H--+                                 ----E------+E+-------++
     |                                        -+E+---   +++     +++      +E|
     |                                +++ ---- +++                       ++|
  50 ++                       +++  ---+E+-                                ++
     |                        -E---                                        |
  40 ++                    ---+++                                         ++
     |               +++---                                                |
     |              -+E+                                                   |
  30 ++      +++----                                                      ++
     |       +E+                                                           |
  20 ++ +++--                                                             ++
     |  +E+                                                                |
     |+E+                                                                  |
  10 +E+                                                                  ++
     +          +          +         +          +          +          +    |
   0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

              atomic_add-bench: 1000000 ops/thread, [0,1024] range

  120 ++---------+---------+----------+---------+----------+----------+---++
      +cmpxchg +-E--+      +          +         +          +          +    |
      | master +-H--+                                                    ++|
  100 ++                                                              ----E+
      |                                                 +++  ---+E+---   ++|
      |                                                --E---   +++        |
   80 ++                                           ---- +++               ++
      |                                     ---+E+-                        |
   60 ++                              -+E+--                              ++
      |                       +++ ---- +++                                 |
      |                      -+E+-                                         |
   40 ++              +++----                                             ++
      |      +++   ---+E+                                                  |
      |     -+E+---                                                        |
   20 ++ +E+                                                              ++
      |+E+++                                                               |
      +E+        +         +          +         +          +          +    |
    0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

[rth: Enforce alignment for ldrexd.]
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-23-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

354161b3

target-arm: Rearrange aa32 load and store functions · 7f5616f5

由 Richard Henderson 提交于 6月 30, 2016

Stop specializing on TARGET_LONG_BITS == 32; unconditionally allocate
a temp and expand with tcg_gen_extu_i32_tl. Split out gen_aa32_addr,
gen_aa32_frob64, gen_aa32_ld_i32 and gen_aa32_st_i32 as separate interfaces.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

7f5616f5

tests: add atomic_add-bench · 070e3edc

由 Emilio G. Cota 提交于 6月 27, 2016

With this microbenchmark we can measure the overhead of emulating atomic
instructions with a configurable degree of contention.

The benchmark spawns $n threads, each performing $o atomic ops (additions)
in a loop. Each atomic operation is performed on a different cache line
(assuming lines are 64b long) that is randomly selected from a range [0, $r).

[ Note: each $foo corresponds to a -foo flag ]
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>
Message-Id: <1467054136-10430-20-git-send-email-cota@braap.org>

070e3edc

target-i386: remove helper_lock() · 37b995f6

由 Emilio G. Cota 提交于 6月 27, 2016

It's been superseded by the atomic helpers.

The use of the atomic helpers provides a significant performance and scalability
improvement. Below is the result of running the atomic_add-test microbenchmark with:
 $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
, where $n is the number of threads and $r is the allowed range for the additions.

The scenarios measured are:
- atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
- cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
- master: before this patchset

Results sorted in ascending range, i.e. descending degree of contention.
Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
Opteron 6376 cores.

                atomic_add-bench: 5000000 ops/thread, [0,1] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     + atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 +Emaster +-N--+                                                      ++
     ||                                                                    |
     |++                                                                   |
     ||                                                                    |
  15 +++                                                                  ++
     |N|                                                                   |
     |+|                                                                   |
  10 ++|                                                                  ++
     |+|+                                                                  |
     | |    -+E+------        +++  ---+E+------+E+------+E+-----+E+------+E|
     |+E+E+- +++     +E+------+E+--                                        |
   5 ++|+                                                                 ++
     |+N+H+---                                 +++                         |
     ++++N+--+H++----+++   +  +++  --++H+------+H+------+H++----+H+---+--- |
   0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,2] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 ++master +-N--+                                                      ++
     |E|                                                                   |
     |++                                                                   |
     ||E                                                                   |
  15 ++|                                                                  ++
     |N||                                                                  |
     |+||                                   ---+E+------+E+-----+E+------+E|
  10 ++| |        ---+E+------+E+-----+E+---                    +++      +++
     ||H+E+--+E+--                                                         |
     |+++++                                                                |
     | ||                                                                  |
   5 ++|+H+--                                  +++                        ++
     |+N+    -                              ---+H+------+H+------          |
     +  +N+--+H++----+H+---+--+H+----++H+---    +          +    +H+---+--+H|
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,8] range

  40 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
  35 +cmpxchg +-H--+                                                      ++
     | master +-N--+               ---+E+------+E+------+E+-----+E+------+E|
  30 ++|                   ---+E+--   +++                                 ++
     | |            -+E+---                                                |
  25 ++E        ---- +++                                                  ++
     |+++++ -+E+                                                           |
  20 +E+ E-- +++                                                          ++
     |H|+++                                                                |
     |+|                                       +H+-------                  |
  15 ++H+                                   ---+++      +H+------         ++
     |N++H+--                         +++---                    +H+------++|
  10 ++ +++  -       +++           ---+H+                       +++      +H+
     | |     +H+-----+H+------+H+--                                        |
   5 ++|                      +++                                         ++
     ++N+N+--+N++          +         +          +          +          +    |
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 5000000 ops/thread, [0,128] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  140 +cmpxchg +-H--+                          +++      +++               ++
      | master +-N--+                           E--------E------+E+------++|
  120 ++                                      --|        |      +++       E+
      |                                     -- +++      +++              ++|
  100 ++                                   -                              ++
      |                                +++-                     +++      ++|
   80 ++                              -+E+    -+H+------+H+------H--------++
      |                           ----    ----                  +++       H|
      |            ---+E+-----+E+-  ---+H+                               ++|
   60 ++     +E+---   +++  ---+H+---                                      ++
      |    --+++   ---+H+--                                                |
   40 ++ +E+-+H+---                                                       ++
      |  +H+                                                               |
   20 +EE+                                                                ++
      +N+        +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

              atomic_add-bench: 5000000 ops/thread, [0,1024] range

  350 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  300 +cmpxchg +-H--+                                                    +++
      | master +-N--+                                           +++       ||
      |                                                 +++      |    ----E|
  250 ++                                                 |   ----E----    ++
      |                                              ----E---    |    ---+H|
  200 ++                                      -+E+---   +++  ---+H+---    ++
      |                                   ----         -+H+--              |
      |                                +E+     +++ ---- +++                |
  150 ++                            ---+++  ---+H+-                       ++
      |                          ---  -+H+--                               |
  100 ++                   ---+E+ ---- +++                                ++
      |      +++   ---+E+-----+H+-                                         |
      |     -+E+------+H+--                                                |
   50 ++ +E+                                                              ++
      +EE+       +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

  hi-res: http://imgur.com/a/fMRmq

For master I stopped measuring master after 8 threads, because there is little
point in measuring the well-known performance collapse of a contended lock.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

37b995f6

target-i386: emulate XCHG using atomic helper · ea97ebe8

由 Emilio G. Cota 提交于 6月 27, 2016

Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-19-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

ea97ebe8

target-i386: emulate LOCK'ed BTX ops using atomic helpers · cfe819d3

由 Emilio G. Cota 提交于 6月 27, 2016

[rth: Avoid redundant qemu_ld in locked case.  Fix previously unnoticed
incorrect zero-extension of address in register-offset case.]
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-18-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

cfe819d3

target-i386: emulate LOCK'ed XADD using atomic helper · f53b0181

由 Emilio G. Cota 提交于 6月 27, 2016

[rth: Move load of reg value to common location.]
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-17-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

f53b0181

target-i386: emulate LOCK'ed NEG using cmpxchg helper · 8eb8c738

由 Emilio G. Cota 提交于 6月 27, 2016

[rth: Move redundant qemu_load out of cmpxchg loop.]
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-16-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

8eb8c738

target-i386: emulate LOCK'ed NOT using atomic helper · 2a5fe8ae

由 Emilio G. Cota 提交于 6月 27, 2016

[rth: Avoid qemu_load that's redundant with the atomic op.]
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-15-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

2a5fe8ae

target-i386: emulate LOCK'ed INC using atomic helper · 60e57346

由 Emilio G. Cota 提交于 6月 27, 2016

[rth: Merge gen_inc_locked back into gen_inc to share cc update.]
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-14-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

60e57346

target-i386: emulate LOCK'ed OP instructions using atomic helpers · a7cee522

由 Emilio G. Cota 提交于 6月 27, 2016

[rth: Eliminate some unnecessary temporaries.]
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-13-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

a7cee522

target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers · ae03f8de

由 Emilio G. Cota 提交于 6月 27, 2016

The diff here is uglier than necessary. All this does is to turn

FOO

into:

if (s->prefix & PREFIX_LOCK) {
  BAR
} else {
  FOO
}

where FOO is the original implementation of an unlocked cmpxchg.

[rth: Adjust unlocked cmpxchg to use movcond instead of branches.
Adjust helpers to use atomic helpers.]
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-6-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

ae03f8de

tcg: Emit barriers with parallel_cpus · 91682118

由 Richard Henderson 提交于 9月 16, 2016

Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

91682118

tcg: Add CONFIG_ATOMIC64 · df79b996

由 Richard Henderson 提交于 9月 02, 2016

Allow qemu to build on 32-bit hosts without 64-bit atomic ops.

Even if we only allow 32-bit hosts to multi-thread emulate 32-bit
guests, we still need some way to handle the 32-bit guest using a
64-bit atomic operation.  Do so by dropping back to single-step.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

df79b996

tcg: Add atomic128 helpers · 7ebee43e

由 Richard Henderson 提交于 6月 29, 2016

Force the use of cmpxchg16b on x86_64.

Wikipedia suggests that only very old AMD64 (circa 2004) did not have
this instruction.  Further, it's required by Windows 8 so no new cpus
will ever omit it.

If we truely care about these, then we could check this at startup time
and then avoid executing paths that use it.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

7ebee43e

tcg: Add atomic helpers · c482cb11

由 Richard Henderson 提交于 6月 28, 2016

Add all of cmpxchg, op_fetch, fetch_op, and xchg.
Handle both endian-ness, and sizes up to 8.
Handle expanding non-atomically, when emulating in serial.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

c482cb11

cputlb: Tidy some macros · c86c6e4c

由 Richard Henderson 提交于 7月 08, 2016

TGT_LE and TGT_BE are not size dependent and do not need to be
redefined.  The others are no longer used at all.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

c86c6e4c

cputlb: Move most of iotlb code out of line · 82a45b96

由 Richard Henderson 提交于 7月 08, 2016

Saves 2k code size off of a cold path.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

82a45b96

cputlb: Remove includes from softmmu_template.h · 40978428

由 Richard Henderson 提交于 7月 08, 2016

We already include exec/address-spaces.h and exec/memory.h in
cputlb.c; the include of qemu/timer.h appears to be a fossil.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

40978428

cputlb: Move probe_write out of softmmu_template.h · 3b08f0a9

由 Richard Henderson 提交于 7月 08, 2016

Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

3b08f0a9