1. 28 10月, 2016 10 次提交
    • M
      spapr_pci: advertise explicit numa IDs even when there's 1 node · 4bcfa56c
      Michael Roth 提交于
      With the addition of "numa_node" properties for PHBs we began
      advertising NUMA affinity in cases where nb_numa_nodes > 1.
      
      Since the default on the guest side is to make no assumptions about
      PHB NUMA affinity (defaulting to -1), there is still a valid use-case
      for explicitly defining a PHB's NUMA affinity even when there's just
      one node. In particular, some workloads make faulty assumptions about
      /sys/bus/pci/<devid>/numa_node being >= 0, warranting the use of
      this property as a workaround even if there's just 1 PHB or NUMA
      node.
      
      Enable this use-case by always advertising the PHB's NUMA affinity
      if "numa_node" has been explicitly set.
      
      We could achieve this by relaxing the check to simply be
      nb_numa_nodes > 0, but even safer would be to check
      numa_info[nodeid].present explicitly, and to fail at start time
      for cases where it does not exist.
      
      This has an additional affect of no longer advertising PHB NUMA
      affinity unconditionally if nb_numa_nodes > 1 and "numa_node"
      property is unset/-1, but since the default value on the guest
      side for each PHB is also -1, the behavior should be the same for
      that situation. We could still retain the old behavior if desired,
      but the decision seems arbitrary, so we take the simpler route.
      
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Shivaprasad G. Bhat <shivapbh@in.ibm.com>
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      4bcfa56c
    • L
      tests: enable virtio tests on SPAPR · 30ca440e
      Laurent Vivier 提交于
      but disable MSI-X tests on SPAPR as we can't check the result
      (the memory region used on PC is not readable on SPAPR).
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      30ca440e
    • L
      tests: use qtest_pc_boot()/qtest_shutdown() in virtio tests · a980f7f2
      Laurent Vivier 提交于
      This patch replaces calls to qtest_start() and qtest_end() by
      calls to qtest_pc_boot() and qtest_shutdown().
      
      This allows to initialize memory allocator and PCI interface
      functions. This will ease to enable virtio tests on other
      architectures by only adding a specific qtest_XXX_boot() (like
      qtest_spapr_boot()).
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      a980f7f2
    • L
      tests: rename target_big_endian() as qvirtio_is_big_endian() · 8b4b80c3
      Laurent Vivier 提交于
      Move the definition to libqos/virtio.h as it must be used
      only with virtio functions.
      
      Add a QVirtioDevice parameter as it will be needed to
      know if the virtio device is using virtio 1.0 specification
      and thus is always little-endian (to do)
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      8b4b80c3
    • L
      tests: move QVirtioBus pointer into QVirtioDevice · 6b9cdf4c
      Laurent Vivier 提交于
      This allows to not have to pass bus and device for every virtio functions.
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      [dwg: Fix style nit]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      6b9cdf4c
    • L
      tests: don't check if qtest_spapr_boot() returns NULL · 458f3b2c
      Laurent Vivier 提交于
      qtest_spapr_boot()/qtest_pc_boot()/qtest_boot() call qtest_vboot()
      and qtest_vboot() calls g_malloc(),
      and g_malloc() never fails:
      if memory allocation fails, the application is terminated.
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      458f3b2c
    • L
      tests: fix memory leak in virtio-scsi-test · f62e0bbb
      Laurent Vivier 提交于
      vs is allocated in qvirtio_scsi_pci_init() and never freed.
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      f62e0bbb
    • B
      ppc/xics: Add xics to the monitor "info pic" command · b1fc72f0
      Benjamin Herrenschmidt 提交于
      Useful to debug interrupt problems.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [clg: - updated for qemu-2.7
            - added a test on ->irqs as it is not necessarily allocated
              (PHB3_MSI)
            - removed static variable g_xics and replace with a loop on all
              children to find the xics objects.
            - rebased on InterruptStatsProvider interface ]
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      b1fc72f0
    • A
      pseries: Update SLOF firmware image to 20161019 · f77d4ff8
      Alexey Kardashevskiy 提交于
      The main changes are:
      * virtio-serial
      * booting speed imrovement
      * better PCI bridge support
      
      The complete changelog is:
        > virtio-serial: Fix compile error
        > scsi: Remove debug functions from scsi-loader.fs
        > scsi: Remove unused read-6 command
        > obp-tftp: Remove the ciregs-buffer
        > libnet: Simplify the net-load arguments passing
        > libnet: Simplify the Forth-to-C wrapper of ping()
        > Do not link libnet to net-snk anymore, and remove net-snk from board-qemu
        > Add a Forth-to-C wrapper for the ping command, too
        > Link libnet code to Paflof and add a wrapper for netboot()
        > Remember execution tokens of "write" and "read" for socket operations
        > Add virtio-serial device support
        > Generalize output banner write routine
        > Improve indentation in OF.fs
        > scsi: implement READ (16) command
        > rtas: Improve rtas-do-config-@ and rtas-do-config-! a little bit
        > libnet: Make netapps.h includable from .code files
        > libnet: Remove unused prototypes from netapps.h
        > libnet: Fix the printout of the ping command
        > libnet: Make sure to close sockets when we're done
        > scsi: implement read-capacity-16
        > pci: Fix secondary and subordinate PCI bus enumeration with board-qemu
        > pci-phb: Fix stack underflow in phb-pci-walk-bridge
        > paflof: Add a read() function to read keyboard input
        > paflof: Add socket(), send() and recv() functions to paflof
        > paflof: Provide get_timer() and set_timer() helper functions
        > paflof: Add a write_mm_log helper function
        > paflof: Copy sbrk code from net-snk
        > paflof: Use CFLAGS from make.rules instead of completely redefining them
        > Do not include the FCode evaluator by default anymore
        > Source code beautification of board-qemu/slof/pci-interrupts.fs
        > Allow PCI devices in PCI bridge slots greater than 4
        > Fix bad interrupt pin numbering in interrupt-map property of PCI bridges
        > Improve SLOF_alloc_mem_aligned()
        > instance: Fix set-my-args for empty arguments
        > Fix remaining compiler warnings in sloffs.c
        > Remove misleading padding fields from ROM header definition
        > Improve indentation in calculatecrc.h
        > Do not include calculatecrc.h from assembler files
        > Remove unused defines in calculatecrc.h
        > libnet: Re-initialize global variables at the beginning of tftp()
        > Remove dependency on cpu/@0 for booting
        > usb: Set XHCI slot speed according to port status
        > usb: Build correct route string for USB3 devices behind a hub
        > usb: Initialize USB3 devices on a hub and keep track of hub topology
        > usb: Increase amount of maximum slot IDs and add a sanity check
        > usb: Move XHCI port state arrays from header to .c file
        > tools: add copy functionality
        > tools: added support to sloffs to read from /dev/slof_flash
        > tools: added file append functionality
        > tools: use crc checking code from romfs/tools
        > tools: added initial version of sloffs
        > romfs: factored out crc code, to make it usable from other locations
        > tools: remove unused parts from the Makefile
        > usb-hid: Fix non-working comma key
        > fat-files: Fix access to FAT32 dir/files when cluster > 16-bits
        > virtio-net: fix ring handling in receive
        > net: Remove remainders of the MTFTP code
        > net: Move also files from clients/net-snk/app/netapps/ to lib/libnet/
        > net: Move files from clients/net-snk/app/netlib/ to lib/libnet/
        > net-snk: Get rid of netlib and netapps prefixes in include statements
        > usb-xhci: assign field4 before conditional
        > Improve F12 key handling in boot menu
        > Fix stack underflow that occurs with duplicated ESC in input
        > rtas-nvram: optimize erase
        > ipv6: Replace magic number 1500 with ETH_MTU_SIZE (i.e. 1518)
        > ipv6: Fix NULL pointer dereference in ip6addr_add()
        > ipv6: Fix memory leak in set_ipv6_address() / ip6_create_ll_address()
        > ipv6: Clear memory after malloc if necessary
        > ipv6: Fix possible NULL-pointer dereference in send_ipv6()
        > ping: use gateway address for routing
        > ping: add netmask in the ping argument
        > xhci: fix missing keys from keyboard
        > xhci: add memory barrier after filling the trb
        > loaders: Remove netflash command
        > boot: Remove legacy Forth words for network loading
        > base: Move cnt-bits and bcd-to-bin to board-js2x folder
        > base: Move huge-tftp-load variable to obp-tftp package
        > base: Remove unused IP address conversion functions
        > virtio: White space cleanup in virtio-9p.c
        > virtio: Add modern version 1.0 support to 9p driver
        > virtio: Set a proper name for virtio-9p device tree nodes
        > pci: Fix mistype in "unkown-bridge"
        > ipv6: Indent code with tabs, not with spaces
        > ipv6: send_ipv6() has to return after doing NDP
        > ipv6: Do not use unitialized MAC address array
        > ipv6: Add support for sending packets through a router
        > Remove unused sms code.
        > virtio-net: initialize to populate mac address
        > libbootmsg: Do not use '\b' characters when printing checkpoints
        > dev-null: The "read" function has to return 0 if nothing has been read
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      f77d4ff8
    • P
      Merge remote-tracking branch 'remotes/kraxel/tags/pull-audio-20161027-1' into staging · 835f3d24
      Peter Maydell 提交于
      audio: intel-hda: check stream entry count during transfer
      
      # gpg: Signature made Thu 27 Oct 2016 15:30:51 BST
      # gpg:                using RSA key 0x4CB6D8EED3E87138
      # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
      # gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
      # gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
      # Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138
      
      * remotes/kraxel/tags/pull-audio-20161027-1:
        audio: intel-hda: check stream entry count during transfer
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      835f3d24
  2. 27 10月, 2016 3 次提交
    • P
      Merge remote-tracking branch 'remotes/rth/tags/pull-atomic-20161026' into staging · 5929d7e8
      Peter Maydell 提交于
      cmpxchg emulation of atomics, v8
      
      # gpg: Signature made Wed 26 Oct 2016 16:30:03 BST
      # gpg:                using RSA key 0xAD1270CC4DD0279B
      # gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
      # gpg:                 aka "Richard Henderson <rth@redhat.com>"
      # gpg:                 aka "Richard Henderson <rth@twiddle.net>"
      # Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B
      
      * remotes/rth/tags/pull-atomic-20161026: (37 commits)
        target-alpha: Emulate LL/SC using cmpxchg helpers
        target-alpha: Introduce MMU_PHYS_IDX
        target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}
        linux-user: remove handling of aarch64's EXCP_STREX
        linux-user: remove handling of ARM's EXCP_STREX
        target-arm: emulate aarch64's LL/SC using cmpxchg helpers
        target-arm: emulate SWP with atomic_xchg helper
        target-arm: emulate LL/SC using cmpxchg helpers
        target-arm: Rearrange aa32 load and store functions
        tests: add atomic_add-bench
        target-i386: remove helper_lock()
        target-i386: emulate XCHG using atomic helper
        target-i386: emulate LOCK'ed BTX ops using atomic helpers
        target-i386: emulate LOCK'ed XADD using atomic helper
        target-i386: emulate LOCK'ed NEG using cmpxchg helper
        target-i386: emulate LOCK'ed NOT using atomic helper
        target-i386: emulate LOCK'ed INC using atomic helper
        target-i386: emulate LOCK'ed OP instructions using atomic helpers
        target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
        tcg: Emit barriers with parallel_cpus
        ...
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      5929d7e8
    • P
      Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging · 8f9d84df
      Peter Maydell 提交于
      # gpg: Signature made Wed 26 Oct 2016 03:19:06 BST
      # gpg:                using RSA key 0xEF04965B398D6211
      # gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
      # gpg: WARNING: This key is not certified with sufficiently trusted signatures!
      # gpg:          It is not certain that the signature belongs to the owner.
      # Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211
      
      * remotes/jasowang/tags/net-pull-request:
        colo-proxy: fix memory leak
        net: rtl8139: limit processing of ring descriptors
        net: vmxnet: initialise local tx descriptor
        e1000e: Don't zero out buffer address in rx descriptor
        net: rocker: set limit to DMA buffer size
        net: eepro100: fix memory leak in device uninit
        tap-bsd: OpenBSD uses tap(4) now
        net: pcnet: fix source formatting and indentation
        net: pcnet: check rx/tx descriptor ring length
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      8f9d84df
    • P
      Merge remote-tracking branch 'remotes/vivier/tags/m68k-part1-pull-request' into staging · 991a97ac
      Peter Maydell 提交于
      # gpg: Signature made Tue 25 Oct 2016 19:58:46 BST
      # gpg:                using RSA key 0xF30C38BD3F2FBE3C
      # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>"
      # gpg:                 aka "Laurent Vivier <laurent@vivier.eu>"
      # gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>"
      # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C
      
      * remotes/vivier/tags/m68k-part1-pull-request: (23 commits)
        target-m68k: Optimize gen_flush_flags
        target-m68k: Optimize some comparisons
        target-m68k: Use setcond for scc
        target-m68k: Introduce DisasCompare
        target-m68k: Reorg flags handling
        target-m68k: Remove incorrect clearing of cc_x
        target-m68k: Some fixes to SR and flags management
        target-m68k: Print flags properly
        target-m68k: update CPU flags management
        target-m68k: don't update cc_dest in helpers
        target-m68k: update move to/from ccr/sr
        target-m68k: remove m68k_cpu_exec_enter() and m68k_cpu_exec_exit()
        target-m68k: Replace helper_xflag_lt with setcond
        target-m68k: allow to update flags with operation on words and bytes
        target-m68k: REG() macro cleanup
        target-m68k: set PAGE_BITS to 12 for m68k
        target-m68k: define operand sizes
        target-m68k: set disassembler mode to 680x0 or coldfire
        target-m68k: introduce read_imXX() functions
        target-m68k: manage scaled index
        ...
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      991a97ac
  3. 26 10月, 2016 27 次提交
    • R
      target-alpha: Emulate LL/SC using cmpxchg helpers · ed283916
      Richard Henderson 提交于
      Emulating LL/SC with cmpxchg is not correct, since it can
      suffer from the ABA problem.  However, portable parallel
      code is written assuming only cmpxchg which means that in
      practice this is a viable alternative.
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      ed283916
    • R
      target-alpha: Introduce MMU_PHYS_IDX · 6a73ecf5
      Richard Henderson 提交于
      Rather than using helpers for physical accesses, use a mmu index.
      The primary cleanup is with store-conditional on physical addresses.
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      6a73ecf5
    • E
      target-arm: remove EXCP_STREX + cpu_exclusive_{test, info} · 05188cc7
      Emilio G. Cota 提交于
      The exception is not emitted anymore; remove it and the associated
      TCG variables.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      Message-Id: <1467054136-10430-31-git-send-email-cota@braap.org>
      05188cc7
    • E
      linux-user: remove handling of aarch64's EXCP_STREX · f4e6eb7f
      Emilio G. Cota 提交于
      The exception is not emitted anymore.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      Message-Id: <1467054136-10430-30-git-send-email-cota@braap.org>
      f4e6eb7f
    • E
      linux-user: remove handling of ARM's EXCP_STREX · b50b82fc
      Emilio G. Cota 提交于
      The exception is not emitted anymore.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twidle.net>
      Message-Id: <1467054136-10430-29-git-send-email-cota@braap.org>
      b50b82fc
    • E
      target-arm: emulate aarch64's LL/SC using cmpxchg helpers · 1dd089d0
      Emilio G. Cota 提交于
      Emulating LL/SC with cmpxchg is not correct, since it can
      suffer from the ABA problem. Portable parallel code, however,
      is written assuming only cmpxchg--and not LL/SC--is available.
      This means that in practice emulating LL/SC with cmpxchg is
      a viable alternative.
      
      The appended emulates LL/SC pairs in aarch64 with cmpxchg helpers.
      This works in both user and system mode. In usermode, it avoids
      pausing all other CPUs to perform the LL/SC pair. The subsequent
      performance and scalability improvement is significant, as the
      plots below show. They plot the throughput of atomic_add-bench
      compiled for ARM and executed on a 64-core x86 machine.
      
      Hi-res plots: http://imgur.com/a/JVc8Y
      
                      atomic_add-bench: 1000000 ops/thread, [0,1] range
      
        18 ++---------+----------+---------+----------+----------+----------+---++
           +cmpxchg +-E--+       +         +          +          +          +    |
        16 ++master +-H--+                                                      ++
           ||                                                                    |
        14 ++                                                                   ++
           | |                                                                   |
        12 ++|                                                                  ++
           | |                                                                   |
        10 ++++                                                                 ++
         8 ++E                                                                  ++
           |+++                                                                  |
         6 ++ |                                                                 ++
           |  |                                                                  |
         4 ++ |                                                                 ++
           |   |                                                                 |
         2 +H++E+---                                                            ++
           + |     +E++----+E+---+--+E+----++E+------+E+------+E++----+E+---+--+E|
         0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
           0          10         20        30         40         50         60
                                      Number of threads
      
                      atomic_add-bench: 1000000 ops/thread, [0,2] range
      
        18 ++---------+----------+---------+----------+----------+----------+---++
           +cmpxchg +-E--+       +         +          +          +          +    |
        16 ++master +-H--+                                                      ++
           | |                                                                   |
        14 ++E                                                                  ++
           | |                                                                   |
        12 ++|                                                                  ++
           |+++                                                                  |
        10 ++ |                                                                 ++
         8 ++ |                                                                 ++
           |  |                                                                  |
         6 ++ |                                                                 ++
           |   |                                                                 |
         4 ++  |                                                                ++
           |  +E+---                                                             |
         2 +H+     +E+-----+++              +++      +++   ---+E+-----+E+------+++
           +++        +    +E+---+--+E+----++E+------+E+---   ++++    +++   +  +E|
         0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
           0          10         20        30         40         50         60
                                      Number of threads
      
                     atomic_add-bench: 1000000 ops/thread, [0,128] range
      
        70 ++---------+----------+---------+----------+----------+----------+---++
           +cmpxchg +-E--+       +         +          +          +          +    |
        60 ++master +-H--+                  +++            ---+E+-----+E+------+E+
           |                        +E+------E-------+E+---                      |
           |                     ---        +++                                  |
        50 ++              +++---                                               ++
           |              -+E+                                                   |
        40 ++      +++----                                                      ++
           |        E-                                                           |
           |      --|                                                            |
        30 ++   -- +++                                                          ++
           |  +E+                                                                |
        20 ++E+                                                                 ++
           |E+                                                                   |
           |                                                                     |
        10 ++                                                                   ++
           +          +          +         +          +          +          +    |
         0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
           0          10         20        30         40         50         60
                                      Number of threads
      
                    atomic_add-bench: 1000000 ops/thread, [0,1024] range
      
        160 ++---------+---------+----------+---------+----------+----------+---++
            +cmpxchg +-E--+      +          +         +          +          +    |
        140 ++master +-H--+                                           +++      +++
            |                                                -+E+-----+E+-------E|
        120 ++                                       +++ ----                  +++
            |                                +++  ----E--                        |
        100 ++                              --E---   +++                        ++
            |                       +++ ---- +++                                 |
         80 ++                     --E--                                        ++
            |                  ---- +++                                          |
            |              -+E+                                                  |
         60 ++         ---- +++                                                 ++
            |      +E+-                                                          |
         40 ++   --                                                             ++
            |  +E+                                                               |
         20 +EE+                                                                ++
            +++        +         +          +         +          +          +    |
          0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
            0          10        20         30        40         50         60
                                      Number of threads
      
      [rth: Rearrange 128-bit cmpxchg helper.  Enforce alignment on LL.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-28-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      1dd089d0
    • E
      target-arm: emulate SWP with atomic_xchg helper · cf12bce0
      Emilio G. Cota 提交于
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-25-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      cf12bce0
    • E
      target-arm: emulate LL/SC using cmpxchg helpers · 354161b3
      Emilio G. Cota 提交于
      Emulating LL/SC with cmpxchg is not correct, since it can
      suffer from the ABA problem. Portable parallel code, however,
      is written assuming only cmpxchg--and not LL/SC--is available.
      This means that in practice emulating LL/SC with cmpxchg is
      a viable alternative.
      
      The appended emulates LL/SC pairs in ARM with cmpxchg helpers.
      This works in both user and system mode. In usermode, it avoids
      pausing all other CPUs to perform the LL/SC pair. The subsequent
      performance and scalability improvement is significant, as the
      plots below show. They plot the throughput of atomic_add-bench
      compiled for ARM and executed on a 64-core x86 machine.
      
      Hi-res plots: http://imgur.com/a/aNQpB
      
                     atomic_add-bench: 1000000 ops/thread, [0,1] range
      
        9 ++---------+----------+----------+----------+----------+----------+---++
          +cmpxchg +-E--+       +          +          +          +          +    |
        8 +Emaster +-H--+                                                       ++
          | |                                                                    |
        7 ++E                                                                   ++
          | |                                                                    |
        6 ++++                                                                  ++
          |  |                                                                   |
        5 ++ |                                                                  ++
        4 ++ |                                                                  ++
          |  |                                                                   |
        3 ++ |                                                                  ++
          |   |                                                                  |
        2 ++  |                                                                 ++
          |H++E+---                                  +++  ---+E+------+E+------+E|
        1 +++     +E+-----+E+------+E+------+E+------+E+--   +++      +++       ++
          ++H+       +    +++   +  +++     ++++       +          +          +    |
        0 ++--H----H-+-----H----+----------+----------+----------+----------+---++
          0          10         20         30         40         50         60
                                     Number of threads
      
                      atomic_add-bench: 1000000 ops/thread, [0,2] range
      
        16 ++---------+----------+---------+----------+----------+----------+---++
           +cmpxchg +-E--+       +         +          +          +          +    |
        14 ++master +-H--+                                                      ++
           | |                                                                   |
        12 ++|                                                                  ++
           | E                                                                   |
        10 ++|                                                                  ++
           | |                                                                   |
         8 ++++                                                                 ++
           |E+|                                                                  |
           |  |                                                                  |
         6 ++ |                                                                 ++
           |   |                                                                 |
         4 ++  |                                                                ++
           |  +E+---       +++      +++              +++           ---+E+------+E|
         2 +H+     +E+------E-------+E+-----+E+------+E+------+E+--            +++
           + |        +    +++   +         ++++       +          +          +    |
         0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
           0          10         20        30         40         50         60
                                      Number of threads
      
                     atomic_add-bench: 1000000 ops/thread, [0,128] range
      
        70 ++---------+----------+---------+----------+----------+----------+---++
           +cmpxchg +-E--+       +         +          +       ++++          +    |
        60 ++master +-H--+                                 ----E------+E+-------++
           |                                        -+E+---   +++     +++      +E|
           |                                +++ ---- +++                       ++|
        50 ++                       +++  ---+E+-                                ++
           |                        -E---                                        |
        40 ++                    ---+++                                         ++
           |               +++---                                                |
           |              -+E+                                                   |
        30 ++      +++----                                                      ++
           |       +E+                                                           |
        20 ++ +++--                                                             ++
           |  +E+                                                                |
           |+E+                                                                  |
        10 +E+                                                                  ++
           +          +          +         +          +          +          +    |
         0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
           0          10         20        30         40         50         60
                                      Number of threads
      
                    atomic_add-bench: 1000000 ops/thread, [0,1024] range
      
        120 ++---------+---------+----------+---------+----------+----------+---++
            +cmpxchg +-E--+      +          +         +          +          +    |
            | master +-H--+                                                    ++|
        100 ++                                                              ----E+
            |                                                 +++  ---+E+---   ++|
            |                                                --E---   +++        |
         80 ++                                           ---- +++               ++
            |                                     ---+E+-                        |
         60 ++                              -+E+--                              ++
            |                       +++ ---- +++                                 |
            |                      -+E+-                                         |
         40 ++              +++----                                             ++
            |      +++   ---+E+                                                  |
            |     -+E+---                                                        |
         20 ++ +E+                                                              ++
            |+E+++                                                               |
            +E+        +         +          +         +          +          +    |
          0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
            0          10        20         30        40         50         60
                                      Number of threads
      
      [rth: Enforce alignment for ldrexd.]
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-23-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      354161b3
    • R
      target-arm: Rearrange aa32 load and store functions · 7f5616f5
      Richard Henderson 提交于
      Stop specializing on TARGET_LONG_BITS == 32; unconditionally allocate
      a temp and expand with tcg_gen_extu_i32_tl.  Split out gen_aa32_addr,
      gen_aa32_frob64, gen_aa32_ld_i32 and gen_aa32_st_i32 as separate interfaces.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      7f5616f5
    • E
      tests: add atomic_add-bench · 070e3edc
      Emilio G. Cota 提交于
      With this microbenchmark we can measure the overhead of emulating atomic
      instructions with a configurable degree of contention.
      
      The benchmark spawns $n threads, each performing $o atomic ops (additions)
      in a loop. Each atomic operation is performed on a different cache line
      (assuming lines are 64b long) that is randomly selected from a range [0, $r).
      
      [ Note: each $foo corresponds to a -foo flag ]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      Message-Id: <1467054136-10430-20-git-send-email-cota@braap.org>
      070e3edc
    • E
      target-i386: remove helper_lock() · 37b995f6
      Emilio G. Cota 提交于
      It's been superseded by the atomic helpers.
      
      The use of the atomic helpers provides a significant performance and scalability
      improvement. Below is the result of running the atomic_add-test microbenchmark with:
       $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
      , where $n is the number of threads and $r is the allowed range for the additions.
      
      The scenarios measured are:
      - atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
      - cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
      - master: before this patchset
      
      Results sorted in ascending range, i.e. descending degree of contention.
      Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
      Opteron 6376 cores.
      
                      atomic_add-bench: 5000000 ops/thread, [0,1] range
      
        25 ++---------+----------+---------+----------+----------+----------+---++
           + atomic +-E--+       +         +          +          +          +    |
           |cmpxchg +-H--+                                                       |
        20 +Emaster +-N--+                                                      ++
           ||                                                                    |
           |++                                                                   |
           ||                                                                    |
        15 +++                                                                  ++
           |N|                                                                   |
           |+|                                                                   |
        10 ++|                                                                  ++
           |+|+                                                                  |
           | |    -+E+------        +++  ---+E+------+E+------+E+-----+E+------+E|
           |+E+E+- +++     +E+------+E+--                                        |
         5 ++|+                                                                 ++
           |+N+H+---                                 +++                         |
           ++++N+--+H++----+++   +  +++  --++H+------+H+------+H++----+H+---+--- |
         0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
           0          10         20        30         40         50         60
                                      Number of threads
      
                      atomic_add-bench: 5000000 ops/thread, [0,2] range
      
        25 ++---------+----------+---------+----------+----------+----------+---++
           ++atomic +-E--+       +         +          +          +          +    |
           |cmpxchg +-H--+                                                       |
        20 ++master +-N--+                                                      ++
           |E|                                                                   |
           |++                                                                   |
           ||E                                                                   |
        15 ++|                                                                  ++
           |N||                                                                  |
           |+||                                   ---+E+------+E+-----+E+------+E|
        10 ++| |        ---+E+------+E+-----+E+---                    +++      +++
           ||H+E+--+E+--                                                         |
           |+++++                                                                |
           | ||                                                                  |
         5 ++|+H+--                                  +++                        ++
           |+N+    -                              ---+H+------+H+------          |
           +  +N+--+H++----+H+---+--+H+----++H+---    +          +    +H+---+--+H|
         0 ++---------+----------+---------+----------+----------+----------+---++
           0          10         20        30         40         50         60
                                      Number of threads
      
                      atomic_add-bench: 5000000 ops/thread, [0,8] range
      
        40 ++---------+----------+---------+----------+----------+----------+---++
           ++atomic +-E--+       +         +          +          +          +    |
        35 +cmpxchg +-H--+                                                      ++
           | master +-N--+               ---+E+------+E+------+E+-----+E+------+E|
        30 ++|                   ---+E+--   +++                                 ++
           | |            -+E+---                                                |
        25 ++E        ---- +++                                                  ++
           |+++++ -+E+                                                           |
        20 +E+ E-- +++                                                          ++
           |H|+++                                                                |
           |+|                                       +H+-------                  |
        15 ++H+                                   ---+++      +H+------         ++
           |N++H+--                         +++---                    +H+------++|
        10 ++ +++  -       +++           ---+H+                       +++      +H+
           | |     +H+-----+H+------+H+--                                        |
         5 ++|                      +++                                         ++
           ++N+N+--+N++          +         +          +          +          +    |
         0 ++---------+----------+---------+----------+----------+----------+---++
           0          10         20        30         40         50         60
                                      Number of threads
      
                     atomic_add-bench: 5000000 ops/thread, [0,128] range
      
        160 ++---------+---------+----------+---------+----------+----------+---++
            + atomic +-E--+      +          +         +          +          +    |
        140 +cmpxchg +-H--+                          +++      +++               ++
            | master +-N--+                           E--------E------+E+------++|
        120 ++                                      --|        |      +++       E+
            |                                     -- +++      +++              ++|
        100 ++                                   -                              ++
            |                                +++-                     +++      ++|
         80 ++                              -+E+    -+H+------+H+------H--------++
            |                           ----    ----                  +++       H|
            |            ---+E+-----+E+-  ---+H+                               ++|
         60 ++     +E+---   +++  ---+H+---                                      ++
            |    --+++   ---+H+--                                                |
         40 ++ +E+-+H+---                                                       ++
            |  +H+                                                               |
         20 +EE+                                                                ++
            +N+        +         +          +         +          +          +    |
          0 ++N-N---N--+---------+----------+---------+----------+----------+---++
            0          10        20         30        40         50         60
                                      Number of threads
      
                    atomic_add-bench: 5000000 ops/thread, [0,1024] range
      
        350 ++---------+---------+----------+---------+----------+----------+---++
            + atomic +-E--+      +          +         +          +          +    |
        300 +cmpxchg +-H--+                                                    +++
            | master +-N--+                                           +++       ||
            |                                                 +++      |    ----E|
        250 ++                                                 |   ----E----    ++
            |                                              ----E---    |    ---+H|
        200 ++                                      -+E+---   +++  ---+H+---    ++
            |                                   ----         -+H+--              |
            |                                +E+     +++ ---- +++                |
        150 ++                            ---+++  ---+H+-                       ++
            |                          ---  -+H+--                               |
        100 ++                   ---+E+ ---- +++                                ++
            |      +++   ---+E+-----+H+-                                         |
            |     -+E+------+H+--                                                |
         50 ++ +E+                                                              ++
            +EE+       +         +          +         +          +          +    |
          0 ++N-N---N--+---------+----------+---------+----------+----------+---++
            0          10        20         30        40         50         60
                                      Number of threads
      
        hi-res: http://imgur.com/a/fMRmq
      
      For master I stopped measuring master after 8 threads, because there is little
      point in measuring the well-known performance collapse of a contended lock.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      37b995f6
    • E
      target-i386: emulate XCHG using atomic helper · ea97ebe8
      Emilio G. Cota 提交于
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-19-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      ea97ebe8
    • E
      target-i386: emulate LOCK'ed BTX ops using atomic helpers · cfe819d3
      Emilio G. Cota 提交于
      [rth: Avoid redundant qemu_ld in locked case.  Fix previously unnoticed
      incorrect zero-extension of address in register-offset case.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-18-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      cfe819d3
    • E
      target-i386: emulate LOCK'ed XADD using atomic helper · f53b0181
      Emilio G. Cota 提交于
      [rth: Move load of reg value to common location.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-17-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      f53b0181
    • E
      target-i386: emulate LOCK'ed NEG using cmpxchg helper · 8eb8c738
      Emilio G. Cota 提交于
      [rth: Move redundant qemu_load out of cmpxchg loop.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-16-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      8eb8c738
    • E
      target-i386: emulate LOCK'ed NOT using atomic helper · 2a5fe8ae
      Emilio G. Cota 提交于
      [rth: Avoid qemu_load that's redundant with the atomic op.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-15-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      2a5fe8ae
    • E
      target-i386: emulate LOCK'ed INC using atomic helper · 60e57346
      Emilio G. Cota 提交于
      [rth: Merge gen_inc_locked back into gen_inc to share cc update.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-14-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      60e57346
    • E
      target-i386: emulate LOCK'ed OP instructions using atomic helpers · a7cee522
      Emilio G. Cota 提交于
      [rth: Eliminate some unnecessary temporaries.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-13-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      a7cee522
    • E
      target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers · ae03f8de
      Emilio G. Cota 提交于
      The diff here is uglier than necessary. All this does is to turn
      
      FOO
      
      into:
      
      if (s->prefix & PREFIX_LOCK) {
        BAR
      } else {
        FOO
      }
      
      where FOO is the original implementation of an unlocked cmpxchg.
      
      [rth: Adjust unlocked cmpxchg to use movcond instead of branches.
      Adjust helpers to use atomic helpers.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-6-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      ae03f8de
    • R
      91682118
    • R
      tcg: Add CONFIG_ATOMIC64 · df79b996
      Richard Henderson 提交于
      Allow qemu to build on 32-bit hosts without 64-bit atomic ops.
      
      Even if we only allow 32-bit hosts to multi-thread emulate 32-bit
      guests, we still need some way to handle the 32-bit guest using a
      64-bit atomic operation.  Do so by dropping back to single-step.
      Reviewed-by: NEmilio G. Cota <cota@braap.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      df79b996
    • R
      tcg: Add atomic128 helpers · 7ebee43e
      Richard Henderson 提交于
      Force the use of cmpxchg16b on x86_64.
      
      Wikipedia suggests that only very old AMD64 (circa 2004) did not have
      this instruction.  Further, it's required by Windows 8 so no new cpus
      will ever omit it.
      
      If we truely care about these, then we could check this at startup time
      and then avoid executing paths that use it.
      Reviewed-by: NEmilio G. Cota <cota@braap.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      7ebee43e
    • R
      tcg: Add atomic helpers · c482cb11
      Richard Henderson 提交于
      Add all of cmpxchg, op_fetch, fetch_op, and xchg.
      Handle both endian-ness, and sizes up to 8.
      Handle expanding non-atomically, when emulating in serial.
      Reviewed-by: NEmilio G. Cota <cota@braap.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      c482cb11
    • R
      cputlb: Tidy some macros · c86c6e4c
      Richard Henderson 提交于
      TGT_LE and TGT_BE are not size dependent and do not need to be
      redefined.  The others are no longer used at all.
      Reviewed-by: NEmilio G. Cota <cota@braap.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      c86c6e4c
    • R
      cputlb: Move most of iotlb code out of line · 82a45b96
      Richard Henderson 提交于
      Saves 2k code size off of a cold path.
      Reviewed-by: NEmilio G. Cota <cota@braap.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      82a45b96
    • R
      cputlb: Remove includes from softmmu_template.h · 40978428
      Richard Henderson 提交于
      We already include exec/address-spaces.h and exec/memory.h in
      cputlb.c; the include of qemu/timer.h appears to be a fossil.
      Reviewed-by: NEmilio G. Cota <cota@braap.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      40978428
    • R