- 31 10月, 2016 11 次提交
-
-
由 Alex Bennée 提交于
In preparation for multi-threaded TCG we remove tcg_exec_all and move all the CPU cycling into the main thread function. When MTTCG is enabled we shall use a separate thread function which only handles one vCPU. Signed-off-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: NRichard Henderson <rth@twiddle.net> Message-Id: <20161027151030.20863-13-alex.bennee@linaro.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alex Bennée 提交于
This is a pure mechanical change in preparation for up-coming re-factoring. Instead of a forward declaration for tcg_exec_all it and the associated helper functions are moved in front of the call from qemu_tcg_cpu_thread_fn. Signed-off-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NRichard Henderson <rth@twiddle.net> Message-Id: <20161027151030.20863-12-alex.bennee@linaro.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alex Bennée 提交于
Testing with Alexander's bare metal syncronisation tests fails in MTTCG leaving one CPU spinning forever waiting for the second CPU to wake up. We simply need to kick the vCPU once we have processed the PSCI power on call. As the power control API is for system emulation only as is the qemu_kick_cpu function we also ensure we only build arm-powerctl for SoftMMU builds. Tested-by: NAlex Bennée <alex.bennee@linaro.org> CC: Alexander Spyridakis <a.spyridakis@virtualopensystems.com> Message-Id: <1439220437-23957-20-git-send-email-fred.konrad@greensocs.com> Signed-off-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20161027151030.20863-11-alex.bennee@linaro.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 KONRAD Frederic 提交于
This protects all translation related work with tb_lock() too ensure thread safety. This effectively serialises all code generation. In addition to the code generation we also take the lock for TB invalidation. This has a knock on effect of meaning tb_lock() is held for modification of the SoftMMU TLB by non-self threads which will be used in later patches. Signed-off-by: NKONRAD Frederic <fred.konrad@greensocs.com> Message-Id: <1439220437-23957-8-git-send-email-fred.konrad@greensocs.com> Signed-off-by: NEmilio G. Cota <cota@braap.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [AJB: moved into tree, clean-up history] Signed-off-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NRichard Henderson <rth@twiddle.net> Message-Id: <20161027151030.20863-10-alex.bennee@linaro.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alex Bennée 提交于
This adds calls to the assert_(memory|tb)_lock for all public APIs which are documented as needing them held for linux-user mode. The asserts are NOPs for system-mode although these will be converted when MTTCG is enabled. Signed-off-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NRichard Henderson <rth@twiddle.net> Message-Id: <20161027151030.20863-9-alex.bennee@linaro.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alex Bennée 提交于
Future patches will enforce the holding of mmap_lock() when we are manipulating internal memory structures. Technically it doesn't matter in the case of elfload as we haven't started executing yet. However it is easier to grab the lock when required than special case the translate-all API. Signed-off-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NRichard Henderson <rth@twiddle.net> Message-Id: <20161027151030.20863-8-alex.bennee@linaro.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
softmmu requires more functions to be thread-safe, because translation blocks can be invalidated from e.g. notdirty callbacks. Probably the same holds for user-mode emulation, it's just that no one has ever tried to produce a coherent locking there. This patch will guide the introduction of more tb_lock and tb_unlock calls for system emulation. Note that after this patch some (most) of the mentioned functions are still called outside tb_lock/tb_unlock. The next one will rectify this. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NRichard Henderson <rth@twiddle.net> Message-Id: <20161027151030.20863-7-alex.bennee@linaro.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alex Bennée 提交于
Even more important when debugging MTTCG is seeing which vCPU is currently executing. Signed-off-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NRichard Henderson <rth@twiddle.net> Message-Id: <20161027151030.20863-5-alex.bennee@linaro.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alex Bennée 提交于
This adds asserts to check the locking on the various translation engines structures. There are two sets of structures that are protected by locks. The first the l1map and PageDesc structures used to track which translation blocks are associated with which physical addresses. In user-mode this is covered by the mmap_lock. The second case are TB context related structures which are protected by tb_lock which is also user-mode only. Currently the asserts do nothing in SoftMMU mode but this will change for MTTCG. Signed-off-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NRichard Henderson <rth@twiddle.net> Message-Id: <20161027151030.20863-4-alex.bennee@linaro.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alex Bennée 提交于
Make the debug define consistent with the others. The flush operation is all about invalidating TranslationBlocks on flush events. Also fix up the commenting on the other DEBUG for the benefit of checkpatch. Signed-off-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NRichard Henderson <rth@twiddle.net> Message-Id: <20161027151030.20863-3-alex.bennee@linaro.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alex Bennée 提交于
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: NRichard Henderson <rth@twiddle.net> Message-Id: <20161027151030.20863-2-alex.bennee@linaro.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 26 10月, 2016 29 次提交
-
-
由 Richard Henderson 提交于
Emulating LL/SC with cmpxchg is not correct, since it can suffer from the ABA problem. However, portable parallel code is written assuming only cmpxchg which means that in practice this is a viable alternative. Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
Rather than using helpers for physical accesses, use a mmu index. The primary cleanup is with store-conditional on physical addresses. Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Emilio G. Cota 提交于
The exception is not emitted anymore; remove it and the associated TCG variables. Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NEmilio G. Cota <cota@braap.org> Signed-off-by: NRichard Henderson <rth@twiddle.net> Message-Id: <1467054136-10430-31-git-send-email-cota@braap.org>
-
由 Emilio G. Cota 提交于
The exception is not emitted anymore. Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NEmilio G. Cota <cota@braap.org> Signed-off-by: NRichard Henderson <rth@twiddle.net> Message-Id: <1467054136-10430-30-git-send-email-cota@braap.org>
-
由 Emilio G. Cota 提交于
The exception is not emitted anymore. Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NEmilio G. Cota <cota@braap.org> Signed-off-by: NRichard Henderson <rth@twidle.net> Message-Id: <1467054136-10430-29-git-send-email-cota@braap.org>
-
由 Emilio G. Cota 提交于
Emulating LL/SC with cmpxchg is not correct, since it can suffer from the ABA problem. Portable parallel code, however, is written assuming only cmpxchg--and not LL/SC--is available. This means that in practice emulating LL/SC with cmpxchg is a viable alternative. The appended emulates LL/SC pairs in aarch64 with cmpxchg helpers. This works in both user and system mode. In usermode, it avoids pausing all other CPUs to perform the LL/SC pair. The subsequent performance and scalability improvement is significant, as the plots below show. They plot the throughput of atomic_add-bench compiled for ARM and executed on a 64-core x86 machine. Hi-res plots: http://imgur.com/a/JVc8Y atomic_add-bench: 1000000 ops/thread, [0,1] range 18 ++---------+----------+---------+----------+----------+----------+---++ +cmpxchg +-E--+ + + + + + | 16 ++master +-H--+ ++ || | 14 ++ ++ | | | 12 ++| ++ | | | 10 ++++ ++ 8 ++E ++ |+++ | 6 ++ | ++ | | | 4 ++ | ++ | | | 2 +H++E+--- ++ + | +E++----+E+---+--+E+----++E+------+E+------+E++----+E+---+--+E| 0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 1000000 ops/thread, [0,2] range 18 ++---------+----------+---------+----------+----------+----------+---++ +cmpxchg +-E--+ + + + + + | 16 ++master +-H--+ ++ | | | 14 ++E ++ | | | 12 ++| ++ |+++ | 10 ++ | ++ 8 ++ | ++ | | | 6 ++ | ++ | | | 4 ++ | ++ | +E+--- | 2 +H+ +E+-----+++ +++ +++ ---+E+-----+E+------+++ +++ + +E+---+--+E+----++E+------+E+--- ++++ +++ + +E| 0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 1000000 ops/thread, [0,128] range 70 ++---------+----------+---------+----------+----------+----------+---++ +cmpxchg +-E--+ + + + + + | 60 ++master +-H--+ +++ ---+E+-----+E+------+E+ | +E+------E-------+E+--- | | --- +++ | 50 ++ +++--- ++ | -+E+ | 40 ++ +++---- ++ | E- | | --| | 30 ++ -- +++ ++ | +E+ | 20 ++E+ ++ |E+ | | | 10 ++ ++ + + + + + + + | 0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 1000000 ops/thread, [0,1024] range 160 ++---------+---------+----------+---------+----------+----------+---++ +cmpxchg +-E--+ + + + + + | 140 ++master +-H--+ +++ +++ | -+E+-----+E+-------E| 120 ++ +++ ---- +++ | +++ ----E-- | 100 ++ --E--- +++ ++ | +++ ---- +++ | 80 ++ --E-- ++ | ---- +++ | | -+E+ | 60 ++ ---- +++ ++ | +E+- | 40 ++ -- ++ | +E+ | 20 +EE+ ++ +++ + + + + + + | 0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads [rth: Rearrange 128-bit cmpxchg helper. Enforce alignment on LL.] Signed-off-by: NEmilio G. Cota <cota@braap.org> Message-Id: <1467054136-10430-28-git-send-email-cota@braap.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Emilio G. Cota 提交于
Signed-off-by: NEmilio G. Cota <cota@braap.org> Message-Id: <1467054136-10430-25-git-send-email-cota@braap.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Emilio G. Cota 提交于
Emulating LL/SC with cmpxchg is not correct, since it can suffer from the ABA problem. Portable parallel code, however, is written assuming only cmpxchg--and not LL/SC--is available. This means that in practice emulating LL/SC with cmpxchg is a viable alternative. The appended emulates LL/SC pairs in ARM with cmpxchg helpers. This works in both user and system mode. In usermode, it avoids pausing all other CPUs to perform the LL/SC pair. The subsequent performance and scalability improvement is significant, as the plots below show. They plot the throughput of atomic_add-bench compiled for ARM and executed on a 64-core x86 machine. Hi-res plots: http://imgur.com/a/aNQpB atomic_add-bench: 1000000 ops/thread, [0,1] range 9 ++---------+----------+----------+----------+----------+----------+---++ +cmpxchg +-E--+ + + + + + | 8 +Emaster +-H--+ ++ | | | 7 ++E ++ | | | 6 ++++ ++ | | | 5 ++ | ++ 4 ++ | ++ | | | 3 ++ | ++ | | | 2 ++ | ++ |H++E+--- +++ ---+E+------+E+------+E| 1 +++ +E+-----+E+------+E+------+E+------+E+-- +++ +++ ++ ++H+ + +++ + +++ ++++ + + + | 0 ++--H----H-+-----H----+----------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 1000000 ops/thread, [0,2] range 16 ++---------+----------+---------+----------+----------+----------+---++ +cmpxchg +-E--+ + + + + + | 14 ++master +-H--+ ++ | | | 12 ++| ++ | E | 10 ++| ++ | | | 8 ++++ ++ |E+| | | | | 6 ++ | ++ | | | 4 ++ | ++ | +E+--- +++ +++ +++ ---+E+------+E| 2 +H+ +E+------E-------+E+-----+E+------+E+------+E+-- +++ + | + +++ + ++++ + + + | 0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 1000000 ops/thread, [0,128] range 70 ++---------+----------+---------+----------+----------+----------+---++ +cmpxchg +-E--+ + + + ++++ + | 60 ++master +-H--+ ----E------+E+-------++ | -+E+--- +++ +++ +E| | +++ ---- +++ ++| 50 ++ +++ ---+E+- ++ | -E--- | 40 ++ ---+++ ++ | +++--- | | -+E+ | 30 ++ +++---- ++ | +E+ | 20 ++ +++-- ++ | +E+ | |+E+ | 10 +E+ ++ + + + + + + + | 0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 1000000 ops/thread, [0,1024] range 120 ++---------+---------+----------+---------+----------+----------+---++ +cmpxchg +-E--+ + + + + + | | master +-H--+ ++| 100 ++ ----E+ | +++ ---+E+--- ++| | --E--- +++ | 80 ++ ---- +++ ++ | ---+E+- | 60 ++ -+E+-- ++ | +++ ---- +++ | | -+E+- | 40 ++ +++---- ++ | +++ ---+E+ | | -+E+--- | 20 ++ +E+ ++ |+E+++ | +E+ + + + + + + | 0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads [rth: Enforce alignment for ldrexd.] Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NEmilio G. Cota <cota@braap.org> Message-Id: <1467054136-10430-23-git-send-email-cota@braap.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
Stop specializing on TARGET_LONG_BITS == 32; unconditionally allocate a temp and expand with tcg_gen_extu_i32_tl. Split out gen_aa32_addr, gen_aa32_frob64, gen_aa32_ld_i32 and gen_aa32_st_i32 as separate interfaces. Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Emilio G. Cota 提交于
With this microbenchmark we can measure the overhead of emulating atomic instructions with a configurable degree of contention. The benchmark spawns $n threads, each performing $o atomic ops (additions) in a loop. Each atomic operation is performed on a different cache line (assuming lines are 64b long) that is randomly selected from a range [0, $r). [ Note: each $foo corresponds to a -foo flag ] Signed-off-by: NEmilio G. Cota <cota@braap.org> Signed-off-by: NRichard Henderson <rth@twiddle.net> Message-Id: <1467054136-10430-20-git-send-email-cota@braap.org>
-
由 Emilio G. Cota 提交于
It's been superseded by the atomic helpers. The use of the atomic helpers provides a significant performance and scalability improvement. Below is the result of running the atomic_add-test microbenchmark with: $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n , where $n is the number of threads and $r is the allowed range for the additions. The scenarios measured are: - atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset) - cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper - master: before this patchset Results sorted in ascending range, i.e. descending degree of contention. Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64 Opteron 6376 cores. atomic_add-bench: 5000000 ops/thread, [0,1] range 25 ++---------+----------+---------+----------+----------+----------+---++ + atomic +-E--+ + + + + + | |cmpxchg +-H--+ | 20 +Emaster +-N--+ ++ || | |++ | || | 15 +++ ++ |N| | |+| | 10 ++| ++ |+|+ | | | -+E+------ +++ ---+E+------+E+------+E+-----+E+------+E| |+E+E+- +++ +E+------+E+-- | 5 ++|+ ++ |+N+H+--- +++ | ++++N+--+H++----+++ + +++ --++H+------+H+------+H++----+H+---+--- | 0 ++---------+-----H----+---H-----+----------+----------+----------+---H+ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,2] range 25 ++---------+----------+---------+----------+----------+----------+---++ ++atomic +-E--+ + + + + + | |cmpxchg +-H--+ | 20 ++master +-N--+ ++ |E| | |++ | ||E | 15 ++| ++ |N|| | |+|| ---+E+------+E+-----+E+------+E| 10 ++| | ---+E+------+E+-----+E+--- +++ +++ ||H+E+--+E+-- | |+++++ | | || | 5 ++|+H+-- +++ ++ |+N+ - ---+H+------+H+------ | + +N+--+H++----+H+---+--+H+----++H+--- + + +H+---+--+H| 0 ++---------+----------+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,8] range 40 ++---------+----------+---------+----------+----------+----------+---++ ++atomic +-E--+ + + + + + | 35 +cmpxchg +-H--+ ++ | master +-N--+ ---+E+------+E+------+E+-----+E+------+E| 30 ++| ---+E+-- +++ ++ | | -+E+--- | 25 ++E ---- +++ ++ |+++++ -+E+ | 20 +E+ E-- +++ ++ |H|+++ | |+| +H+------- | 15 ++H+ ---+++ +H+------ ++ |N++H+-- +++--- +H+------++| 10 ++ +++ - +++ ---+H+ +++ +H+ | | +H+-----+H+------+H+-- | 5 ++| +++ ++ ++N+N+--+N++ + + + + + | 0 ++---------+----------+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,128] range 160 ++---------+---------+----------+---------+----------+----------+---++ + atomic +-E--+ + + + + + | 140 +cmpxchg +-H--+ +++ +++ ++ | master +-N--+ E--------E------+E+------++| 120 ++ --| | +++ E+ | -- +++ +++ ++| 100 ++ - ++ | +++- +++ ++| 80 ++ -+E+ -+H+------+H+------H--------++ | ---- ---- +++ H| | ---+E+-----+E+- ---+H+ ++| 60 ++ +E+--- +++ ---+H+--- ++ | --+++ ---+H+-- | 40 ++ +E+-+H+--- ++ | +H+ | 20 +EE+ ++ +N+ + + + + + + | 0 ++N-N---N--+---------+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,1024] range 350 ++---------+---------+----------+---------+----------+----------+---++ + atomic +-E--+ + + + + + | 300 +cmpxchg +-H--+ +++ | master +-N--+ +++ || | +++ | ----E| 250 ++ | ----E---- ++ | ----E--- | ---+H| 200 ++ -+E+--- +++ ---+H+--- ++ | ---- -+H+-- | | +E+ +++ ---- +++ | 150 ++ ---+++ ---+H+- ++ | --- -+H+-- | 100 ++ ---+E+ ---- +++ ++ | +++ ---+E+-----+H+- | | -+E+------+H+-- | 50 ++ +E+ ++ +EE+ + + + + + + | 0 ++N-N---N--+---------+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads hi-res: http://imgur.com/a/fMRmq For master I stopped measuring master after 8 threads, because there is little point in measuring the well-known performance collapse of a contended lock. Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NEmilio G. Cota <cota@braap.org> Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Emilio G. Cota 提交于
Signed-off-by: NEmilio G. Cota <cota@braap.org> Message-Id: <1467054136-10430-19-git-send-email-cota@braap.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Emilio G. Cota 提交于
[rth: Avoid redundant qemu_ld in locked case. Fix previously unnoticed incorrect zero-extension of address in register-offset case.] Signed-off-by: NEmilio G. Cota <cota@braap.org> Message-Id: <1467054136-10430-18-git-send-email-cota@braap.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Emilio G. Cota 提交于
[rth: Move load of reg value to common location.] Signed-off-by: NEmilio G. Cota <cota@braap.org> Message-Id: <1467054136-10430-17-git-send-email-cota@braap.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Emilio G. Cota 提交于
[rth: Move redundant qemu_load out of cmpxchg loop.] Signed-off-by: NEmilio G. Cota <cota@braap.org> Message-Id: <1467054136-10430-16-git-send-email-cota@braap.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Emilio G. Cota 提交于
[rth: Avoid qemu_load that's redundant with the atomic op.] Signed-off-by: NEmilio G. Cota <cota@braap.org> Message-Id: <1467054136-10430-15-git-send-email-cota@braap.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Emilio G. Cota 提交于
[rth: Merge gen_inc_locked back into gen_inc to share cc update.] Signed-off-by: NEmilio G. Cota <cota@braap.org> Message-Id: <1467054136-10430-14-git-send-email-cota@braap.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Emilio G. Cota 提交于
[rth: Eliminate some unnecessary temporaries.] Signed-off-by: NEmilio G. Cota <cota@braap.org> Message-Id: <1467054136-10430-13-git-send-email-cota@braap.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Emilio G. Cota 提交于
The diff here is uglier than necessary. All this does is to turn FOO into: if (s->prefix & PREFIX_LOCK) { BAR } else { FOO } where FOO is the original implementation of an unlocked cmpxchg. [rth: Adjust unlocked cmpxchg to use movcond instead of branches. Adjust helpers to use atomic helpers.] Signed-off-by: NEmilio G. Cota <cota@braap.org> Message-Id: <1467054136-10430-6-git-send-email-cota@braap.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
Reviewed-by: NEmilio G. Cota <cota@braap.org> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
Allow qemu to build on 32-bit hosts without 64-bit atomic ops. Even if we only allow 32-bit hosts to multi-thread emulate 32-bit guests, we still need some way to handle the 32-bit guest using a 64-bit atomic operation. Do so by dropping back to single-step. Reviewed-by: NEmilio G. Cota <cota@braap.org> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
Force the use of cmpxchg16b on x86_64. Wikipedia suggests that only very old AMD64 (circa 2004) did not have this instruction. Further, it's required by Windows 8 so no new cpus will ever omit it. If we truely care about these, then we could check this at startup time and then avoid executing paths that use it. Reviewed-by: NEmilio G. Cota <cota@braap.org> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
Add all of cmpxchg, op_fetch, fetch_op, and xchg. Handle both endian-ness, and sizes up to 8. Handle expanding non-atomically, when emulating in serial. Reviewed-by: NEmilio G. Cota <cota@braap.org> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
TGT_LE and TGT_BE are not size dependent and do not need to be redefined. The others are no longer used at all. Reviewed-by: NEmilio G. Cota <cota@braap.org> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
Saves 2k code size off of a cold path. Reviewed-by: NEmilio G. Cota <cota@braap.org> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
We already include exec/address-spaces.h and exec/memory.h in cputlb.c; the include of qemu/timer.h appears to be a fossil. Reviewed-by: NEmilio G. Cota <cota@braap.org> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
Reviewed-by: NEmilio G. Cota <cota@braap.org> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
Reviewed-by: NEmilio G. Cota <cota@braap.org> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Alex Bennée 提交于
The variable parallel_cpus controls the generation of thread aware atomic code. We only need to set it once we clone our first thread. At this point any existing translations need to be thrown away. Reviewed-by: NEmilio G. Cota <cota@braap.org> Signed-off-by: NAlex Bennée <alex.bennee@linaro.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-