提交 · ca7d8e1c9ccb95257a9c9288617b270727ec3e79 · openeuler / qemu

17 7月, 2016 1 次提交

cpu-exec: Move down some declarations in cpu_exec() · ca7d8e1c

由 Sergey Fedorov 提交于 7月 15, 2016

This will fix a compiler warning with -Wclobbered:

http://lists.nongnu.org/archive/html/qemu-devel/2016-07/msg03347.htmlReported-by: NStefan Weil <sw@weilnetz.de>
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <20160715193123.28113-1-sergey.fedorov@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ca7d8e1c

12 6月, 2016 2 次提交

tb hash: track translated blocks with qht · 909eaac9

由 Emilio G. Cota 提交于 6月 08, 2016

Having a fixed-size hash table for keeping track of all translation blocks
is suboptimal: some workloads are just too big or too small to get maximum
performance from the hash table. The MRU promotion policy helps improve
performance when the hash table is a little undersized, but it cannot
make up for severely undersized hash tables.

Furthermore, frequent MRU promotions result in writes that are a scalability
bottleneck. For scalability, lookups should only perform reads, not writes.
This is not a big deal for now, but it will become one once MTTCG matures.

The appended fixes these issues by using qht as the implementation of
the TB hash table. This solution is superior to other alternatives considered,
namely:

- master: implementation in QEMU before this patchset
- xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU.
- xxhash-rcu: fixed buckets + xxhash + RCU list + MRU.
              MRU is implemented here by adding an intermediate struct
              that contains the u32 hash and a pointer to the TB; this
              allows us, on an MRU promotion, to copy said struct (that is not
              at the head), and put this new copy at the head. After a grace
              period, the original non-head struct can be eliminated, and
              after another grace period, freed.
- qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize +
                   no MRU for lookups; MRU for inserts.
The appended solution is the following:
- qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize +
                 no MRU for lookups; MRU for inserts.

The plots below compare the considered solutions. The Y axis shows the
boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis
sweeps the number of buckets (or initial number of buckets for qht-autoresize).
The plots in PNG format (and with errorbars) can be seen here:
  http://imgur.com/a/Awgnq

Each test runs 5 times, and the entire QEMU process is pinned to a
single core for repeatability of results.

                            Host: Intel Xeon E5-2690

  28 ++------------+-------------+-------------+-------------+------------++
     A*****        +             +             +             master **A*** +
  27 ++    *                                                 xxhash ##B###++
     |      A******A******                               xxhash-rcu $$C$$$ |
  26 C$$                  A******A******            qht-fixed-nomru*%%D%%%++
     D%%$$                              A******A******A*qht-dyn-mru A*E****A
  25 ++ %%$$                                          qht-dyn-nomru &&F&&&++
     B#####%                                                               |
  24 ++    #C$$$$$                                                        ++
     |      B###  $                                                        |
     |          ## C$$$$$$                                                 |
  23 ++           #       C$$$$$$                                         ++
     |             B######       C$$$$$$                                %%%D
  22 ++                  %B######       C$$$$$$C$$$$$$C$$$$$$C$$$$$$C$$$$$$C
     |                    D%%%%%%B######      @E@@@@@@    %%%D%%%@@@E@@@@@@E
  21 E@@@@@@E@@@@@@F&&&@@@E@@@&&&D%%%%%%B######B######B######B######B######B
     +             E@@@   F&&&   +      E@     +      F&&&   +             +
  20 ++------------+-------------+-------------+-------------+------------++
     14            16            18            20            22            24
                             log2 number of buckets

                                 Host: Intel i7-4790K

  14.5 ++------------+------------+-------------+------------+------------++
       A**           +            +             +            master **A*** +
    14 ++ **                                                 xxhash ##B###++
  13.5 ++   **                                           xxhash-rcu $$C$$$++
       |                                            qht-fixed-nomru %%D%%% |
    13 ++     A******                                   qht-dyn-mru @@E@@@++
       |             A*****A******A******             qht-dyn-nomru &&F&&& |
  12.5 C$$                               A******A******A*****A******    ***A
    12 ++ $$                                                        A***  ++
       D%%% $$                                                             |
  11.5 ++  %%                                                             ++
       B###  %C$$$$$$                                                      |
    11 ++  ## D%%%%% C$$$$$                                               ++
       |     #      %      C$$$$$$                                         |
  10.5 F&&&&&&B######D%%%%%       C$$$$$$C$$$$$$C$$$$$$C$$$$$C$$$$$$    $$$C
    10 E@@@@@@E@@@@@@B#####B######B######E@@@@@@E@@@%%%D%%%%%D%%%###B######B
       +             F&&          D%%%%%%B######B######B#####B###@@@D%%%   +
   9.5 ++------------+------------+-------------+------------+------------++
       14            16           18            20           22            24
                              log2 number of buckets

Note that the original point before this patch series is X=15 for "master";
the little sensitivity to the increased number of buckets is due to the
poor hashing function in master.

xxhash-rcu has significant overhead due to the constant churn of allocating
and deallocating intermediate structs for implementing MRU. An alternative
would be do consider failed lookups as "maybe not there", and then
acquire the external lock (tb_lock in this case) to really confirm that
there was indeed a failed lookup. This, however, would not be enough
to implement dynamic resizing--this is more complex: see
"Resizable, Scalable, Concurrent Hash Tables via Relativistic
Programming" by Triplett, McKenney and Walpole. This solution was
discarded due to the very coarse RCU read critical sections that we have
in MTTCG; resizing requires waiting for readers after every pointer update,
and resizes require many pointer updates, so this would quickly become
prohibitive.

qht-fixed-nomru shows that MRU promotion is advisable for undersized
hash tables.

However, qht-dyn-mru shows that MRU promotion is not important if the
hash table is properly sized: there is virtually no difference in
performance between qht-dyn-nomru and qht-dyn-mru.

Before this patch, we're at X=15 on "xxhash"; after this patch, we're at
X=15 @ qht-dyn-nomru. This patch thus matches the best performance that we
can achieve with optimum sizing of the hash table, while keeping the hash
table scalable for readers.

The improvement we get before and after this patch for booting debian jessie
with arm-softmmu is:

- Intel Xeon E5-2690: 10.5% less time
- Intel i7-4790K: 5.2% less time

We could get this same improvement _for this particular workload_ by
statically increasing the size of the hash table. But this would hurt
workloads that do not need a large hash table. The dynamic (upward)
resizing allows us to start small and enlarge the hash table as needed.

A quick note on downsizing: the table is resized back to 2**15 buckets
on every tb_flush; this makes sense because it is not guaranteed that the
table will reach the same number of TBs later on (e.g. most bootup code is
thrown away after boot); it makes sense to grow the hash table as
more code blocks are translated. This also avoids the complication of
having to build downsizing hysteresis logic into qht.
Reviewed-by: NSergey Fedorov <serge.fedorov@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-15-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

909eaac9

tb hash: hash phys_pc, pc, and flags with xxhash · 42bd3228

由 Emilio G. Cota 提交于 6月 08, 2016

For some workloads such as arm bootup, tb_phys_hash is performance-critical.
The is due to the high frequency of accesses to the hash table, originated
by (frequent) TLB flushes that wipe out the cpu-private tb_jmp_cache's.
More info:
  https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg05098.html

To dig further into this I modified an arm image booting debian jessie to
immediately shut down after boot. Analysis revealed that quite a bit of time
is unnecessarily spent in tb_phys_hash: the cause is poor hashing that
results in very uneven loading of chains in the hash table's buckets;
the longest observed chain had ~550 elements.

The appended addresses this with two changes:

1) Use xxhash as the hash table's hash function. xxhash is a fast,
   high-quality hashing function.

2) Feed the hashing function with not just tb_phys, but also pc and flags.

This improves performance over using just tb_phys for hashing, since that
resulted in some hash buckets having many TB's, while others getting very few;
with these changes, the longest observed chain on a single hash bucket is
brought down from ~550 to ~40.

Tests show that the other element checked for in tb_find_physical,
cs_base, is always a match when tb_phys+pc+flags are a match,
so hashing cs_base is wasteful. It could be that this is an ARM-only
thing, though. UPDATE:
On Tue, Apr 05, 2016 at 08:41:43 -0700, Richard Henderson wrote:
> The cs_base field is only used by i386 (in 16-bit modes), and sparc (for a TB
> consisting of only a delay slot).
> It may well still turn out to be reasonable to ignore cs_base for hashing.

BTW, after this change the hash table should not be called "tb_hash_phys"
anymore; this is addressed later in this series.

This change gives consistent bootup time improvements. I tested two
host machines:
- Intel Xeon E5-2690: 11.6% less time
- Intel i7-4790K: 19.2% less time

Increasing the number of hash buckets yields further improvements. However,
using a larger, fixed number of buckets can degrade performance for other
workloads that do not translate as many blocks (600K+ for debian-jessie arm
bootup). This is dealt with later in this series.
Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-8-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

42bd3228

26 5月, 2016 1 次提交

cpu-exec: Fix direct jump to TB spanning page · c88c67e5

由 Sergey Fedorov 提交于 5月 16, 2016

It is not safe to make a direct jump to a TB spanning two pages in
system emulation because the mapping for the second page can get changed
but we don't take care of direct jumps in this case.

However in user mode emulation, this is not the case because there's
only static address translation and TBs are always invalidated properly.

Fixes: 5b053a4a ("tcg: Clean up direct block chaining safety checks")
Reported-by: NMax Filippov <jcmvbkbc@gmail.com>
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Tested-by: NMax Filippov <jcmvbkbc@gmail.com>
Message-id: 1463404380-29302-1-git-send-email-sergey.fedorov@linaro.org
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

c88c67e5

19 5月, 2016 1 次提交

cpu: move exec-all.h inclusion out of cpu.h · 63c91552

由 Paolo Bonzini 提交于 3月 15, 2016

exec-all.h contains TCG-specific definitions.  It is not needed outside
TCG-specific files such as translate.c, exec.c or *helper.c.

One generic function had snuck into include/exec/exec-all.h; move it to
include/qom/cpu.h.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

63c91552

13 5月, 2016 15 次提交

cpu-exec: Clean up 'interrupt_request' reloading in cpu_handle_interrupt() · 8b1fe3f4

由 Sergey Fedorov 提交于 5月 12, 2016

Suggested-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1463071937-26607-1-git-send-email-sergey.fedorov@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

8b1fe3f4

cpu-exec: Remove unused 'x86_cpu' and 'env' from cpu_exec() · ba048a4a

由 Sergey Fedorov 提交于 5月 11, 2016

Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Message-Id: <1462962111-32237-6-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

ba048a4a

cpu-exec: Move TB execution stuff out of cpu_exec() · 928de9ee

由 Sergey Fedorov 提交于 5月 11, 2016

Simplify cpu_exec() by extracting TB execution code outside of
cpu_exec() into a new static inline function cpu_loop_exec_tb().
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Message-Id: <1462962111-32237-5-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

928de9ee

cpu-exec: Move interrupt handling out of cpu_exec() · c385e6e4

由 Sergey Fedorov 提交于 5月 11, 2016

Simplify cpu_exec() by extracting interrupt handling code outside of
cpu_exec() into a new static inline function cpu_handle_interrupt().
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NRichard Henderson  <rth@twiddle.net>
Message-Id: <1462962111-32237-4-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

c385e6e4

cpu-exec: Move exception handling out of cpu_exec() · ea284766

由 Sergey Fedorov 提交于 5月 11, 2016

Simplify cpu_exec() by extracting exception handling code out of
cpu_exec() into a new static inline function cpu_handle_exception().
Also make cpu_handle_debug_exception() inline as it is used only once.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Message-Id: <1462962111-32237-3-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

ea284766

cpu-exec: Move halt handling out of cpu_exec() · 8b2d34e9

由 Sergey Fedorov 提交于 5月 11, 2016

Simplify cpu_exec() by extracting CPU halt state handling code out of
cpu_exec() into a new static inline function cpu_handle_halt().
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Message-Id: <1462962111-32237-2-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

8b2d34e9

cpu-exec: Remove relic orphaned comment · c6f0d9f8

由 Sergey Fedorov 提交于 5月 03, 2016

This comment should have been deleted by commit 0ac087f1 ("removed
unused code") but somehow it is still here. There's no point to keep it.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1462286050-21778-1-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

c6f0d9f8

tcg: Remove needless CPUState::current_tb · 3213525f

由 Sergey Fedorov 提交于 5月 03, 2016

This field was used for telling cpu_interrupt() to unlink a chain of TBs
being executed when it worked that way. Now, cpu_interrupt() don't do
this anymore. So we don't need this field anymore.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1462273462-14036-1-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

3213525f

cpu-exec: Move TB chaining into tb_find_fast() · a0522c7a

由 Sergey Fedorov 提交于 4月 25, 2016

Move tb_add_jump() call and surrounding code from cpu_exec() into
tb_find_fast(). That simplifies cpu_exec() a little by hiding the direct
chaining optimization details into tb_find_fast(). It also allows to
move tb_lock()/tb_unlock() pair into tb_find_fast(), putting it closer
to tb_find_slow() which also manipulates the lock.
Suggested-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>
[rth: Fixed rebase typo in nochain test.]

a0522c7a

tcg: Rework tb_invalidated_flag · 6f789be5

由 Sergey Fedorov 提交于 4月 13, 2016

'tb_invalidated_flag' was meant to catch two events:
 * some TB has been invalidated by tb_phys_invalidate();
 * the whole translation buffer has been flushed by tb_flush().

Then it was checked:
 * in cpu_exec() to ensure that the last executed TB can be safely
   linked to directly call the next one;
 * in cpu_exec_nocache() to decide if the original TB should be provided
   for further possible invalidation along with the temporarily
   generated TB.

It is always safe to patch an invalidated TB since it is not going to be
used anyway. It is also safe to call tb_phys_invalidate() for an already
invalidated TB. Thus, setting this flag in tb_phys_invalidate() is
simply unnecessary. Moreover, it can prevent from pretty proper linking
of TBs, if any arbitrary TB has been invalidated. So just don't touch it
in tb_phys_invalidate().

If this flag is only used to catch whether tb_flush() has been called
then rename it to 'tb_flushed'. Declare it as 'bool' and stick to using
only 'true' and 'false' to set its value. Also, instead of setting it in
tb_gen_code(), just after tb_flush() has been called, do it right inside
of tb_flush().

In cpu_exec(), this flag is used to track if tb_flush() has been called
and have made 'next_tb' (a reference to the last executed TB) invalid
for linking it to directly call the next TB. tb_flush() can be called
during the CPU execution loop from tb_gen_code(), during TB execution or
by another thread while 'tb_lock' is released. Catch for translation
buffer flush reliably by resetting this flag once before first TB lookup
and each time we find it set before trying to add a direct jump. Don't
touch in in tb_find_physical().

Each vCPU has its own execution loop in multithreaded mode and thus
should have its own copy of the flag to be able to reset it with its own
'next_tb' and don't affect any other vCPU execution thread. So make this
flag per-vCPU and move it to CPUState.

In cpu_exec_nocache(), we only need to check if tb_flush() has been
called from tb_gen_code() called by cpu_exec_nocache() itself. To do
this reliably, preserve the old value of the flag, reset it before
calling tb_gen_code(), check afterwards, and combine the saved value
back to the flag.

This patch is based on the patch "tcg: move tb_invalidated_flag to
CPUState" from Paolo Bonzini <pbonzini@redhat.com>.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

6f789be5

tcg: Clean up from 'next_tb' · 819af24b

由 Sergey Fedorov 提交于 4月 21, 2016

The value returned from tcg_qemu_tb_exec() is the value passed to the
corresponding tcg_gen_exit_tb() at translation time of the last TB
attempted to execute. It is a little confusing to store it in a variable
named 'next_tb'. In fact, it is a combination of 4-byte aligned pointer
and additional information in its two least significant bits. Break it
down right away into two variables named 'last_tb' and 'tb_exit' which
are a pointer to the last TB attempted to execute and the TB exit
reason, correspondingly. This simplifies the code and improves its
readability.

Correct a misleading documentation comment for tcg_qemu_tb_exec() and
fix logging in cpu_tb_exec(). Also rename a misleading 'next_tb' in
another couple of places.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

819af24b

cpu-exec: elide more icount code if CONFIG_USER_ONLY · 7687bf52

由 Paolo Bonzini 提交于 8月 11, 2015

Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[Alex Bennée: #ifndef replay code to match elided functions]
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

7687bf52

tcg: reorganize tb_find_physical loop · 1279f323

由 Alex Bennée 提交于 3月 22, 2016

Put some comments and improve code structure. This should help reading
the code.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
[Sergey Fedorov: provide commit message; bring back resetting of
tb_invalidated_flag]
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NRichard Henderson  <rth@twiddle.net>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

1279f323

tcg: Clean up direct block chaining safety checks · 5b053a4a

由 Sergey Fedorov 提交于 4月 08, 2016

We don't take care of direct jumps when address mapping changes. Thus we
must be sure to generate direct jumps so that they always keep valid
even if address mapping changes. Luckily, we can only allow to execute a
TB if it was generated from the pages which match with current mapping.

Document tcg_gen_goto_tb() declaration and note the reason for
destination PC limitations.

Some targets with variable length instructions allow TB to straddle a
page boundary. However, we make sure that both of TB pages match the
current address mapping when looking up TBs. So it is safe to do direct
jumps into the both pages. Correct the checks for some of those targets.

Given that, we can safely patch a TB which spans two pages. Remove the
unnecessary check in cpu_exec() and allow such TBs to be patched.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

5b053a4a

tb: consistently use uint32_t for tb->flags · 89fee74a

由 Emilio G. Cota 提交于 4月 07, 2016

We are inconsistent with the type of tb->flags: usage varies loosely
between int and uint64_t. Settle to uint32_t everywhere, which is
superior to both: at least one target (aarch64) uses the most significant
bit in the u32, and uint64_t is wasteful.

Compile-tested for all targets.
Suggested-by: NLaurent Desnogues <laurent.desnogues@gmail.com>
Suggested-by: NRichard Henderson <rth@twiddle.net>
Tested-by: NEdgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: NEdgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: NLaurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>
Message-Id: <1460049562-23517-1-git-send-email-cota@braap.org>

89fee74a

23 3月, 2016 2 次提交

qemu-log: dfilter-ise exec, out_asm, op and opt_op · d977e1c2

由 Alex Bennée 提交于 3月 15, 2016

This ensures the code generation debug code will honour -dfilter if set.
For the "exec" tracing I've added a new inline macro for efficiency's
sake.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NAurelien Jarno <aurelien@aureL32.net>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Message-Id: <1458052224-9316-8-git-send-email-alex.bennee@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d977e1c2

qemu-log: Improve the "exec" TB execution logging · 1a830635

由 Peter Maydell 提交于 3月 15, 2016

Improve the TB execution logging so that it is easier to identify
what is happening from trace logs:
 * move the "Trace" logging of executed TBs into cpu_tb_exec()
   so that it is emitted if and only if we actually execute a TB,
   and for consistency for the CPU state logging
 * log when we link two TBs together via tb_add_jump()
 * log when cpu_tb_exec() returns early from a chain of TBs

The new style logging looks like this:

Trace 0x7fb7cc822ca0 [ffffffc0000dce00]
Linking TBs 0x7fb7cc822ca0 [ffffffc0000dce00] index 0 -> 0x7fb7cc823110 [ffffffc0000dce10]
Trace 0x7fb7cc823110 [ffffffc0000dce10]
Trace 0x7fb7cc823420 [ffffffc000302688]
Trace 0x7fb7cc8234a0 [ffffffc000302698]
Trace 0x7fb7cc823520 [ffffffc0003026a4]
Trace 0x7fb7cc823560 [ffffffc0000dce44]
Linking TBs 0x7fb7cc823560 [ffffffc0000dce44] index 1 -> 0x7fb7cc8235d0 [ffffffc0000dce70]
Trace 0x7fb7cc8235d0 [ffffffc0000dce70]
Stopped execution of TB chain before 0x7fb7cc8235d0 [ffffffc0000dce70]
Trace 0x7fb7cc8235d0 [ffffffc0000dce70]
Trace 0x7fb7cc822fd0 [ffffffc0000dd52c]
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
[AJB: reword patch title, Abandoned->Stopped]
Reviewed-by: NAurelien Jarno <aurelien@aurel32.net>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Message-Id: <1458052224-9316-6-git-send-email-alex.bennee@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1a830635

03 2月, 2016 1 次提交

log: do not unnecessarily include qom/cpu.h · 508127e2

由 Paolo Bonzini 提交于 1月 07, 2016

Split the bits that require it to exec/log.h.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Message-id: 1452174932-28657-8-git-send-email-den@openvz.org
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

508127e2

29 1月, 2016 1 次提交

exec: Clean up includes · 7b31bbc2

由 Peter Maydell 提交于 1月 26, 2016

Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-4-git-send-email-peter.maydell@linaro.org

7b31bbc2

06 11月, 2015 1 次提交

replay: interrupts and exceptions · 6f060969

由 Pavel Dovgalyuk 提交于 9月 17, 2015

This patch includes modifications of common cpu files. All interrupts and
exceptions occured during recording are written into the replay log.
These events allow correct replaying the execution by kicking cpu thread
when one of these events is found in the log.
Signed-off-by: NPavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162416.8676.57647.stgit@PASHA-ISP.def.inno>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6f060969

05 11月, 2015 1 次提交

cpu-exec: allow temporary disabling icount · 56c0269a

由 Pavel Dovgalyuk 提交于 9月 17, 2015

This patch is required for deterministic replay to generate an exception
by trying executing an instruction without changing icount.
It adds new flag to TB for disabling icount while translating it.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162359.8676.77011.stgit@PASHA-ISP.def.inno>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56c0269a

04 11月, 2015 1 次提交

cpu-exec: Fix compiler warning (-Werror=clobbered) · 0448f5f8

由 Stefan Weil 提交于 9月 26, 2015

Reloading of local variables after sigsetjmp is only needed for some
buggy compilers.

The code which should reload these variables causes compiler warnings
with gcc 4.7 when compiler optimizations are enabled:

cpu-exec.c:204:15: error:
 variable ‘cpu’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered]
cpu-exec.c:207:15: error:
 variable ‘cc’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered]
cpu-exec.c:202:28: error:
 argument ‘env’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered]

Now this code is only used for compilers which need it
(and gcc 4.5.x, x > 0 which does not need it but won't give warnings).

There were bug reports for clang and gcc 4.5.0, while gcc 4.5.1
was reported to work fine without the reload code. For clang it
is not clear which versions are affected, so simply keep the status quo
for all clang compilations. This can be improved later.
Signed-off-by: NStefan Weil <sw@weilnetz.de>
Message-Id: <1443266606-21400-1-git-send-email-sw@weilnetz.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0448f5f8

20 10月, 2015 1 次提交

cpu-exec: Add "nochain" debug flag · 89a82cd4

由 Richard Henderson 提交于 9月 16, 2015

Respect it to avoid linking TBs together.
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

89a82cd4

25 9月, 2015 1 次提交

i386: partial revert of interrupt poll fix · 6220e900

由 Pavel Dovgalyuk 提交于 9月 17, 2015

Processing CPU_INTERRUPT_POLL requests in cpu_has_work functions
break the determinism of cpu_exec. This patch is required to make
interrupts processing deterministic.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162331.8676.15286.stgit@PASHA-ISP.def.inno>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6220e900

16 9月, 2015 1 次提交

cpu-exec: Migrate some generic fns to cpu-exec-common · 5abf9495

由 Peter Crosthwaite 提交于 9月 10, 2015

The goal is to split the functions such that cpu-exec is CPU specific
content, while cpus-exec-common.c is generic code only. The function
interface to cpu-exec needs to be virtualised to prepare support for
multi-arch and moving these definitions out saves bloating the QOM
interface. So move these definitions out of cpu-exec to a new module,
cpu-exec-common.
Signed-off-by: NPeter Crosthwaite <crosthwaite.peter@gmail.com>
Message-Id: <3cefeb3fbbb33031670951a0e74de2778529da3f.1441614289.git.crosthwaite.peter@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5abf9495

11 9月, 2015 1 次提交

cpu-exec: introduce loop exit with restore function · 1c3c8af1

由 Pavel Dovgalyuk 提交于 7月 10, 2015

This patch introduces loop exit function, which also
restores guest CPU state according to the value of host
program counter.
Reviewed-by: NAurelien Jarno <aurelien@aurel32.net>
Signed-off-by: NPavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150710095702.13280.97477.stgit@PASHA-ISP>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

1c3c8af1

09 9月, 2015 7 次提交

cpu-exec: fix lock hierarchy for user-mode emulation · 9fd1a948

由 Paolo Bonzini 提交于 8月 11, 2015

tb_lock has to be taken inside the mmap_lock (example:
tb_invalidate_phys_range is called by target_mmap), but
tb_link_page is taking the mmap_lock and it is called
with the tb_lock held.

To fix this, take the mmap_lock in tb_find_slow, not
in tb_link_page.
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9fd1a948

replace spinlock by QemuMutex. · 677ef623

由 KONRAD Frederic 提交于 8月 10, 2015

spinlock is only used in two cases:
  * cpu-exec.c: to protect TranslationBlock
  * mem_helper.c: for lock helper in target-i386 (which seems broken).

It's a pthread_mutex_t in user-mode, so we can use QemuMutex directly,
with an #ifdef.  The #ifdef will be removed when multithreaded TCG
will need the mutex as well.
Signed-off-by: NKONRAD Frederic <fred.konrad@greensocs.com>
Message-Id: <1439220437-23957-5-git-send-email-fred.konrad@greensocs.com>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
[Merge Emilio G. Cota's patch to remove volatile. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

677ef623

tcg: signal-free qemu_cpu_kick · e0c38211

由 Paolo Bonzini 提交于 8月 26, 2015

Signals are slow and do not exist on Win32.  The previous patches
have done most of the legwork to introduce memory barriers (some
of them were even there already for the sake of Windows!) and
we can now set the flags directly in the iothread.

qemu_cpu_kick_thread is not used anymore on TCG, since the TCG thread is
never outside usermode while the CPU is running (not halted).  Instead run
the content of the signal handler (now in qemu_cpu_kick_no_halt) directly.
qemu_cpu_kick_no_halt is also used in qemu_mutex_lock_iothread to avoid
the overhead of qemu_cond_broadcast.
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e0c38211

tcg: synchronize exit_request and tcg_current_cpu accesses · aed807c8

由 Paolo Bonzini 提交于 8月 18, 2015

Synchronize the remaining pair of accesses in cpu_signal.  These should
be necessary on Windows as well, at least in theory.  Probably
SuspendProcess and ResumeProcess introduce some implicit memory
barrier.
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

aed807c8

P
tcg: synchronize cpu->exit_request and cpu->tcg_exit_req accesses · ab096a75
由 Paolo Bonzini 提交于 8月 18, 2015
```
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
ab096a75

tcg: assign cpu->current_tb in a simpler place · b0a46fa7

由 Paolo Bonzini 提交于 8月 18, 2015

TCG has not been reading cpu->current_tb from signal handlers for years.
The code that synchronized cpu_exec with the signal handler is not
needed anymore.
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b0a46fa7

tcg: introduce tcg_current_cpu · 9373e632

由 Paolo Bonzini 提交于 8月 18, 2015

This is already useful on Windows in order to remove tls.h, because
accesses to current_cpu are done from a different thread on that
platform.  It will be used on POSIX platforms as soon TCG stops using
signals to interrupt the execution of translated code.
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9373e632

15 8月, 2015 1 次提交

exec: drop cpu_can_do_io, just read cpu->can_do_io · 414b15c9

由 Paolo Bonzini 提交于 6月 24, 2015

After commit 626cf8f4 (icount: set can_do_io outside TB execution,
2014-12-08), can_do_io is set to 1 if not executing code.  It is
no longer necessary to make this assumption in cpu_can_do_io.

It is also possible to remove the use_icount test, simply by
never setting cpu->can_do_io to 0 unless use_icount is true.

With these changes cpu_can_do_io boils down to a read of
cpu->can_do_io.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

414b15c9

06 8月, 2015 1 次提交

cpu-exec: Do not invalidate original TB in cpu_exec_nocache() · 02d57ea1

由 Sergey Fedorov 提交于 6月 30, 2015

Instead of invalidating an original TB in cpu_exec_nocache()
prematurely, just save a link to it in the temporary generated TB. If
cpu_io_recompile() is raised subsequently from the temporary TB,
invalidate the original one as well. That allows reusing the original TB
each time cpu_exec_nocache() is called to handle expired instruction
counter in icount mode.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Message-Id: <1435656909-29116-1-git-send-email-serge.fdrv@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

02d57ea1