提交 · eae3eb3e185028d6e862db747e3b7397600d6762 · openeuler / qemu

11 1月, 2019 1 次提交

qemu/queue.h: simplify reverse access to QTAILQ · eae3eb3e

由 Paolo Bonzini 提交于 12月 06, 2018

The new definition of QTAILQ does not require passing the headname,
remove it.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

eae3eb3e

26 12月, 2018 15 次提交

tcg: Improve call argument loading · 4250da10

由 Richard Henderson 提交于 12月 11, 2018

Free the argument register only after we have verified that the
temporary is not already in that register. This case is likely
now that we are back propagating the preferred register.
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

4250da10

tcg: Record register preferences during liveness · 25f49c5f

由 Richard Henderson 提交于 11月 27, 2018

With these preferences, we can arrange for function call arguments to
be computed into the proper registers instead of requiring extra moves.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

25f49c5f

tcg: Add TCG_OPF_BB_EXIT · ae36a246

由 Richard Henderson 提交于 11月 27, 2018

Use this to notice the opcodes that exit the TB, which implies
that local temps are really dead and need not be synced.

Previously we so marked the true end of the TB, but that was
immediately overwritten by the la_bb_end invoked by any
TCG_OPF_BB_END opcode, like exit_tb.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

ae36a246

tcg: Split out more subroutines from liveness_pass_1 · f65a061c

由 Richard Henderson 提交于 11月 27, 2018

Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

f65a061c

tcg: Rename and adjust liveness_pass_1 helpers · 2616c808

由 Richard Henderson 提交于 11月 27, 2018

No need for a "tcg_" prefix for a static function; we already
have another "la_" prefix for indicating liveness analysis.
Pass in nb_globals and nb_temps, as we will already have them
in registers for other loops within the parent function.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

2616c808

tcg: Reindent parts of liveness_pass_1 · 152c35aa

由 Richard Henderson 提交于 11月 27, 2018

There are two blocks of the form

    if (foo) {
        stuff1;
        goto bar;
    } else {
    baz:
        stuff2;
    }

which have unnecessary and confusing indentation.
Remove the else and unindent stuff2.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

152c35aa

tcg: Dump register preference info with liveness · 1894f69a

由 Richard Henderson 提交于 11月 27, 2018

Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

1894f69a

tcg: Improve register allocation for matching constraints · d62816f2

由 Richard Henderson 提交于 11月 27, 2018

Try harder to honor the output_pref.  When we're forced to allocate
a second register for the input, it does not need to use the input
constraint; that will be honored by the register we allocate for the
output and a move is already required.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

d62816f2

tcg: Add output_pref to TCGOp · 69e3706d

由 Richard Henderson 提交于 11月 27, 2018

Allocate storage for, but do not yet fill in, per-opcode
preferences for the output operands.  Pass it in to the
register allocation routines for output operands.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

69e3706d

tcg: Add preferred_reg argument to tcg_reg_alloc_do_movi · ba87719c

由 Richard Henderson 提交于 11月 27, 2018

Pass this through to temp_sync.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

ba87719c

tcg: Add preferred_reg argument to temp_sync · 98b4e186

由 Richard Henderson 提交于 11月 27, 2018

Pass this through to tcg_reg_alloc.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

98b4e186

tcg: Add preferred_reg argument to temp_load · b722452a

由 Richard Henderson 提交于 11月 27, 2018

Pass this through to tcg_reg_alloc.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

b722452a

tcg: Add preferred_reg argument to tcg_reg_alloc · b016486e

由 Richard Henderson 提交于 11月 27, 2018

This new argument will aid register allocation by indicating how
the temporary will be used in future.  If the preference cannot
be satisfied, fall back to the constraints of the current insn.

Short circuit the preference when it cannot be satisfied or if
it does not further constrain the operation.

With an eye toward optimizing function call sequences, optimize
for the preferred_reg set containing a single register.

For the moment, all users pass 0 for preference.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

b016486e

tcg: Add reachable_code_pass · b4fc67c7

由 Richard Henderson 提交于 11月 26, 2018

Delete trivially dead code that follows unconditional branches and
noreturn helpers.  These can occur either via optimization or via
the structure of a target's translator following an exception.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

b4fc67c7

tcg: Reference count labels · d88a117e

由 Richard Henderson 提交于 11月 26, 2018

Increment when adding branches, and decrement when removing them.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

d88a117e

17 12月, 2018 2 次提交

tcg: Drop nargs from tcg_op_insert_{before,after} · ac1043f6

由 Emilio G. Cota 提交于 12月 09, 2018

It's unused since 75e8b9b7.
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <20181209193749.12277-9-cota@braap.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

ac1043f6

tcg: Return success from patch_reloc · 6ac17786

由 Richard Henderson 提交于 11月 30, 2018

This will move the assert for success from within (subroutines of)
patch_reloc into the callers.  It will also let new code do something
different when a relocation is out of range.

For the moment, all backends are trivially converted to return true.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

6ac17786

19 10月, 2018 2 次提交

tcg: distribute tcg_time into TCG contexts · 72fd2efb

由 Emilio G. Cota 提交于 10月 10, 2018

When we implemented per-vCPU TCG contexts, we forgot to also
distribute the tcg_time counter, which has remained as a global
accessed without any serialization, leading to potentially missed
counts.

Fix it by distributing the field over the TCG contexts, embedding
it into TCGProfile with a field called "cpu_exec_time", which is more
descriptive than "tcg_time". Add a function to query this value
directly, and for completeness, fill in the field in
tcg_profile_snapshot, even though its callers do not use it.
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <20181010144853.13005-5-cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

72fd2efb

tcg: fix use of uninitialized variable under CONFIG_PROFILER · c1f543b7

由 Emilio G. Cota 提交于 10月 10, 2018

We forgot to initialize n in commit 15fa08f8 ("tcg: Dynamically
allocate TCGOps", 2017-12-29).
Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <20181010144853.13005-3-cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

c1f543b7

16 6月, 2018 3 次提交

tcg: Reduce max TB opcode count · 9f754620

由 Richard Henderson 提交于 6月 14, 2018

Also, assert that we don't overflow any of two different offsets into
the TB. Both unwind and goto_tb both record a uint16_t for later use.

This fixes an arm-softmmu test case utilizing NEON in which there is
a TB generated that runs to 7800 opcodes, and compiles to 96k on an
x86_64 host.  This overflows the 16-bit offset in which we record the
goto_tb reset offset.  Because of that overflow, we install a jump
destination that goes to neverland.  Boom.

With this reduced op count, the same TB compiles to about 48k for
aarch64, ppc64le, and x86_64 hosts, and neither assertion fires.

Cc: qemu-stable@nongnu.org
Reported-by: N"Jason A. Donenfeld" <Jason@zx2c4.com>
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

9f754620

tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx · 128ed227

由 Emilio G. Cota 提交于 8月 01, 2017

Thereby making it per-TCGContext. Once we remove tb_lock, this will
avoid an atomic increment every time a TB is invalidated.
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

128ed227

tcg: track TBs with per-region BST's · be2cdc5e

由 Emilio G. Cota 提交于 7月 26, 2017

This paves the way for enabling scalable parallel generation of TCG code.

Instead of tracking TBs with a single binary search tree (BST), use a
BST for each TCG region, protecting it with a lock. This is as scalable
as it gets, since each TCG thread operates on a separate region.

The core of this change is the introduction of struct tcg_region_tree,
which contains a pointer to a GTree and an associated lock to serialize
accesses to it. We then allocate an array of tcg_region_tree's, adding
the appropriate padding to avoid false sharing based on
qemu_dcache_linesize.

Given a tc_ptr, we first find the corresponding region_tree. This
is done by special-casing the first and last regions first, since they
might be of size != region.size; otherwise we just divide the offset
by region.stride. I was worried about this division (several dozen
cycles of latency), but profiling shows that this is not a fast path.
Note that region.stride is not required to be a power of two; it
is only required to be a multiple of the host's page size.

Note that with this design we can also provide consistent snapshots
about all region trees at once; for instance, tcg_tb_foreach
acquires/releases all region_tree locks before/after iterating over them.
For this reason we now drop tb_lock in dump_exec_info().

As an alternative I considered implementing a concurrent BST, but this
can be tricky to get right, offers no consistent snapshots of the BST,
and performance and scalability-wise I don't think it could ever beat
having separate GTrees, given that our workload is insert-mostly (all
concurrent BST designs I've seen focus, understandably, on making
lookups fast, which comes at the expense of convoluted, non-wait-free
insertions/removals).
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

be2cdc5e

09 5月, 2018 1 次提交

tcg: Limit the number of ops in a TB · abebf925

由 Richard Henderson 提交于 5月 08, 2018

In 6001f772 we partially attempt to address the branch
displacement overflow caused by 15fa08f8.

However, gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqtbX.c
is a testcase that contains a TB so large as to overflow anyway.
The limit here of 8000 ops produces a maximum output TB size of
24112 bytes on a ppc64le host with that test case.  This is still
much less than the maximum forward branch distance of 32764 bytes.

Cc: qemu-stable@nongnu.org
Fixes: 15fa08f8 ("tcg: Dynamically allocate TCGOps")
Reviewed-by: NLaurent Vivier <laurent@vivier.eu>
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

abebf925

02 5月, 2018 2 次提交

tcg: workaround branch instruction overflow in tcg_out_qemu_ld/st · 6001f772

由 Laurent Vivier 提交于 4月 30, 2018

ppc64 uses a BC instruction to call the tcg_out_qemu_ld/st
slow path. BC instruction uses a relative address encoded
on 14 bits.

The slow path functions are added at the end of the generated
instructions buffer, in the reverse order of the callers.
So more we have slow path functions more the distance between
the caller (BC) and the function increases.

This patch changes the behavior to generate the functions in
the same order of the callers.

Cc: qemu-stable@nongnu.org
Fixes: 15fa08f8 ("tcg: Dynamically allocate TCGOps")
Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
Message-Id: <20180429235840.16659-1-lvivier@redhat.com>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

6001f772

tcg: Improve TCGv_ptr support · 5bfa8034

由 Richard Henderson 提交于 2月 22, 2018

Drop TCGV_PTR_TO_NAT and TCGV_NAT_TO_PTR internal macros.

Add tcg_temp_local_new_ptr, tcg_gen_brcondi_ptr, tcg_gen_ext_i32_ptr,
tcg_gen_trunc_i64_ptr, tcg_gen_extu_ptr_i64, tcg_gen_trunc_ptr_i32.

Use inlines instead of macros where possible.
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

5bfa8034

08 2月, 2018 5 次提交

tcg: Add generic vector ops for multiplication · 3774030a

由 Richard Henderson 提交于 11月 21, 2017

Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

3774030a

tcg: Add generic vector ops for comparisons · 212be173

由 Richard Henderson 提交于 11月 17, 2017

Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

212be173

tcg: Add generic vector ops for constant shifts · d0ec9796

由 Richard Henderson 提交于 11月 17, 2017

Opcodes are added for scalar and vector shifts, but considering the
varied semantics of these do not expose them to the front ends. Do
go ahead and provide them in case they are needed for backend expansion.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

d0ec9796

tcg: Add generic vector expanders · db432672

由 Richard Henderson 提交于 9月 15, 2017

Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

db432672

tcg: Add types and basic operations for host vectors · d2fd745f

由 Richard Henderson 提交于 9月 14, 2017

Nothing uses or enables them yet.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

d2fd745f

30 12月, 2017 3 次提交

tcg: Generalize TCGOp parameters · cd9090aa

由 Richard Henderson 提交于 11月 14, 2017

We had two fields specific to INDEX_op_call. Rename these and
add some macros so that the fields may be reused for other opcodes.
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

cd9090aa

tcg: Dynamically allocate TCGOps · 15fa08f8

由 Richard Henderson 提交于 11月 02, 2017

With no fixed array allocation, we can't overflow a buffer.
This will be important as optimizations related to host vectors
may expand the number of ops used.

Use QTAILQ to link the ops together.
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

15fa08f8

tcg: Remove TCGV_UNUSED* and TCGV_IS_UNUSED* · f764718d

由 Richard Henderson 提交于 11月 02, 2017

These are now trivial sets and tests against NULL.  Unwrap.
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

f764718d

03 11月, 2017 1 次提交

tcg: Allow constant pool entries in the prologue · 5b38ee31

由 Richard Henderson 提交于 10月 25, 2017

Both ARMv6 and AArch64 currently may drop complex guest_base values
into the constant pool.  But generic code wasn't expecting that, and
the pool is not emitted.  Correct that.
Tested-by: NEmilio G. Cota <cota@braap.org>
Tested-by: NLaurent Desnogues <laurent.desnogues@gmail.com>
Reported-by: NLaurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

5b38ee31

25 10月, 2017 5 次提交

tcg: Initialize cpu_env generically · 1c2adb95

由 Richard Henderson 提交于 10月 10, 2017

This is identical for each target.  So, move the initialization to
common code.  Move the variable itself out of tcg_ctx and name it
cpu_env to minimize changes within targets.

This also means we can remove tcg_global_reg_new_{ptr,i32,i64},
since there are no longer global-register temps created by targets.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

1c2adb95

tcg: enable multiple TCG contexts in softmmu · 3468b59e

由 Emilio G. Cota 提交于 7月 19, 2017

This enables parallel TCG code generation. However, we do not take
advantage of it yet since tb_lock is still held during tb_gen_code.

In user-mode we use a single TCG context; see the documentation
added to tcg_region_init for the rationale.

Note that targets do not need any conversion: targets initialize a
TCGContext (e.g. defining TCG globals), and after this initialization
has finished, the context is cloned by the vCPU threads, each of
them keeping a separate copy.

TCG threads claim one entry in tcg_ctxs[] by atomically increasing
n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's
of that variable and tcg_ctxs; they are there just to play nice with
analysis tools such as thread sanitizer.

Note that we do not allocate an array of contexts (we allocate
an array of pointers instead) because when tcg_context_init
is called, we do not know yet how many contexts we'll use since
the bool behind qemu_tcg_mttcg_enabled() isn't set yet.

Previous patches folded some TCG globals into TCGContext. The non-const
globals remaining are only set at init time, i.e. before the TCG
threads are spawned. Here is a list of these set-at-init-time globals
under tcg/:

Only written by tcg_context_init:
- indirect_reg_alloc_order
- tcg_op_defs
Only written by tcg_target_init (called from tcg_context_init):
- tcg_target_available_regs
- tcg_target_call_clobber_regs
- arm: arm_arch, use_idiv_instructions
- i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt,
        have_movbe, have_popcnt
- mips: use_movnz_instructions, use_mips32_instructions,
        use_mips32r2_instructions, got_sigill (tcg_target_detect_isa)
- ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr
- s390: tb_ret_addr, s390_facilities
- sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines),
         use_vis3_instructions

Only written by tcg_prologue_init:
- 'struct jit_code_entry one_entry'
- aarch64: tb_ret_addr
- arm: tb_ret_addr
- i386: tb_ret_addr, guest_base_flags
- ia64: tb_ret_addr
- mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

3468b59e

tcg: introduce regions to split code_gen_buffer · e8feb96f

由 Emilio G. Cota 提交于 7月 07, 2017

This is groundwork for supporting multiple TCG contexts.

The naive solution here is to split code_gen_buffer statically
among the TCG threads; this however results in poor utilization
if translation needs are different across TCG threads.

What we do here is to add an extra layer of indirection, assigning
regions that act just like pages do in virtual memory allocation.
(BTW if you are wondering about the chosen naming, I did not want
to use blocks or pages because those are already heavily used in QEMU).

We use a global lock to serialize allocations as well as statistics
reporting (we now export the size of the used code_gen_buffer with
tcg_code_size()). Note that for the allocator we could just use
a counter and atomic_inc; however, that would complicate the gathering
of tcg_code_size()-like stats. So given that the region operations are
not a fast path, a lock seems the most reasonable choice.

The effectiveness of this approach is clear after seeing some numbers.
I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark.
Note that I'm evaluating this after enabling per-thread TCG (which
is done by a subsequent commit).

* -smp 1, 1 region (entire buffer):
    qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357
    qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363
    qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364
    qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373
    qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373
    qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360
    qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370
    qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367

That is, 8 flushes.

* -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]:

    qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356
    qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361
    qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361
    qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375
    qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375
    qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360
    qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365
    qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368

Again, 8 flushes. Note how buffer utilization is not 100%, but it
is close. Smaller region sizes would yield higher utilization,
but we want region allocation to be rare (it acquires a lock), so
we do not want to go too small.

* -smp 8, static partitioning of 8 regions (10 MB per region):
    qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354
    qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370
    qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365
    qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377
    qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358
    qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367
    qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364
    qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358
    qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362
    qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372
    qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374
    qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376
    qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374
    qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372
    qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359
    qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362
    qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368
    qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378
    qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367
    qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364

That is, 20 flushes. Note how a static partitioning approach uses
the code buffer poorly, leading to many unnecessary flushes.
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

e8feb96f

tcg: distribute profiling counters across TCGContext's · c3fac113

由 Emilio G. Cota 提交于 7月 05, 2017

This is groundwork for supporting multiple TCG contexts.

To avoid scalability issues when profiling info is enabled, this patch
makes the profiling info counters distributed via the following changes:

1) Consolidate profile info into its own struct, TCGProfile, which
   TCGContext also includes. Note that tcg_table_op_count is brought
   into TCGProfile after dropping the tcg_ prefix.
2) Iterate over the TCG contexts in the system to obtain the total counts.

This change also requires updating the accessors to TCGProfile fields to
use atomic_read/set whenever there may be conflicting accesses (as defined
in C11) to them.
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

c3fac113

tcg: introduce **tcg_ctxs to keep track of all TCGContext's · df2cce29

由 Emilio G. Cota 提交于 7月 12, 2017

Groundwork for supporting multiple TCG contexts.

Note that having n_tcg_ctxs is unnecessary. However, it is
convenient to have it, since it will simplify iterating over the
array: we'll have just a for loop instead of having to iterate
over a NULL-terminated array (which would require n+1 elems)
or having to check with ifdef's for usermode/softmmu.
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

df2cce29