提交 · 3468b59e18b179bc63c7ce934de912dfa9596122 · openeuler / qemu

25 10月, 2017 18 次提交

tcg: enable multiple TCG contexts in softmmu · 3468b59e

由 Emilio G. Cota 提交于 7月 19, 2017

This enables parallel TCG code generation. However, we do not take
advantage of it yet since tb_lock is still held during tb_gen_code.

In user-mode we use a single TCG context; see the documentation
added to tcg_region_init for the rationale.

Note that targets do not need any conversion: targets initialize a
TCGContext (e.g. defining TCG globals), and after this initialization
has finished, the context is cloned by the vCPU threads, each of
them keeping a separate copy.

TCG threads claim one entry in tcg_ctxs[] by atomically increasing
n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's
of that variable and tcg_ctxs; they are there just to play nice with
analysis tools such as thread sanitizer.

Note that we do not allocate an array of contexts (we allocate
an array of pointers instead) because when tcg_context_init
is called, we do not know yet how many contexts we'll use since
the bool behind qemu_tcg_mttcg_enabled() isn't set yet.

Previous patches folded some TCG globals into TCGContext. The non-const
globals remaining are only set at init time, i.e. before the TCG
threads are spawned. Here is a list of these set-at-init-time globals
under tcg/:

Only written by tcg_context_init:
- indirect_reg_alloc_order
- tcg_op_defs
Only written by tcg_target_init (called from tcg_context_init):
- tcg_target_available_regs
- tcg_target_call_clobber_regs
- arm: arm_arch, use_idiv_instructions
- i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt,
        have_movbe, have_popcnt
- mips: use_movnz_instructions, use_mips32_instructions,
        use_mips32r2_instructions, got_sigill (tcg_target_detect_isa)
- ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr
- s390: tb_ret_addr, s390_facilities
- sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines),
         use_vis3_instructions

Only written by tcg_prologue_init:
- 'struct jit_code_entry one_entry'
- aarch64: tb_ret_addr
- arm: tb_ret_addr
- i386: tb_ret_addr, guest_base_flags
- ia64: tb_ret_addr
- mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

3468b59e

tcg: introduce regions to split code_gen_buffer · e8feb96f

由 Emilio G. Cota 提交于 7月 07, 2017

This is groundwork for supporting multiple TCG contexts.

The naive solution here is to split code_gen_buffer statically
among the TCG threads; this however results in poor utilization
if translation needs are different across TCG threads.

What we do here is to add an extra layer of indirection, assigning
regions that act just like pages do in virtual memory allocation.
(BTW if you are wondering about the chosen naming, I did not want
to use blocks or pages because those are already heavily used in QEMU).

We use a global lock to serialize allocations as well as statistics
reporting (we now export the size of the used code_gen_buffer with
tcg_code_size()). Note that for the allocator we could just use
a counter and atomic_inc; however, that would complicate the gathering
of tcg_code_size()-like stats. So given that the region operations are
not a fast path, a lock seems the most reasonable choice.

The effectiveness of this approach is clear after seeing some numbers.
I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark.
Note that I'm evaluating this after enabling per-thread TCG (which
is done by a subsequent commit).

* -smp 1, 1 region (entire buffer):
    qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357
    qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363
    qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364
    qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373
    qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373
    qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360
    qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370
    qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367

That is, 8 flushes.

* -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]:

    qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356
    qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361
    qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361
    qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375
    qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375
    qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360
    qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365
    qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368

Again, 8 flushes. Note how buffer utilization is not 100%, but it
is close. Smaller region sizes would yield higher utilization,
but we want region allocation to be rare (it acquires a lock), so
we do not want to go too small.

* -smp 8, static partitioning of 8 regions (10 MB per region):
    qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354
    qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370
    qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365
    qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377
    qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358
    qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367
    qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364
    qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358
    qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362
    qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372
    qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374
    qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376
    qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374
    qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372
    qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359
    qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362
    qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368
    qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378
    qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367
    qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364

That is, 20 flushes. Note how a static partitioning approach uses
the code buffer poorly, leading to many unnecessary flushes.
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

e8feb96f

tcg: distribute profiling counters across TCGContext's · c3fac113

由 Emilio G. Cota 提交于 7月 05, 2017

This is groundwork for supporting multiple TCG contexts.

To avoid scalability issues when profiling info is enabled, this patch
makes the profiling info counters distributed via the following changes:

1) Consolidate profile info into its own struct, TCGProfile, which
   TCGContext also includes. Note that tcg_table_op_count is brought
   into TCGProfile after dropping the tcg_ prefix.
2) Iterate over the TCG contexts in the system to obtain the total counts.

This change also requires updating the accessors to TCGProfile fields to
use atomic_read/set whenever there may be conflicting accesses (as defined
in C11) to them.
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

c3fac113

tcg: introduce **tcg_ctxs to keep track of all TCGContext's · df2cce29

由 Emilio G. Cota 提交于 7月 12, 2017

Groundwork for supporting multiple TCG contexts.

Note that having n_tcg_ctxs is unnecessary. However, it is
convenient to have it, since it will simplify iterating over the
array: we'll have just a for loop instead of having to iterate
over a NULL-terminated array (which would require n+1 elems)
or having to check with ifdef's for usermode/softmmu.
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

df2cce29

tcg: define tcg_init_ctx and make tcg_ctx a pointer · b1311c4a

由 Emilio G. Cota 提交于 7月 12, 2017

Groundwork for supporting multiple TCG contexts.

The core of this patch is this change to tcg/tcg.h:

> -extern TCGContext tcg_ctx;
> +extern TCGContext tcg_init_ctx;
> +extern TCGContext *tcg_ctx;

Note that for now we set *tcg_ctx to whatever TCGContext is passed
to tcg_context_init -- in this case &tcg_init_ctx.
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

b1311c4a

tcg: Remove GET_TCGV_* and MAKE_TCGV_* · dc41aa7d

由 Richard Henderson 提交于 10月 20, 2017

The GET and MAKE functions weren't really specific enough.
We now have a full complement of functions that convert exactly
between temporaries, arguments, tcgv pointers, and indices.

The target/sparc change is also a bug fix, which would have affected
a host that defines TCG_TARGET_HAS_extr[lh]_i64_i32, i.e. MIPS64.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

dc41aa7d

tcg: Introduce temp_tcgv_{i32,i64,ptr} · 085272b3

由 Richard Henderson 提交于 10月 20, 2017

Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

085272b3

tcg: Introduce tcgv_{i32,i64,ptr}_{arg,temp} · ae8b75dc

由 Richard Henderson 提交于 10月 15, 2017

Transform TCGv_* to an "argument" or a temporary.
For now, an argument is simply the temporary index.
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

ae8b75dc

tcg: Push tcg_ctx into tcg_gen_callN · 960c50e0

由 Richard Henderson 提交于 10月 15, 2017

Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

960c50e0

tcg: Change temp_allocate_frame arg to TCGTemp · 2272e4a7

由 Richard Henderson 提交于 11月 09, 2016

Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

2272e4a7

tcg: Avoid loops against variable bounds · ac3b8891

由 Richard Henderson 提交于 11月 02, 2016

Copy s->nb_globals or s->nb_temps to a local variable for the purposes
of iteration.  This should allow the compiler to use low-overhead
looping constructs on some hosts.
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

ac3b8891

tcg: Use per-temp state data in liveness · b83eabea

由 Richard Henderson 提交于 11月 01, 2016

This avoids having to allocate external memory for each temporary.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

b83eabea

tcg: Introduce temp_arg, export temp_idx · 1807f4c4

由 Richard Henderson 提交于 6月 20, 2017

At the same time, drop the TCGContext argument and use tcg_ctx instead.
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

1807f4c4

tcg: Add temp_global bit to TCGTemp · fa477d25

由 Richard Henderson 提交于 11月 02, 2016

This avoids needing to test the index of a temp against nb_globals.
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

fa477d25

tcg: Introduce arg_temp · 43439139

由 Richard Henderson 提交于 6月 19, 2017

Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

43439139

tcg: Propagate TCGOp down to allocators · dd186292

由 Richard Henderson 提交于 12月 08, 2016

Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

dd186292

tcg: Propagate args to op->args in tcg.c · efee3746

由 Richard Henderson 提交于 12月 08, 2016

Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

efee3746

tcg: Merge opcode arguments into TCGOp · 75e8b9b7

由 Richard Henderson 提交于 12月 08, 2016

Rather than have a separate buffer of 10*max_ops entries,
give each opcode 10 entries.  The result is actually a bit
smaller and should have slightly more cache locality.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

75e8b9b7

11 10月, 2017 1 次提交

tcg: define TCG_HIGHWATER · a505785c

由 Emilio G. Cota 提交于 7月 07, 2017

Will come in handy very soon.
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

a505785c

10 10月, 2017 2 次提交

tcg: take .helpers out of TCGContext · 619205fd

由 Emilio G. Cota 提交于 7月 05, 2017

Groundwork for supporting multiple TCG contexts.

The hash table becomes read-only after it is filled in,
so we can save space by keeping just a global pointer to it.
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

619205fd

exec-all: extract tb->tc_* into a separate struct tc_tb · e7e168f4

由 Emilio G. Cota 提交于 7月 12, 2017

In preparation for adding tc.size to be able to keep track of
TB's using the binary search tree implementation from glib.
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

e7e168f4

17 9月, 2017 4 次提交

tcg: Remove tcg_regset_{or,and,andnot,not} · 07ddf036

由 Richard Henderson 提交于 9月 11, 2017

Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

07ddf036

tcg: Remove tcg_regset_set · d21369f5

由 Richard Henderson 提交于 9月 11, 2017

Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

d21369f5

tcg: Remove tcg_regset_clear · ccb1bb66

由 Richard Henderson 提交于 9月 11, 2017

Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

ccb1bb66

tcg: Add tcg_op_supported · be0f34b5

由 Richard Henderson 提交于 8月 17, 2017

Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

be0f34b5

08 9月, 2017 2 次提交

tcg: Infrastructure for managing constant pools · 57a26946

由 Richard Henderson 提交于 7月 30, 2017

A new shared header tcg-pool.inc.c adds new_pool_label,
for registering a tcg_target_ulong to be emitted after
the generated code, plus relocation data to install a
pointer to the data.

A new pointer is added to the TCGContext, so that we
dump the constant pool as data, not code.
Signed-off-by: NRichard Henderson <rth@twiddle.net>

57a26946

tcg: Rearrange ldst label tracking · 659ef5cb

由 Richard Henderson 提交于 7月 30, 2017

Dispense with TCGBackendData, as it has never been used for more than
holding a single pointer. Use a define in the cpu/tcg-target.h to
signal requirement for TCGLabelQemuLdst, so that we can drop the no-op
tcg-be-null.h stubs. Rename tcg-be-ldst.h to tcg-ldst.inc.c.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

659ef5cb

20 6月, 2017 1 次提交

tcg: allocate TB structs before the corresponding translated code · 6e3b2bfd

由 Emilio G. Cota 提交于 6月 06, 2017

Allocating an arbitrarily-sized array of tbs results in either
(a) a lot of memory wasted or (b) unnecessary flushes of the code
cache when we run out of TB structs in the array.

An obvious solution would be to just malloc a TB struct when needed,
and keep the TB array as an array of pointers (recall that tb_find_pc()
needs the TB array to run in O(log n)).

Perhaps a better solution, which is implemented in this patch, is to
allocate TB's right before the translated code they describe. This
results in some memory waste due to padding to have code and TBs in
separate cache lines--for instance, I measured 4.7% of padding in the
used portion of code_gen_buffer when booting aarch64 Linux on a
host with 64-byte cache lines. However, it can allow for optimizations
in some host architectures, since TCG backends could safely assume that
the TB and the corresponding translated code are very close to each
other in memory. See this message by rth for a detailed explanation:

  https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05172.html
  Subject: Re: GSoC 2017 Proposal: TCG performance enhancements
  Message-ID: <1e67644b-4b30-887e-d329-1848e94c9484@twiddle.net>
Suggested-by: NRichard Henderson <rth@twiddle.net>
Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1496790745-314-3-git-send-email-cota@braap.org>
[rth: Simplify the arithmetic in tcg_tb_alloc]
Signed-off-by: NRichard Henderson <rth@twiddle.net>

6e3b2bfd

06 6月, 2017 1 次提交

tcg: Introduce goto_ptr opcode and tcg_gen_lookup_and_goto_ptr · cedbcb01

由 Emilio G. Cota 提交于 4月 26, 2017

Instead of exporting goto_ptr directly to TCG frontends, export
tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer
returned by the lookup_tb_ptr() helper. This is the only use case
we have for goto_ptr and lookup_tb_ptr, so having this function is
very convenient. Furthermore, it trivially allows us to avoid calling
the lookup helper if goto_ptr is not implemented by the backend.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1493263764-18657-2-git-send-email-cota@braap.org>
Message-Id: <1493263764-18657-3-git-send-email-cota@braap.org>
Message-Id: <1493263764-18657-4-git-send-email-cota@braap.org>
Message-Id: <1493263764-18657-5-git-send-email-cota@braap.org>
[rth: Squashed 4 related commits.]
Signed-off-by: NRichard Henderson <rth@twiddle.net>

cedbcb01

11 1月, 2017 4 次提交

tcg: Allow an operand to be matching or a constant · 17280ff4

由 Richard Henderson 提交于 11月 18, 2016

This allows an output operand to match an input operand
only when the input operand needs a register.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

17280ff4

tcg: Pass the opcode width to target_parse_constraint · 069ea736

由 Richard Henderson 提交于 11月 18, 2016

This will let us choose how to interpret a given constraint
depending on whether the opcode is 32- or 64-bit.  Which will
let us share more constraint combinations between opcodes.

At the same time, change the interface to return the advanced
pointer instead of passing it in/out by reference.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

069ea736

tcg: Transition flat op_defs array to a target callback · f69d277e

由 Richard Henderson 提交于 11月 18, 2016

This will allow the target to tailor the constraints to the
auto-detected ISA extensions.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

f69d277e

tcg: Add markup for output requires new register · 82790a87

由 Richard Henderson 提交于 11月 18, 2016

This is the same concept as, and same markup as, the
early clobber markup in gcc.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

82790a87

02 11月, 2016 1 次提交

log: Add locking to large logging blocks · 1ee73216

由 Richard Henderson 提交于 9月 22, 2016

Reuse the existing locking provided by stdio to keep in_asm, cpu,
op, op_opt, op_ind, and out_asm as contiguous blocks.

While it isn't possible to interleave e.g. in_asm or op_opt logs
because of the TB lock protecting all code generation, it is
possible to interleave cpu logs, or to interleave a cpu dump with
an out_asm dump.

For mingw32, we appear to have no viable solution for this.  The locking
functions are not properly exported from the system runtime library.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

1ee73216

24 10月, 2016 1 次提交

tcg: try sti when moving a constant into a dead memory temp · 0fe4fca4

由 Paolo Bonzini 提交于 9月 15, 2016

This comes from free from unifying tcg_reg_alloc_mov and
tcg_reg_alloc_movi's handling of TEMP_VAL_CONST.  It triggers
often on moves to cc_dst, such as the following translation
of "sub $0x3c,%esp":

  before:                          after:
  subl   $0x3c,%ebp                subl   $0x3c,%ebp
  movl   %ebp,0x10(%r14)           movl   %ebp,0x10(%r14)
  movl   $0x3c,%ebx                movl   $0x3c,0x2c(%r14)
  movl   %ebx,0x2c(%r14)
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <1473945360-13663-1-git-send-email-pbonzini@redhat.com>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0fe4fca4

15 9月, 2016 1 次提交

tcg: Remove duplicate header includes · 347519eb

由 Thomas Huth 提交于 9月 13, 2016

host-utils.h and timer.h are included twice in tcg.c.
One time should be enough.
Signed-off-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NMichael Tokarev <mjt@tls.msk.ru>

347519eb

06 8月, 2016 4 次提交

tcg: Lower indirect registers in a separate pass · 5a18407f

由 Richard Henderson 提交于 6月 23, 2016

Rather than rely on recursion during the middle of register allocation,
lower indirect registers to loads and stores off the indirect base into
plain temps.

For an x86_64 host, with sufficient registers, this results in identical
code, modulo the actual register assignments.

For an i686 host, with insufficient registers, this means that temps can
be (temporarily) spilled to the stack in order to satisfy an allocation.
This as opposed to the possibility of not being able to spill, to allocate
a register for the indirect base, in order to perform a spill.
Reviewed-by: NAurelien Jarno <aurelien@aurel32.net>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

5a18407f

tcg: Require liveness analysis · c0ef05b5

由 Richard Henderson 提交于 6月 22, 2016

Reviewed-by: NAurelien Jarno <aurelien@aurel32.net>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

c0ef05b5

tcg: Include liveness info in the dumps · bdfb460e

由 Richard Henderson 提交于 6月 23, 2016

Reviewed-by: NAurelien Jarno <aurelien@aurel32.net>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

bdfb460e

tcg: Compress dead_temps and mem_temps into a single array · c70fbf0a

由 Richard Henderson 提交于 6月 23, 2016

We only need two bits per temporary.  Fold the two bytes into one,
and reduce the memory and cachelines required during compilation.
Reviewed-by: NAurelien Jarno <aurelien@aurel32.net>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

c70fbf0a