提交 · 75478279a0c1eafc7b69d5382356da138f58f1bd · openeuler / qemu

17 12月, 2018 5 次提交

tcg/i386: Implement INDEX_op_extr{lh}_i64_i32 for 32-bit guests · 75478279

This preserves the invariant that all TCG_TYPE_I32 values are
zero-extended in the 64-bit host register.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

75478279

tcg/i386: Propagate is64 to tcg_out_qemu_ld_slow_path · 3dbc8c61

由 Richard Henderson 提交于 6年前

This helps preserve the invariant that all TCG_TYPE_I32 values
are stored zero-extended in the 64-bit host registers.
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

3dbc8c61

tcg/i386: Propagate is64 to tcg_out_qemu_ld_direct · 1d21d95b

由 Richard Henderson 提交于 6年前

This helps preserve the invariant that all TCG_TYPE_I32 values
are stored zero-extended in the 64-bit host registers.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

1d21d95b

tcg/i386: Return false on failure from patch_reloc · bec3afd5

由 Richard Henderson 提交于 6年前

Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

bec3afd5

tcg: Return success from patch_reloc · 6ac17786

由 Richard Henderson 提交于 6年前

This will move the assert for success from within (subroutines of)
patch_reloc into the callers.  It will also let new code do something
different when a relocation is out of range.

For the moment, all backends are trivially converted to return true.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

6ac17786

27 9月, 2018 1 次提交

tcg/i386: fix vector operations on 32-bit hosts · 93bf9a42

由 Roman Kapl 提交于 6年前

The TCG backend uses LOWREGMASK to get the low 3 bits of register numbers.
This was defined as no-op for 32-bit x86, with the assumption that we have
eight registers anyway. This assumption is not true once we have xmm regs.

Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes
and have overflown into other opcode fields, wreaking havoc.

To trigger these problems, you can try running the "movi d8, #0x0" AArch64
instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated,
but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2".

Fixes: 770c2fc7 ("Add vector operations")
Signed-off-by: NRoman Kapl <rka@sysgo.com>
Message-Id: <20180824131734.18557-1-rka@sysgo.com>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

93bf9a42

24 7月, 2018 1 次提交

tcg/i386: Mark xmm registers call-clobbered · 672189cd

由 Richard Henderson 提交于 6年前

When host vector registers and operations were introduced, I failed
to mark the registers call clobbered as required by the ABI.

Fixes: 770c2fc7
Cc: qemu-stable@nongnu.org
Reported-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

672189cd

16 6月, 2018 2 次提交

tcg: Reduce max TB opcode count · 9f754620

由 Richard Henderson 提交于 6年前

Also, assert that we don't overflow any of two different offsets into
the TB. Both unwind and goto_tb both record a uint16_t for later use.

This fixes an arm-softmmu test case utilizing NEON in which there is
a TB generated that runs to 7800 opcodes, and compiles to 96k on an
x86_64 host.  This overflows the 16-bit offset in which we record the
goto_tb reset offset.  Because of that overflow, we install a jump
destination that goes to neverland.  Boom.

With this reduced op count, the same TB compiles to about 48k for
aarch64, ppc64le, and x86_64 hosts, and neither assertion fires.

Cc: qemu-stable@nongnu.org
Reported-by: N"Jason A. Donenfeld" <Jason@zx2c4.com>
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

9f754620

tcg/i386: Use byte form of xgetbv instruction · 1019242a

由 John Arbuckle 提交于 6年前

The assembler in most versions of Mac OS X is pretty old and does not
support the xgetbv instruction.  To go around this problem, the raw
encoding of the instruction is used instead.
Signed-off-by: NJohn Arbuckle <programmingkidx@gmail.com>
Message-Id: <20180604215102.11002-1-programmingkidx@gmail.com>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

1019242a

09 5月, 2018 1 次提交

tcg/i386: Fix dup_vec in non-AVX2 codepath · 7eb30ef0

由 Peter Maydell 提交于 6年前

The VPUNPCKLD* instructions are all "non-destructive source",
indicated by "NDS" in the encoding string in the x86 ISA manual.
This means that they take two source operands, one of which is
encoded in the VEX.vvvv field. We were incorrectly treating them
as if they were destructive-source and passing 0 as the 'v'
argument of tcg_out_vex_modrm(). This meant we were always
using %xmm0 as one of the source operands, causing incorrect
results if the register allocator happened to want to use
something else. For instance the input AArch64 insn:
 DUP v26.16b, w21
which becomes TCG IR ops:
 dup_vec v128,e8,tmp2,x21
 st_vec v128,e8,tmp2,env,$0xa40
was assembled to:
0x607c568c:  c4 c1 7a 7e 86 e8 00 00  vmovq    0xe8(%r14), %xmm0
0x607c5694:  00
0x607c5695:  c5 f9 60 c8              vpunpcklbw %xmm0, %xmm0, %xmm1
0x607c5699:  c5 f9 61 c9              vpunpcklwd %xmm1, %xmm0, %xmm1
0x607c569d:  c5 f9 70 c9 00           vpshufd  $0, %xmm1, %xmm1
0x607c56a2:  c4 c1 7a 7f 8e 40 0a 00  vmovdqu  %xmm1, 0xa40(%r14)
0x607c56aa:  00

when the vpunpcklwd insn should be "%xmm1, %xmm1, %xmm1".
This resulted in our incorrectly setting the output vector to
q26=0000320000003200:0000320000003200
when given an input of x21 == 0000000002803200
rather than the expected all-zeroes.

Pass the correct source register number to tcg_out_vex_modrm()
for these insns.

Fixes: 770c2fc7
Cc: qemu-stable@nongnu.org
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Message-Id: <20180504153431.5169-1-peter.maydell@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

7eb30ef0

16 3月, 2018 1 次提交

tcg/i386: Support INDEX_op_dup2_vec for -m32 · 7f34ed4b

由 Richard Henderson 提交于 6年前

Unknown why -m32 was passing with gcc but not clang; it should have
failed for both.  This would be used for tcg_gen_dup_i64_vec, and
visible with the right TB and an aarch64 guest.
Reported-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

7f34ed4b

08 2月, 2018 1 次提交

tcg/i386: Add vector operations · 770c2fc7

由 Richard Henderson 提交于 7年前

The x86 vector instruction set is extremely irregular.  With newer
editions, Intel has filled in some of the blanks.  However, we don't
get many 64-bit operations until SSE4.2, introduced in 2009.

The subsequent edition was for AVX1, introduced in 2011, which added
three-operand addressing, and adjusts how all instructions should be
encoded.

Given the relatively narrow 2 year window between possible to support
and desirable to support, and to vastly simplify code maintainence,
I am only planning to support AVX1 and later cpus.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

770c2fc7

10 10月, 2017 1 次提交

tcg/i386: constify tcg_target_callee_save_regs · e268f4c0

由 Emilio G. Cota 提交于 7年前

Reviewed-by: NRichard Henderson <rth@twiddle.net>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

e268f4c0

17 9月, 2017 2 次提交

tcg: Remove tcg_regset_set32 · f46934df

由 Richard Henderson 提交于 7年前

It's not even clear what the interface REG and VAL32 were supposed to mean.
All uses had REG = 0 and VAL32 was the bitset assigned to the destination.
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

f46934df

tcg: Remove tcg_regset_clear · ccb1bb66

由 Richard Henderson 提交于 7年前

Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

ccb1bb66

08 9月, 2017 2 次提交

tcg/i386: Store out-of-range call targets in constant pool · 4e45f239

由 Richard Henderson 提交于 7年前

Already it saves 2 bytes per call, but also the constant pool
entry may well be shared across multiple calls.
Signed-off-by: NRichard Henderson <rth@twiddle.net>

4e45f239

tcg: Rearrange ldst label tracking · 659ef5cb

由 Richard Henderson 提交于 7年前

Dispense with TCGBackendData, as it has never been used for more than
holding a single pointer. Use a define in the cpu/tcg-target.h to
signal requirement for TCGLabelQemuLdst, so that we can drop the no-op
tcg-be-null.h stubs. Rename tcg-be-ldst.h to tcg-ldst.inc.c.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

659ef5cb

24 7月, 2017 1 次提交

util: Introduce include/qemu/cpuid.h · 5dd89908

由 Richard Henderson 提交于 7年前

Clang 3.9 passes the CONFIG_AVX2_OPT configure test.  However, the
supplied <cpuid.h> does not contain the bit_AVX2 define that we use
when detecting whether the routine can be enabled.

Introduce a qemu-specific header that uses the compiler's definition
of __cpuid et al, but supplies any missing bit_* definitions needed.
This avoids introducing any extra ifdefs to util/bufferiszero.c, and
allows quite a few to be removed from tcg/i386/tcg-target.inc.c.
Signed-off-by: NRichard Henderson <rth@twiddle.net>
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Message-id: 20170719044018.18063-1-rth@twiddle.net
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

5dd89908

06 6月, 2017 1 次提交

tcg/i386: implement goto_ptr · 5cb4ef80

由 Emilio G. Cota 提交于 7年前

Suggested-by: NRichard Henderson <rth@twiddle.net>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1493263764-18657-6-git-send-email-cota@braap.org>
[rth: Reuse goto_ptr epilogue for exit_tb 0.]
Signed-off-by: NRichard Henderson <rth@twiddle.net>

5cb4ef80

18 1月, 2017 2 次提交

tcg/i386: Always use TZCNT when available · 39f099ec

由 Richard Henderson 提交于 8年前

I think this is cleaner than sometimes using BSF.
Signed-off-by: NRichard Henderson <rth@twiddle.net>

39f099ec

Revert "tcg/i386: Rely on undefined/undocumented behaviour of BSF/BSR" · 9bf38308

由 Richard Henderson 提交于 8年前

This reverts commit 4ac76910.

This fixes
  http://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg03062.html

While I think we could get away with relying on the undocumented
behaviour, the tcg constraint system isn't powerful enough to
properly describe the required (non-)overlap conditions.
Reported-by: NEduardo Habkost <ehabkost@redhat.com>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

9bf38308

11 1月, 2017 8 次提交

R
tcg/i386: Handle ctpop opcode · 993508e4
由 Richard Henderson 提交于 8年前
```
Signed-off-by: NRichard Henderson <rth@twiddle.net>
```
993508e4

tcg/i386: Rely on undefined/undocumented behaviour of BSF/BSR · 4ac76910

由 Richard Henderson 提交于 8年前

The ISA manual documents the output is undefined if the input was zero.

However, we document in target-i386 that the behavior of real silicon
is to preserve the contents of the output register.  We also mention
that there are real applications that depend on this.  That this is
baked into silicon is mentioned as a potential cause for some false
sharing behaviour wrt lzcnt/tzcnt.

Taking advantage of this allows us to save 2 insns in the normal case,
and 4 insns for i686 emulating a 64-bit clz.
Signed-off-by: NRichard Henderson <rth@twiddle.net>

4ac76910

R
tcg/i386: Handle ctz and clz opcodes · bbf25f90
由 Richard Henderson 提交于 8年前
```
Signed-off-by: NRichard Henderson <rth@twiddle.net>
```
bbf25f90

tcg/i386: Allow bmi2 shiftx to have non-matching operands · 6a5aed4b

由 Richard Henderson 提交于 8年前

Previously we could not have different constraints for different ISA levels,
which prevented us from eliding the matching constraint for shifts.

We do now have to make sure that the operands match for constant shifts.
We can also handle some small left shifts via lea.
Signed-off-by: NRichard Henderson <rth@twiddle.net>

6a5aed4b

R
tcg/i386: Hoist common arguments in tcg_out_op · 42d5b514
由 Richard Henderson 提交于 8年前
```
Signed-off-by: NRichard Henderson <rth@twiddle.net>
```
42d5b514

tcg/i386: Fuly convert tcg_target_op_def · cd26449a

由 Richard Henderson 提交于 8年前

Use a switch instead of searching a table.  Share constraints between
32-bit and 64-bit, when at all possible.
Signed-off-by: NRichard Henderson <rth@twiddle.net>

cd26449a

tcg: Pass the opcode width to target_parse_constraint · 069ea736

由 Richard Henderson 提交于 8年前

This will let us choose how to interpret a given constraint
depending on whether the opcode is 32- or 64-bit.  Which will
let us share more constraint combinations between opcodes.

At the same time, change the interface to return the advanced
pointer instead of passing it in/out by reference.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

069ea736

tcg: Transition flat op_defs array to a target callback · f69d277e

由 Richard Henderson 提交于 8年前

This will allow the target to tailor the constraints to the
auto-detected ISA extensions.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

f69d277e

10 1月, 2017 1 次提交
- R
  tcg/i386: Implement field extraction opcodes · 78fdbfb9
  由 Richard Henderson 提交于 8年前
```
Signed-off-by: NRichard Henderson <rth@twiddle.net>
```
  78fdbfb9
21 9月, 2016 1 次提交

tcg/i386: Extend TARGET_PAGE_MASK to the proper type · ebb90a00

由 Richard Henderson 提交于 8年前

TARGET_PAGE_MASK, as defined, has type "int".  We need to extend
that to the proper target width before oring in an "unsigned".
Signed-off-by: NRichard Henderson <rth@twiddle.net>

ebb90a00

16 9月, 2016 2 次提交

tcg/i386: Add support for fence · a7d00d4e

由 Pranith Kumar 提交于 8年前

Generate a 'lock orl $0,0(%esp)' instruction for ordering instead of
mfence which has similar ordering semantics.
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-3-bobby.prani@gmail.com>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

a7d00d4e

tcg: Support arbitrary size + alignment · 85aa8081

由 Richard Henderson 提交于 8年前

Previously we allowed fully unaligned operations, but not operations
that are aligned but with less alignment than the operation size.

In addition, arm32, ia64, mips, and sparc had been omitted from the
previous overalignment patch, which would have led to that alignment
being enforced.
Signed-off-by: NRichard Henderson <rth@twiddle.net>

85aa8081

06 7月, 2016 2 次提交

tcg: Improve the alignment check infrastructure · 1f00b27f

由 Sergey Sorokin 提交于 8年前

Some architectures (e.g. ARMv8) need the address which is aligned
to a size more than the size of the memory access.
To support such check it's enough the current costless alignment
check implementation in QEMU, but we need to support
an alignment size specifying.
Signed-off-by: NSergey Sorokin <afarallax@yandex.ru>
Message-Id: <1466705806-679898-1-git-send-email-afarallax@yandex.ru>
Signed-off-by: NRichard Henderson <rth@twiddle.net>
[rth: Assert in tcg_canonicalize_memop.  Leave get_alignment_bits
available for, though unused by, user-mode.  Retain logging difference
based on ALIGNED_ONLY.]

1f00b27f

tcg: Optimize spills of constants · 59d7c14e

由 Richard Henderson 提交于 8年前

While we can store constants via constrants on INDEX_op_st_i32 et al,
we weren't able to spill constants to backing store.

Add a new backend interface, tcg_out_sti, which may store the constant
(and is allowed to fail).  Rearrange the temp_* helpers so that we only
attempt to directly store a constant when the temp is becoming dead/free.
Signed-off-by: NRichard Henderson <rth@twiddle.net>

59d7c14e

13 5月, 2016 2 次提交

tcg: Clean up direct block chaining data fields · f309101c

由 Sergey Fedorov 提交于 8年前

Briefly describe in a comment how direct block chaining is done. It
should help in understanding of the following data fields.

Rename some fields in TranslationBlock and TCGContext structures to
better reflect their purpose (dropping excessive 'tb_' prefix in
TranslationBlock but keeping it in TCGContext):
   tb_next_offset  =>  jmp_reset_offset
   tb_jmp_offset   =>  jmp_insn_offset
   tb_next         =>  jmp_target_addr
   jmp_next        =>  jmp_list_next
   jmp_first       =>  jmp_list_first

Avoid using a magic constant as an invalid offset which is used to
indicate that there's no n-th jump generated.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

f309101c

tcg/i386: Make direct jump patching thread-safe · 0d07abf0

由 Sergey Fedorov 提交于 8年前

Ensure direct jump patching in i386 is atomic by:
 * naturally aligning a location of direct jump address;
 * using atomic_read()/atomic_set() for code patching.

tcg_out_nopn() implementation:
Suggested-by: Richard Henderson <rth@twiddle.net>.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1461341333-19646-6-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

0d07abf0

21 4月, 2016 2 次提交

tcg: check for CONFIG_DEBUG_TCG instead of NDEBUG · 8d8fdbae

由 Aurelien Jarno 提交于 8年前

Check for CONFIG_DEBUG_TCG instead of NDEBUG, drop now useless code.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: NAurelien Jarno <aurelien@aurel32.net>
Message-id: 1461228530-14852-2-git-send-email-aurelien@aurel32.net
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

8d8fdbae

tcg: use tcg_debug_assert instead of assert (fix performance regression) · eabb7b91

由 Aurelien Jarno 提交于 8年前

The TCG code is quite performance sensitive, but at the same time can
also be quite tricky. That is why asserts that can be enabled with the
--enable-debug-tcg configure option.

This used to work the following way:

| #include "config.h"
|
| ...
|
| #if !defined(CONFIG_DEBUG_TCG) && !defined(NDEBUG)
| /* define it to suppress various consistency checks (faster) */
| #define NDEBUG
| #endif
|
| ...
|
| #include <assert.h>

Since commit 757e725b (tcg: Clean up includes) "config.h" as been
replaced by "qemu/osdep.h" which itself includes <assert.h>. As a
consequence the assertions are always enabled, even when using
--disable-debug-tcg, causing a performance regression, especially on
targets with many registers. For instance on qemu-system-ppc the
speed difference is about 15%.

tcg_debug_assert is controlled directly by CONFIG_DEBUG_TCG and already
uses in some places. This patch replaces all the calls to assert into
calss to tcg_debug_assert.

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: NAurelien Jarno <aurelien@aurel32.net>
Message-id: 1461228530-14852-1-git-send-email-aurelien@aurel32.net
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

eabb7b91

24 2月, 2016 1 次提交

tcg: Remove unnecessary osdep.h includes from tcg-target.inc.c · c3b7f668

由 Peter Maydell 提交于 8年前

Commit 757e725b added a number of #include "qemu/osdep.h"
files to the tcg-target.c files (as they were named at the time).
These are unnecessary because these files are not standalone C
files, and the tcg/tcg.c file which includes them will have
already included osdep.h on their behalf. Remove the unneeded
include directives.
Reviewed-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Message-Id: <1456238983-10160-4-git-send-email-peter.maydell@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

c3b7f668