提交 · c1f543b739086733024e31d74a52d9e41553f316 · openeuler / qemu

19 10月, 2018 2 次提交

tcg: fix use of uninitialized variable under CONFIG_PROFILER · c1f543b7

由 Emilio G. Cota 提交于 10月 10, 2018

We forgot to initialize n in commit 15fa08f8 ("tcg: Dynamically
allocate TCGOps", 2017-12-29).
Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <20181010144853.13005-3-cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

c1f543b7

tcg: Implement CPU_LOG_TB_NOCHAIN during expansion · d7f425fd

由 Richard Henderson 提交于 10月 06, 2018

Rather than test NOCHAIN before linking, do not emit the
goto_tb opcode at all.  We already do this for goto_ptr.
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

d7f425fd

27 9月, 2018 1 次提交

tcg/i386: fix vector operations on 32-bit hosts · 93bf9a42

由 Roman Kapl 提交于 8月 24, 2018

The TCG backend uses LOWREGMASK to get the low 3 bits of register numbers.
This was defined as no-op for 32-bit x86, with the assumption that we have
eight registers anyway. This assumption is not true once we have xmm regs.

Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes
and have overflown into other opcode fields, wreaking havoc.

To trigger these problems, you can try running the "movi d8, #0x0" AArch64
instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated,
but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2".

Fixes: 770c2fc7 ("Add vector operations")
Signed-off-by: NRoman Kapl <rka@sysgo.com>
Message-Id: <20180824131734.18557-1-rka@sysgo.com>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

93bf9a42

06 8月, 2018 1 次提交

tcg/optimize: Do not skip default processing of dup_vec · 1fb57da7

由 Richard Henderson 提交于 8月 05, 2018

If we do not opimize away dup_vec, we must mark its output as changed.

Fixes: 170ba88fReported-by: NLaurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
Reviewed-by: NLaurent Desnogues <laurent.desnogues@gmail.com>
Tested-by: NLaurent Desnogues <laurent.desnogues@gmail.com>
Message-id: 20180805233258.31892-1-richard.henderson@linaro.org
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

1fb57da7

24 7月, 2018 1 次提交

tcg/i386: Mark xmm registers call-clobbered · 672189cd

由 Richard Henderson 提交于 7月 22, 2018

When host vector registers and operations were introduced, I failed
to mark the registers call clobbered as required by the ABI.

Fixes: 770c2fc7
Cc: qemu-stable@nongnu.org
Reported-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

672189cd

20 7月, 2018 1 次提交

tcg/aarch64: limit mul_vec size · e65a5f22

由 Alex Bennée 提交于 7月 19, 2018

In AdvSIMD we can only do 32x32 integer multiples although SVE is
capable of larger 64 bit multiples. As a result we can end up
generating invalid opcodes. Fix this by only reprting we can emit
mul vector ops if the size is small enough.

Fixes a crash on:

  sve-all-short-v8.3+sve@vq3/insn_mul_z_zi___INC.risu.bin

When running on AArch64 hardware.
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Message-Id: <20180719154248.29669-1-alex.bennee@linaro.org>
[rth: Removed the tcg_debug_assert -- there are plenty of other
cases that we do not diagnose within the insn encoding helpers.]
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

e65a5f22

09 7月, 2018 1 次提交

tcg: Restrict check_size_impl to multiples of the line size · 499748d7

由 Richard Henderson 提交于 7月 09, 2018

Normally this is automatic in the size restrictions that are placed
on vector sizes coming from the implementation.  However, for the
legitimate size tuple [oprsz=8, maxsz=32], we need to clear the final
24 bytes of the vector register.  Without this check, do_dup selects
TCG_TYPE_V128 and clears only 16 bytes.
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Tested-by: NAlex Bennée <alex.bennee@linaro.org>
Message-id: 20180705191929.30773-2-richard.henderson@linaro.org
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

499748d7

16 6月, 2018 5 次提交

tcg: Reduce max TB opcode count · 9f754620

由 Richard Henderson 提交于 6月 14, 2018

Also, assert that we don't overflow any of two different offsets into
the TB. Both unwind and goto_tb both record a uint16_t for later use.

This fixes an arm-softmmu test case utilizing NEON in which there is
a TB generated that runs to 7800 opcodes, and compiles to 96k on an
x86_64 host.  This overflows the 16-bit offset in which we record the
goto_tb reset offset.  Because of that overflow, we install a jump
destination that goes to neverland.  Boom.

With this reduced op count, the same TB compiles to about 48k for
aarch64, ppc64le, and x86_64 hosts, and neither assertion fires.

Cc: qemu-stable@nongnu.org
Reported-by: N"Jason A. Donenfeld" <Jason@zx2c4.com>
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

9f754620

tcg: remove tb_lock · 0ac20318

由 Emilio G. Cota 提交于 8月 04, 2017

Use mmap_lock in user-mode to protect TCG state and the page descriptors.
In !user-mode, each vCPU has its own TCG state, so no locks needed.
Per-page locks are used to protect the page descriptors.

Per-TB locks are used in both modes to protect TB jumps.

Some notes:

- tb_lock is removed from notdirty_mem_write by passing a
  locked page_collection to tb_invalidate_phys_page_fast.

- tcg_tb_lookup/remove/insert/etc have their own internal lock(s),
  so there is no need to further serialize access to them.

- do_tb_flush is run in a safe async context, meaning no other
  vCPU threads are running. Therefore acquiring mmap_lock there
  is just to please tools such as thread sanitizer.

- Not visible in the diff, but tb_invalidate_phys_page already
  has an assert_memory_lock.

- cpu_io_recompile is !user-only, so no mmap_lock there.

- Added mmap_unlock()'s before all siglongjmp's that could
  be called in user-mode while mmap_lock is held.
  + Added an assert for !have_mmap_lock() after returning from
    the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic.

Performance numbers before/after:

Host: AMD Opteron(tm) Processor 6376

                 ubuntu 17.04 ppc64 bootup+shutdown time

  700 +-+--+----+------+------------+-----------+------------*--+-+
      |    +    +      +            +           +           *B    |
      |         before ***B***                            ** *    |
      |tb lock removal ###D###                         ***        |
  600 +-+                                           ***         +-+
      |                                           **         #    |
      |                                        *B*          #D    |
      |                                     *** *         ##      |
  500 +-+                                ***           ###      +-+
      |                             * ***           ###           |
      |                            *B*          # ##              |
      |                          ** *          #D#                |
  400 +-+                      **            ##                 +-+
      |                      **           ###                     |
      |                    **           ##                        |
      |                  **         # ##                          |
  300 +-+  *           B*          #D#                          +-+
      |    B         ***        ###                               |
      |    *       **       ####                                  |
      |     *   ***      ###                                      |
  200 +-+   B  *B     #D#                                       +-+
      |     #B* *   ## #                                          |
      |     #*    ##                                              |
      |    + D##D#     +            +           +            +    |
  100 +-+--+----+------+------------+-----------+------------+--+-+
           1    8      16      Guest CPUs       48           64
  png: https://imgur.com/HwmBHXe

              debian jessie aarch64 bootup+shutdown time

  90 +-+--+-----+-----+------------+------------+------------+--+-+
     |    +     +     +            +            +            +    |
     |         before ***B***                                B    |
  80 +tb lock removal ###D###                              **D  +-+
     |                                                   **###    |
     |                                                 **##       |
  70 +-+                                             ** #       +-+
     |                                             ** ##          |
     |                                           **  #            |
  60 +-+                                       *B  ##           +-+
     |                                       **  ##               |
     |                                    ***  #D                 |
  50 +-+                               ***   ##                 +-+
     |                             * **   ###                     |
     |                           **B*  ###                        |
  40 +-+                     ****  # ##                         +-+
     |                   ****     #D#                             |
     |             ***B**      ###                                |
  30 +-+    B***B**        ####                                 +-+
     |    B *   *     # ###                                       |
     |     B       ###D#                                          |
  20 +-+   D  ##D##                                             +-+
     |      D#                                                    |
     |    +     +     +            +            +            +    |
  10 +-+--+-----+-----+------------+------------+------------+--+-+
          1     8     16      Guest CPUs        48           64
  png: https://imgur.com/iGpGFtv

The gains are high for 4-8 CPUs. Beyond that point, however, unrelated
lock contention significantly hurts scalability.
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

0ac20318

tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx · 128ed227

由 Emilio G. Cota 提交于 8月 01, 2017

Thereby making it per-TCGContext. Once we remove tb_lock, this will
avoid an atomic increment every time a TB is invalidated.
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

128ed227

tcg: track TBs with per-region BST's · be2cdc5e

由 Emilio G. Cota 提交于 7月 26, 2017

This paves the way for enabling scalable parallel generation of TCG code.

Instead of tracking TBs with a single binary search tree (BST), use a
BST for each TCG region, protecting it with a lock. This is as scalable
as it gets, since each TCG thread operates on a separate region.

The core of this change is the introduction of struct tcg_region_tree,
which contains a pointer to a GTree and an associated lock to serialize
accesses to it. We then allocate an array of tcg_region_tree's, adding
the appropriate padding to avoid false sharing based on
qemu_dcache_linesize.

Given a tc_ptr, we first find the corresponding region_tree. This
is done by special-casing the first and last regions first, since they
might be of size != region.size; otherwise we just divide the offset
by region.stride. I was worried about this division (several dozen
cycles of latency), but profiling shows that this is not a fast path.
Note that region.stride is not required to be a power of two; it
is only required to be a multiple of the host's page size.

Note that with this design we can also provide consistent snapshots
about all region trees at once; for instance, tcg_tb_foreach
acquires/releases all region_tree locks before/after iterating over them.
For this reason we now drop tb_lock in dump_exec_info().

As an alternative I considered implementing a concurrent BST, but this
can be tricky to get right, offers no consistent snapshots of the BST,
and performance and scalability-wise I don't think it could ever beat
having separate GTrees, given that our workload is insert-mostly (all
concurrent BST designs I've seen focus, understandably, on making
lookups fast, which comes at the expense of convoluted, non-wait-free
insertions/removals).
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

be2cdc5e

tcg/i386: Use byte form of xgetbv instruction · 1019242a

由 John Arbuckle 提交于 6月 04, 2018

The assembler in most versions of Mac OS X is pretty old and does not
support the xgetbv instruction.  To go around this problem, the raw
encoding of the instruction is used instead.
Signed-off-by: NJohn Arbuckle <programmingkidx@gmail.com>
Message-Id: <20180604215102.11002-1-programmingkidx@gmail.com>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

1019242a

02 6月, 2018 1 次提交

tcg: Pass tb and index to tcg_gen_exit_tb separately · 07ea28b4

由 Richard Henderson 提交于 5月 30, 2018

Do the cast to uintptr_t within the helper, so that the compiler
can type check the pointer argument.  We can also do some more
sanity checking of the index argument.
Reviewed-by: NLaurent Vivier <laurent@vivier.eu>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

07ea28b4

01 6月, 2018 1 次提交

target: Do not include "exec/exec-all.h" if it is not necessary · 23c11b04

由 Philippe Mathieu-Daudé 提交于 5月 28, 2018

Code change produced with:
    $ git grep '#include "exec/exec-all.h"' | \
      cut -d: -f-1 | \
      xargs egrep -L "(cpu_address_space_init|cpu_loop_|tlb_|tb_|GETPC|singlestep|TranslationBlock)" | \
      xargs sed -i.bak '/#include "exec\/exec-all.h"/d'
Signed-off-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180528232719.4721-10-f4bug@amsat.org>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

23c11b04

20 5月, 2018 1 次提交

tcg: fix s/compliment/complement/ typos · 1d349821

由 Emilio G. Cota 提交于 3月 05, 2018

Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NMichael Tokarev <mjt@tls.msk.ru>

1d349821

11 5月, 2018 2 次提交

tcg: Introduce atomic helpers for integer min/max · 5507c2bf

由 Richard Henderson 提交于 5月 10, 2018

Given that this atomic operation will be used by both risc-v
and aarch64, let's not duplicate code across the two targets.
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
Message-id: 20180508151437.4232-5-richard.henderson@linaro.org
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

5507c2bf

tcg: Introduce helpers for integer min/max · b87fb8cd

由 Richard Henderson 提交于 5月 10, 2018

These operations are re-invented by several targets so far.
Several supported hosts have insns for these, so place the
expanders out-of-line for a future introduction of tcg opcodes.
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
Message-id: 20180508151437.4232-2-richard.henderson@linaro.org
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

b87fb8cd

09 5月, 2018 2 次提交

tcg: Limit the number of ops in a TB · abebf925

由 Richard Henderson 提交于 5月 08, 2018

In 6001f772 we partially attempt to address the branch
displacement overflow caused by 15fa08f8.

However, gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqtbX.c
is a testcase that contains a TB so large as to overflow anyway.
The limit here of 8000 ops produces a maximum output TB size of
24112 bytes on a ppc64le host with that test case.  This is still
much less than the maximum forward branch distance of 32764 bytes.

Cc: qemu-stable@nongnu.org
Fixes: 15fa08f8 ("tcg: Dynamically allocate TCGOps")
Reviewed-by: NLaurent Vivier <laurent@vivier.eu>
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

abebf925

tcg/i386: Fix dup_vec in non-AVX2 codepath · 7eb30ef0

由 Peter Maydell 提交于 5月 04, 2018

The VPUNPCKLD* instructions are all "non-destructive source",
indicated by "NDS" in the encoding string in the x86 ISA manual.
This means that they take two source operands, one of which is
encoded in the VEX.vvvv field. We were incorrectly treating them
as if they were destructive-source and passing 0 as the 'v'
argument of tcg_out_vex_modrm(). This meant we were always
using %xmm0 as one of the source operands, causing incorrect
results if the register allocator happened to want to use
something else. For instance the input AArch64 insn:
 DUP v26.16b, w21
which becomes TCG IR ops:
 dup_vec v128,e8,tmp2,x21
 st_vec v128,e8,tmp2,env,$0xa40
was assembled to:
0x607c568c:  c4 c1 7a 7e 86 e8 00 00  vmovq    0xe8(%r14), %xmm0
0x607c5694:  00
0x607c5695:  c5 f9 60 c8              vpunpcklbw %xmm0, %xmm0, %xmm1
0x607c5699:  c5 f9 61 c9              vpunpcklwd %xmm1, %xmm0, %xmm1
0x607c569d:  c5 f9 70 c9 00           vpshufd  $0, %xmm1, %xmm1
0x607c56a2:  c4 c1 7a 7f 8e 40 0a 00  vmovdqu  %xmm1, 0xa40(%r14)
0x607c56aa:  00

when the vpunpcklwd insn should be "%xmm1, %xmm1, %xmm1".
This resulted in our incorrectly setting the output vector to
q26=0000320000003200:0000320000003200
when given an input of x21 == 0000000002803200
rather than the expected all-zeroes.

Pass the correct source register number to tcg_out_vex_modrm()
for these insns.

Fixes: 770c2fc7
Cc: qemu-stable@nongnu.org
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Message-Id: <20180504153431.5169-1-peter.maydell@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

7eb30ef0

02 5月, 2018 5 次提交

tcg: workaround branch instruction overflow in tcg_out_qemu_ld/st · 6001f772

由 Laurent Vivier 提交于 4月 30, 2018

ppc64 uses a BC instruction to call the tcg_out_qemu_ld/st
slow path. BC instruction uses a relative address encoded
on 14 bits.

The slow path functions are added at the end of the generated
instructions buffer, in the reverse order of the callers.
So more we have slow path functions more the distance between
the caller (BC) and the function increases.

This patch changes the behavior to generate the functions in
the same order of the callers.

Cc: qemu-stable@nongnu.org
Fixes: 15fa08f8 ("tcg: Dynamically allocate TCGOps")
Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
Message-Id: <20180429235840.16659-1-lvivier@redhat.com>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

6001f772

tcg: Improve TCGv_ptr support · 5bfa8034

由 Richard Henderson 提交于 2月 22, 2018

Drop TCGV_PTR_TO_NAT and TCGV_NAT_TO_PTR internal macros.

Add tcg_temp_local_new_ptr, tcg_gen_brcondi_ptr, tcg_gen_ext_i32_ptr,
tcg_gen_trunc_i64_ptr, tcg_gen_extu_ptr_i64, tcg_gen_trunc_ptr_i32.

Use inlines instead of macros where possible.
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

5bfa8034

tcg: Allow wider vectors for cmp and mul · 9a938d86

由 Richard Henderson 提交于 4月 17, 2018

In db432672, we allow wide inputs for operations such as add.
However, in 212be173 and 3774030a we didn't do the same for
compare and multiply.
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

9a938d86

tcg/arm: Fix memory barrier encoding · 3f814b80

由 Henry Wertz 提交于 4月 17, 2018

I found with qemu 2.11.x or newer that I would get an illegal instruction
error running some Intel binaries on my ARM chromebook.  On investigation,
I found it was quitting on memory barriers.

qemu instruction:
mb $0x31
was translating as:
0x604050cc:  5bf07ff5  blpl     #0x600250a8

After patch it gives:
0x604050cc:  f57ff05b  dmb      ish

In short, I found INSN_DMB_ISH (memory barrier for ARMv7) appeared to be
correct based on online docs, but due to some endian-related shenanigans it
had to be byte-swapped to suit qemu; it appears INSN_DMB_MCR (memory
barrier for ARMv6) also should be byte swapped  (and this patch does so).
I have not checked for correctness of aarch64's barrier instruction.

Cc: qemu-stable@nongnu.org
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NHenry Wertz <hwertz10@gmail.com>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

3f814b80

R
tcg: Document INDEX_mul[us]h_* · d1030212
由 Richard Henderson 提交于 4月 17, 2018
```
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
```
d1030212

16 4月, 2018 1 次提交

tcg/mips: Handle large offsets from target env to tlb_table · 161dfd1e

由 Peter Maydell 提交于 4月 13, 2018

The MIPS TCG target makes the assumption that the offset from the
target env pointer to the tlb_table is less than about 64K. This
used to be true, but gradual addition of features to the Arm
target means that it's no longer true there. This results in
the build-time assertion failing:

In file included from /home/pm215/qemu/include/qemu/osdep.h:36:0,
                 from /home/pm215/qemu/tcg/tcg.c:28:
/home/pm215/qemu/tcg/mips/tcg-target.inc.c: In function ‘tcg_out_tlb_load’:
/home/pm215/qemu/include/qemu/compiler.h:90:36: error: static assertion failed: "not expecting: offsetof(CPUArchState, tlb_table[NB_MMU_MODES - 1][1]) > 0x7ff0 + 0x7fff"
 #define QEMU_BUILD_BUG_MSG(x, msg) _Static_assert(!(x), msg)
                                    ^
/home/pm215/qemu/include/qemu/compiler.h:98:30: note: in expansion of macro ‘QEMU_BUILD_BUG_MSG’
 #define QEMU_BUILD_BUG_ON(x) QEMU_BUILD_BUG_MSG(x, "not expecting: " #x)
                              ^
/home/pm215/qemu/tcg/mips/tcg-target.inc.c:1236:9: note: in expansion of macro ‘QEMU_BUILD_BUG_ON’
         QEMU_BUILD_BUG_ON(offsetof(CPUArchState,
         ^
/home/pm215/qemu/rules.mak:66: recipe for target 'tcg/tcg.o' failed

An ideal long term approach would be to rearrange the CPU state
so that the tlb_table was not so far along it, but this is tricky
because it would move it from the "not cleared on CPU reset" part
of the struct to the "cleared on CPU reset" part. As a simple fix
for the 2.12 release, make the MIPS TCG target handle an arbitrary
offset by emitting more add instructions. This will mean an extra
instruction in the fastpath for TCG loads and stores for the
affected guests (currently just aarch64-softmmu).
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Message-id: 20180413142336.32163-1-peter.maydell@linaro.org

161dfd1e

10 4月, 2018 1 次提交

tcg: Introduce tcg_set_insn_start_param · 9743cd57

由 Richard Henderson 提交于 4月 10, 2018

The parameters for tcg_gen_insn_start are target_ulong, which may be split
into two TCGArg parameters for storage in the opcode on 32-bit hosts.

Fixes the ARM target and its direct use of tcg_set_insn_param, which would
set the wrong argument in the 64-on-32 case.

Cc: qemu-stable@nongnu.org
Reported-by: alarson@ddci.com
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
Message-id: 20180410003558.2470-1-richard.henderson@linaro.org
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

9743cd57

28 3月, 2018 1 次提交

tcg: Mark muluh_i64 and mulsh_i64 as 64-bit ops · f2f1dde7

由 Richard Henderson 提交于 3月 26, 2018

Failure to do so results in the tcg optimizer sign-extending
any constant fold from 32-bits.  This turns out to be visible
in the RISC-V testsuite using a host that emits these opcodes
(e.g. any non-x86_64).
Reported-by: NMichael Clark <mjc@sifive.com>
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

f2f1dde7

16 3月, 2018 3 次提交

tcg: Add choose_vector_size · adb196cb

由 Richard Henderson 提交于 2月 17, 2018

This unifies 5 copies of checks for supported vector size,
and in the process fixes a missing check in tcg_gen_gvec_2s.

This lead to an assertion failure for 64-bit vector multiply,
which is not available in the AVX instruction set.
Suggested-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

adb196cb

tcg/i386: Support INDEX_op_dup2_vec for -m32 · 7f34ed4b

由 Richard Henderson 提交于 2月 21, 2018

Unknown why -m32 was passing with gcc but not clang; it should have
failed for both.  This would be used for tcg_gen_dup_i64_vec, and
visible with the right TB and an aarch64 guest.
Reported-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

7f34ed4b

tcg: Improve tcg_gen_muli_i32/i64 · b2e3ae94

由 Richard Henderson 提交于 1月 31, 2018

Convert multiplication by power of two to left shift.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

b2e3ae94

08 2月, 2018 10 次提交

tcg/aarch64: Add vector operations · 14e4c1e2

由 Richard Henderson 提交于 9月 11, 2017

Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

14e4c1e2

tcg/i386: Add vector operations · 770c2fc7

由 Richard Henderson 提交于 8月 17, 2017

The x86 vector instruction set is extremely irregular.  With newer
editions, Intel has filled in some of the blanks.  However, we don't
get many 64-bit operations until SSE4.2, introduced in 2009.

The subsequent edition was for AVX1, introduced in 2011, which added
three-operand addressing, and adjusts how all instructions should be
encoded.

Given the relatively narrow 2 year window between possible to support
and desirable to support, and to vastly simplify code maintainence,
I am only planning to support AVX1 and later cpus.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

770c2fc7

tcg/optimize: Handle vector opcodes during optimize · 170ba88f

由 Richard Henderson 提交于 11月 22, 2017

Trivial move and constant propagation.  Some identity and constant
function folding, but nothing that requires knowledge of the size
of the vector element.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

170ba88f

tcg: Add generic vector helpers with a scalar operand · 22fc3527

由 Richard Henderson 提交于 12月 21, 2017

Use dup to convert a non-constant scalar to a third vector.

Add addition, multiplication, and logical operations with an immediate.
Add addition, subtraction, multiplication, and logical operations with
a non-constant scalar.  Allow for the front-end to build operations in
which the scalar operand comes first.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

22fc3527

tcg: Add generic helpers for saturating arithmetic · f49b12c6

由 Richard Henderson 提交于 12月 14, 2017

No vector ops as yet. SSE only has direct support for 8- and 16-bit
saturation; handling 32- and 64-bit saturation is much more expensive.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

f49b12c6

tcg: Add generic vector ops for multiplication · 3774030a

由 Richard Henderson 提交于 11月 21, 2017

Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

3774030a

tcg: Add generic vector ops for comparisons · 212be173

由 Richard Henderson 提交于 11月 17, 2017

Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

212be173

tcg: Add generic vector ops for constant shifts · d0ec9796

由 Richard Henderson 提交于 11月 17, 2017

Opcodes are added for scalar and vector shifts, but considering the
varied semantics of these do not expose them to the front ends. Do
go ahead and provide them in case they are needed for backend expansion.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

d0ec9796

tcg: Add generic vector expanders · db432672

由 Richard Henderson 提交于 9月 15, 2017

Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

db432672

tcg: Standardize integral arguments to expanders · 474b2e8f

由 Richard Henderson 提交于 1月 04, 2018

Some functions use intN_t arguments, some use uintN_t, some just
used "unsigned".  To aid putting function pointers in tables, we
need consistency.
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>

474b2e8f