提交 · 16e0d8234ef9291747332d2c431e46808a060472 · openeuler / qemu

16 8月, 2019 1 次提交

target/arm: Introduce add_reg_for_lit · 16e0d823

由 Richard Henderson 提交于 8月 15, 2019

Provide a common routine for the places that require ALIGN(PC, 4)
as the base address as opposed to plain PC.  The two are always
the same for A32, but the difference is meaningful for thumb mode.
Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190807045335.1361-5-richard.henderson@linaro.org
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

16e0d823

05 7月, 2019 1 次提交

target/arm: Correct VMOV_imm_dp handling of short vectors · 89a11ff7

由 Peter Maydell 提交于 7月 04, 2019

Coverity points out (CID 1402195) that the loop in trans_VMOV_imm_dp()
that iterates over the destination registers in a short-vector VMOV
accidentally throws away the returned updated register number
from vfp_advance_dreg(). Add the missing assignment. (We got this
correct in trans_VMOV_imm_sp().)

Fixes: 18cf951aSigned-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
Message-id: 20190702105115.9465-1-peter.maydell@linaro.org

89a11ff7

18 6月, 2019 1 次提交

target/arm: Check for dp support for dp VFM, not sp · 34bea4ed

由 Peter Maydell 提交于 6月 17, 2019

In commit 1120827f we accidentally put the
"UNDEF unless FPU has double-precision support" check in
the single-precision VFM function. Put it in the dp
function where it belongs.

Fixes: 1120827fSigned-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
Message-id: 20190617160130.3207-1-peter.maydell@linaro.org

34bea4ed

17 6月, 2019 4 次提交

target/arm: Only implement doubles if the FPU supports them · 1120827f

由 Peter Maydell 提交于 6月 14, 2019

The architecture permits FPUs which have only single-precision
support, not double-precision; Cortex-M4 and Cortex-M33 are
both like that. Add the necessary checks on the MVFR0 FPDP
field so that we UNDEF any double-precision instructions on
CPUs like this.

Note that even if FPDP==0 the insns like VMOV-to/from-gpreg,
VLDM/VSTM, VLDR/VSTR which take double precision registers
still exist.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
Message-id: 20190614104457.24703-3-peter.maydell@linaro.org

1120827f

target/arm: Fix typos in trans function prototypes · 83655223

由 Peter Maydell 提交于 6月 14, 2019

In several places cut and paste errors meant we were using the wrong
type for the 'arg' struct in trans_ functions called by the
decodetree decoder, because we were using the _sp version of the
struct in the _dp function. These were harmless, because the two
structs were identical and so decodetree made them typedefs of the
same underlying structure (and we'd have had a compile error if they
were not harmless), but we should clean them up anyway.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190614104457.24703-2-peter.maydell@linaro.org

83655223

target/arm: Use vfp_expand_imm() for AArch32 VFP VMOV_imm · 9bee50b4

由 Peter Maydell 提交于 6月 13, 2019

The AArch32 VMOV (immediate) instruction uses the same VFP encoded
immediate format we already handle in vfp_expand_imm().  Use that
function rather than hand-decoding it.
Suggested-by: NRichard Henderson <richard.henderson@linaro.org>
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
Tested-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190613163917.28589-3-peter.maydell@linaro.org

9bee50b4

target/arm: Move vfp_expand_imm() to translate.[ch] · d6a092d4

由 Peter Maydell 提交于 6月 13, 2019

We want to use vfp_expand_imm() in the AArch32 VFP decode;
move it from the a64-only header/source file to the
AArch32 one (which is always compiled even for AArch64).
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
Tested-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190613163917.28589-2-peter.maydell@linaro.org

d6a092d4

13 6月, 2019 33 次提交

target/arm: Fix short-vector increment behaviour · 18cf951a

由 Peter Maydell 提交于 6月 11, 2019

For VFP short vectors, the VFP registers are divided into a
series of banks: for single-precision these are s0-s7, s8-s15,
s16-s23 and s24-s31; for double-precision they are d0-d3,
d4-d7, ... d28-d31. Some banks are "scalar" meaning that
use of a register within them triggers a pure-scalar or
mixed vector-scalar operation rather than a full vector
operation. The scalar banks are s0-s7, d0-d3 and d16-d19.
When using a bank as part of a vector operation, we
iterate through it, increasing the register number by
the specified stride each time, and wrapping around to
the beginning of the bank.

Unfortunately our calculation of the "increment" part of this
was incorrect:
 vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask)
will only do the intended thing if bank_mask has exactly
one set high bit. For instance for doubles (bank_mask = 0xc),
if we start with vd = 6 and delta_d = 2 then vd is updated
to 12 rather than the intended 4.

This only causes problems in the unlikely case that the
starting register is not the first in its bank: if the
register number doesn't have to wrap around then the
expression happens to give the right answer.

Fix this bug by abstracting out the "check whether register
is in a scalar bank" and "advance register within bank"
operations to utility functions which use the right
bit masking operations.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

18cf951a

target/arm: Convert float-to-integer VCVT insns to decodetree · 3111bfc2

由 Peter Maydell 提交于 6月 11, 2019

Convert the float-to-integer VCVT instructions to decodetree.
Since these are the last unconverted instructions, we can
delete the old decoder structure entirely now.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

3111bfc2

target/arm: Convert VCVT fp/fixed-point conversion insns to decodetree · e3d6f429

由 Peter Maydell 提交于 6月 11, 2019

Convert the VCVT (between floating-point and fixed-point) instructions
to decodetree.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

e3d6f429

target/arm: Convert VJCVT to decodetree · 92073e94

由 Peter Maydell 提交于 6月 11, 2019

Convert the VJCVT instruction to decodetree.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

92073e94

target/arm: Convert integer-to-float insns to decodetree · 8fc9d891

由 Peter Maydell 提交于 6月 11, 2019

Convert the VCVT integer-to-float instructions to decodetree.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

8fc9d891

target/arm: Convert double-single precision conversion insns to decodetree · 6ed7e49c

由 Peter Maydell 提交于 6月 11, 2019

Convert the VCVT double/single precision conversion insns to decodetree.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

6ed7e49c

target/arm: Convert VFP round insns to decodetree · e25155f5

由 Peter Maydell 提交于 6月 11, 2019

Convert the VFP round-to-integer instructions VRINTR, VRINTZ and
VRINTX to decodetree.

These instructions were only introduced as part of the "VFP misc"
additions in v8A, so we check this. The old decoder's implementation
was incorrectly providing them even for v7A CPUs.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

e25155f5

target/arm: Convert the VCVT-to-f16 insns to decodetree · cdfd14e8

由 Peter Maydell 提交于 6月 11, 2019

Convert the VCVTT and VCVTB instructions which convert from
f32 and f64 to f16 to decodetree.

Since we're no longer constrained to the old decoder's style
using cpu_F0s and cpu_F0d we can perform a direct 16 bit
store of the right half of the input single-precision register
rather than doing a load/modify/store sequence on the full
32 bits.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

cdfd14e8

target/arm: Convert the VCVT-from-f16 insns to decodetree · b623d803

由 Peter Maydell 提交于 6月 11, 2019

Convert the VCVTT, VCVTB instructions that deal with conversion
from half-precision floats to f32 or 64 to decodetree.

Since we're no longer constrained to the old decoder's style
using cpu_F0s and cpu_F0d we can perform a direct 16 bit
load of the right half of the input single-precision register
rather than loading the full 32 bits and then doing a
separate shift or sign-extension.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

b623d803

target/arm: Convert VFP comparison insns to decodetree · 386bba23

由 Peter Maydell 提交于 6月 11, 2019

Convert the VFP comparison instructions to decodetree.

Note that comparison instructions should not honour the VFP
short-vector length and stride information: they are scalar-only
operations. This applies to all the 2-operand instructions except
for VMOV, VABS, VNEG and VSQRT. (In the old decoder this is
implemented via the "if (op == 15 && rn > 3) { veclen = 0; }" check.)
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

386bba23

target/arm: Convert VMOV (register) to decodetree · 17552b97

由 Peter Maydell 提交于 6月 11, 2019

Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

17552b97

target/arm: Convert VSQRT to decodetree · b8474540

由 Peter Maydell 提交于 6月 11, 2019

Convert the VSQRT instruction to decodetree.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

b8474540

target/arm: Convert VNEG to decodetree · 1882651a

由 Peter Maydell 提交于 6月 11, 2019

Convert the VNEG instruction to decodetree.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

1882651a

target/arm: Convert VABS to decodetree · 90287e22

由 Peter Maydell 提交于 6月 11, 2019

Convert the VFP VABS instruction to decodetree.

Unlike the 3-op versions, we don't pass fpst to the VFPGen2OpSPFn or
VFPGen2OpDPFn because none of the operations which use this format
and support short vectors will need it.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

90287e22

target/arm: Convert VMOV (imm) to decodetree · b518c753

由 Peter Maydell 提交于 6月 11, 2019

Convert the VFP VMOV (immediate) instruction to decodetree.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

b518c753

target/arm: Convert VFP fused multiply-add insns to decodetree · d4893b01

由 Peter Maydell 提交于 6月 11, 2019

Convert the VFP fused multiply-add instructions (VFNMA, VFNMS,
VFMA, VFMS) to decodetree.

Note that in the old decode structure we were implementing
these to honour the VFP vector stride/length. These instructions
were introduced in VFPv4, and in the v7A architecture they
are UNPREDICTABLE if the vector stride or length are non-zero.
In v8A they must UNDEF if stride or length are non-zero, like
all VFP instructions; we choose to UNDEF always.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

d4893b01

target/arm: Convert VDIV to decodetree · 519ee7ae

由 Peter Maydell 提交于 6月 11, 2019

Convert the VDIV instruction to decodetree.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

519ee7ae

target/arm: Convert VSUB to decodetree · 8fec9a11

由 Peter Maydell 提交于 6月 11, 2019

Convert the VSUB instruction to decodetree.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

8fec9a11

target/arm: Convert VADD to decodetree · ce28b303

由 Peter Maydell 提交于 6月 11, 2019

Convert the VADD instruction to decodetree.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

ce28b303

target/arm: Convert VNMUL to decodetree · 43c4be12

由 Peter Maydell 提交于 6月 11, 2019

Convert the VNMUL instruction to decodetree.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

43c4be12

target/arm: Convert VMUL to decodetree · 88c5188c

由 Peter Maydell 提交于 6月 11, 2019

Convert the VMUL instruction to decodetree.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

88c5188c

target/arm: Convert VFP VNMLA to decodetree · 8a483533

由 Peter Maydell 提交于 6月 11, 2019

Convert the VFP VNMLA instruction to decodetree.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

8a483533

target/arm: Convert VFP VNMLS to decodetree · c54a416c

由 Peter Maydell 提交于 6月 11, 2019

Convert the VFP VNMLS instruction to decodetree.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

c54a416c

target/arm: Convert VFP VMLS to decodetree · e7258280

由 Peter Maydell 提交于 6月 11, 2019

Convert the VFP VMLS instruction to decodetree.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

e7258280

target/arm: Convert VFP VMLA to decodetree · 266bd25c

由 Peter Maydell 提交于 6月 11, 2019

Convert the VFP VMLA instruction to decodetree.

This is the first of the VFP 3-operand data processing instructions,
so we include in this patch the code which loops over the elements
for an old-style VFP vector operation. The existing code to do this
looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since
we are going to be converting instructions one at a time anyway
we can take the opportunity to make the new loop use TCG temporaries,
which means we can do that conversion one operation at a time
rather than needing to do it all in one go.

We include an UNDEF check which was missing in the old code:
short-vector operations (with stride or length non-zero) were
deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec
field does not indicate that support for short vectors is present
we UNDEF the operations that would use them. (This is a change
of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which
previously were all incorrectly allowing short-vector operations.)

Note that the conversion fixes a bug in the old code for the
case of VFP short-vector "mixed scalar/vector operations". These
happen where the destination register is in a vector bank but
but the second operand is in a scalar bank. For example
  vmla.f64 d10, d1, d16   with length 2 stride 2
is equivalent to the pair of scalar operations
  vmla.f64 d10, d1, d16
  vmla.f64 d8, d3, d16
where the destination and first input register cycle through
their vector but the second input is scalar (d16). In the
old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d}
as a temporary output for the multiply, which trashes the
second input operand. For the fully-scalar case (where we
never do a second iteration) and the fully-vector case
(where the loop loads the new second input operand) this
doesn't matter, but for the mixed scalar/vector case we
will end up using the wrong value for later loop iterations.
In the new code we use TCG temporaries and so avoid the bug.
This bug is present for all the multiply-accumulate insns
that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS.

Note 2: the expression used to calculate the next register
number in the vector bank is not in fact correct; we leave
this behaviour unchanged from the old decoder and will
fix this bug later in the series.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

266bd25c

target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d · 3993d040

由 Peter Maydell 提交于 6月 11, 2019

Expand out the sequences in the new decoder VLDR/VSTR/VLDM/VSTM trans
functions which perform the memory accesses by going via the TCG
globals cpu_F0s and cpu_F0d, to use local TCG temps instead.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

3993d040

target/arm: Convert the VFP load/store multiple insns to decodetree · fa288de2

由 Peter Maydell 提交于 6月 11, 2019

Convert the VFP load/store multiple insns to decodetree.
This includes tightening up the UNDEF checking for pre-VFPv3
CPUs which only have D0-D15 : they now UNDEF for any access
to D16-D31, not merely when the smallest register in the
transfer list is in D16-D31.

This conversion does not try to share code between the single
precision and the double precision versions; this looks a bit
duplicative of code, but it leaves the door open for a future
refactoring which gets rid of the use of the "F0" registers
by inlining the various functions like gen_vfp_ld() and
gen_mov_F0_reg() which are hiding "if (dp) { ... } else { ... }"
conditionalisation.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

fa288de2

target/arm: Convert VFP VLDR and VSTR to decodetree · 79b02a3b

由 Peter Maydell 提交于 6月 11, 2019

Convert the VFP single load/store insns VLDR and VSTR to decodetree.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

79b02a3b

target/arm: Convert VFP two-register transfer insns to decodetree · 81f68110

由 Peter Maydell 提交于 6月 11, 2019

Convert the VFP two-register transfer instructions to decodetree
(in the v8 Arm ARM these are the "Advanced SIMD and floating-point
64-bit move" encoding group).

Again, we expand out the sequences involving gen_vfp_msr() and
gen_msr_vfp().
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

81f68110

target/arm: Convert "single-precision" register moves to decodetree · a9ab5001

由 Peter Maydell 提交于 6月 11, 2019

Convert the "single-precision" register moves to decodetree:
 * VMSR
 * VMRS
 * VMOV between general purpose register and single precision

Note that the VMSR/VMRS conversions make our handling of
the "should this UNDEF?" checks consistent between the two
instructions:
 * VMSR to MVFR0, MVFR1, MVFR2 now UNDEF from EL0
   (previously was a nop)
 * VMSR to FPSID now UNDEFs from EL0 or if VFPv3 or better
   (previously was a nop)
 * VMSR to FPINST and FPINST2 now UNDEF if VFPv3 or better
   (previously would write to the register, which had no
   guest-visible effect because we always UNDEF reads)

We also tighten up the decode: we were previously underdecoding
some SBZ or SBO bits.

The conversion of VMOV_single includes the expansion out of the
gen_mov_F0_vreg()/gen_vfp_mrs() and gen_mov_vreg_F0()/gen_vfp_msr()
sequences into the simpler direct load/store of the TCG temp via
neon_{load,store}_reg32(): we know in the new function that we're
always single-precision, we don't need to use the old-and-deprecated
cpu_F0* TCG globals, and we don't happen to have the declaration of
gen_vfp_msr() and gen_vfp_mrs() at the point in the file where the
new function is.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

a9ab5001

target/arm: Convert "double-precision" register moves to decodetree · 9851ed92

由 Peter Maydell 提交于 6月 11, 2019

Convert the "double-precision" register moves to decodetree:
this covers VMOV scalar-to-gpreg, VMOV gpreg-to-scalar and VDUP.

Note that the conversion process has tightened up a few of the
UNDEF encoding checks: we now correctly forbid:
 * VMOV-to-gpr with U:opc1:opc2 == 10x00 or x0x10
 * VMOV-from-gpr with opc1:opc2 == 0x10
 * VDUP with B:E == 11
 * VDUP with Q == 1 and Vn<0> == 1
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
---
The accesses of elements < 32 bits could be improved by doing
direct ld/st of the right size rather than 32-bit read-and-shift
or read-modify-write, but we leave this for later cleanup,
since this series is generally trying to stick to fixing
the decode.
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

9851ed92

target/arm: Add helpers for VFP register loads and stores · 160f3b64

由 Peter Maydell 提交于 6月 11, 2019

The current VFP code has two different idioms for
loading and storing from the VFP register file:
 1 using the gen_mov_F0_vreg() and similar functions,
   which load and store to a fixed set of TCG globals
   cpu_F0s, CPU_F0d, etc
 2 by direct calls to tcg_gen_ld_f64() and friends

We want to phase out idiom 1 (because the use of the
fixed globals is a relic of a much older version of TCG),
but idiom 2 is quite longwinded:
 tcg_gen_ld_f64(tmp, cpu_env, vfp_reg_offset(true, reg))
requires us to specify the 64-bitness twice, once in
the function name and once by passing 'true' to
vfp_reg_offset(). There's no guard against accidentally
passing the wrong flag.

Instead, let's move to a convention of accessing 64-bit
registers via the existing neon_load_reg64() and
neon_store_reg64(), and provide new neon_load_reg32()
and neon_store_reg32() for the 32-bit equivalents.

Implement the new functions and use them in the code in
translate-vfp.inc.c. We will convert the rest of the VFP
code as we do the decodetree conversion in subsequent
commits.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

160f3b64

target/arm: Move the VFP trans_* functions to translate-vfp.inc.c · f7bbb8f3

由 Peter Maydell 提交于 6月 11, 2019

Move the trans_*() functions we've just created from translate.c
to translate-vfp.inc.c. This is pure code motion with no textual
changes (this can be checked with 'git show --color-moved').
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>

f7bbb8f3