- 09 6月, 2015 3 次提交
-
-
由 Aurelien Jarno 提交于
Each call to tcg_opt_gen_mov is preceeded by a test to check if the source and destination temps are copies. Fold that into the tcg_opt_gen_mov function. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: NAurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447607-31184-4-git-send-email-aurelien@aurel32.net> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Aurelien Jarno 提交于
We can get the opcode using the TCGOp pointer. It needs to be dereferenced, but it's anyway done a few lines below to write the new value. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: NAurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447607-31184-3-git-send-email-aurelien@aurel32.net> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Aurelien Jarno 提交于
We can get the opcode using the TCGOp pointer. It needs to be dereferenced, but it's anyway done a few lines below to write the new value. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: NAurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447607-31184-2-git-send-email-aurelien@aurel32.net> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
- 15 5月, 2015 1 次提交
-
-
由 Richard Henderson 提交于
At the tcg opcode level, not at the tcg-op.h generator level. This requires minor changes through all of the tcg backends, but none of the cpu translators. Reviewed-by: NPeter Maydell <peter.maydell@linaro.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
- 16 3月, 2015 1 次提交
-
-
由 Richard Henderson 提交于
As seen with ubuntu-5.10-live-powerpc.iso. Reported-by: NMark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: NMark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: NBastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
- 13 2月, 2015 3 次提交
-
-
由 Richard Henderson 提交于
Rather reserving space in the op stream for optimization, let the optimizer add ops as necessary. Reviewed-by: NBastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
With the linked list scheme we need not leave nops in the stream that we need to process later. Reviewed-by: NBastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
The previous setup required ops and args to be completely sequential, and was error prone when it came to both iteration and optimization. Reviewed-by: NBastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
- 19 6月, 2014 1 次提交
-
-
由 Richard Henderson 提交于
With the "old" ldst ops we didn't know the real width of the result of the load, but with the "new" ldst ops we do. Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
- 05 6月, 2014 1 次提交
-
-
由 Richard Henderson 提交于
Since all backends have been converted, remove the compatibility code. Acked-by: NClaudio Fontana <claudio.fontana@huawei.com> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
- 29 5月, 2014 3 次提交
-
-
由 Richard Henderson 提交于
For a 64-bit host, the high bits of a register after a 32-bit operation are undefined. Adjust the temps mask for all 32-bit ops to reflect that. Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
No functional change, just reduce a bit of redundancy. Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
If either the high or low pair can be resolved, we can simplify to either a constant or to a 32-bit comparison. Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
- 13 5月, 2014 1 次提交
-
-
由 Richard Henderson 提交于
Avoid allocating a tcg temporary to hold the constant address, and instead place it directly into the op_call arguments. At the same time, convert to the newly introduced tcg_out_call backend function, rather than invoking tcg_out_op for the call. Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
- 29 4月, 2014 1 次提交
-
-
由 Richard Henderson 提交于
Let the backend do something special for truncation. Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
- 19 4月, 2014 2 次提交
-
-
由 Richard Henderson 提交于
By inspection, for a deposit(x, y, 0, 64), we'd have a shift of (1<<64) and everything else falls apart. But we can reuse the existing deposit logic to get this right. Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
The TCG result would be undefined, but we can at least produce one plausible result and avoid triggering the wrath of analysis tools. Reviewed-by: NPeter Maydell <peter.maydell@linaro.org> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
- 18 2月, 2014 8 次提交
-
-
由 Richard Henderson 提交于
Recognize 0 operand to andc, and -1 operands to and, orc, eqv. Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
Like we already do for SUB and XOR. Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
Given, of course, an appropriate constant. These could be generated from the "canonical" operation for inversion on the guest, or via other optimizations. Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Aurelien Jarno 提交于
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Aurelien Jarno 提交于
The shl_i32 op might set some bits of the unused 32 high bits of the mask. Fix that by clearing the unused 32 high bits for all 32-bit ops except load/store which operate on tl values. Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Aurelien Jarno 提交于
Known-zero bits optimization is a great idea that helps to generate more optimized code. However the current implementation only works in very few cases as the computed mask is not saved. Fix this to make it really working. Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Aurelien Jarno 提交于
32-bit versions of sar and shr ops should not propagate known-zero bits from the unused 32 high bits. For sar it could even lead to wrong code being generated. Cc: qemu-stable@nongnu.org Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
- 26 9月, 2013 1 次提交
-
-
由 Stefan Weil 提交于
Signed-off-by: NStefan Weil <sw@weilnetz.de>
-
- 03 9月, 2013 2 次提交
-
-
由 Richard Henderson 提交于
Reviewed-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
由 Richard Henderson 提交于
Use them in places where mulu2 and muls2 are used. Optimize mulx2 with dead low part to mulxh. Reviewed-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NRichard Henderson <rth@twiddle.net>
-
- 09 5月, 2013 1 次提交
-
-
由 Aurelien Jarno 提交于
When setcond2 is rewritten into setcond, the state of the destination temp should be reset, so that a copy of the previous value is not used instead of the result. Reported-by: NMichael Tokarev <mjt@tls.msk.ru> Reviewed-by: NRichard Henderson <rth@twiddle.net> Signed-off-by: NAurelien Jarno <aurelien@aurel32.net>
-
- 23 3月, 2013 1 次提交
-
-
由 Richard Henderson 提交于
Cc: Blue Swirl <blauwirbel@gmail.com> Signed-off-by: NRichard Henderson <rth@twiddle.net> Signed-off-by: NBlue Swirl <blauwirbel@gmail.com>
-
- 24 2月, 2013 2 次提交
-
-
由 Richard Henderson 提交于
Signed-off-by: NRichard Henderson <rth@twiddle.net> Signed-off-by: NBlue Swirl <blauwirbel@gmail.com>
-
由 Richard Henderson 提交于
Matching the 32-bit multiword arithmetic that we already have. Signed-off-by: NBlue Swirl <blauwirbel@gmail.com>
-
- 19 1月, 2013 3 次提交
-
-
由 Paolo Bonzini 提交于
This adds two optimizations using the non-zero bit mask. In some cases involving shifts or ANDs the value can become zero, and can thus be optimized to a move of zero. Second, useless zero-extension or an AND with constant can be detected that would only zero bits that are already zero. The main advantage of this optimization is that it turns zero-extensions into moves, thus enabling much better copy propagation (around 1% code reduction). Here is for example a "test $0xff0000,%ecx + je" before optimization: mov_i64 tmp0,rcx movi_i64 tmp1,$0xff0000 discard cc_src and_i64 cc_dst,tmp0,tmp1 movi_i32 cc_op,$0x1c ext32u_i64 tmp0,cc_dst movi_i64 tmp12,$0x0 brcond_i64 tmp0,tmp12,eq,$0x0 and after (without patch on the left, with on the right): movi_i64 tmp1,$0xff0000 movi_i64 tmp1,$0xff0000 discard cc_src discard cc_src and_i64 cc_dst,rcx,tmp1 and_i64 cc_dst,rcx,tmp1 movi_i32 cc_op,$0x1c movi_i32 cc_op,$0x1c ext32u_i64 tmp0,cc_dst movi_i64 tmp12,$0x0 movi_i64 tmp12,$0x0 brcond_i64 tmp0,tmp12,eq,$0x0 brcond_i64 cc_dst,tmp12,eq,$0x0 Other similar cases: "test %eax, %eax + jne" where eax is already 32-bit (after optimization, without patch on the left, with on the right): discard cc_src discard cc_src mov_i64 cc_dst,rax mov_i64 cc_dst,rax movi_i32 cc_op,$0x1c movi_i32 cc_op,$0x1c ext32u_i64 tmp0,cc_dst movi_i64 tmp12,$0x0 movi_i64 tmp12,$0x0 brcond_i64 tmp0,tmp12,ne,$0x0 brcond_i64 rax,tmp12,ne,$0x0 "test $0x1, %dl + je": movi_i64 tmp1,$0x1 movi_i64 tmp1,$0x1 discard cc_src discard cc_src and_i64 cc_dst,rdx,tmp1 and_i64 cc_dst,rdx,tmp1 movi_i32 cc_op,$0x1a movi_i32 cc_op,$0x1a ext8u_i64 tmp0,cc_dst movi_i64 tmp12,$0x0 movi_i64 tmp12,$0x0 brcond_i64 tmp0,tmp12,eq,$0x0 brcond_i64 cc_dst,tmp12,eq,$0x0 In some cases TCG even outsmarts GCC. :) Here the input code has "and $0x2,%eax + movslq %eax,%rbx + test %rbx, %rbx" and the optimizer, thanks to copy propagation, does the following: movi_i64 tmp12,$0x2 movi_i64 tmp12,$0x2 and_i64 rax,rax,tmp12 and_i64 rax,rax,tmp12 mov_i64 cc_dst,rax mov_i64 cc_dst,rax ext32s_i64 tmp0,rax -> nop mov_i64 rbx,tmp0 -> mov_i64 rbx,cc_dst and_i64 cc_dst,rbx,rbx -> nop Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRichard Henderson <rth@twiddle.net> Signed-off-by: NBlue Swirl <blauwirbel@gmail.com>
-
由 Paolo Bonzini 提交于
Add a "mask" field to the tcg_temp_info struct. A bit that is zero in "mask" will always be zero in the corresponding temporary. Zero bits in the mask can be produced from moves of immediates, zero-extensions, ANDs with constants, shifts; they can then be be propagated by logical operations, shifts, sign-extensions, negations, deposit operations, and conditional moves. Other operations will just reset the mask to all-ones, i.e. unknown. [rth: s/target_ulong/tcg_target_ulong/] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRichard Henderson <rth@twiddle.net> Signed-off-by: NBlue Swirl <blauwirbel@gmail.com>
-
由 Paolo Bonzini 提交于
The next patch will add to the TCG optimizer a field that should be non-zero in the default case. Thus, replace the memset of the temps array with a loop. Only the state field has to be up-to-date, because others are not used except if the state is TCG_TEMP_COPY or TCG_TEMP_CONST. [rth: Extracted the loop to a function.] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRichard Henderson <rth@twiddle.net> Signed-off-by: NBlue Swirl <blauwirbel@gmail.com>
-
- 17 11月, 2012 1 次提交
-
-
由 Evgeny Voevodin 提交于
Signed-off-by: NEvgeny Voevodin <e.voevodin@samsung.com> Reviewed-by: NRichard Henderson <rth@twiddle.net> Signed-off-by: NBlue Swirl <blauwirbel@gmail.com>
-
- 28 10月, 2012 1 次提交
-
-
由 Aurelien Jarno 提交于
The current helper flags, TCG_CALL_CONST and TCG_CALL_PURE might be confusing and doesn't provide enough granularity for some helpers (FP helpers for example). This patch changes them into the following helpers flags: - TCG_CALL_NO_READ_GLOBALS means that the helper does not read globals, either directly or via an exception. They will not be saved to their canonical location before calling the helper. - TCG_CALL_NO_WRITE_GLOBALS means that the helper does not modify any globals. They will only be saved to their canonical locations before calling helpers, but they won't be reloaded afterwise. - TCG_CALL_NO_SIDE_EFFECTS means that the call to the function is removed if the return value is not used. It provides convenience flags, to avoid helper definitions longer than 80 characters. It also provides compatibility flags, and updates the documentation. Reviewed-by: NRichard Henderson <rth@twiddle.net> Signed-off-by: NAurelien Jarno <aurelien@aurel32.net>
-
- 17 10月, 2012 3 次提交
-
-
由 Richard Henderson 提交于
Like add2, do operand ordering, constant folding, and dead operand elimination. The latter happens about 15% of all mulu2 during an x86_64 bios boot. Signed-off-by: NRichard Henderson <rth@twiddle.net> Signed-off-by: NAurelien Jarno <aurelien@aurel32.net>
-
由 Richard Henderson 提交于
Signed-off-by: NRichard Henderson <rth@twiddle.net> Signed-off-by: NAurelien Jarno <aurelien@aurel32.net>
-
由 Richard Henderson 提交于
Signed-off-by: NRichard Henderson <rth@twiddle.net> Signed-off-by: NAurelien Jarno <aurelien@aurel32.net>
-