提交 · 1edaeee0955fba7d834b7c8f4e372e7eae030745 · openeuler / qemu

24 10月, 2016 1 次提交

translate-all.c: Compute L1 page table properties at runtime · 66ec9f49

由 Vijaya Kumar K 提交于 10月 24, 2016

Remove L1 page mapping table properties computing
statically using macros which is dependent on
TARGET_PAGE_BITS. Drop macros V_L1_SIZE, V_L1_SHIFT,
V_L1_BITS macros and replace with variables which are
computed at early stage of VM boot.

Removing dependency can help to make TARGET_PAGE_BITS
dynamic.
Signed-off-by: NVijaya Kumar K <vijayak@cavium.com>
Message-id: 1465808915-4887-4-git-send-email-vijayak@caviumnetworks.com
[PMM:
 assert(v_l1_shift % V_L2_BITS == 0)
 cache v_l2_levels
 initialize from page_init() rather than vl.c
 minor code style fixes
 put v_l1_size into a local where used as a loop limit]
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

66ec9f49

27 9月, 2016 1 次提交

tcg: Make tb_flush() thread safe · 3359baad

由 Sergey Fedorov 提交于 8月 02, 2016

Use async_safe_run_on_cpu() to make tb_flush() thread safe.  This is
possible now that code generation does not happen in the middle of
execution.

It can happen that multiple threads schedule a safe work to flush the
translation buffer. To keep statistics and debugging output sane, always
check if the translation buffer has already been flushed.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
[AJB: minor re-base fixes]
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Message-Id: <1470158864-17651-13-git-send-email-alex.bennee@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3359baad

16 9月, 2016 1 次提交

tcg: Merge GETPC and GETRA · 01ecaf43

由 Richard Henderson 提交于 7月 26, 2016

The return address argument to the softmmu template helpers was
confused.  In the legacy case, we wanted to indicate that there
is no return address, and so passed in NULL.  However, we then
immediately subtracted GETPC_ADJ from NULL, resulting in a non-zero
value, indicating the presence of an (invalid) return address.

Push the GETPC_ADJ subtraction down to the only point it's required:
immediately before use within cpu_restore_state_from_tb, after all
NULL pointer checks have been completed.

This makes GETPC and GETRA identical.  Remove GETRA as the lesser
used macro, replacing all uses with GETPC.
Signed-off-by: NRichard Henderson <rth@twiddle.net>

01ecaf43

14 9月, 2016 4 次提交

tcg: set up tb->page_addr before insertion · 2e1ae44a

由 Alex Bennée 提交于 7月 15, 2016

This ensures that if we find the TB on the slow path that tb->page_addr
is correctly set before being tested.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <20160715175852.30749-9-sergey.fedorov@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2e1ae44a

tcg: Prepare TB invalidation for lockless TB lookup · 6d21e420

由 Paolo Bonzini 提交于 7月 19, 2016

When invalidating a translation block, set an invalid flag into the
TranslationBlock structure first. It is also necessary to check whether
the target TB is still valid after acquiring 'tb_lock' but before calling
tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in
future. Note that we don't have to check 'last_tb'; an already invalidated
TB will not be executed anyway and it is thus safe to patch it.
Suggested-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6d21e420

tcg: Prepare safe access to tb_flushed out of tb_lock · 118b0730

由 Sergey Fedorov 提交于 7月 15, 2016

Ensure atomicity and ordering of CPU's 'tb_flushed' access for future
translation block lookup out of 'tb_lock'.

This field can only be touched from another thread by tb_flush() in user
mode emulation. So the only access to be sequential atomic is:
 * a single write in tb_flush();
 * reads/writes out of 'tb_lock'.

In future, before enabling MTTCG in system mode, tb_flush() must be safe
and this field becomes unnecessary.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Message-Id: <20160715175852.30749-5-sergey.fedorov@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

118b0730

tcg: Prepare safe tb_jmp_cache lookup out of tb_lock · 89a16b1e

由 Sergey Fedorov 提交于 7月 15, 2016

Ensure atomicity of CPU's 'tb_jmp_cache' access for future translation
block lookup out of 'tb_lock'.

Note that this patch does *not* make CPU's TLB invalidation safe if it
is done from some other thread while the CPU is in its execution loop.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Message-Id: <20160715175852.30749-4-sergey.fedorov@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

89a16b1e

30 8月, 2016 1 次提交

translate: early exit in tb_flush if there is no tcg · 135a972b

由 Christian Borntraeger 提交于 8月 25, 2016

tb_flush does all kind of things, which are very tcg specific. As it
is called from some places even for KVM (e.g. gdb server) it is better
to detect these cases and do an early exit.
This also fixes a crash in the gdb server that was triggered by
commit 909eaac9 ("tb hash: track translated blocks with qht").
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Reported-by: NRichard Henderson <rth@twiddle.net>
Reported-by: NBrent Baccala <cosine@freesoft.org>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Message-id: 1472148686-39841-1-git-send-email-borntraeger@de.ibm.com>
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

135a972b

02 8月, 2016 1 次提交

qht: do not segfault when gathering stats from an uninitialized qht · 7266ae91

由 Emilio G. Cota 提交于 7月 22, 2016

So far, QHT functions assume that the passed qht has previously been
initialized--otherwise they segfault.

This patch makes an exception for qht_statistics_init, with the goal
of simplifying calling code. For instance, qht_statistics_init is
called from the 'info jit' dump, and given that under KVM the TB qht
is never initialized, we get a segfault. Thus, instead of complicating
the 'info jit' code with additional checks, let's allow passing an
uninitialized qht to qht_statistics_init.

While at it, add a test for this to test-qht.

Before the patch (for $ qemu -enable-kvm [...]):
(qemu) info jit
[...]
direct jump count   0 (0%) (2 jumps=0 0%)
Program received signal SIGSEGV, Segmentation fault.

After the patch the "TB hash buckets", "TB hash occupancy"
and "TB hash avg chain" lines are omitted.
(qemu) info jit
[...]
direct jump count   0 (0%) (2 jumps=0 0%)
TB hash buckets     0/0 (-nan% head buckets used)
TB hash occupancy   nan% avg chain occ. Histogram: (null)
TB hash avg chain   nan buckets. Histogram: (null)
[...]

Reported by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1469205390-14369-1-git-send-email-cota@braap.org>
[Extract printing statistics to an entirely separate function. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7266ae91

09 7月, 2016 1 次提交

translate-all: Fix user-mode self-modifying code in 2 page long TB · 7399a337

由 Stanislav Shmarov 提交于 7月 07, 2016

In user-mode emulation Translation Block can consist of 2 guest pages.
In that case QEMU also mprotects 2 host pages that are dedicated for
guest memory, containing instructions. QEMU detects self-modifying code
with SEGFAULT signal processing.

In case if instruction in 1st page is modifying memory of 2nd
page (or vice versa) QEMU will mark 2nd page with PAGE_WRITE,
invalidate TB, generate new TB contatining 1 guest instruction and
exit to CPU loop. QEMU won't call mprotect, and new TB will cause
same SEGFAULT. Page will have both PAGE_WRITE_ORG and PAGE_WRITE
flags, so QEMU will handle the signal as guest binary problem,
and exit with guest SEGFAULT.

Solution is to do following: In case if current TB was invalidated
continue to invalidate TBs from remaining guest pages and mark pages
as PAGE_WRITE. After that disable host page protection with mprotect.
If current tb was invalidated longjmp to main loop. That is more
efficient, since we won't get SEGFAULT when executing new TB.
Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: NStanislav Shmarov <snarpix@gmail.com>
Message-Id: <1467880392-1043630-1-git-send-email-snarpix@gmail.com>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

7399a337

20 6月, 2016 1 次提交

exec: [tcg] Track which vCPU is performing translation and execution · 7c255043

由 Lluís Vilanova 提交于 6月 09, 2016

Information is tracked inside the TCGContext structure, and later used
by tracing events with the 'tcg' and 'vcpu' properties.

The 'cpu' field is used to check tracing of translation-time
events ("*_trans"). The 'tcg_env' field is used to pass it to
execution-time events ("*_exec").
Signed-off-by: NLluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Message-id: 146549350162.18437.3033661139638458143.stgit@fimbulvetr.bsc.es
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

7c255043

17 6月, 2016 1 次提交

os-posix: include sys/mman.h · 02d0e095

由 Paolo Bonzini 提交于 6月 06, 2016

qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check
is bogus without a previous inclusion of sys/mman.h.  Include it in
sysemu/os-posix.h and remove it from everywhere else.
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

02d0e095

12 6月, 2016 3 次提交

translate-all: add tb hash bucket info to 'info jit' dump · 329844d4

由 Emilio G. Cota 提交于 6月 08, 2016

Examples:

- Good hashing, i.e. tb_hash_func5(phys_pc, pc, flags):
TB count            715135/2684354
[...]
TB hash buckets     388775/524288 (74.15% head buckets used)
TB hash occupancy   33.04% avg chain occ. Histogram: [0,10)%|▆ █  ▅▁▃▁▁|[90,100]%
TB hash avg chain   1.017 buckets. Histogram: 1|█▁▁|3

- Not-so-good hashing, i.e. tb_hash_func5(phys_pc, pc, 0):
TB count            712636/2684354
[...]
TB hash buckets     344924/524288 (65.79% head buckets used)
TB hash occupancy   31.64% avg chain occ. Histogram: [0,10)%|█ ▆  ▅▁▃▁▂|[90,100]%
TB hash avg chain   1.047 buckets. Histogram: 1|█▁▁▁|4

- Bad hashing, i.e. tb_hash_func5(phys_pc, 0, 0):
TB count            702818/2684354
[...]
TB hash buckets     112741/524288 (21.50% head buckets used)
TB hash occupancy   10.15% avg chain occ. Histogram: [0,10)%|█ ▁  ▁▁▁▁▁|[90,100]%
TB hash avg chain   2.107 buckets. Histogram: [1.0,10.2)|█▁▁▁▁▁▁▁▁▁|[83.8,93.0]

- Good hashing, but no auto-resize:
TB count            715634/2684354
TB hash buckets     8192/8192 (100.00% head buckets used)
TB hash occupancy   98.30% avg chain occ. Histogram: [95.3,95.8)%|▁▁▃▄▃▄▁▇▁█|[99.5,100.0]%
TB hash avg chain   22.070 buckets. Histogram: [15.0,16.7)|▁▂▅▄█▅▁▁▁▁|[30.3,32.0]
Acked-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Suggested-by: NRichard Henderson <rth@twiddle.net>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-16-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

329844d4

tb hash: track translated blocks with qht · 909eaac9

由 Emilio G. Cota 提交于 6月 08, 2016

Having a fixed-size hash table for keeping track of all translation blocks
is suboptimal: some workloads are just too big or too small to get maximum
performance from the hash table. The MRU promotion policy helps improve
performance when the hash table is a little undersized, but it cannot
make up for severely undersized hash tables.

Furthermore, frequent MRU promotions result in writes that are a scalability
bottleneck. For scalability, lookups should only perform reads, not writes.
This is not a big deal for now, but it will become one once MTTCG matures.

The appended fixes these issues by using qht as the implementation of
the TB hash table. This solution is superior to other alternatives considered,
namely:

- master: implementation in QEMU before this patchset
- xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU.
- xxhash-rcu: fixed buckets + xxhash + RCU list + MRU.
              MRU is implemented here by adding an intermediate struct
              that contains the u32 hash and a pointer to the TB; this
              allows us, on an MRU promotion, to copy said struct (that is not
              at the head), and put this new copy at the head. After a grace
              period, the original non-head struct can be eliminated, and
              after another grace period, freed.
- qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize +
                   no MRU for lookups; MRU for inserts.
The appended solution is the following:
- qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize +
                 no MRU for lookups; MRU for inserts.

The plots below compare the considered solutions. The Y axis shows the
boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis
sweeps the number of buckets (or initial number of buckets for qht-autoresize).
The plots in PNG format (and with errorbars) can be seen here:
  http://imgur.com/a/Awgnq

Each test runs 5 times, and the entire QEMU process is pinned to a
single core for repeatability of results.

                            Host: Intel Xeon E5-2690

  28 ++------------+-------------+-------------+-------------+------------++
     A*****        +             +             +             master **A*** +
  27 ++    *                                                 xxhash ##B###++
     |      A******A******                               xxhash-rcu $$C$$$ |
  26 C$$                  A******A******            qht-fixed-nomru*%%D%%%++
     D%%$$                              A******A******A*qht-dyn-mru A*E****A
  25 ++ %%$$                                          qht-dyn-nomru &&F&&&++
     B#####%                                                               |
  24 ++    #C$$$$$                                                        ++
     |      B###  $                                                        |
     |          ## C$$$$$$                                                 |
  23 ++           #       C$$$$$$                                         ++
     |             B######       C$$$$$$                                %%%D
  22 ++                  %B######       C$$$$$$C$$$$$$C$$$$$$C$$$$$$C$$$$$$C
     |                    D%%%%%%B######      @E@@@@@@    %%%D%%%@@@E@@@@@@E
  21 E@@@@@@E@@@@@@F&&&@@@E@@@&&&D%%%%%%B######B######B######B######B######B
     +             E@@@   F&&&   +      E@     +      F&&&   +             +
  20 ++------------+-------------+-------------+-------------+------------++
     14            16            18            20            22            24
                             log2 number of buckets

                                 Host: Intel i7-4790K

  14.5 ++------------+------------+-------------+------------+------------++
       A**           +            +             +            master **A*** +
    14 ++ **                                                 xxhash ##B###++
  13.5 ++   **                                           xxhash-rcu $$C$$$++
       |                                            qht-fixed-nomru %%D%%% |
    13 ++     A******                                   qht-dyn-mru @@E@@@++
       |             A*****A******A******             qht-dyn-nomru &&F&&& |
  12.5 C$$                               A******A******A*****A******    ***A
    12 ++ $$                                                        A***  ++
       D%%% $$                                                             |
  11.5 ++  %%                                                             ++
       B###  %C$$$$$$                                                      |
    11 ++  ## D%%%%% C$$$$$                                               ++
       |     #      %      C$$$$$$                                         |
  10.5 F&&&&&&B######D%%%%%       C$$$$$$C$$$$$$C$$$$$$C$$$$$C$$$$$$    $$$C
    10 E@@@@@@E@@@@@@B#####B######B######E@@@@@@E@@@%%%D%%%%%D%%%###B######B
       +             F&&          D%%%%%%B######B######B#####B###@@@D%%%   +
   9.5 ++------------+------------+-------------+------------+------------++
       14            16           18            20           22            24
                              log2 number of buckets

Note that the original point before this patch series is X=15 for "master";
the little sensitivity to the increased number of buckets is due to the
poor hashing function in master.

xxhash-rcu has significant overhead due to the constant churn of allocating
and deallocating intermediate structs for implementing MRU. An alternative
would be do consider failed lookups as "maybe not there", and then
acquire the external lock (tb_lock in this case) to really confirm that
there was indeed a failed lookup. This, however, would not be enough
to implement dynamic resizing--this is more complex: see
"Resizable, Scalable, Concurrent Hash Tables via Relativistic
Programming" by Triplett, McKenney and Walpole. This solution was
discarded due to the very coarse RCU read critical sections that we have
in MTTCG; resizing requires waiting for readers after every pointer update,
and resizes require many pointer updates, so this would quickly become
prohibitive.

qht-fixed-nomru shows that MRU promotion is advisable for undersized
hash tables.

However, qht-dyn-mru shows that MRU promotion is not important if the
hash table is properly sized: there is virtually no difference in
performance between qht-dyn-nomru and qht-dyn-mru.

Before this patch, we're at X=15 on "xxhash"; after this patch, we're at
X=15 @ qht-dyn-nomru. This patch thus matches the best performance that we
can achieve with optimum sizing of the hash table, while keeping the hash
table scalable for readers.

The improvement we get before and after this patch for booting debian jessie
with arm-softmmu is:

- Intel Xeon E5-2690: 10.5% less time
- Intel i7-4790K: 5.2% less time

We could get this same improvement _for this particular workload_ by
statically increasing the size of the hash table. But this would hurt
workloads that do not need a large hash table. The dynamic (upward)
resizing allows us to start small and enlarge the hash table as needed.

A quick note on downsizing: the table is resized back to 2**15 buckets
on every tb_flush; this makes sense because it is not guaranteed that the
table will reach the same number of TBs later on (e.g. most bootup code is
thrown away after boot); it makes sense to grow the hash table as
more code blocks are translated. This also avoids the complication of
having to build downsizing hysteresis logic into qht.
Reviewed-by: NSergey Fedorov <serge.fedorov@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-15-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

909eaac9

tb hash: hash phys_pc, pc, and flags with xxhash · 42bd3228

由 Emilio G. Cota 提交于 6月 08, 2016

For some workloads such as arm bootup, tb_phys_hash is performance-critical.
The is due to the high frequency of accesses to the hash table, originated
by (frequent) TLB flushes that wipe out the cpu-private tb_jmp_cache's.
More info:
  https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg05098.html

To dig further into this I modified an arm image booting debian jessie to
immediately shut down after boot. Analysis revealed that quite a bit of time
is unnecessarily spent in tb_phys_hash: the cause is poor hashing that
results in very uneven loading of chains in the hash table's buckets;
the longest observed chain had ~550 elements.

The appended addresses this with two changes:

1) Use xxhash as the hash table's hash function. xxhash is a fast,
   high-quality hashing function.

2) Feed the hashing function with not just tb_phys, but also pc and flags.

This improves performance over using just tb_phys for hashing, since that
resulted in some hash buckets having many TB's, while others getting very few;
with these changes, the longest observed chain on a single hash bucket is
brought down from ~550 to ~40.

Tests show that the other element checked for in tb_find_physical,
cs_base, is always a match when tb_phys+pc+flags are a match,
so hashing cs_base is wasteful. It could be that this is an ARM-only
thing, though. UPDATE:
On Tue, Apr 05, 2016 at 08:41:43 -0700, Richard Henderson wrote:
> The cs_base field is only used by i386 (in 16-bit modes), and sparc (for a TB
> consisting of only a delay slot).
> It may well still turn out to be reasonable to ignore cs_base for hashing.

BTW, after this change the hash table should not be called "tb_hash_phys"
anymore; this is addressed later in this series.

This change gives consistent bootup time improvements. I tested two
host machines:
- Intel Xeon E5-2690: 11.6% less time
- Intel i7-4790K: 19.2% less time

Increasing the number of hash buckets yields further improvements. However,
using a larger, fixed number of buckets can degrade performance for other
workloads that do not translate as many blocks (600K+ for debian-jessie arm
bootup). This is dealt with later in this series.
Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-8-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

42bd3228

09 6月, 2016 3 次提交

cpu-exec: Rename cpu_resume_from_signal() to cpu_loop_exit_noexc() · 6886b980

由 Peter Maydell 提交于 5月 17, 2016

The function cpu_resume_from_signal() is now always called with a
NULL puc argument, and is rather misnamed since it is never called
from a signal handler. It is essentially forcing an exit to the
top level cpu loop but without raising any exception, so rename
it to cpu_loop_exit_noexc() and drop the useless unused argument.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Acked-by: NEduardo Habkost <ehabkost@redhat.com>
Acked-by: NRiku Voipio <riku.voipio@linaro.org>
Message-id: 1463494687-25947-4-git-send-email-peter.maydell@linaro.org

6886b980

user-exec: Push resume-from-signal code out to handle_cpu_signal() · f213e72f

由 Peter Maydell 提交于 5月 17, 2016

Since the only caller of page_unprotect() which might cause it to
need to call cpu_resume_from_signal() is handle_cpu_signal() in
the user-mode code, push the longjump handling out to that function.

Since this is the only caller of cpu_resume_from_signal() which
passes a non-NULL puc argument, split the non-NULL handling into
a new cpu_exit_tb_from_sighandler() function. This allows us
to merge the softmmu and usermode implementations of the
cpu_resume_from_signal() function, which are now identical.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Acked-by: NEduardo Habkost <ehabkost@redhat.com>
Acked-by: NRiku Voipio <riku.voipio@linaro.org>
Message-id: 1463494687-25947-3-git-send-email-peter.maydell@linaro.org

f213e72f

translate-all.c: Don't pass puc, locked to tb_invalidate_phys_page() · 75809229

由 Peter Maydell 提交于 5月 17, 2016

The user-mode-only function tb_invalidate_phys_page() is only
called from two places:
 * page_unprotect(), which passes in a non-zero pc, a puc pointer
   and the value 'true' for the locked argument
 * page_set_flags(), which passes in a zero pc, a NULL puc pointer
   and a 'false' locked argument

If the pc is non-zero then we may call cpu_resume_from_signal(),
which does a longjmp out of the calling code (and out of the
signal handler); this is to cover the case of a target CPU with
"precise self-modifying code" (currently only x86) executing
a store instruction which modifies code in the same TB as the
store itself. Rather than doing the longjump directly here,
return a flag to the caller which indicates whether the current
TB was modified, and move the longjump to page_unprotect.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Acked-by: NEduardo Habkost <ehabkost@redhat.com>
Acked-by: NRiku Voipio <riku.voipio@linaro.org>
Message-id: 1463494687-25947-2-git-send-email-peter.maydell@linaro.org

75809229

23 5月, 2016 1 次提交

memory: remove unnecessary masking of MemoryRegion ram_addr · e4e69794

由 Paolo Bonzini 提交于 3月 01, 2016

mr->ram_block->offset is already aligned to both host and target size
(see qemu_ram_alloc_internal).  Remove further masking as it is
unnecessary.
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e4e69794

19 5月, 2016 1 次提交

cpu: move exec-all.h inclusion out of cpu.h · 63c91552

由 Paolo Bonzini 提交于 3月 15, 2016

exec-all.h contains TCG-specific definitions.  It is not needed outside
TCG-specific files such as translate.c, exec.c or *helper.c.

One generic function had snuck into include/exec/exec-all.h; move it to
include/qom/cpu.h.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

63c91552

13 5月, 2016 15 次提交

tcg: Remove needless CPUState::current_tb · 3213525f

由 Sergey Fedorov 提交于 5月 03, 2016

This field was used for telling cpu_interrupt() to unlink a chain of TBs
being executed when it worked that way. Now, cpu_interrupt() don't do
this anymore. So we don't need this field anymore.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1462273462-14036-1-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

3213525f

tcg: Rework tb_invalidated_flag · 6f789be5

由 Sergey Fedorov 提交于 4月 13, 2016

'tb_invalidated_flag' was meant to catch two events:
 * some TB has been invalidated by tb_phys_invalidate();
 * the whole translation buffer has been flushed by tb_flush().

Then it was checked:
 * in cpu_exec() to ensure that the last executed TB can be safely
   linked to directly call the next one;
 * in cpu_exec_nocache() to decide if the original TB should be provided
   for further possible invalidation along with the temporarily
   generated TB.

It is always safe to patch an invalidated TB since it is not going to be
used anyway. It is also safe to call tb_phys_invalidate() for an already
invalidated TB. Thus, setting this flag in tb_phys_invalidate() is
simply unnecessary. Moreover, it can prevent from pretty proper linking
of TBs, if any arbitrary TB has been invalidated. So just don't touch it
in tb_phys_invalidate().

If this flag is only used to catch whether tb_flush() has been called
then rename it to 'tb_flushed'. Declare it as 'bool' and stick to using
only 'true' and 'false' to set its value. Also, instead of setting it in
tb_gen_code(), just after tb_flush() has been called, do it right inside
of tb_flush().

In cpu_exec(), this flag is used to track if tb_flush() has been called
and have made 'next_tb' (a reference to the last executed TB) invalid
for linking it to directly call the next TB. tb_flush() can be called
during the CPU execution loop from tb_gen_code(), during TB execution or
by another thread while 'tb_lock' is released. Catch for translation
buffer flush reliably by resetting this flag once before first TB lookup
and each time we find it set before trying to add a direct jump. Don't
touch in in tb_find_physical().

Each vCPU has its own execution loop in multithreaded mode and thus
should have its own copy of the flag to be able to reset it with its own
'next_tb' and don't affect any other vCPU execution thread. So make this
flag per-vCPU and move it to CPUState.

In cpu_exec_nocache(), we only need to check if tb_flush() has been
called from tb_gen_code() called by cpu_exec_nocache() itself. To do
this reliably, preserve the old value of the flag, reset it before
calling tb_gen_code(), check afterwards, and combine the saved value
back to the flag.

This patch is based on the patch "tcg: move tb_invalidated_flag to
CPUState" from Paolo Bonzini <pbonzini@redhat.com>.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

6f789be5

tcg: code_bitmap and code_write_count are not used by user-mode emulation · 6fad459c

由 Paolo Bonzini 提交于 8月 11, 2015

Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[Sergey Fedorov: eliminate the field entirely in user-mode]
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NRichard Henderson  <rth@twiddle.net>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
[rth: merged followup fixup]
Message-Id: <1462982777-4513-1-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

6fad459c

tcg: Clean up tb_jmp_unlink() · f9c5b66f

由 Sergey Fedorov 提交于 3月 23, 2016

Unify the code of this function with tb_jmp_remove_from_list(). Making
these functions similar improves their readability. Also this could be a
step towards making this function thread-safe.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

f9c5b66f

tcg: Extract removing of jumps to TB from tb_phys_invalidate() · 89bba496

由 Sergey Fedorov 提交于 3月 23, 2016

Move the code for removing jumps to a TB out of tb_phys_invalidate() to
a separate static inline function tb_jmp_unlink(). This simplifies
tb_phys_invalidate() and improves code structure.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

89bba496

tcg: Rename tb_jmp_remove() to tb_remove_from_jmp_list() · 13362678

由 Sergey Fedorov 提交于 3月 23, 2016

tb_jmp_remove() was only used to remove the TB from a list of all TBs
jumping to the same TB which is n-th jump destination of the given TB.
Put a comment briefly describing the function behavior and rename it to
better reflect its purpose.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

13362678

tcg: Init TB's direct jumps before making it visible · 901bc3de

由 Sergey Fedorov 提交于 3月 22, 2016

Initialize TB's direct jump list data fields and reset the jumps before
tb_link_page() puts it into the physical hash table and the physical
page list. So TB is completely initialized before it becomes visible.

This is pure rearrangement of code to a more suitable place, though it
could be a preparation for relaxing the locking scheme in future.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

901bc3de

tcg: Rearrange tb_link_page() to avoid forward declaration · e90d96b1

由 Sergey Fedorov 提交于 3月 22, 2016

Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

e90d96b1

tcg: Use uintptr_t type for jmp_list_{next|first} fields of TB · c37e6d7e

由 Sergey Fedorov 提交于 3月 21, 2016

These fields do not contain pure pointers to a TranslationBlock
structure. So uintptr_t is the most appropriate type for them.
Also put some asserts to assure that the two least significant bits of
the pointer are always zero before assigning it to jmp_list_first.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

c37e6d7e

tcg: Clean up direct block chaining data fields · f309101c

由 Sergey Fedorov 提交于 4月 10, 2016

Briefly describe in a comment how direct block chaining is done. It
should help in understanding of the following data fields.

Rename some fields in TranslationBlock and TCGContext structures to
better reflect their purpose (dropping excessive 'tb_' prefix in
TranslationBlock but keeping it in TCGContext):
   tb_next_offset  =>  jmp_reset_offset
   tb_jmp_offset   =>  jmp_insn_offset
   tb_next         =>  jmp_target_addr
   jmp_next        =>  jmp_list_next
   jmp_first       =>  jmp_list_first

Avoid using a magic constant as an invalid offset which is used to
indicate that there's no n-th jump generated.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

f309101c

translate-all: Adjust 256mb testing for mips64 · 7ba6a512

由 Richard Henderson 提交于 4月 24, 2016

Make sure we preserve the high 32-bits when masking for mips64.
Signed-off-by: NRichard Henderson <rth@twiddle.net>

7ba6a512

translate-all: add missing munmap of the code_gen guard page for MIPS · 8bdf4997

由 Emilio G. Cota 提交于 4月 21, 2016

Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1461283314-2353-2-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

8bdf4997

translate-all: remove redundant setting of tcg_ctx.code_gen_buffer_size · 835154b6

由 Emilio G. Cota 提交于 4月 21, 2016

The setting of tcg_ctx.code_gen_buffer_size is done by the only caller of
size_code_gen_buffer(), which is code_gen_alloc():

  $ git grep size_code_gen_buffer
  translate-all.c:static inline size_t size_code_gen_buffer(size_t tb_size)
  translate-all.c:    tcg_ctx.code_gen_buffer_size = size_code_gen_buffer(tb_size);
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1461283314-2353-1-git-send-email-cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

835154b6

tcg/ppc: Make direct jump patching thread-safe · 399f1648

由 Sergey Fedorov 提交于 4月 22, 2016

Ensure direct jump patching in PPC is atomic by:
 * limiting translation buffer size in 32-bit mode to be addressable by
   Branch I-form instruction;
 * using atomic_read()/atomic_set() for code patching.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Message-Id: <1461341333-19646-5-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

399f1648

tb: consistently use uint32_t for tb->flags · 89fee74a

由 Emilio G. Cota 提交于 4月 07, 2016

We are inconsistent with the type of tb->flags: usage varies loosely
between int and uint64_t. Settle to uint32_t everywhere, which is
superior to both: at least one target (aarch64) uses the most significant
bit in the u32, and uint64_t is wasteful.

Compile-tested for all targets.
Suggested-by: NLaurent Desnogues <laurent.desnogues@gmail.com>
Suggested-by: NRichard Henderson <rth@twiddle.net>
Tested-by: NEdgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: NEdgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: NLaurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>
Message-Id: <1460049562-23517-1-git-send-email-cota@braap.org>

89fee74a

08 4月, 2016 1 次提交

translate-all: add missing fold of tb_ctx into tcg_ctx · 7e6bd36d

由 Emilio G. Cota 提交于 4月 05, 2016

Since 5e5f07e0 "TCG: Move translation block variables
to new context inside tcg_ctx: tb_ctx" on Feb 1 2013, compilation
of usermode + TB_DEBUG_CHECK has been broken. Fix it.
Signed-off-by: NEmilio G. Cota <cota@braap.org>
Message-Id: <1459834253-8291-2-git-send-email-cota@braap.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7e6bd36d

23 3月, 2016 2 次提交

qemu-log: dfilter-ise exec, out_asm, op and opt_op · d977e1c2

由 Alex Bennée 提交于 3月 15, 2016

This ensures the code generation debug code will honour -dfilter if set.
For the "exec" tracing I've added a new inline macro for efficiency's
sake.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NAurelien Jarno <aurelien@aureL32.net>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Message-Id: <1458052224-9316-8-git-send-email-alex.bennee@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d977e1c2

tcg: pass down TranslationBlock to tcg_code_gen · 5bd2ec3d

由 Alex Bennée 提交于 3月 15, 2016

My later debugging patches need access to the origin PC which is held in
the TranslationBlock structure. Pass down the whole structure as it also
holds the information about the code start point.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NRichard Henderson  <rth@twiddle.net>
Message-Id: <1458052224-9316-3-git-send-email-alex.bennee@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5bd2ec3d

03 2月, 2016 1 次提交

log: do not unnecessarily include qom/cpu.h · 508127e2

由 Paolo Bonzini 提交于 1月 07, 2016

Split the bits that require it to exec/log.h.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Message-id: 1452174932-28657-8-git-send-email-den@openvz.org
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

508127e2

29 1月, 2016 1 次提交

exec: Clean up includes · 7b31bbc2

由 Peter Maydell 提交于 1月 26, 2016

Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-4-git-send-email-peter.maydell@linaro.org

7b31bbc2