1. 11 4月, 2018 1 次提交
    • P
      icount: fix cpu_restore_state_from_tb for non-tb-exit cases · afd46fca
      Pavel Dovgalyuk 提交于
      In icount mode, instructions that access io memory spaces in the middle
      of the translation block invoke TB recompilation.  After recompilation,
      such instructions become last in the TB and are allowed to access io
      memory spaces.
      
      When the code includes instruction like i386 'xchg eax, 0xffffd080'
      which accesses APIC, QEMU goes into an infinite loop of the recompilation.
      
      This instruction includes two memory accesses - one read and one write.
      After the first access, APIC calls cpu_report_tpr_access, which restores
      the CPU state to get the current eip.  But cpu_restore_state_from_tb
      resets the cpu->can_do_io flag which makes the second memory access invalid.
      Therefore the second memory access causes a recompilation of the block.
      Then these operations repeat again and again.
      
      This patch moves resetting cpu->can_do_io flag from
      cpu_restore_state_from_tb to cpu_loop_exit* functions.
      
      It also adds a parameter for cpu_restore_state which controls restoring
      icount.  There is no need to restore icount when we only query CPU state
      without breaking the TB.  Restoring it in such cases leads to the
      incorrect flow of the virtual time.
      
      In most cases new parameter is true (icount should be recalculated).
      But there are two cases in i386 and openrisc when the CPU state is only
      queried without the need to break the TB.  This patch fixes both of
      these cases.
      Signed-off-by: NPavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
      Message-Id: <20180409091320.12504.35329.stgit@pasha-VirtualBox>
      [rth: Make can_do_io setting unconditional; move from cpu_exec;
      make cpu_loop_exit_{noexc,restore} call cpu_loop_exit.]
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      afd46fca
  2. 25 1月, 2018 1 次提交
    • L
      accel/tcg: add size paremeter in tlb_fill() · 98670d47
      Laurent Vivier 提交于
      The MC68040 MMU provides the size of the access that
      triggers the page fault.
      
      This size is set in the Special Status Word which
      is written in the stack frame of the access fault
      exception.
      
      So we need the size in m68k_cpu_unassigned_access() and
      m68k_cpu_handle_mmu_fault().
      
      To be able to do that, this patch modifies the prototype of
      handle_mmu_fault handler, tlb_fill() and probe_write().
      do_unassigned_access() already includes a size parameter.
      
      This patch also updates handle_mmu_fault handlers and
      tlb_fill() of all targets (only parameter, no code change).
      Signed-off-by: NLaurent Vivier <laurent@vivier.eu>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
      Message-Id: <20180118193846.24953-2-laurent@vivier.eu>
      98670d47
  3. 21 12月, 2017 1 次提交
  4. 13 11月, 2017 1 次提交
  5. 25 10月, 2017 6 次提交
    • E
      exec-all: rename tb_free to tb_remove · be1e0117
      Emilio G. Cota 提交于
      We don't really free anything in this function anymore; we just remove
      the TB from the binary search tree.
      Suggested-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      be1e0117
    • E
      translate-all: use a binary search tree to track TBs in TBContext · 2ac01d6d
      Emilio G. Cota 提交于
      This is a prerequisite for supporting multiple TCG contexts, since
      we will have threads generating code in separate regions of
      code_gen_buffer.
      
      For this we need a new field (.size) in struct tb_tc to keep
      track of the size of the translated code. This field uses a size_t
      to avoid adding a hole to the struct, although really an unsigned
      int would have been enough.
      
      The comparison function we use is optimized for the common case:
      insertions. Profiling shows that upon booting debian-arm, 98%
      of comparisons are between existing tb's (i.e. a->size and b->size
      are both !0), which happens during insertions (and removals, but
      those are rare). The remaining cases are lookups. From reading the glib
      sources we see that the first key is always the lookup key. However,
      the code does not assume this to always be the case because this
      behaviour is not guaranteed in the glib docs. However, we embed
      this knowledge in the code as a branch hint for the compiler.
      
      Note that tb_free does not free space in the code_gen_buffer anymore,
      since we cannot easily know whether the tb is the last one inserted
      in code_gen_buffer. The next patch in this series renames tb_free
      to tb_remove to reflect this.
      
      Performance-wise, lookups in tb_find_pc are the same as before:
      O(log n). However, insertions are O(log n) instead of O(1), which
      results in a small slowdown when booting debian-arm:
      
      Performance counter stats for 'build/arm-softmmu/qemu-system-arm \
      	-machine type=virt -nographic -smp 1 -m 4096 \
      	-netdev user,id=unet,hostfwd=tcp::2222-:22 \
      	-device virtio-net-device,netdev=unet \
      	-drive file=img/arm/jessie-arm32.qcow2,id=myblock,index=0,if=none \
      	-device virtio-blk-device,drive=myblock \
      	-kernel img/arm/aarch32-current-linux-kernel-only.img \
      	-append console=ttyAMA0 root=/dev/vda1 \
      	-name arm,debug-threads=on -smp 1' (10 runs):
      
      - Before:
      
             8048.598422      task-clock (msec)         #    0.931 CPUs utilized            ( +-  0.28% )
                  16,974      context-switches          #    0.002 M/sec                    ( +-  0.12% )
                       0      cpu-migrations            #    0.000 K/sec
                  10,125      page-faults               #    0.001 M/sec                    ( +-  1.23% )
          35,144,901,879      cycles                    #    4.367 GHz                      ( +-  0.14% )
         <not supported>      stalled-cycles-frontend
         <not supported>      stalled-cycles-backend
          65,758,252,643      instructions              #    1.87  insns per cycle          ( +-  0.33% )
          10,871,298,668      branches                  # 1350.707 M/sec                    ( +-  0.41% )
             192,322,212      branch-misses             #    1.77% of all branches          ( +-  0.32% )
      
             8.640869419 seconds time elapsed                                          ( +-  0.57% )
      
      - After:
             8146.242027      task-clock (msec)         #    0.923 CPUs utilized            ( +-  1.23% )
                  17,016      context-switches          #    0.002 M/sec                    ( +-  0.40% )
                       0      cpu-migrations            #    0.000 K/sec
                  18,769      page-faults               #    0.002 M/sec                    ( +-  0.45% )
          35,660,956,120      cycles                    #    4.378 GHz                      ( +-  1.22% )
         <not supported>      stalled-cycles-frontend
         <not supported>      stalled-cycles-backend
          65,095,366,607      instructions              #    1.83  insns per cycle          ( +-  1.73% )
          10,803,480,261      branches                  # 1326.192 M/sec                    ( +-  1.95% )
             195,601,289      branch-misses             #    1.81% of all branches          ( +-  0.39% )
      
             8.828660235 seconds time elapsed                                          ( +-  0.38% )
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      2ac01d6d
    • R
      tcg: Remove CF_IGNORE_ICOUNT · 416986d3
      Richard Henderson 提交于
      Now that we have curr_cflags, we can include CF_USE_ICOUNT
      early and then remove it as necessary.
      Reviewed-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      416986d3
    • R
      tcg: Add CF_LAST_IO + CF_USE_ICOUNT to CF_HASH_MASK · 0cf8a44c
      Richard Henderson 提交于
      These flags are used by target/*/translate.c,
      and affect code generation.
      Reviewed-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      0cf8a44c
    • R
    • E
      tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK · 4e2ca83e
      Emilio G. Cota 提交于
      This will enable us to decouple code translation from the value
      of parallel_cpus at any given time. It will also help us minimize
      TB flushes when generating code via EXCP_ATOMIC.
      
      Note that the declaration of parallel_cpus is brought to exec-all.h
      to be able to define there the "curr_cflags" inline.
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      4e2ca83e
  6. 10 10月, 2017 4 次提交
  7. 08 9月, 2017 1 次提交
    • R
      tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h · a8583393
      Richard Henderson 提交于
      Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump
      boolean test.  Replace the tb_set_jmp_target1 ifdef with an unconditional
      function tb_target_set_jmp_target.
      
      While we're touching all backends, add a parameter for tb->tc_ptr;
      we're going to need it shortly for some backends.
      
      Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c.
      
      This opens the possibility for TCG_TARGET_HAS_direct_jump to be
      a runtime decision -- based on host cpu capabilities, the size of
      code_gen_buffer, or a future debugging switch.
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      a8583393
  8. 06 9月, 2017 2 次提交
  9. 20 7月, 2017 1 次提交
  10. 17 7月, 2017 2 次提交
  11. 14 7月, 2017 1 次提交
  12. 05 7月, 2017 1 次提交
  13. 04 7月, 2017 1 次提交
  14. 20 6月, 2017 1 次提交
  15. 06 6月, 2017 1 次提交
    • E
      tcg: Introduce goto_ptr opcode and tcg_gen_lookup_and_goto_ptr · cedbcb01
      Emilio G. Cota 提交于
      Instead of exporting goto_ptr directly to TCG frontends, export
      tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer
      returned by the lookup_tb_ptr() helper. This is the only use case
      we have for goto_ptr and lookup_tb_ptr, so having this function is
      very convenient. Furthermore, it trivially allows us to avoid calling
      the lookup helper if goto_ptr is not implemented by the backend.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1493263764-18657-2-git-send-email-cota@braap.org>
      Message-Id: <1493263764-18657-3-git-send-email-cota@braap.org>
      Message-Id: <1493263764-18657-4-git-send-email-cota@braap.org>
      Message-Id: <1493263764-18657-5-git-send-email-cota@braap.org>
      [rth: Squashed 4 related commits.]
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      cedbcb01
  16. 24 2月, 2017 5 次提交
    • A
      cputlb: introduce tlb_flush_*_all_cpus[_synced] · c3b9a07a
      Alex Bennée 提交于
      This introduces support to the cputlb API for flushing all CPUs TLBs
      with one call. This avoids the need for target helpers to iterate
      through the vCPUs themselves.
      
      An additional variant of the API (_synced) will cause the source vCPUs
      work to be scheduled as "safe work". The result will be all the flush
      operations will be complete by the time the originating vCPU executes
      its safe work. The calling implementation can either end the TB
      straight away (which will then pick up the cpu->exit_request on
      entering the next block) or defer the exit until the architectural
      sync point (usually a barrier instruction).
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      c3b9a07a
    • A
      cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap · 0336cbf8
      Alex Bennée 提交于
      While the vargs approach was flexible the original MTTCG ended up
      having munge the bits to a bitmap so the data could be used in
      deferred work helpers. Instead of hiding that in cputlb we push the
      change to the API to make it take a bitmap of MMU indexes instead.
      
      For ARM some the resulting flushes end up being quite long so to aid
      readability I've tended to move the index shifting to a new line so
      all the bits being or-ed together line up nicely, for example:
      
          tlb_flush_page_by_mmuidx(other_cs, pageaddr,
                                   (1 << ARMMMUIdx_S1SE1) |
                                   (1 << ARMMMUIdx_S1SE0));
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      [AT: SPARC parts only]
      Reviewed-by: NArtyom Tarasenko <atar4qemu@gmail.com>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      [PM: ARM parts only]
      Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
      0336cbf8
    • K
      cputlb: introduce tlb_flush_* async work. · e3b9ca81
      KONRAD Frederic 提交于
      Some architectures allow to flush the tlb of other VCPUs. This is not a problem
      when we have only one thread for all VCPUs but it definitely needs to be an
      asynchronous work when we are in true multithreaded work.
      
      We take the tb_lock() when doing this to avoid racing with other threads
      which may be invalidating TB's at the same time. The alternative would
      be to use proper atomic primitives to clear the tlb entries en-mass.
      
      This patch doesn't do anything to protect other cputlb function being
      called in MTTCG mode making cross vCPU changes.
      Signed-off-by: NKONRAD Frederic <fred.konrad@greensocs.com>
      [AJB: remove need for g_malloc on defer, make check fixes, tb_lock]
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      e3b9ca81
    • A
      tcg: remove global exit_request · e5143e30
      Alex Bennée 提交于
      There are now only two uses of the global exit_request left.
      
      The first ensures we exit the run_loop when we first start to process
      pending work and in the kick handler. This is just as easily done by
      setting the first_cpu->exit_request flag.
      
      The second use is in the round robin kick routine. The global
      exit_request ensured every vCPU would set its local exit_request and
      cause a full exit of the loop. Now the iothread isn't being held while
      running we can just rely on the kick handler to push us out as intended.
      
      We lightly re-factor the main vCPU thread to ensure cpu->exit_requests
      cause us to exit the main loop and process any IO requests that might
      come along. As an cpu->exit_request may legitimately get squashed
      while processing the EXCP_INTERRUPT exception we also check
      cpu->queued_work_first to ensure queued work is expedited as soon as
      possible.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      e5143e30
    • A
      tcg: rename tcg_current_cpu to tcg_current_rr_cpu · 791158d9
      Alex Bennée 提交于
      ..and make the definition local to cpus. In preparation for MTTCG the
      concept of a global tcg_current_cpu will no longer make sense. However
      we still need to keep track of it in the single-threaded case to be able
      to exit quickly when required.
      
      qemu_cpu_kick_no_halt() moves and becomes qemu_cpu_kick_rr_cpu() to
      emphasise its use-case. qemu_cpu_kick now kicks the relevant cpu as
      well as qemu_kick_rr_cpu() which will become a no-op in MTTCG.
      
      For the time being the setting of the global exit_request remains.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>
      791158d9
  17. 16 2月, 2017 1 次提交
  18. 13 1月, 2017 1 次提交
  19. 31 10月, 2016 2 次提交
  20. 26 10月, 2016 1 次提交
  21. 25 10月, 2016 1 次提交
  22. 27 9月, 2016 1 次提交
  23. 16 9月, 2016 1 次提交
    • R
      tcg: Merge GETPC and GETRA · 01ecaf43
      Richard Henderson 提交于
      The return address argument to the softmmu template helpers was
      confused.  In the legacy case, we wanted to indicate that there
      is no return address, and so passed in NULL.  However, we then
      immediately subtracted GETPC_ADJ from NULL, resulting in a non-zero
      value, indicating the presence of an (invalid) return address.
      
      Push the GETPC_ADJ subtraction down to the only point it's required:
      immediately before use within cpu_restore_state_from_tb, after all
      NULL pointer checks have been completed.
      
      This makes GETPC and GETRA identical.  Remove GETRA as the lesser
      used macro, replacing all uses with GETPC.
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      01ecaf43
  24. 14 9月, 2016 1 次提交
  25. 27 7月, 2016 1 次提交