1. 17 2月, 2015 2 次提交
  2. 02 2月, 2015 2 次提交
    • P
      cpu-exec: simplify init_delay_params · 2e91cc62
      Paolo Bonzini 提交于
      With the introduction of QEMU_CLOCK_VIRTUAL_RT, the computation of
      sc->diff_clk can be simplified nicely:
      
              qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) -
              qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
              cpu_get_clock_offset()
      
           =  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) -
              (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - cpu_get_clock_offset())
      
           =  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) -
              (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + timers_state.cpu_clock_offset)
      
           =  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) -
              qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT)
      
      Cc: Sebastian Tanase <sebastian.tanase@openwide.fr>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2e91cc62
    • P
      cpu-exec: simplify align_clocks · a498d0ef
      Paolo Bonzini 提交于
      sc->diff_clk is already equal to sleep_delay (split in a second and a
      nanosecond part).  If you subtract sleep_delay - rem_delay, the result
      is exactly rem_delay.
      
      Cc: Sebastian Tanase <sebastian.tanase@openwide.fr>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a498d0ef
  3. 15 12月, 2014 4 次提交
  4. 26 9月, 2014 23 次提交
  5. 12 9月, 2014 2 次提交
  6. 01 9月, 2014 1 次提交
  7. 12 8月, 2014 1 次提交
    • A
      trace: add some tcg tracing support · 6db8b538
      Alex Bennée 提交于
      This adds a couple of tcg specific trace-events which are useful for
      tracing execution though tcg generated blocks. It's been tested with
      lttng user space tracing but is generic enough for all systems. The tcg
      events are:
      
        * translate_block - when a subject block is translated
        * exec_tb - when a translated block is entered
        * exec_tb_exit - when we exit the translated code
        * exec_tb_nocache - special case translations
      
      Of course we can only trace the entrance to the first block of a chain
      as each block will jump directly to the next when it can. See the -d
      nochain patch to allow more complete tracing at the expense of
      performance.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      6db8b538
  8. 07 8月, 2014 1 次提交
  9. 06 8月, 2014 2 次提交
    • S
      cpu-exec: Print to console if the guest is late · 7f7bc144
      Sebastian Tanase 提交于
      If the align option is enabled, we print to the user whenever
      the guest clock is behind the host clock in order for he/she
      to have a hint about the actual performance. The maximum
      print interval is 2s and we limit the number of messages to 100.
      If desired, this can be changed in cpu-exec.c
      Signed-off-by: NSebastian Tanase <sebastian.tanase@openwide.fr>
      Tested-by: NCamille Bégué <camille.begue@openwide.fr>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7f7bc144
    • S
      cpu-exec: Add sleeping algorithm · c2aa5f81
      Sebastian Tanase 提交于
      The goal is to sleep qemu whenever the guest clock
      is in advance compared to the host clock (we use
      the monotonic clocks). The amount of time to sleep
      is calculated in the execution loop in cpu_exec.
      
      At first, we tried to approximate at each for loop the real time elapsed
      while searching for a TB (generating or retrieving from cache) and
      executing it. We would then approximate the virtual time corresponding
      to the number of virtual instructions executed. The difference between
      these 2 values would allow us to know if the guest is in advance or delayed.
      However, the function used for measuring the real time
      (qemu_clock_get_ns(QEMU_CLOCK_REALTIME)) proved to be very expensive.
      We had an added overhead of 13% of the total run time.
      
      Therefore, we modified the algorithm and only take into account the
      difference between the 2 clocks at the begining of the cpu_exec function.
      During the for loop we try to reduce the advance of the guest only by
      computing the virtual time elapsed and sleeping if necessary. The overhead
      is thus reduced to 3%. Even though this method still has a noticeable
      overhead, it no longer is a bottleneck in trying to achieve a better
      guest frequency for which the guest clock is faster than the host one.
      
      As for the the alignement of the 2 clocks, with the first algorithm
      the guest clock was oscillating between -1 and 1ms compared to the host clock.
      Using the second algorithm we notice that the guest is 5ms behind the host, which
      is still acceptable for our use case.
      
      The tests where conducted using fio and stress. The host machine in an i5 CPU at
      3.10GHz running Debian Jessie (kernel 3.12). The guest machine is an arm versatile-pb
      built with buildroot.
      
      Currently, on our test machine, the lowest icount we can achieve that is suitable for
      aligning the 2 clocks is 6. However, we observe that the IO tests (using fio) are
      slower than the cpu tests (using stress).
      Signed-off-by: NSebastian Tanase <sebastian.tanase@openwide.fr>
      Tested-by: NCamille Bégué <camille.begue@openwide.fr>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c2aa5f81
  10. 13 5月, 2014 1 次提交
  11. 05 4月, 2014 1 次提交