1. 24 2月, 2017 7 次提交
    • A
      cputlb: atomically update tlb fields used by tlb_reset_dirty · b0706b71
      Alex Bennée 提交于
      The main use case for tlb_reset_dirty is to set the TLB_NOTDIRTY flags
      in TLB entries to force the slow-path on writes. This is used to mark
      page ranges containing code which has been translated so it can be
      invalidated if written to. To do this safely we need to ensure the TLB
      entries in question for all vCPUs are updated before we attempt to run
      the code otherwise a race could be introduced.
      
      To achieve this we atomically set the flag in tlb_reset_dirty_range and
      take care when setting it when the TLB entry is filled.
      
      On 32 bit systems attempting to emulate 64 bit guests we don't even
      bother as we might not have the atomic primitives available. MTTCG is
      disabled in this case and can't be forced on. The copy_tlb_helper
      function helps keep the atomic semantics in one place to avoid
      confusion.
      
      The dirty helper function is made static as it isn't used outside of
      cputlb.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      b0706b71
    • A
      cputlb: add tlb_flush_by_mmuidx async routines · e7218445
      Alex Bennée 提交于
      This converts the remaining TLB flush routines to use async work when
      detecting a cross-vCPU flush. The only minor complication is having to
      serialise the var_list of MMU indexes into a form that can be punted
      to an asynchronous job.
      
      The pending_tlb_flush field on QOM's CPU structure also becomes a
      bitfield rather than a boolean.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      e7218445
    • A
      cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap · 0336cbf8
      Alex Bennée 提交于
      While the vargs approach was flexible the original MTTCG ended up
      having munge the bits to a bitmap so the data could be used in
      deferred work helpers. Instead of hiding that in cputlb we push the
      change to the API to make it take a bitmap of MMU indexes instead.
      
      For ARM some the resulting flushes end up being quite long so to aid
      readability I've tended to move the index shifting to a new line so
      all the bits being or-ed together line up nicely, for example:
      
          tlb_flush_page_by_mmuidx(other_cs, pageaddr,
                                   (1 << ARMMMUIdx_S1SE1) |
                                   (1 << ARMMMUIdx_S1SE0));
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      [AT: SPARC parts only]
      Reviewed-by: NArtyom Tarasenko <atar4qemu@gmail.com>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      [PM: ARM parts only]
      Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
      0336cbf8
    • K
      cputlb: introduce tlb_flush_* async work. · e3b9ca81
      KONRAD Frederic 提交于
      Some architectures allow to flush the tlb of other VCPUs. This is not a problem
      when we have only one thread for all VCPUs but it definitely needs to be an
      asynchronous work when we are in true multithreaded work.
      
      We take the tb_lock() when doing this to avoid racing with other threads
      which may be invalidating TB's at the same time. The alternative would
      be to use proper atomic primitives to clear the tlb entries en-mass.
      
      This patch doesn't do anything to protect other cputlb function being
      called in MTTCG mode making cross vCPU changes.
      Signed-off-by: NKONRAD Frederic <fred.konrad@greensocs.com>
      [AJB: remove need for g_malloc on defer, make check fixes, tb_lock]
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      e3b9ca81
    • A
      cputlb: tweak qemu_ram_addr_from_host_nofail reporting · 857baec1
      Alex Bennée 提交于
      This moves the helper function closer to where it is called and updates
      the error message to report via error_report instead of the deprecated
      fprintf.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      857baec1
    • A
      cputlb: add assert_cpu_is_self checks · f0aff0f1
      Alex Bennée 提交于
      For SoftMMU the TLB flushes are an example of a task that can be
      triggered on one vCPU by another. To deal with this properly we need to
      use safe work to ensure these changes are done safely. The new assert
      can be enabled while debugging to catch these cases.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      f0aff0f1
    • J
      tcg: drop global lock during TCG code execution · 8d04fb55
      Jan Kiszka 提交于
      This finally allows TCG to benefit from the iothread introduction: Drop
      the global mutex while running pure TCG CPU code. Reacquire the lock
      when entering MMIO or PIO emulation, or when leaving the TCG loop.
      
      We have to revert a few optimization for the current TCG threading
      model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
      kicking it in qemu_cpu_kick. We also need to disable RAM block
      reordering until we have a more efficient locking mechanism at hand.
      
      Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
      These numbers demonstrate where we gain something:
      
      20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
      20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm
      
      The guest CPU was fully loaded, but the iothread could still run mostly
      independent on a second core. Without the patch we don't get beyond
      
      32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
      32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm
      
      We don't benefit significantly, though, when the guest is not fully
      loading a host CPU.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com>
      [FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
      Signed-off-by: NKONRAD Frederic <fred.konrad@greensocs.com>
      [EGC: fixed iothread lock for cpu-exec IRQ handling]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      [AJB: -smp single-threaded fix, clean commit msg, BQL fixes]
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>
      [PM: target-arm changes]
      Acked-by: NPeter Maydell <peter.maydell@linaro.org>
      8d04fb55
  2. 13 1月, 2017 1 次提交
  3. 28 10月, 2016 1 次提交
  4. 26 10月, 2016 7 次提交
  5. 16 9月, 2016 1 次提交
    • R
      tcg: Merge GETPC and GETRA · 01ecaf43
      Richard Henderson 提交于
      The return address argument to the softmmu template helpers was
      confused.  In the legacy case, we wanted to indicate that there
      is no return address, and so passed in NULL.  However, we then
      immediately subtracted GETPC_ADJ from NULL, resulting in a non-zero
      value, indicating the presence of an (invalid) return address.
      
      Push the GETPC_ADJ subtraction down to the only point it's required:
      immediately before use within cpu_restore_state_from_tb, after all
      NULL pointer checks have been completed.
      
      This makes GETPC and GETRA identical.  Remove GETRA as the lesser
      used macro, replacing all uses with GETPC.
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      01ecaf43
  6. 09 7月, 2016 2 次提交
  7. 29 6月, 2016 1 次提交
    • P
      cputlb: don't cpu_abort() if guest tries to execute outside RAM or RAM · d7f30403
      Peter Maydell 提交于
      In get_page_addr_code(), if the guest program counter turns out not to
      be in ROM or RAM, we can't handle executing from it, and we call
      cpu_abort(). This results in the message
        qemu: fatal: Trying to execute code outside RAM or ROM at 0x08000000
      followed by a guest register dump, and then QEMU dumps core.
      
      This situation happens in one of two cases:
       (1) a guest kernel bug, where it jumped off into nowhere
       (2) a user command line mistake, where they tried to run an image for
           board A on a QEMU model of board B, or where they didn't provide
           an image at all, and QEMU executed through a ROM or RAM full of
           NOP instructions and then fell off the end
      
      In either case, a core dump of QEMU itself is entirely useless, and
      only confuses users into thinking that this is a bug in QEMU rather
      than a bug in the guest or a problem with their command line. (This
      is a variation on the general idea that we shouldn't assert() on
      something the user can accidentally provoke.)
      
      Replace the cpu_abort() with something that explains the situation
      a bit better and exits QEMU without dumping core.
      
      (See LP:1062220 for several examples of confused users.)
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: NRichard Henderson  <rth@twiddle.net>
      Message-id: 1466442425-11885-1-git-send-email-peter.maydell@linaro.org
      d7f30403
  8. 29 5月, 2016 1 次提交
  9. 19 5月, 2016 1 次提交
  10. 13 5月, 2016 1 次提交
  11. 23 3月, 2016 1 次提交
    • A
      cputlb: modernise the debug support · 8526e1f4
      Alex Bennée 提交于
      To avoid cluttering the code with #ifdef legs we wrap up the print
      statements into a tlb_debug() macro. As access to the virtual TLB can
      get quite heavy defining DEBUG_TLB_LOG will ensure all the logs go to
      the qemu_log target of CPU_LOG_MMU instead of stderr. This remains
      compile time optional as these debug statements haven't been considered
      for usefulness for user visible logging.
      
      I've also removed DEBUG_TLB_CHECK which wasn't used.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Message-Id: <1458052224-9316-11-git-send-email-alex.bennee@linaro.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8526e1f4
  12. 07 3月, 2016 1 次提交
  13. 29 1月, 2016 1 次提交
    • P
      exec: Clean up includes · 7b31bbc2
      Peter Maydell 提交于
      Clean up includes so that osdep.h is included first and headers
      which it implies are not included manually.
      
      This commit was created with scripts/clean-includes.
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      Message-id: 1453832250-766-4-git-send-email-peter.maydell@linaro.org
      7b31bbc2
  14. 21 1月, 2016 2 次提交
  15. 16 9月, 2015 2 次提交
  16. 11 9月, 2015 1 次提交
  17. 25 8月, 2015 1 次提交
  18. 05 6月, 2015 2 次提交
  19. 26 4月, 2015 2 次提交
  20. 17 2月, 2015 2 次提交
  21. 17 12月, 2014 1 次提交
    • A
      qemu-log: add log category for MMU info · 339aaf5b
      Antony Pavlov 提交于
      Running barebox on qemu-system-mips* with '-d unimp' overloads
      stderr by very very many mips_cpu_handle_mmu_fault() messages:
      
        mips_cpu_handle_mmu_fault address=b80003fd ret 0 physical 00000000180003fd prot 3
        mips_cpu_handle_mmu_fault address=a0800884 ret 0 physical 0000000000800884 prot 3
        mips_cpu_handle_mmu_fault pc a080cd80 ad b80003fd rw 0 mmu_idx 0
      
      So it's very difficult to find LOG_UNIMP message.
      
      The mips_cpu_handle_mmu_fault() messages appear on enabling ANY
      logging! It's not very handy.
      
      Adding separate log category for *_cpu_handle_mmu_fault()
      logging fixes the problem.
      Signed-off-by: NAntony Pavlov <antonynpavlov@gmail.com>
      Acked-by: NAlexander Graf <agraf@suse.de>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Message-id: 1418489298-1184-1-git-send-email-antonynpavlov@gmail.com
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      339aaf5b
  22. 02 9月, 2014 1 次提交
    • X
      implementing victim TLB for QEMU system emulated TLB · 88e89a57
      Xin Tong 提交于
      QEMU system mode page table walks are expensive. Taken by running QEMU
      qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a
      4-level page tables in guest Linux OS takes ~450 X86 instructions on
      average.
      
      QEMU system mode TLB is implemented using a directly-mapped hashtable.
      This structure suffers from conflict misses. Increasing the
      associativity of the TLB may not be the solution to conflict misses as
      all the ways may have to be walked in serial.
      
      A victim TLB is a TLB used to hold translations evicted from the
      primary TLB upon replacement. The victim TLB lies between the main TLB
      and its refill path. Victim TLB is of greater associativity (fully
      associative in this patch). It takes longer to lookup the victim TLB,
      but its likely better than a full page table walk. The memory
      translation path is changed as follows :
      
      Before Victim TLB:
      1. Inline TLB lookup
      2. Exit code cache on TLB miss.
      3. Check for unaligned, IO accesses
      4. TLB refill.
      5. Do the memory access.
      6. Return to code cache.
      
      After Victim TLB:
      1. Inline TLB lookup
      2. Exit code cache on TLB miss.
      3. Check for unaligned, IO accesses
      4. Victim TLB lookup.
      5. If victim TLB misses, TLB refill
      6. Do the memory access.
      7. Return to code cache
      
      The advantage is that victim TLB can offer more associativity to a
      directly mapped TLB and thus potentially fewer page table walks while
      still keeping the time taken to flush within reasonable limits.
      However, placing a victim TLB before the refill path increase TLB
      refill path as the victim TLB is consulted before the TLB refill. The
      performance results demonstrate that the pros outweigh the cons.
      
      some performance results taken on SPECINT2006 train
      datasets and kernel boot and qemu configure script on an
      Intel(R) Xeon(R) CPU  E5620  @ 2.40GHz Linux machine are shown in the
      Google Doc link below.
      
      https://docs.google.com/spreadsheets/d/1eiItzekZwNQOal_h-5iJmC4tMDi051m9qidi5_nwvH4/edit?usp=sharing
      
      In summary, victim TLB improves the performance of qemu-system-x86_64 by
      11% on average on SPECINT2006, kernelboot and qemu configscript and with
      highest improvement of in 26% in 456.hmmer. And victim TLB does not result
      in any performance degradation in any of the measured benchmarks. Furthermore,
      the implemented victim TLB is architecture independent and is expected to
      benefit other architectures in QEMU as well.
      
      Although there are measurement fluctuations, the performance
      improvement is very significant and by no means in the range of
      noises.
      Signed-off-by: NXin Tong <trent.tong@gmail.com>
      Message-id: 1407202523-23553-1-git-send-email-trent.tong@gmail.com
      Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      88e89a57