1. 26 10月, 2016 4 次提交
  2. 16 9月, 2016 2 次提交
    • R
      tcg: Merge GETPC and GETRA · 01ecaf43
      Richard Henderson 提交于
      The return address argument to the softmmu template helpers was
      confused.  In the legacy case, we wanted to indicate that there
      is no return address, and so passed in NULL.  However, we then
      immediately subtracted GETPC_ADJ from NULL, resulting in a non-zero
      value, indicating the presence of an (invalid) return address.
      
      Push the GETPC_ADJ subtraction down to the only point it's required:
      immediately before use within cpu_restore_state_from_tb, after all
      NULL pointer checks have been completed.
      
      This makes GETPC and GETRA identical.  Remove GETRA as the lesser
      used macro, replacing all uses with GETPC.
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      01ecaf43
    • R
      tcg: Support arbitrary size + alignment · 85aa8081
      Richard Henderson 提交于
      Previously we allowed fully unaligned operations, but not operations
      that are aligned but with less alignment than the operation size.
      
      In addition, arm32, ia64, mips, and sparc had been omitted from the
      previous overalignment patch, which would have led to that alignment
      being enforced.
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      85aa8081
  3. 09 7月, 2016 3 次提交
    • S
      cputlb: Fix for self-modifying writes across page boundaries · 81daabaf
      Samuel Damashek 提交于
      As it currently stands, QEMU does not properly handle self-modifying code
      when the write is unaligned and crosses a page boundary. The procedure
      for handling a write to the current translation block is to write-protect
      the current translation block, catch the write, split up the translation
      block into the current instruction (which remains write-protected so that
      the current instruction is not modified) and the remaining instructions
      in the translation block, and then restore the CPU state to before the
      write occurred so the write will be retried and successfully executed.
      However, since unaligned writes across pages are split into one-byte
      writes for simplicity, writes to the second page (which is not the
      current TB) may succeed before a write to the current TB is attempted,
      and since these writes are not invalidated before resuming state after
      splitting the TB, these writes will be performed a second time, thus
      corrupting the second page. Credit goes to Patrick Hulin for
      discovering this.
      
      In recent 64-bit versions of Windows running in emulated mode, this
      results in either being very unstable (a BSOD after a couple minutes of
      uptime), or being entirely unable to boot. Windows performs one or more
      8-byte unaligned self-modifying writes (xors) which intersect the end
      of the current TB and the beginning of the next TB, which runs into the
      aforementioned issue. This commit fixes that issue by making the
      unaligned write loop perform the writes in forwards order, instead of
      reverse order. This way, QEMU immediately tries to write to the current
      TB, and splits the TB before any write to the second page is executed.
      The write then proceeds as intended. With this patch applied, I am able
      to boot and use Windows 7 64-bit and Windows 10 64-bit in QEMU without
      KVM.
      
      Per Richard Henderson's input, this patch also ensures the second page
      is in the TLB before executing the write loop, to ensure the second
      page is mapped.
      
      The original discussion of the issue is located at
      http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg02161.html.
      Signed-off-by: NSamuel Damashek <samuel.damashek@invincea.com>
      Message-Id: <20160706182652.16190-1-samuel.damashek@invincea.com>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      81daabaf
    • S
      cputlb: Add address parameter to VICTIM_TLB_HIT · a390284b
      Samuel Damashek 提交于
      [rth: Split out from the original patch.]
      Signed-off-by: NSamuel Damashek <samuel.damashek@invincea.com>
      Message-Id: <20160706182652.16190-1-samuel.damashek@invincea.com>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      a390284b
    • R
      cputlb: Move VICTIM_TLB_HIT out of line · 7e9a7c50
      Richard Henderson 提交于
      There are currently 22 invocations of this function,
      and we're about to increase that number.
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      7e9a7c50
  4. 06 7月, 2016 1 次提交
    • S
      tcg: Improve the alignment check infrastructure · 1f00b27f
      Sergey Sorokin 提交于
      Some architectures (e.g. ARMv8) need the address which is aligned
      to a size more than the size of the memory access.
      To support such check it's enough the current costless alignment
      check implementation in QEMU, but we need to support
      an alignment size specifying.
      Signed-off-by: NSergey Sorokin <afarallax@yandex.ru>
      Message-Id: <1466705806-679898-1-git-send-email-afarallax@yandex.ru>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      [rth: Assert in tcg_canonicalize_memop.  Leave get_alignment_bits
      available for, though unused by, user-mode.  Retain logging difference
      based on ALIGNED_ONLY.]
      1f00b27f
  5. 21 1月, 2016 1 次提交
  6. 11 9月, 2015 2 次提交
  7. 15 8月, 2015 1 次提交
    • P
      exec: drop cpu_can_do_io, just read cpu->can_do_io · 414b15c9
      Paolo Bonzini 提交于
      After commit 626cf8f4 (icount: set can_do_io outside TB execution,
      2014-12-08), can_do_io is set to 1 if not executing code.  It is
      no longer necessary to make this assumption in cpu_can_do_io.
      
      It is also possible to remove the use_icount test, simply by
      never setting cpu->can_do_io to 0 unless use_icount is true.
      
      With these changes cpu_can_do_io boils down to a read of
      cpu->can_do_io.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      414b15c9
  8. 11 6月, 2015 1 次提交
  9. 15 5月, 2015 2 次提交
  10. 26 4月, 2015 3 次提交
  11. 17 2月, 2015 1 次提交
    • P
      exec: make iotlb RCU-friendly · 9d82b5a7
      Paolo Bonzini 提交于
      After the previous patch, TLBs will be flushed on every change to
      the memory mapping.  This patch augments that with synchronization
      of the MemoryRegionSections referred to in the iotlb array.
      
      With this change, it is guaranteed that iotlb_to_region will access
      the correct memory map, even once the TLB will be accessed outside
      the BQL.
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9d82b5a7
  12. 03 11月, 2014 1 次提交
  13. 02 9月, 2014 1 次提交
    • X
      implementing victim TLB for QEMU system emulated TLB · 88e89a57
      Xin Tong 提交于
      QEMU system mode page table walks are expensive. Taken by running QEMU
      qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a
      4-level page tables in guest Linux OS takes ~450 X86 instructions on
      average.
      
      QEMU system mode TLB is implemented using a directly-mapped hashtable.
      This structure suffers from conflict misses. Increasing the
      associativity of the TLB may not be the solution to conflict misses as
      all the ways may have to be walked in serial.
      
      A victim TLB is a TLB used to hold translations evicted from the
      primary TLB upon replacement. The victim TLB lies between the main TLB
      and its refill path. Victim TLB is of greater associativity (fully
      associative in this patch). It takes longer to lookup the victim TLB,
      but its likely better than a full page table walk. The memory
      translation path is changed as follows :
      
      Before Victim TLB:
      1. Inline TLB lookup
      2. Exit code cache on TLB miss.
      3. Check for unaligned, IO accesses
      4. TLB refill.
      5. Do the memory access.
      6. Return to code cache.
      
      After Victim TLB:
      1. Inline TLB lookup
      2. Exit code cache on TLB miss.
      3. Check for unaligned, IO accesses
      4. Victim TLB lookup.
      5. If victim TLB misses, TLB refill
      6. Do the memory access.
      7. Return to code cache
      
      The advantage is that victim TLB can offer more associativity to a
      directly mapped TLB and thus potentially fewer page table walks while
      still keeping the time taken to flush within reasonable limits.
      However, placing a victim TLB before the refill path increase TLB
      refill path as the victim TLB is consulted before the TLB refill. The
      performance results demonstrate that the pros outweigh the cons.
      
      some performance results taken on SPECINT2006 train
      datasets and kernel boot and qemu configure script on an
      Intel(R) Xeon(R) CPU  E5620  @ 2.40GHz Linux machine are shown in the
      Google Doc link below.
      
      https://docs.google.com/spreadsheets/d/1eiItzekZwNQOal_h-5iJmC4tMDi051m9qidi5_nwvH4/edit?usp=sharing
      
      In summary, victim TLB improves the performance of qemu-system-x86_64 by
      11% on average on SPECINT2006, kernelboot and qemu configscript and with
      highest improvement of in 26% in 456.hmmer. And victim TLB does not result
      in any performance degradation in any of the measured benchmarks. Furthermore,
      the implemented victim TLB is architecture independent and is expected to
      benefit other architectures in QEMU as well.
      
      Although there are measurement fluctuations, the performance
      improvement is very significant and by no means in the range of
      noises.
      Signed-off-by: NXin Tong <trent.tong@gmail.com>
      Message-id: 1407202523-23553-1-git-send-email-trent.tong@gmail.com
      Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      88e89a57
  14. 05 6月, 2014 3 次提交
  15. 14 3月, 2014 4 次提交
  16. 11 2月, 2014 2 次提交
  17. 01 2月, 2014 1 次提交
  18. 11 10月, 2013 2 次提交
  19. 03 9月, 2013 3 次提交
  20. 27 8月, 2013 2 次提交