1. 14 11月, 2012 2 次提交
  2. 13 11月, 2012 1 次提交
  3. 11 11月, 2012 2 次提交
  4. 10 11月, 2012 14 次提交
  5. 09 11月, 2012 9 次提交
  6. 06 11月, 2012 2 次提交
    • P
      tools: initialize main loop before block layer · 2592c59a
      Paolo Bonzini 提交于
      Tools were broken because they initialized the block layer while
      qemu_aio_context was still NULL.
      Reported-by: Nmalc <av1474@comtv.ru>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: Nmalc <av1474@comtv.ru>
      2592c59a
    • M
      tcg/ppc32: Use trampolines to trim the code size for mmu slow path accessors · c878da3b
      malc 提交于
      mmu access looks something like:
      
      <check tlb>
      if miss goto slow_path
      <fast path>
      done:
      ...
      
      ; end of the TB
      slow_path:
       <pre process>
       mr r3, r27         ; move areg0 to r3
                          ; (r3 holds the first argument for all the PPC32 ABIs)
       <call mmu_helper>
       b $+8
       .long done
       <post process>
       b done
      
      On ppc32 <call mmu_helper> is:
      
      (SysV and Darwin)
      
      mmu_helper is most likely not within direct branching distance from
      the call site, necessitating
      
      a. moving 32 bit offset of mmu_helper into a GPR ; 8 bytes
      b. moving GPR to CTR/LR                          ; 4 bytes
      c. (finally) branching to CTR/LR                 ; 4 bytes
      
      r3 setting              - 4 bytes
      call                    - 16 bytes
      dummy jump over retaddr - 4 bytes
      embedded retaddr        - 4 bytes
               Total overhead - 28 bytes
      
      (PowerOpen (AIX))
      a. moving 32 bit offset of mmu_helper's TOC into a GPR1 ; 8 bytes
      b. loading 32 bit function pointer into GPR2            ; 4 bytes
      c. moving GPR2 to CTR/LR                                ; 4 bytes
      d. loading 32 bit small area pointer into R2            ; 4 bytes
      e. (finally) branching to CTR/LR                        ; 4 bytes
      
      r3 setting              - 4 bytes
      call                    - 24 bytes
      dummy jump over retaddr - 4 bytes
      embedded retaddr        - 4 bytes
               Total overhead - 36 bytes
      
      Following is done to trim the code size of slow path sections:
      
      In tcg_target_qemu_prologue trampolines are emitted that look like this:
      
      trampoline:
      mfspr r3, LR
      addi  r3, 4
      mtspr LR, r3      ; fixup LR to point over embedded retaddr
      mr    r3, r27
      <jump mmu_helper> ; tail call of sorts
      
      And slow path becomes:
      
      slow_path:
       <pre process>
       <call trampoline>
       .long done
       <post process>
       b done
      
      call                    - 4 bytes (trampoline is within code gen buffer
                                         and most likely accessible via
                                         direct branch)
      embedded retaddr        - 4 bytes
               Total overhead - 8 bytes
      
      In the end the icache pressure is decreased by 20/28 bytes at the cost
      of an extra jump to trampoline and adjusting LR (to skip over embedded
      retaddr) once inside.
      Signed-off-by: Nmalc <av1474@comtv.ru>
      c878da3b
  7. 05 11月, 2012 9 次提交
  8. 04 11月, 2012 1 次提交