1. 06 6月, 2017 3 次提交
    • E
      target/i386: optimize indirect branches · b4aa2977
      Emilio G. Cota 提交于
      Speed up indirect branches by jumping to the target if it is valid.
      
      Softmmu measurements (see later commit for user-mode numbers):
      
      Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.
      
      -                  SPECint06 (test set), x86_64-softmmu (Ubuntu 16.04 guest). Host: Intel i7-4790K @ 4.00GHz
      
       2.4x +-+--------------------------------------------------------------------------------------------------------------+-+
            |                                                                                                                  |
            |   cross                                                                                                          |
       2.2x +cross+jr..........................................................................+++...........................+-+
            |                                                                                   |                              |
            |                                                                               +++ |                              |
         2x +-+..............................................................................|..|............................+-+
            |                                                                                |  |                              |
            |                                                                                |  |                              |
       1.8x +-+..............................................................................|####...........................+-+
            |                                                                                |# |#                             |
            |                                                                              **** |#                             |
       1.6x +-+............................................................................*.|*.|#...........................+-+
            |                                                                              * |* |#                             |
            |                                                                              * |* |#                             |
       1.4x +-+.......................................................................+++..*.|*.|#...........................+-+
            |                                                      ++++++             #### * |*++#             +++             |
            |                        +++                            |  |              #++# *++*  #          +++ |              |
       1.2x +-+......................###.....####....+++............|..|...........****..#.*..*..#....####...|.###.....####..+-+
            |        +++          **** #  ****  #    ####          ***###          *++*  # *  *  #    #++#  ****|#  +++#++#    |
            |    ****###     +++  *++* #  *++*  #  ++#  #    ####  *|* |#     +++  *  *  # *  *  #  ***  #  *| *|#  ****  #    |
         1x +-++-*++*++#++***###++*++*+#++*+-*++#+****++#++***++#+-*+*++#-+****##++*++*-+#+*++*-+#++*+*++#++*-+*+#++*++*++#-++-+
            |    *  *  #  * *  #  *  * #  *  *  # *  *  #  * *  #  *|* |#  *++* #  *  *  # *  *  #  * *  #  *  * #  *  *  #    |
            |    *  *  #  * *  #  *  * #  *  *  # *  *  #  * *  #  *+*++#  *  * #  *  *  # *  *  #  * *  #  *  * #  *  *  #    |
       0.8x +-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+
               astar   bzip2      gcc   gobmk h264ref   hmmlibquantum      mcf omnetpperlbench   sjengxalancbmk   hmean
        png: http://imgur.com/DU36YFU
      
      NB. 'cross' represents the previous commit.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1493263764-18657-11-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      b4aa2977
    • E
      target/i386: optimize cross-page direct jumps in softmmu · fe620895
      Emilio G. Cota 提交于
      Instead of unconditionally exiting to the exec loop, use the
      gen_jr helper to jump to the target if it is valid.
      
      Perf impact: see next commit's log.
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1493263764-18657-10-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      fe620895
    • E
      target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr · 1ebb1af1
      Emilio G. Cota 提交于
      This helper will be used by subsequent changes.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1493263764-18657-9-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      1ebb1af1
  2. 24 3月, 2017 1 次提交
  3. 13 1月, 2017 1 次提交
  4. 11 1月, 2017 3 次提交
  5. 22 12月, 2016 1 次提交
  6. 21 12月, 2016 1 次提交
    • T
      Move target-* CPU file into a target/ folder · fcf5ef2a
      Thomas Huth 提交于
      We've currently got 18 architectures in QEMU, and thus 18 target-xxx
      folders in the root folder of the QEMU source tree. More architectures
      (e.g. RISC-V, AVR) are likely to be included soon, too, so the main
      folder of the QEMU sources slowly gets quite overcrowded with the
      target-xxx folders.
      To disburden the main folder a little bit, let's move the target-xxx
      folders into a dedicated target/ folder, so that target-xxx/ simply
      becomes target/xxx/ instead.
      
      Acked-by: Laurent Vivier <laurent@vivier.eu> [m68k part]
      Acked-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> [tricore part]
      Acked-by: Michael Walle <michael@walle.cc> [lm32 part]
      Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x part]
      Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [s390x part]
      Acked-by: Eduardo Habkost <ehabkost@redhat.com> [i386 part]
      Acked-by: Artyom Tarasenko <atar4qemu@gmail.com> [sparc part]
      Acked-by: Richard Henderson <rth@twiddle.net> [alpha part]
      Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa part]
      Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [ppc part]
      Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> [cris&microblaze part]
      Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> [unicore32 part]
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      fcf5ef2a
  7. 02 11月, 2016 1 次提交
    • R
      log: Add locking to large logging blocks · 1ee73216
      Richard Henderson 提交于
      Reuse the existing locking provided by stdio to keep in_asm, cpu,
      op, op_opt, op_ind, and out_asm as contiguous blocks.
      
      While it isn't possible to interleave e.g. in_asm or op_opt logs
      because of the TB lock protecting all code generation, it is
      possible to interleave cpu logs, or to interleave a cpu dump with
      an out_asm dump.
      
      For mingw32, we appear to have no viable solution for this.  The locking
      functions are not properly exported from the system runtime library.
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      1ee73216
  8. 26 10月, 2016 9 次提交
    • E
      target-i386: remove helper_lock() · 37b995f6
      Emilio G. Cota 提交于
      It's been superseded by the atomic helpers.
      
      The use of the atomic helpers provides a significant performance and scalability
      improvement. Below is the result of running the atomic_add-test microbenchmark with:
       $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
      , where $n is the number of threads and $r is the allowed range for the additions.
      
      The scenarios measured are:
      - atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
      - cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
      - master: before this patchset
      
      Results sorted in ascending range, i.e. descending degree of contention.
      Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
      Opteron 6376 cores.
      
                      atomic_add-bench: 5000000 ops/thread, [0,1] range
      
        25 ++---------+----------+---------+----------+----------+----------+---++
           + atomic +-E--+       +         +          +          +          +    |
           |cmpxchg +-H--+                                                       |
        20 +Emaster +-N--+                                                      ++
           ||                                                                    |
           |++                                                                   |
           ||                                                                    |
        15 +++                                                                  ++
           |N|                                                                   |
           |+|                                                                   |
        10 ++|                                                                  ++
           |+|+                                                                  |
           | |    -+E+------        +++  ---+E+------+E+------+E+-----+E+------+E|
           |+E+E+- +++     +E+------+E+--                                        |
         5 ++|+                                                                 ++
           |+N+H+---                                 +++                         |
           ++++N+--+H++----+++   +  +++  --++H+------+H+------+H++----+H+---+--- |
         0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
           0          10         20        30         40         50         60
                                      Number of threads
      
                      atomic_add-bench: 5000000 ops/thread, [0,2] range
      
        25 ++---------+----------+---------+----------+----------+----------+---++
           ++atomic +-E--+       +         +          +          +          +    |
           |cmpxchg +-H--+                                                       |
        20 ++master +-N--+                                                      ++
           |E|                                                                   |
           |++                                                                   |
           ||E                                                                   |
        15 ++|                                                                  ++
           |N||                                                                  |
           |+||                                   ---+E+------+E+-----+E+------+E|
        10 ++| |        ---+E+------+E+-----+E+---                    +++      +++
           ||H+E+--+E+--                                                         |
           |+++++                                                                |
           | ||                                                                  |
         5 ++|+H+--                                  +++                        ++
           |+N+    -                              ---+H+------+H+------          |
           +  +N+--+H++----+H+---+--+H+----++H+---    +          +    +H+---+--+H|
         0 ++---------+----------+---------+----------+----------+----------+---++
           0          10         20        30         40         50         60
                                      Number of threads
      
                      atomic_add-bench: 5000000 ops/thread, [0,8] range
      
        40 ++---------+----------+---------+----------+----------+----------+---++
           ++atomic +-E--+       +         +          +          +          +    |
        35 +cmpxchg +-H--+                                                      ++
           | master +-N--+               ---+E+------+E+------+E+-----+E+------+E|
        30 ++|                   ---+E+--   +++                                 ++
           | |            -+E+---                                                |
        25 ++E        ---- +++                                                  ++
           |+++++ -+E+                                                           |
        20 +E+ E-- +++                                                          ++
           |H|+++                                                                |
           |+|                                       +H+-------                  |
        15 ++H+                                   ---+++      +H+------         ++
           |N++H+--                         +++---                    +H+------++|
        10 ++ +++  -       +++           ---+H+                       +++      +H+
           | |     +H+-----+H+------+H+--                                        |
         5 ++|                      +++                                         ++
           ++N+N+--+N++          +         +          +          +          +    |
         0 ++---------+----------+---------+----------+----------+----------+---++
           0          10         20        30         40         50         60
                                      Number of threads
      
                     atomic_add-bench: 5000000 ops/thread, [0,128] range
      
        160 ++---------+---------+----------+---------+----------+----------+---++
            + atomic +-E--+      +          +         +          +          +    |
        140 +cmpxchg +-H--+                          +++      +++               ++
            | master +-N--+                           E--------E------+E+------++|
        120 ++                                      --|        |      +++       E+
            |                                     -- +++      +++              ++|
        100 ++                                   -                              ++
            |                                +++-                     +++      ++|
         80 ++                              -+E+    -+H+------+H+------H--------++
            |                           ----    ----                  +++       H|
            |            ---+E+-----+E+-  ---+H+                               ++|
         60 ++     +E+---   +++  ---+H+---                                      ++
            |    --+++   ---+H+--                                                |
         40 ++ +E+-+H+---                                                       ++
            |  +H+                                                               |
         20 +EE+                                                                ++
            +N+        +         +          +         +          +          +    |
          0 ++N-N---N--+---------+----------+---------+----------+----------+---++
            0          10        20         30        40         50         60
                                      Number of threads
      
                    atomic_add-bench: 5000000 ops/thread, [0,1024] range
      
        350 ++---------+---------+----------+---------+----------+----------+---++
            + atomic +-E--+      +          +         +          +          +    |
        300 +cmpxchg +-H--+                                                    +++
            | master +-N--+                                           +++       ||
            |                                                 +++      |    ----E|
        250 ++                                                 |   ----E----    ++
            |                                              ----E---    |    ---+H|
        200 ++                                      -+E+---   +++  ---+H+---    ++
            |                                   ----         -+H+--              |
            |                                +E+     +++ ---- +++                |
        150 ++                            ---+++  ---+H+-                       ++
            |                          ---  -+H+--                               |
        100 ++                   ---+E+ ---- +++                                ++
            |      +++   ---+E+-----+H+-                                         |
            |     -+E+------+H+--                                                |
         50 ++ +E+                                                              ++
            +EE+       +         +          +         +          +          +    |
          0 ++N-N---N--+---------+----------+---------+----------+----------+---++
            0          10        20         30        40         50         60
                                      Number of threads
      
        hi-res: http://imgur.com/a/fMRmq
      
      For master I stopped measuring master after 8 threads, because there is little
      point in measuring the well-known performance collapse of a contended lock.
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      37b995f6
    • E
      target-i386: emulate XCHG using atomic helper · ea97ebe8
      Emilio G. Cota 提交于
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-19-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      ea97ebe8
    • E
      target-i386: emulate LOCK'ed BTX ops using atomic helpers · cfe819d3
      Emilio G. Cota 提交于
      [rth: Avoid redundant qemu_ld in locked case.  Fix previously unnoticed
      incorrect zero-extension of address in register-offset case.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-18-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      cfe819d3
    • E
      target-i386: emulate LOCK'ed XADD using atomic helper · f53b0181
      Emilio G. Cota 提交于
      [rth: Move load of reg value to common location.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-17-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      f53b0181
    • E
      target-i386: emulate LOCK'ed NEG using cmpxchg helper · 8eb8c738
      Emilio G. Cota 提交于
      [rth: Move redundant qemu_load out of cmpxchg loop.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-16-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      8eb8c738
    • E
      target-i386: emulate LOCK'ed NOT using atomic helper · 2a5fe8ae
      Emilio G. Cota 提交于
      [rth: Avoid qemu_load that's redundant with the atomic op.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-15-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      2a5fe8ae
    • E
      target-i386: emulate LOCK'ed INC using atomic helper · 60e57346
      Emilio G. Cota 提交于
      [rth: Merge gen_inc_locked back into gen_inc to share cc update.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-14-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      60e57346
    • E
      target-i386: emulate LOCK'ed OP instructions using atomic helpers · a7cee522
      Emilio G. Cota 提交于
      [rth: Eliminate some unnecessary temporaries.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-13-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      a7cee522
    • E
      target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers · ae03f8de
      Emilio G. Cota 提交于
      The diff here is uglier than necessary. All this does is to turn
      
      FOO
      
      into:
      
      if (s->prefix & PREFIX_LOCK) {
        BAR
      } else {
        FOO
      }
      
      where FOO is the original implementation of an unlocked cmpxchg.
      
      [rth: Adjust unlocked cmpxchg to use movcond instead of branches.
      Adjust helpers to use atomic helpers.]
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1467054136-10430-6-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      ae03f8de
  9. 24 10月, 2016 1 次提交
    • P
      target-i386: fix 32-bit addresses in LEA · 620abfb0
      Paolo Bonzini 提交于
      This was found with test-i386.  The issue is that instructions
      such as
      
          addr32 lea (%eax), %rax
      
      did not perform a 32-bit extension, because the LEA translation
      skipped the gen_lea_v_seg step.  That step does not just add
      segments, it also takes care of extending from address size to
      pointer size.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      620abfb0
  10. 16 9月, 2016 1 次提交
  11. 02 8月, 2016 1 次提交
    • D
      target-i386: fix typo in xsetbv implementation · ba03584f
      Dave Hansen 提交于
      QEMU 2.6 added support for the XSAVE family of instructions, which
      includes the XSETBV instruction which allows setting the XCR0
      register.
      
      But, when booting Linux kernels with XSAVE support enabled, I was
      getting very early crashes where the instruction pointer was set
      to 0x3.  I tracked it down to a jump instruction generated by this:
      
              gen_jmp_im(s->pc - pc_start);
      
      where s->pc is pointing to the instruction after XSETBV and pc_start
      is pointing _at_ XSETBV.  Subtract the two and you get 0x3.  Whoops.
      
      The fix is to replace this typo with the pattern found everywhere
      else in the file when folks want to end the translation buffer.
      
      Richard Henderson confirmed that this is a bug and that this is the
      correct fix.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: qemu-stable@nongnu.org
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ba03584f
  12. 19 7月, 2016 1 次提交
  13. 20 6月, 2016 1 次提交
  14. 06 6月, 2016 1 次提交
  15. 24 5月, 2016 1 次提交
  16. 23 5月, 2016 1 次提交
    • P
      target-i386: key sfence availability on CPUID_SSE, not CPUID_SSE2 · 14cb949a
      Paolo Bonzini 提交于
      sfence was introduced before lfence and mfence.  This fixes Linux
      2.4's measurement of checksumming speeds for the pIII_sse
      algorithm:
      
      md: linear personality registered as nr 1
      md: raid0 personality registered as nr 2
      md: raid1 personality registered as nr 3
      md: raid5 personality registered as nr 4
      raid5: measuring checksumming speed
         8regs     :   384.400 MB/sec
         32regs    :   259.200 MB/sec
      invalid operand: 0000
      CPU:    0
      EIP:    0010:[<c0240b2a>]    Not tainted
      EFLAGS: 00000246
      eax: c15d8000   ebx: 00000000   ecx: 00000000   edx: c15d5000
      esi: 8005003b   edi: 00000004   ebp: 00000000   esp: c15bdf50
      ds: 0018   es: 0018   ss: 0018
      Process swapper (pid: 1, stackpage=c15bd000)
      Stack: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      00000000
             00000000 00000000 00000000 00000000 00000000 00000000 00000000
      00000000
             00000000 00000206 c0241c6c 00001000 c15d4000 c15d7000 c15d4000
      c15d4000
      Call Trace:    [<c0241c6c>] [<c0105000>] [<c0241db4>] [<c010503b>]
      [<c0105000>]
        [<c0107416>] [<c0105030>]
      
      Code: 0f ae f8 0f 10 04 24 0f 10 4c 24 10 0f 10 54 24 20 0f 10 5c
       <0>Kernel panic: Attempted to kill init!
      Reported-by: NStefan Weil <sw@weilnetz.de>
      Fixes: 121f3157
      Cc: qemu-stable@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      14cb949a
  17. 19 5月, 2016 1 次提交
  18. 13 5月, 2016 3 次提交
  19. 24 3月, 2016 1 次提交
  20. 15 3月, 2016 7 次提交