1. 27 10月, 2017 26 次提交
  2. 26 10月, 2017 1 次提交
    • P
      Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2017-10-24-1' into staging · 325a084c
      Peter Maydell 提交于
      Merge tpm 2017/10/24 v1
      
      # gpg: Signature made Wed 25 Oct 2017 06:06:55 BST
      # gpg:                using RSA key 0x75AD65802A0B4211
      # gpg: Good signature from "Stefan Berger <stefanb@linux.vnet.ibm.com>"
      # gpg: WARNING: This key is not certified with a trusted signature!
      # gpg:          There is no indication that the signature belongs to the owner.
      # Primary key fingerprint: B818 B9CA DF90 89C2 D5CE  C66B 75AD 6580 2A0B 4211
      
      * remotes/stefanberger/tags/pull-tpm-2017-10-24-1:
        tpm: print buffers received from TPM when debugging
        vl: remove unnecessary #ifdef CONFIG_TPM
        tpm: remove unnecessary #ifdef CONFIG_TPM
        tpm: add stubs
        tpm: add missing include
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      325a084c
  3. 25 10月, 2017 13 次提交
    • P
      Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20171025' into staging · ae49fbbc
      Peter Maydell 提交于
      TCG patch queue
      
      # gpg: Signature made Wed 25 Oct 2017 10:30:18 BST
      # gpg:                using RSA key 0x64DF38E8AF7E215F
      # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>"
      # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F
      
      * remotes/rth/tags/pull-tcg-20171025: (51 commits)
        translate-all: exit from tb_phys_invalidate if qht_remove fails
        tcg: Initialize cpu_env generically
        tcg: enable multiple TCG contexts in softmmu
        tcg: introduce regions to split code_gen_buffer
        translate-all: use qemu_protect_rwx/none helpers
        osdep: introduce qemu_mprotect_rwx/none
        tcg: allocate optimizer temps with tcg_malloc
        tcg: distribute profiling counters across TCGContext's
        tcg: introduce **tcg_ctxs to keep track of all TCGContext's
        gen-icount: fold exitreq_label into TCGContext
        tcg: define tcg_init_ctx and make tcg_ctx a pointer
        tcg: take tb_ctx out of TCGContext
        translate-all: report correct avg host TB size
        exec-all: rename tb_free to tb_remove
        translate-all: use a binary search tree to track TBs in TBContext
        tcg: Remove CF_IGNORE_ICOUNT
        tcg: Add CF_LAST_IO + CF_USE_ICOUNT to CF_HASH_MASK
        cpu-exec: lookup/generate TB outside exclusive region during step_atomic
        tcg: check CF_PARALLEL instead of parallel_cpus
        target/sparc: check CF_PARALLEL instead of parallel_cpus
        ...
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      ae49fbbc
    • P
      Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20171023' into staging · 4e1b31db
      Peter Maydell 提交于
      migration/next for 20171023
      
      # gpg: Signature made Mon 23 Oct 2017 17:05:14 BST
      # gpg:                using RSA key 0xF487EF185872D723
      # gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
      # gpg:                 aka "Juan Quintela <quintela@trasno.org>"
      # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03  4B82 F487 EF18 5872 D723
      
      * remotes/juanquintela/tags/migration/20171023: (21 commits)
        migration: Improve migration thread error handling
        qapi: Fix grammar in x-multifd-page-count descriptions
        migration: add bitmap for received page
        migration: introduce qemu_ufd_copy_ioctl helper
        migration: postcopy_place_page factoring out
        migration: new ram_init_bitmaps()
        migration: clean up xbzrle cache init/destroy
        migration: provide ram_state_cleanup
        migration: provide ram_state_init()
        migration: pause-before-switchover for postcopy
        migration: allow cancel to unpause
        migrate: HMP migate_continue
        migration: migrate-continue
        migration: Wait for semaphore before completing migration
        migration: Add 'pre-switchover' and 'device' statuses
        migration: Add 'pause-before-switchover' capability
        migration: Make cache_init() take an error parameter
        migration: Move xbzrle cache resize error handling to xbzrle_cache_resize
        migration: Make cache size elements use the right types
        migratiom: Remove max_item_age parameter
        ...
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      4e1b31db
    • S
    • P
      2f0a1153
    • P
      tpm: remove unnecessary #ifdef CONFIG_TPM · 3fdde7e0
      Philippe Mathieu-Daudé 提交于
      Makefile.objs now checks for $(CONFIG_TPM).
      Suggested-by: NStefan Berger <stefanb@linux.vnet.ibm.com>
      Signed-off-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
      Reviewed-by: NStefan Berger <stefanb@linux.vnet.ibm.com>
      Reviewed-by: NJuan Quintela <quintela@redhat.com>
      Signed-off-by: NStefan Berger <stefanb@linux.vnet.ibm.com>
      3fdde7e0
    • P
      tpm: add stubs · c39f95dc
      Philippe Mathieu-Daudé 提交于
      Commit c37cacab moved tpm_cleanup() in the main loop exit, however this
      function is not available when compiling with --disable-tpm.
      
      Provides necessary stubs to keep code clean of #ifdef'fery.
      Reported-by: NBALATON Zoltan <balaton@eik.bme.hu>
      Message-Id: <20171023102903.256AF7456A0@zero.eik.bme.hu>
      Signed-off-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
      Tested-by: NRichard W.M. Jones <rjones@redhat.com>
      Reviewed-by: NJuan Quintela <quintela@redhat.com>
      Signed-off-by: NStefan Berger <stefanb@linux.vnet.ibm.com>
      c39f95dc
    • E
      translate-all: exit from tb_phys_invalidate if qht_remove fails · cc689485
      Emilio G. Cota 提交于
      Two or more threads might race while invalidating the same TB. We currently
      do not check for this at all despite taking tb_lock, which means we would
      wrongly invalidate the same TB more than once. This bug has actually been
      hit by users: I recently saw a report on IRC, although I have yet to see
      the corresponding test case.
      
      Fix this by using qht_remove as the synchronization point; if it fails,
      that means the TB has already been invalidated, and therefore there
      is nothing left to do in tb_phys_invalidate.
      
      Note that this solution works now that we still have tb_lock, and will
      continue working once we remove tb_lock.
      Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Message-Id: <1508445114-4717-1-git-send-email-cota@braap.org>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      cc689485
    • R
      tcg: Initialize cpu_env generically · 1c2adb95
      Richard Henderson 提交于
      This is identical for each target.  So, move the initialization to
      common code.  Move the variable itself out of tcg_ctx and name it
      cpu_env to minimize changes within targets.
      
      This also means we can remove tcg_global_reg_new_{ptr,i32,i64},
      since there are no longer global-register temps created by targets.
      Reviewed-by: NEmilio G. Cota <cota@braap.org>
      Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      1c2adb95
    • E
      tcg: enable multiple TCG contexts in softmmu · 3468b59e
      Emilio G. Cota 提交于
      This enables parallel TCG code generation. However, we do not take
      advantage of it yet since tb_lock is still held during tb_gen_code.
      
      In user-mode we use a single TCG context; see the documentation
      added to tcg_region_init for the rationale.
      
      Note that targets do not need any conversion: targets initialize a
      TCGContext (e.g. defining TCG globals), and after this initialization
      has finished, the context is cloned by the vCPU threads, each of
      them keeping a separate copy.
      
      TCG threads claim one entry in tcg_ctxs[] by atomically increasing
      n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's
      of that variable and tcg_ctxs; they are there just to play nice with
      analysis tools such as thread sanitizer.
      
      Note that we do not allocate an array of contexts (we allocate
      an array of pointers instead) because when tcg_context_init
      is called, we do not know yet how many contexts we'll use since
      the bool behind qemu_tcg_mttcg_enabled() isn't set yet.
      
      Previous patches folded some TCG globals into TCGContext. The non-const
      globals remaining are only set at init time, i.e. before the TCG
      threads are spawned. Here is a list of these set-at-init-time globals
      under tcg/:
      
      Only written by tcg_context_init:
      - indirect_reg_alloc_order
      - tcg_op_defs
      Only written by tcg_target_init (called from tcg_context_init):
      - tcg_target_available_regs
      - tcg_target_call_clobber_regs
      - arm: arm_arch, use_idiv_instructions
      - i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt,
              have_movbe, have_popcnt
      - mips: use_movnz_instructions, use_mips32_instructions,
              use_mips32r2_instructions, got_sigill (tcg_target_detect_isa)
      - ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr
      - s390: tb_ret_addr, s390_facilities
      - sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines),
               use_vis3_instructions
      
      Only written by tcg_prologue_init:
      - 'struct jit_code_entry one_entry'
      - aarch64: tb_ret_addr
      - arm: tb_ret_addr
      - i386: tb_ret_addr, guest_base_flags
      - ia64: tb_ret_addr
      - mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      3468b59e
    • E
      tcg: introduce regions to split code_gen_buffer · e8feb96f
      Emilio G. Cota 提交于
      This is groundwork for supporting multiple TCG contexts.
      
      The naive solution here is to split code_gen_buffer statically
      among the TCG threads; this however results in poor utilization
      if translation needs are different across TCG threads.
      
      What we do here is to add an extra layer of indirection, assigning
      regions that act just like pages do in virtual memory allocation.
      (BTW if you are wondering about the chosen naming, I did not want
      to use blocks or pages because those are already heavily used in QEMU).
      
      We use a global lock to serialize allocations as well as statistics
      reporting (we now export the size of the used code_gen_buffer with
      tcg_code_size()). Note that for the allocator we could just use
      a counter and atomic_inc; however, that would complicate the gathering
      of tcg_code_size()-like stats. So given that the region operations are
      not a fast path, a lock seems the most reasonable choice.
      
      The effectiveness of this approach is clear after seeing some numbers.
      I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark.
      Note that I'm evaluating this after enabling per-thread TCG (which
      is done by a subsequent commit).
      
      * -smp 1, 1 region (entire buffer):
          qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357
          qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363
          qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364
          qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373
          qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373
          qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360
          qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370
          qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367
      
      That is, 8 flushes.
      
      * -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]:
      
          qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356
          qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361
          qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361
          qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375
          qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375
          qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360
          qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365
          qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368
      
      Again, 8 flushes. Note how buffer utilization is not 100%, but it
      is close. Smaller region sizes would yield higher utilization,
      but we want region allocation to be rare (it acquires a lock), so
      we do not want to go too small.
      
      * -smp 8, static partitioning of 8 regions (10 MB per region):
          qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354
          qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370
          qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365
          qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377
          qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358
          qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367
          qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364
          qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358
          qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362
          qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372
          qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374
          qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376
          qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374
          qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372
          qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359
          qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362
          qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368
          qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378
          qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367
          qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364
      
      That is, 20 flushes. Note how a static partitioning approach uses
      the code buffer poorly, leading to many unnecessary flushes.
      Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      e8feb96f
    • E
      translate-all: use qemu_protect_rwx/none helpers · f51f315a
      Emilio G. Cota 提交于
      The helpers require the address and size to be page-aligned, so
      do that before calling them.
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      f51f315a
    • E
    • E
      tcg: allocate optimizer temps with tcg_malloc · 34184b07
      Emilio G. Cota 提交于
      Groundwork for supporting multiple TCG contexts.
      
      While at it, also allocate temps_used directly as a bitmap of the
      required size, instead of using a bitmap of TCG_MAX_TEMPS via
      TCGTempSet.
      
      Performance-wise we lose about 1.12% in a translation-heavy workload
      such as booting+shutting down debian-arm:
      
      Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \
      	-machine type=virt -nographic -smp 1 -m 4096 \
      	-netdev user,id=unet,hostfwd=tcp::2222-:22 \
      	-device virtio-net-device,netdev=unet \
      	-drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \
      	-device virtio-blk-device,drive=myblock \
      	-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
      	-name arm,debug-threads=on -smp 1' (10 runs):
      
                   exec time (s)  Relative slowdown wrt original (%)
      ---------------------------------------------------------------
       original     20.213321616                                  0.
       tcg_malloc   20.441130078                           1.1270214
       TCGContext   20.477846517                           1.3086662
       g_malloc     20.780527895                           2.8061013
      
      The other two alternatives shown in the table are:
      - TCGContext: embed temps[TCG_MAX_TEMPS] and TCGTempSet used_temps
        in TCGContext. This is simple enough but it isn't faster than using
        tcg_malloc; moreover, it wastes memory.
      - g_malloc: allocate/deallocate both temps and used_temps every time
        tcg_optimize is executed.
      Suggested-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      34184b07