1. 25 1月, 2018 1 次提交
    • L
      accel/tcg: add size paremeter in tlb_fill() · 98670d47
      Laurent Vivier 提交于
      The MC68040 MMU provides the size of the access that
      triggers the page fault.
      
      This size is set in the Special Status Word which
      is written in the stack frame of the access fault
      exception.
      
      So we need the size in m68k_cpu_unassigned_access() and
      m68k_cpu_handle_mmu_fault().
      
      To be able to do that, this patch modifies the prototype of
      handle_mmu_fault handler, tlb_fill() and probe_write().
      do_unassigned_access() already includes a size parameter.
      
      This patch also updates handle_mmu_fault handlers and
      tlb_fill() of all targets (only parameter, no code change).
      Signed-off-by: NLaurent Vivier <laurent@vivier.eu>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
      Message-Id: <20180118193846.24953-2-laurent@vivier.eu>
      98670d47
  2. 19 1月, 2018 1 次提交
    • H
      hostmem-file: add "align" option · 98376843
      Haozhong Zhang 提交于
      When mmap(2) the backend files, QEMU uses the host page size
      (getpagesize(2)) by default as the alignment of mapping address.
      However, some backends may require alignments different than the page
      size. For example, mmap a device DAX (e.g., /dev/dax0.0) on Linux
      kernel 4.13 to an address, which is 4K-aligned but not 2M-aligned,
      fails with a kernel message like
      
      [617494.969768] dax dax0.0: qemu-system-x86: dax_mmap: fail, unaligned vma (0x7fa37c579000 - 0x7fa43c579000, 0x1fffff)
      
      Because there is no common approach to get such alignment requirement,
      we add the 'align' option to 'memory-backend-file', so that users or
      management utils, which have enough knowledge about the backend, can
      specify a proper alignment via this option.
      Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
      Message-Id: <20171211072806.2812-2-haozhong.zhang@intel.com>
      Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      [ehabkost: fixed typo, fixed error_setg() format string]
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      98376843
  3. 16 1月, 2018 1 次提交
    • D
      cpu_physical_memory_sync_dirty_bitmap: Another alignment fix · aa777e29
      Dr. David Alan Gilbert 提交于
      This code has an optimised, word aligned version, and a boring
      unaligned version. My commit f70d3451 fixed one alignment issue, but
      there's another.
      
      The optimised version operates on 'longs' dealing with (typically) 64
      pages at a time, replacing the whole long by a 0 and counting the bits.
      If the Ramblock is less than 64bits in length that long can contain bits
      representing two different RAMBlocks, but the code will update the
      bmap belinging to the 1st RAMBlock only while having updated the total
      dirty page count for both.
      
      This probably didn't matter prior to 6b6712ef which split the dirty
      bitmap by RAMBlock, but now they're separate RAMBlocks we end up
      with a count that doesn't match the state in the bitmaps.
      
      Symptom:
        Migration showing a few dirty pages left to be sent constantly
        Seen on aarch64 and x86 with x86+ovmf
      Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
      Reported-by: NWei Huang <wei@redhat.com>
      Fixes: 6b6712efSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      aa777e29
  4. 30 12月, 2017 2 次提交
  5. 21 12月, 2017 1 次提交
  6. 18 12月, 2017 1 次提交
  7. 21 11月, 2017 1 次提交
    • P
      exec.c: Factor out before/after actions for notdirty memory writes · 27266271
      Peter Maydell 提交于
      The function notdirty_mem_write() has a sequence of actions
      it has to do before and after the actual business of writing
      data to host RAM to ensure that dirty flags are correctly
      updated and we flush any TCG translations for the region.
      We need to do this also in other places that write directly
      to host RAM, most notably the TCG atomic helper functions.
      Pull out the before and after pieces into their own functions.
      
      We use an API where the prepare function stashes the various
      bits of information about the write into a struct for the
      complete function to use, because in the calls for the atomic
      helpers the place where the complete function will be called
      doesn't have the information to hand.
      
      Cc: qemu-stable@nongnu.org
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-id: 1511201308-23580-2-git-send-email-peter.maydell@linaro.org
      27266271
  8. 15 11月, 2017 1 次提交
  9. 13 11月, 2017 1 次提交
  10. 25 10月, 2017 15 次提交
  11. 24 10月, 2017 1 次提交
  12. 20 10月, 2017 1 次提交
    • D
      accel/tcg: allow to invalidate a write TLB entry immediately · f52bfb12
      David Hildenbrand 提交于
      Background: s390x implements Low-Address Protection (LAP). If LAP is
      enabled, writing to effective addresses (before any translation)
      0-511 and 4096-4607 triggers a protection exception.
      
      So we have subpage protection on the first two pages of every address
      space (where the lowcore - the CPU private data resides).
      
      By immediately invalidating the write entry but allowing the caller to
      continue, we force every write access onto these first two pages into
      the slow path. we will get a tlb fault with the specific accessed
      addresses and can then evaluate if protection applies or not.
      
      We have to make sure to ignore the invalid bit if tlb_fill() succeeds.
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Message-Id: <20171016202358.3633-2-david@redhat.com>
      Signed-off-by: NCornelia Huck <cohuck@redhat.com>
      f52bfb12
  13. 11 10月, 2017 1 次提交
  14. 10 10月, 2017 6 次提交
    • E
      exec-all: extract tb->tc_* into a separate struct tc_tb · e7e168f4
      Emilio G. Cota 提交于
      In preparation for adding tc.size to be able to keep track of
      TB's using the binary search tree implementation from glib.
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      e7e168f4
    • E
      exec-all: introduce TB_PAGE_ADDR_FMT · 67a5b5d2
      Emilio G. Cota 提交于
      And fix the following warning when DEBUG_TB_INVALIDATE is enabled
      in translate-all.c:
      
        CC      mipsn32-linux-user/accel/tcg/translate-all.o
      /data/src/qemu/accel/tcg/translate-all.c: In function ‘tb_alloc_page’:
      /data/src/qemu/accel/tcg/translate-all.c:1201:16: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘tb_page_addr_t {aka unsigned int}’ [-Werror=format=]
               printf("protecting code page: 0x" TARGET_FMT_lx "\n",
                      ^
      cc1: all warnings being treated as errors
      /data/src/qemu/rules.mak:66: recipe for target 'accel/tcg/translate-all.o' failed
      make[1]: *** [accel/tcg/translate-all.o] Error 1
      Makefile:328: recipe for target 'subdir-mipsn32-linux-user' failed
      make: *** [subdir-mipsn32-linux-user] Error 2
      cota@flamenco:/data/src/qemu/build ((18f3fe1...) *$)$
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      67a5b5d2
    • E
      exec-all: bring tb->invalid into tb->cflags · 84f1c148
      Emilio G. Cota 提交于
      This gets rid of a hole in struct TranslationBlock.
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      84f1c148
    • E
      tcg: consolidate TB lookups in tb_lookup__cpu_state · f6bb84d5
      Emilio G. Cota 提交于
      This avoids duplicating code. cpu_exec_step will also use the
      new common function once we integrate parallel_cpus into tb->cflags.
      
      Note that in this commit we also fix a race, described by Richard Henderson
      during review. Think of this scenario with threads A and B:
      
         (A) Lookup succeeds for TB in hash without tb_lock
              (B) Sets the TB's tb->invalid flag
              (B) Removes the TB from tb_htable
              (B) Clears all CPU's tb_jmp_cache
         (A) Store TB into local tb_jmp_cache
      
      Given that order of events, (A) will keep executing that invalid TB until
      another flush of its tb_jmp_cache happens, which in theory might never happen.
      We can fix this by checking the tb->invalid flag every time we look up a TB
      from tb_jmp_cache, so that in the above scenario, next time we try to find
      that TB in tb_jmp_cache, we won't, and will therefore be forced to look it
      up in tb_htable.
      
      Performance-wise, I measured a small improvement when booting debian-arm.
      Note that inlining pays off:
      
       Performance counter stats for 'taskset -c 0 qemu-system-arm \
      	-machine type=virt -nographic -smp 1 -m 4096 \
      	-netdev user,id=unet,hostfwd=tcp::2222-:22 \
      	-device virtio-net-device,netdev=unet \
      	-drive file=jessie.qcow2,id=myblock,index=0,if=none \
      	-device virtio-blk-device,drive=myblock \
      	-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
      	-name arm,debug-threads=on -smp 1' (10 runs):
      
      Before:
            18714.917392 task-clock                #    0.952 CPUs utilized            ( +-  0.95% )
                  23,142 context-switches          #    0.001 M/sec                    ( +-  0.50% )
                       1 CPU-migrations            #    0.000 M/sec
                  10,558 page-faults               #    0.001 M/sec                    ( +-  0.95% )
          53,957,727,252 cycles                    #    2.883 GHz                      ( +-  0.91% ) [83.33%]
          24,440,599,852 stalled-cycles-frontend   #   45.30% frontend cycles idle     ( +-  1.20% ) [83.33%]
          16,495,714,424 stalled-cycles-backend    #   30.57% backend  cycles idle     ( +-  0.95% ) [66.66%]
          76,267,572,582 instructions              #    1.41  insns per cycle
                                                   #    0.32  stalled cycles per insn  ( +-  0.87% ) [83.34%]
          12,692,186,323 branches                  #  678.186 M/sec                    ( +-  0.92% ) [83.35%]
             263,486,879 branch-misses             #    2.08% of all branches          ( +-  0.73% ) [83.34%]
      
            19.648474449 seconds time elapsed                                          ( +-  0.82% )
      
      After, w/ inline (this patch):
            18471.376627 task-clock                #    0.955 CPUs utilized            ( +-  0.96% )
                  23,048 context-switches          #    0.001 M/sec                    ( +-  0.48% )
                       1 CPU-migrations            #    0.000 M/sec
                  10,708 page-faults               #    0.001 M/sec                    ( +-  0.81% )
          53,208,990,796 cycles                    #    2.881 GHz                      ( +-  0.98% ) [83.34%]
          23,941,071,673 stalled-cycles-frontend   #   44.99% frontend cycles idle     ( +-  0.95% ) [83.34%]
          16,161,773,848 stalled-cycles-backend    #   30.37% backend  cycles idle     ( +-  0.76% ) [66.67%]
          75,786,269,766 instructions              #    1.42  insns per cycle
                                                   #    0.32  stalled cycles per insn  ( +-  1.24% ) [83.34%]
          12,573,617,143 branches                  #  680.708 M/sec                    ( +-  1.34% ) [83.33%]
             260,235,550 branch-misses             #    2.07% of all branches          ( +-  0.66% ) [83.33%]
      
            19.340502161 seconds time elapsed                                          ( +-  0.56% )
      
      After, w/o inline:
            18791.253967 task-clock                #    0.954 CPUs utilized            ( +-  0.78% )
                  23,230 context-switches          #    0.001 M/sec                    ( +-  0.42% )
                       1 CPU-migrations            #    0.000 M/sec
                  10,563 page-faults               #    0.001 M/sec                    ( +-  1.27% )
          54,168,674,622 cycles                    #    2.883 GHz                      ( +-  0.80% ) [83.34%]
          24,244,712,629 stalled-cycles-frontend   #   44.76% frontend cycles idle     ( +-  1.37% ) [83.33%]
          16,288,648,572 stalled-cycles-backend    #   30.07% backend  cycles idle     ( +-  0.95% ) [66.66%]
          77,659,755,503 instructions              #    1.43  insns per cycle
                                                   #    0.31  stalled cycles per insn  ( +-  0.97% ) [83.34%]
          12,922,780,045 branches                  #  687.702 M/sec                    ( +-  1.06% ) [83.34%]
             261,962,386 branch-misses             #    2.03% of all branches          ( +-  0.71% ) [83.35%]
      
            19.700174670 seconds time elapsed                                          ( +-  0.56% )
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      f6bb84d5
    • E
    • E
      cputlb: bring back tlb_flush_count under !TLB_DEBUG · 83974cf4
      Emilio G. Cota 提交于
      Commit f0aff0f1 ("cputlb: add assert_cpu_is_self checks") buried
      the increment of tlb_flush_count under TLB_DEBUG. This results in
      "info jit" always (mis)reporting 0 TLB flushes when !TLB_DEBUG.
      
      Besides, under MTTCG tlb_flush_count is updated by several threads,
      so in order not to lose counts we'd either have to use atomic ops
      or distribute the counter, which is more scalable.
      
      This patch does the latter by embedding tlb_flush_count in CPUArchState.
      The global count is then easily obtained by iterating over the CPU list.
      
      Note that this change also requires updating the accessors to
      tlb_flush_count to use atomic_read/set whenever there may be conflicting
      accesses (as defined in C11) to it.
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NEmilio G. Cota <cota@braap.org>
      Signed-off-by: NRichard Henderson <richard.henderson@linaro.org>
      83974cf4
  15. 22 9月, 2017 6 次提交