提交 · b0706b716769494f321a0d2bfd9fa9893992f995 · openeuler / qemu

24 2月, 2017 7 次提交

cputlb: atomically update tlb fields used by tlb_reset_dirty · b0706b71

由 Alex Bennée 提交于 2月 23, 2017

The main use case for tlb_reset_dirty is to set the TLB_NOTDIRTY flags
in TLB entries to force the slow-path on writes. This is used to mark
page ranges containing code which has been translated so it can be
invalidated if written to. To do this safely we need to ensure the TLB
entries in question for all vCPUs are updated before we attempt to run
the code otherwise a race could be introduced.

To achieve this we atomically set the flag in tlb_reset_dirty_range and
take care when setting it when the TLB entry is filled.

On 32 bit systems attempting to emulate 64 bit guests we don't even
bother as we might not have the atomic primitives available. MTTCG is
disabled in this case and can't be forced on. The copy_tlb_helper
function helps keep the atomic semantics in one place to avoid
confusion.

The dirty helper function is made static as it isn't used outside of
cputlb.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>

b0706b71

cputlb: add tlb_flush_by_mmuidx async routines · e7218445

由 Alex Bennée 提交于 2月 23, 2017

This converts the remaining TLB flush routines to use async work when
detecting a cross-vCPU flush. The only minor complication is having to
serialise the var_list of MMU indexes into a form that can be punted
to an asynchronous job.

The pending_tlb_flush field on QOM's CPU structure also becomes a
bitfield rather than a boolean.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>

e7218445

cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap · 0336cbf8

由 Alex Bennée 提交于 2月 23, 2017

While the vargs approach was flexible the original MTTCG ended up
having munge the bits to a bitmap so the data could be used in
deferred work helpers. Instead of hiding that in cputlb we push the
change to the API to make it take a bitmap of MMU indexes instead.

For ARM some the resulting flushes end up being quite long so to aid
readability I've tended to move the index shifting to a new line so
all the bits being or-ed together line up nicely, for example:

    tlb_flush_page_by_mmuidx(other_cs, pageaddr,
                             (1 << ARMMMUIdx_S1SE1) |
                             (1 << ARMMMUIdx_S1SE0));
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
[AT: SPARC parts only]
Reviewed-by: NArtyom Tarasenko <atar4qemu@gmail.com>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
[PM: ARM parts only]
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>

0336cbf8

cputlb: introduce tlb_flush_* async work. · e3b9ca81

由 KONRAD Frederic 提交于 2月 23, 2017

Some architectures allow to flush the tlb of other VCPUs. This is not a problem
when we have only one thread for all VCPUs but it definitely needs to be an
asynchronous work when we are in true multithreaded work.

We take the tb_lock() when doing this to avoid racing with other threads
which may be invalidating TB's at the same time. The alternative would
be to use proper atomic primitives to clear the tlb entries en-mass.

This patch doesn't do anything to protect other cputlb function being
called in MTTCG mode making cross vCPU changes.
Signed-off-by: NKONRAD Frederic <fred.konrad@greensocs.com>
[AJB: remove need for g_malloc on defer, make check fixes, tb_lock]
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>

e3b9ca81

cputlb: tweak qemu_ram_addr_from_host_nofail reporting · 857baec1

由 Alex Bennée 提交于 2月 23, 2017

This moves the helper function closer to where it is called and updates
the error message to report via error_report instead of the deprecated
fprintf.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>

857baec1

cputlb: add assert_cpu_is_self checks · f0aff0f1

由 Alex Bennée 提交于 2月 23, 2017

For SoftMMU the TLB flushes are an example of a task that can be
triggered on one vCPU by another. To deal with this properly we need to
use safe work to ensure these changes are done safely. The new assert
can be enabled while debugging to catch these cases.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>

f0aff0f1

tcg: drop global lock during TCG code execution · 8d04fb55

由 Jan Kiszka 提交于 2月 23, 2017

This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
These numbers demonstrate where we gain something:

20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com>
[FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
Signed-off-by: NKONRAD Frederic <fred.konrad@greensocs.com>
[EGC: fixed iothread lock for cpu-exec IRQ handling]
Signed-off-by: NEmilio G. Cota <cota@braap.org>
[AJB: -smp single-threaded fix, clean commit msg, BQL fixes]
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>
[PM: target-arm changes]
Acked-by: NPeter Maydell <peter.maydell@linaro.org>

8d04fb55

13 1月, 2017 1 次提交

cputlb: drop flush_global flag from tlb_flush · d10eb08f

由 Alex Bennée 提交于 11月 14, 2016

We have never has the concept of global TLB entries which would avoid
the flush so we never actually use this flag. Drop it and make clear
that tlb_flush is the sledge-hammer it has always been.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
[DG: ppc portions]
Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>

d10eb08f

28 10月, 2016 1 次提交

clean-up: removed duplicate #includes · 814bb12a

由 Anand J 提交于 10月 21, 2016

Some files contain multiple #includes of the same header file.
Removed most of those unnecessary duplicate entries using
scripts/clean-includes.
Reviewed-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NAnand J <anand.indukala@gmail.com>
Signed-off-by: NMichael Tokarev <mjt@tls.msk.ru>

814bb12a

26 10月, 2016 7 次提交

tcg: Add CONFIG_ATOMIC64 · df79b996

由 Richard Henderson 提交于 9月 02, 2016

Allow qemu to build on 32-bit hosts without 64-bit atomic ops.

Even if we only allow 32-bit hosts to multi-thread emulate 32-bit
guests, we still need some way to handle the 32-bit guest using a
64-bit atomic operation.  Do so by dropping back to single-step.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

df79b996

tcg: Add atomic128 helpers · 7ebee43e

由 Richard Henderson 提交于 6月 29, 2016

Force the use of cmpxchg16b on x86_64.

Wikipedia suggests that only very old AMD64 (circa 2004) did not have
this instruction.  Further, it's required by Windows 8 so no new cpus
will ever omit it.

If we truely care about these, then we could check this at startup time
and then avoid executing paths that use it.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

7ebee43e

tcg: Add atomic helpers · c482cb11

由 Richard Henderson 提交于 6月 28, 2016

Add all of cmpxchg, op_fetch, fetch_op, and xchg.
Handle both endian-ness, and sizes up to 8.
Handle expanding non-atomically, when emulating in serial.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

c482cb11

cputlb: Tidy some macros · c86c6e4c

由 Richard Henderson 提交于 7月 08, 2016

TGT_LE and TGT_BE are not size dependent and do not need to be
redefined.  The others are no longer used at all.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

c86c6e4c

cputlb: Move most of iotlb code out of line · 82a45b96

由 Richard Henderson 提交于 7月 08, 2016

Saves 2k code size off of a cold path.
Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

82a45b96

cputlb: Move probe_write out of softmmu_template.h · 3b08f0a9

由 Richard Henderson 提交于 7月 08, 2016

Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

3b08f0a9

cputlb: Replace SHIFT with DATA_SIZE · dea21982

由 Richard Henderson 提交于 7月 08, 2016

Reviewed-by: NEmilio G. Cota <cota@braap.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

dea21982

16 9月, 2016 1 次提交

tcg: Merge GETPC and GETRA · 01ecaf43

由 Richard Henderson 提交于 7月 26, 2016

The return address argument to the softmmu template helpers was
confused.  In the legacy case, we wanted to indicate that there
is no return address, and so passed in NULL.  However, we then
immediately subtracted GETPC_ADJ from NULL, resulting in a non-zero
value, indicating the presence of an (invalid) return address.

Push the GETPC_ADJ subtraction down to the only point it's required:
immediately before use within cpu_restore_state_from_tb, after all
NULL pointer checks have been completed.

This makes GETPC and GETRA identical.  Remove GETRA as the lesser
used macro, replacing all uses with GETPC.
Signed-off-by: NRichard Henderson <rth@twiddle.net>

01ecaf43

09 7月, 2016 2 次提交

cputlb: Add address parameter to VICTIM_TLB_HIT · a390284b

由 Samuel Damashek 提交于 7月 06, 2016

[rth: Split out from the original patch.]
Signed-off-by: NSamuel Damashek <samuel.damashek@invincea.com>
Message-Id: <20160706182652.16190-1-samuel.damashek@invincea.com>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

a390284b

cputlb: Move VICTIM_TLB_HIT out of line · 7e9a7c50

由 Richard Henderson 提交于 7月 08, 2016

There are currently 22 invocations of this function,
and we're about to increase that number.
Signed-off-by: NRichard Henderson <rth@twiddle.net>

7e9a7c50

29 6月, 2016 1 次提交

cputlb: don't cpu_abort() if guest tries to execute outside RAM or RAM · d7f30403

由 Peter Maydell 提交于 6月 20, 2016

In get_page_addr_code(), if the guest program counter turns out not to
be in ROM or RAM, we can't handle executing from it, and we call
cpu_abort(). This results in the message
  qemu: fatal: Trying to execute code outside RAM or ROM at 0x08000000
followed by a guest register dump, and then QEMU dumps core.

This situation happens in one of two cases:
 (1) a guest kernel bug, where it jumped off into nowhere
 (2) a user command line mistake, where they tried to run an image for
     board A on a QEMU model of board B, or where they didn't provide
     an image at all, and QEMU executed through a ROM or RAM full of
     NOP instructions and then fell off the end

In either case, a core dump of QEMU itself is entirely useless, and
only confuses users into thinking that this is a bug in QEMU rather
than a bug in the guest or a problem with their command line. (This
is a variation on the general idea that we shouldn't assert() on
something the user can accidentally provoke.)

Replace the cpu_abort() with something that explains the situation
a bit better and exits QEMU without dumping core.

(See LP:1062220 for several examples of confused users.)
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NRichard Henderson  <rth@twiddle.net>
Message-id: 1466442425-11885-1-git-send-email-peter.maydell@linaro.org

d7f30403

29 5月, 2016 1 次提交

memory: split memory_region_from_host from qemu_ram_addr_from_host · 07bdaa41

由 Paolo Bonzini 提交于 3月 25, 2016

Move the old qemu_ram_addr_from_host to memory_region_from_host and
make it return an offset within the region.  For qemu_ram_addr_from_host
return the ram_addr_t directly, similar to what it was before
commit 1b5ec234 ("memory: return MemoryRegion from qemu_ram_addr_from_host",
2013-07-04).
Reviewed-by: NMarc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

07bdaa41

19 5月, 2016 1 次提交

cpu: move exec-all.h inclusion out of cpu.h · 63c91552

由 Paolo Bonzini 提交于 3月 15, 2016

exec-all.h contains TCG-specific definitions.  It is not needed outside
TCG-specific files such as translate.c, exec.c or *helper.c.

One generic function had snuck into include/exec/exec-all.h; move it to
include/qom/cpu.h.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

63c91552

13 5月, 2016 1 次提交

tcg: Remove needless CPUState::current_tb · 3213525f

由 Sergey Fedorov 提交于 5月 03, 2016

This field was used for telling cpu_interrupt() to unlink a chain of TBs
being executed when it worked that way. Now, cpu_interrupt() don't do
this anymore. So we don't need this field anymore.
Signed-off-by: NSergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1462273462-14036-1-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

3213525f

23 3月, 2016 1 次提交

cputlb: modernise the debug support · 8526e1f4

由 Alex Bennée 提交于 3月 15, 2016

To avoid cluttering the code with #ifdef legs we wrap up the print
statements into a tlb_debug() macro. As access to the virtual TLB can
get quite heavy defining DEBUG_TLB_LOG will ensure all the logs go to
the qemu_log target of CPU_LOG_MMU instead of stderr. This remains
compile time optional as these debug statements haven't been considered
for usefulness for user visible logging.

I've also removed DEBUG_TLB_CHECK which wasn't used.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Message-Id: <1458052224-9316-11-git-send-email-alex.bennee@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8526e1f4

07 3月, 2016 1 次提交

memory: Drop MemoryRegion.ram_addr · 8e41fb63

由 Fam Zheng 提交于 3月 01, 2016

All references to mr->ram_addr are replaced by
memory_region_get_ram_addr(mr) (except for a few assertions that are
replaced with mr->ram_block).
Reviewed-by: NGonglei <arei.gonglei@huawei.com>
Signed-off-by: NFam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-5-git-send-email-famz@redhat.com>
Acked-by: NLaszlo Ersek <lersek@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8e41fb63

29 1月, 2016 1 次提交

exec: Clean up includes · 7b31bbc2

由 Peter Maydell 提交于 1月 26, 2016

Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-4-git-send-email-peter.maydell@linaro.org

7b31bbc2

21 1月, 2016 2 次提交

exec.c: Pass MemTxAttrs to iotlb_to_region so it uses the right AS · a54c87b6

由 Peter Maydell 提交于 1月 21, 2016

Pass the MemTxAttrs for the memory access to iotlb_to_region(); this
allows it to determine the correct AddressSpace to use for the lookup.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Acked-by: NEdgar E. Iglesias <edgar.iglesias@xilinx.com>

a54c87b6

cputlb.c: Use correct address space when looking up MemoryRegionSection · d7898cda

由 Peter Maydell 提交于 1月 21, 2016

When looking up the MemoryRegionSection for the new TLB entry in
tlb_set_page_with_attrs(), use cpu_asidx_from_attrs() to determine
the correct address space index for the lookup, and pass it into
address_space_translate_for_iotlb().
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Acked-by: NEdgar E. Iglesias <edgar.iglesias@xilinx.com>

d7898cda

16 9月, 2015 2 次提交

cputlb: Change tlb_set_dirty() arg to cpu · bcae01e4

由 Peter Crosthwaite 提交于 9月 10, 2015

Change tlb_set_dirty() to accept a CPU instead of an env pointer. This
allows for removal of another CPUArchState usage from prototypes that
need to be QOMified.
Signed-off-by: NPeter Crosthwaite <crosthwaite.peter@gmail.com>
Message-Id: <d2b1dcbe7945112989861d8ba7369449c11cc273.1441614289.git.crosthwaite.peter@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bcae01e4

cputlb: move CPU_LOOP() for tlb_reset() to exec.c · 9a13565d

由 Peter Crosthwaite 提交于 9月 10, 2015

To prepare for multi-arch, cputlb.c should only have awareness of one
single architecture. This means it should not have access to the full
CPU lists which may be heterogeneous. Instead, push the CPU_LOOP() up
to the one and only caller in exec.c.
Signed-off-by: NPeter Crosthwaite <crosthwaite.peter@gmail.com>
Message-Id: <db06dc6c49f8970caaf116d0385f00ee10a56f2f.1441614289.git.crosthwaite.peter@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9a13565d

11 9月, 2015 1 次提交

tlb: Add "ifetch" argument to cpu_mmu_index() · 97ed5ccd

由 Benjamin Herrenschmidt 提交于 8月 17, 2015

This is set to true when the index is for an instruction fetch
translation.

The core get_page_addr_code() sets it, as do the SOFTMMU_CODE_ACCESS
acessors.

All targets ignore it for now, and all other callers pass "false".

This will allow targets who wish to split the mmu index between
instruction and data accesses to do so. A subsequent patch will
do just that for PowerPC.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Message-Id: <1439796853-4410-2-git-send-email-benh@kernel.crashing.org>
Signed-off-by: NRichard Henderson <rth@twiddle.net>

97ed5ccd

25 8月, 2015 1 次提交

cputlb: Add functions for flushing TLB for a single MMU index · d7a74a9d

由 Peter Maydell 提交于 8月 25, 2015

Guest CPU TLB maintenance operations may be sufficiently
specialized to only need to flush TLB entries corresponding
to a particular MMU index. Implement cputlb functions for
this, to avoid the inefficiency of flushing TLB entries
which we don't need to.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NEdgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1439548879-1972-2-git-send-email-peter.maydell@linaro.org

d7a74a9d

05 6月, 2015 2 次提交

memory: replace cpu_physical_memory_reset_dirty() with test-and-clear · 03eebc9e

由 Stefan Hajnoczi 提交于 12月 02, 2014

The cpu_physical_memory_reset_dirty() function is sometimes used
together with cpu_physical_memory_get_dirty().  This is not atomic since
two separate accesses to the dirty memory bitmap are made.

Turn cpu_physical_memory_reset_dirty() and
cpu_physical_memory_clear_dirty_range_type() into the atomic
cpu_physical_memory_test_and_clear_dirty().
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Message-Id: <1417519399-3166-6-git-send-email-stefanha@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

03eebc9e

cputlb: remove useless arguments to tlb_unprotect_code_phys, rename · 9564f52d

由 Paolo Bonzini 提交于 4月 22, 2015

These days modification of the TLB is done in notdirty_mem_write,
so the virtual address and env pointer as unnecessary.

The new name of the function, tlb_unprotect_code, is consistent with
tlb_protect_code.
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9564f52d

26 4月, 2015 2 次提交

Add MemTxAttrs to the IOTLB · fadc1cbe

由 Peter Maydell 提交于 4月 26, 2015

Add a MemTxAttrs field to the IOTLB, and allow target-specific
code to set it via a new tlb_set_page_with_attrs() function;
pass the attributes through to the device when making IO accesses.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NEdgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>

fadc1cbe

Make CPU iotlb a structure rather than a plain hwaddr · e469b22f

由 Peter Maydell 提交于 4月 26, 2015

Make the CPU iotlb a structure rather than a plain hwaddr;
this will allow us to add transaction attributes to it.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NEdgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>

e469b22f

17 2月, 2015 2 次提交

exec: RCUify AddressSpaceDispatch · 79e2b9ae

由 Paolo Bonzini 提交于 1月 21, 2015

Note that even after this patch, most callers of address_space_*
functions must still be under the big QEMU lock, otherwise the memory
region returned by address_space_translate can disappear as soon as
address_space_translate returns.  This will be fixed in the next part
of this series.
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

79e2b9ae

exec: make iotlb RCU-friendly · 9d82b5a7

由 Paolo Bonzini 提交于 8月 16, 2013

After the previous patch, TLBs will be flushed on every change to
the memory mapping.  This patch augments that with synchronization
of the MemoryRegionSections referred to in the iotlb array.

With this change, it is guaranteed that iotlb_to_region will access
the correct memory map, even once the TLB will be accessed outside
the BQL.
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9d82b5a7

17 12月, 2014 1 次提交

qemu-log: add log category for MMU info · 339aaf5b

由 Antony Pavlov 提交于 12月 13, 2014

Running barebox on qemu-system-mips* with '-d unimp' overloads
stderr by very very many mips_cpu_handle_mmu_fault() messages:

  mips_cpu_handle_mmu_fault address=b80003fd ret 0 physical 00000000180003fd prot 3
  mips_cpu_handle_mmu_fault address=a0800884 ret 0 physical 0000000000800884 prot 3
  mips_cpu_handle_mmu_fault pc a080cd80 ad b80003fd rw 0 mmu_idx 0

So it's very difficult to find LOG_UNIMP message.

The mips_cpu_handle_mmu_fault() messages appear on enabling ANY
logging! It's not very handy.

Adding separate log category for *_cpu_handle_mmu_fault()
logging fixes the problem.
Signed-off-by: NAntony Pavlov <antonynpavlov@gmail.com>
Acked-by: NAlexander Graf <agraf@suse.de>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Message-id: 1418489298-1184-1-git-send-email-antonynpavlov@gmail.com
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

339aaf5b

02 9月, 2014 1 次提交

implementing victim TLB for QEMU system emulated TLB · 88e89a57

由 Xin Tong 提交于 8月 04, 2014

QEMU system mode page table walks are expensive. Taken by running QEMU
qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a
4-level page tables in guest Linux OS takes ~450 X86 instructions on
average.

QEMU system mode TLB is implemented using a directly-mapped hashtable.
This structure suffers from conflict misses. Increasing the
associativity of the TLB may not be the solution to conflict misses as
all the ways may have to be walked in serial.

A victim TLB is a TLB used to hold translations evicted from the
primary TLB upon replacement. The victim TLB lies between the main TLB
and its refill path. Victim TLB is of greater associativity (fully
associative in this patch). It takes longer to lookup the victim TLB,
but its likely better than a full page table walk. The memory
translation path is changed as follows :

Before Victim TLB:
1. Inline TLB lookup
2. Exit code cache on TLB miss.
3. Check for unaligned, IO accesses
4. TLB refill.
5. Do the memory access.
6. Return to code cache.

After Victim TLB:
1. Inline TLB lookup
2. Exit code cache on TLB miss.
3. Check for unaligned, IO accesses
4. Victim TLB lookup.
5. If victim TLB misses, TLB refill
6. Do the memory access.
7. Return to code cache

The advantage is that victim TLB can offer more associativity to a
directly mapped TLB and thus potentially fewer page table walks while
still keeping the time taken to flush within reasonable limits.
However, placing a victim TLB before the refill path increase TLB
refill path as the victim TLB is consulted before the TLB refill. The
performance results demonstrate that the pros outweigh the cons.

some performance results taken on SPECINT2006 train
datasets and kernel boot and qemu configure script on an
Intel(R) Xeon(R) CPU E5620 @ 2.40GHz Linux machine are shown in the
Google Doc link below.

https://docs.google.com/spreadsheets/d/1eiItzekZwNQOal_h-5iJmC4tMDi051m9qidi5_nwvH4/edit?usp=sharing

In summary, victim TLB improves the performance of qemu-system-x86_64 by
11% on average on SPECINT2006, kernelboot and qemu configscript and with
highest improvement of in 26% in 456.hmmer. And victim TLB does not result
in any performance degradation in any of the measured benchmarks. Furthermore,
the implemented victim TLB is architecture independent and is expected to
benefit other architectures in QEMU as well.

Although there are measurement fluctuations, the performance
improvement is very significant and by no means in the range of
noises.
Signed-off-by: NXin Tong <trent.tong@gmail.com>
Message-id: 1407202523-23553-1-git-send-email-trent.tong@gmail.com
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

88e89a57