1. 24 2月, 2017 5 次提交
    • A
      cputlb: atomically update tlb fields used by tlb_reset_dirty · b0706b71
      Alex Bennée 提交于
      The main use case for tlb_reset_dirty is to set the TLB_NOTDIRTY flags
      in TLB entries to force the slow-path on writes. This is used to mark
      page ranges containing code which has been translated so it can be
      invalidated if written to. To do this safely we need to ensure the TLB
      entries in question for all vCPUs are updated before we attempt to run
      the code otherwise a race could be introduced.
      
      To achieve this we atomically set the flag in tlb_reset_dirty_range and
      take care when setting it when the TLB entry is filled.
      
      On 32 bit systems attempting to emulate 64 bit guests we don't even
      bother as we might not have the atomic primitives available. MTTCG is
      disabled in this case and can't be forced on. The copy_tlb_helper
      function helps keep the atomic semantics in one place to avoid
      confusion.
      
      The dirty helper function is made static as it isn't used outside of
      cputlb.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      b0706b71
    • A
      cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap · 0336cbf8
      Alex Bennée 提交于
      While the vargs approach was flexible the original MTTCG ended up
      having munge the bits to a bitmap so the data could be used in
      deferred work helpers. Instead of hiding that in cputlb we push the
      change to the API to make it take a bitmap of MMU indexes instead.
      
      For ARM some the resulting flushes end up being quite long so to aid
      readability I've tended to move the index shifting to a new line so
      all the bits being or-ed together line up nicely, for example:
      
          tlb_flush_page_by_mmuidx(other_cs, pageaddr,
                                   (1 << ARMMMUIdx_S1SE1) |
                                   (1 << ARMMMUIdx_S1SE0));
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      [AT: SPARC parts only]
      Reviewed-by: NArtyom Tarasenko <atar4qemu@gmail.com>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      [PM: ARM parts only]
      Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
      0336cbf8
    • K
      cputlb: introduce tlb_flush_* async work. · e3b9ca81
      KONRAD Frederic 提交于
      Some architectures allow to flush the tlb of other VCPUs. This is not a problem
      when we have only one thread for all VCPUs but it definitely needs to be an
      asynchronous work when we are in true multithreaded work.
      
      We take the tb_lock() when doing this to avoid racing with other threads
      which may be invalidating TB's at the same time. The alternative would
      be to use proper atomic primitives to clear the tlb entries en-mass.
      
      This patch doesn't do anything to protect other cputlb function being
      called in MTTCG mode making cross vCPU changes.
      Signed-off-by: NKONRAD Frederic <fred.konrad@greensocs.com>
      [AJB: remove need for g_malloc on defer, make check fixes, tb_lock]
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      e3b9ca81
    • A
      tcg: remove global exit_request · e5143e30
      Alex Bennée 提交于
      There are now only two uses of the global exit_request left.
      
      The first ensures we exit the run_loop when we first start to process
      pending work and in the kick handler. This is just as easily done by
      setting the first_cpu->exit_request flag.
      
      The second use is in the round robin kick routine. The global
      exit_request ensured every vCPU would set its local exit_request and
      cause a full exit of the loop. Now the iothread isn't being held while
      running we can just rely on the kick handler to push us out as intended.
      
      We lightly re-factor the main vCPU thread to ensure cpu->exit_requests
      cause us to exit the main loop and process any IO requests that might
      come along. As an cpu->exit_request may legitimately get squashed
      while processing the EXCP_INTERRUPT exception we also check
      cpu->queued_work_first to ensure queued work is expedited as soon as
      possible.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      e5143e30
    • A
      tcg: rename tcg_current_cpu to tcg_current_rr_cpu · 791158d9
      Alex Bennée 提交于
      ..and make the definition local to cpus. In preparation for MTTCG the
      concept of a global tcg_current_cpu will no longer make sense. However
      we still need to keep track of it in the single-threaded case to be able
      to exit quickly when required.
      
      qemu_cpu_kick_no_halt() moves and becomes qemu_cpu_kick_rr_cpu() to
      emphasise its use-case. qemu_cpu_kick now kicks the relevant cpu as
      well as qemu_kick_rr_cpu() which will become a no-op in MTTCG.
      
      For the time being the setting of the global exit_request remains.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>
      791158d9
  2. 18 2月, 2017 1 次提交
  3. 16 2月, 2017 1 次提交
  4. 01 2月, 2017 1 次提交
  5. 28 1月, 2017 1 次提交
  6. 17 1月, 2017 1 次提交
  7. 13 1月, 2017 1 次提交
  8. 10 1月, 2017 2 次提交
  9. 22 12月, 2016 2 次提交
  10. 23 11月, 2016 1 次提交
  11. 31 10月, 2016 4 次提交
    • A
      memory: Don't use memcpy for ram_device regions · 4a2e242b
      Alex Williamson 提交于
      With a vfio assigned device we lay down a base MemoryRegion registered
      as an IO region, giving us read & write accessors.  If the region
      supports mmap, we lay down a higher priority sub-region MemoryRegion
      on top of the base layer initialized as a RAM device pointer to the
      mmap.  Finally, if we have any quirks for the device (ie. address
      ranges that need additional virtualization support), we put another IO
      sub-region on top of the mmap MemoryRegion.  When this is flattened,
      we now potentially have sub-page mmap MemoryRegions exposed which
      cannot be directly mapped through KVM.
      
      This is as expected, but a subtle detail of this is that we end up
      with two different access mechanisms through QEMU.  If we disable the
      mmap MemoryRegion, we make use of the IO MemoryRegion and service
      accesses using pread and pwrite to the vfio device file descriptor.
      If the mmap MemoryRegion is enabled and results in one of these
      sub-page gaps, QEMU handles the access as RAM, using memcpy to the
      mmap.  Using either pread/pwrite or the mmap directly should be
      correct, but using memcpy causes us problems.  I expect that not only
      does memcpy not necessarily honor the original width and alignment in
      performing a copy, but it potentially also uses processor instructions
      not intended for MMIO spaces.  It turns out that this has been a
      problem for Realtek NIC assignment, which has such a quirk that
      creates a sub-page mmap MemoryRegion access.
      
      To resolve this, we disable memory_access_is_direct() for ram_device
      regions since QEMU assumes that it can use memcpy for those regions.
      Instead we access through MemoryRegionOps, which replaces the memcpy
      with simple de-references of standard sizes to the host memory.
      
      With this patch we attempt to provide unrestricted access to the RAM
      device, allowing byte through qword access as well as unaligned
      access.  The assumption here is that accesses initiated by the VM are
      driven by a device specific driver, which knows the device
      capabilities.  If unaligned accesses are not supported by the device,
      we don't want them to work in a VM by performing multiple aligned
      accesses to compose the unaligned access.  A down-side of this
      philosophy is that the xp command from the monitor attempts to use
      the largest available access weidth, unaware of the underlying
      device.  Using memcpy had this same restriction, but at least now an
      operator can dump individual registers, even if blocks of device
      memory may result in access widths beyond the capabilities of a
      given device (RTL NICs only support up to dword).
      Reported-by: NThorsten Kohfeldt <thorsten.kohfeldt@gmx.de>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      4a2e242b
    • A
      memory: Replace skip_dump flag with "ram_device" · 21e00fa5
      Alex Williamson 提交于
      Setting skip_dump on a MemoryRegion allows us to modify one specific
      code path, but the restriction we're trying to address encompasses
      more than that.  If we have a RAM MemoryRegion backed by a physical
      device, it not only restricts our ability to dump that region, but
      also affects how we should manipulate it.  Here we recognize that
      MemoryRegions do not change to sometimes allow dumps and other times
      not, so we replace setting the skip_dump flag with a new initializer
      so that we know exactly the type of region to which we're applying
      this behavior.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      21e00fa5
    • P
      tcg: comment on which functions have to be called with tb_lock held · 7d7500d9
      Paolo Bonzini 提交于
      softmmu requires more functions to be thread-safe, because translation
      blocks can be invalidated from e.g. notdirty callbacks.  Probably the
      same holds for user-mode emulation, it's just that no one has ever
      tried to produce a coherent locking there.
      
      This patch will guide the introduction of more tb_lock and tb_unlock
      calls for system emulation.
      
      Note that after this patch some (most) of the mentioned functions are
      still called outside tb_lock/tb_unlock.  The next one will rectify this.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Message-Id: <20161027151030.20863-7-alex.bennee@linaro.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7d7500d9
    • A
      translate-all: add DEBUG_LOCKING asserts · 301e40ed
      Alex Bennée 提交于
      This adds asserts to check the locking on the various translation
      engines structures. There are two sets of structures that are protected
      by locks.
      
      The first the l1map and PageDesc structures used to track which
      translation blocks are associated with which physical addresses. In
      user-mode this is covered by the mmap_lock.
      
      The second case are TB context related structures which are protected by
      tb_lock which is also user-mode only.
      
      Currently the asserts do nothing in SoftMMU mode but this will change
      for MTTCG.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Message-Id: <20161027151030.20863-4-alex.bennee@linaro.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      301e40ed
  12. 26 10月, 2016 1 次提交
  13. 25 10月, 2016 1 次提交
  14. 24 10月, 2016 3 次提交
    • P
      cpu: Support a target CPU having a variable page size · 20bccb82
      Peter Maydell 提交于
      Support target CPUs having a page size which isn't knownn
      at compile time. To use this, the CPU implementation should:
       * define TARGET_PAGE_BITS_VARY
       * not define TARGET_PAGE_BITS
       * define TARGET_PAGE_BITS_MIN to the smallest value it
         might possibly want for TARGET_PAGE_BITS
       * call set_preferred_target_page_bits() in its realize
         function to indicate the actual preferred target page
         size for the CPU (and report any error from it)
      
      In CONFIG_USER_ONLY, the CPU implementation should continue
      to define TARGET_PAGE_BITS appropriately for the guest
      OS page size.
      
      Machines which want to take advantage of having the page
      size something larger than TARGET_PAGE_BITS_MIN must
      set the MachineClass minimum_page_bits field to a value
      which they guarantee will be no greater than the preferred
      page size for any CPU they create.
      
      Note that changing the target page size by setting
      minimum_page_bits is a migration compatibility break
      for that machine.
      
      For debugging purposes, attempts to use TARGET_PAGE_SIZE
      before it has been finally confirmed will assert.
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      20bccb82
    • P
      memory: add a per-AddressSpace list of listeners · 9a54635d
      Paolo Bonzini 提交于
      This speeds up MEMORY_LISTENER_CALL noticeably.  Right now,
      with many PCI devices you have N regions added to M AddressSpaces
      (M = # PCI devices with bus-master enabled) and each call looks
      up the whole listener list, with at least M listeners in it.
      Because most of the regions in N are BARs, which are also roughly
      proportional to M, the whole thing is O(M^3).  This changes it
      to O(M^2), which is the best we can do without rewriting the
      whole thing.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9a54635d
    • P
      memory: eliminate global MemoryListeners · d45fa784
      Paolo Bonzini 提交于
      There is none, so just drop the code.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d45fa784
  15. 13 10月, 2016 1 次提交
  16. 27 9月, 2016 5 次提交
  17. 16 9月, 2016 1 次提交
    • R
      tcg: Merge GETPC and GETRA · 01ecaf43
      Richard Henderson 提交于
      The return address argument to the softmmu template helpers was
      confused.  In the legacy case, we wanted to indicate that there
      is no return address, and so passed in NULL.  However, we then
      immediately subtracted GETPC_ADJ from NULL, resulting in a non-zero
      value, indicating the presence of an (invalid) return address.
      
      Push the GETPC_ADJ subtraction down to the only point it's required:
      immediately before use within cpu_restore_state_from_tb, after all
      NULL pointer checks have been completed.
      
      This makes GETPC and GETRA identical.  Remove GETRA as the lesser
      used macro, replacing all uses with GETPC.
      Signed-off-by: NRichard Henderson <rth@twiddle.net>
      01ecaf43
  18. 14 9月, 2016 1 次提交
  19. 06 8月, 2016 1 次提交
  20. 04 8月, 2016 1 次提交
  21. 27 7月, 2016 1 次提交
  22. 12 7月, 2016 4 次提交