1. 30 1月, 2013 1 次提交
  2. 26 1月, 2013 6 次提交
    • P
      mirror: support arbitrarily-sized iterations · 884fea4e
      Paolo Bonzini 提交于
      Yet another optimization is to extend the mirroring iteration to include more
      adjacent dirty blocks.  This limits the number of I/O operations and makes
      mirroring efficient even with a small granularity.  Most of the infrastructure
      is already in place; we only need to put a loop around the computation of
      the origin and sector count of the iteration.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      884fea4e
    • P
      mirror: support more than one in-flight AIO operation · 402a4741
      Paolo Bonzini 提交于
      With AIO support in place, we can start copying more than one chunk
      in parallel.  This patch introduces the required infrastructure for
      this: the buffer is split into multiple granularity-sized chunks,
      and there is a free list to access them.
      
      Because of copy-on-write, a single operation may already require
      multiple chunks to be available on the free list.
      
      In addition, two different iterations on the HBitmap may want to
      copy the same cluster.  We avoid this by keeping a bitmap of in-flight
      I/O operations, and blocking until the previous iteration completes.
      This should be a pretty rare occurrence, though; as long as there is
      no overlap the next iteration can start before the previous one finishes.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      402a4741
    • P
      mirror: switch mirror_iteration to AIO · bd48bde8
      Paolo Bonzini 提交于
      There is really no change in the behavior of the job here, since
      there is still a maximum of one in-flight I/O operation between
      the source and the target.  However, this patch already introduces
      the AIO callbacks (which are unmodified in the next patch)
      and some of the logic to count in-flight operations and only
      complete the job when there is none.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      bd48bde8
    • P
      mirror: perform COW if the cluster size is bigger than the granularity · b812f671
      Paolo Bonzini 提交于
      When mirroring runs, the backing files for the target may not yet be
      ready.  However, this means that a copy-on-write operation on the target
      would fill the missing sectors with zeros.  Copy-on-write only happens
      if the granularity of the dirty bitmap is smaller than the cluster size
      (and only for clusters that are allocated in the source after the job
      has started copying).  So far, the granularity was fixed to 1MB; to avoid
      the problem we detected the situation and required the backing files to
      be available in that case only.
      
      However, we want to lower the granularity for efficiency, so we need
      a better solution.  The solution is to always copy a whole cluster the
      first time it is touched.  The code keeps a bitmap of clusters that
      have already been allocated by the mirroring job, and only does "manual"
      copy-on-write if the chunk being copied is zero in the bitmap.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      b812f671
    • P
      block: implement dirty bitmap using HBitmap · 8f0720ec
      Paolo Bonzini 提交于
      This actually uses the dirty bitmap in the block layer, and converts
      mirroring to use an HBitmapIter.
      
      Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts)
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      8f0720ec
    • P
      add hierarchical bitmap data type and test cases · e7c033c3
      Paolo Bonzini 提交于
      HBitmaps provides an array of bits.  The bits are stored as usual in an
      array of unsigned longs, but HBitmap is also optimized to provide fast
      iteration over set bits; going from one bit to the next is O(logB n)
      worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
      that the number of levels is in fact fixed.
      
      In order to do this, it stacks multiple bitmaps with progressively coarser
      granularity; in all levels except the last, bit N is set iff the N-th
      unsigned long is nonzero in the immediately next level.  When iteration
      completes on the last level it can examine the 2nd-last level to quickly
      skip entire words, and even do so recursively to skip blocks of 64 words or
      powers thereof (32 on 32-bit machines).
      
      Given an index in the bitmap, it can be split in group of bits like
      this (for the 64-bit case):
      
           bits 0-57 => word in the last bitmap     | bits 58-63 => bit in the word
           bits 0-51 => word in the 2nd-last bitmap | bits 52-57 => bit in the word
           bits 0-45 => word in the 3rd-last bitmap | bits 46-51 => bit in the word
      
      So it is easy to move up simply by shifting the index right by
      log2(BITS_PER_LONG) bits.  To move down, you shift the index left
      similarly, and add the word index within the group.  Iteration uses
      ffs (find first set bit) to find the next word to examine; this
      operation can be done in constant time in most current architectures.
      
      Setting or clearing a range of m bits on all levels, the work to perform
      is O(m + m/W + m/W^2 + ...), which is O(m) like on a regular bitmap.
      
      When iterating on a bitmap, each bit (on any level) is only visited
      once.  Hence, The total cost of visiting a bitmap with m bits in it is
      the number of bits that are set in all bitmaps.  Unless the bitmap is
      extremely sparse, this is also O(m + m/W + m/W^2 + ...), so the amortized
      cost of advancing from one bit to the next is usually constant.
      Reviewed-by: NLaszlo Ersek <lersek@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      e7c033c3
  3. 22 1月, 2013 1 次提交
  4. 19 1月, 2013 2 次提交
  5. 08 1月, 2013 1 次提交
  6. 02 1月, 2013 2 次提交
    • S
      dataplane: add virtio-blk data plane code · e72f66a0
      Stefan Hajnoczi 提交于
      virtio-blk-data-plane is a subset implementation of virtio-blk.  It only
      handles read, write, and flush requests.  It does this using a dedicated
      thread that executes an epoll(2)-based event loop and processes I/O
      using Linux AIO.
      
      This approach performs very well but can be used for raw image files
      only.  The number of IOPS achieved has been reported to be several times
      higher than the existing virtio-blk implementation.
      
      Eventually it should be possible to unify virtio-blk-data-plane with the
      main body of QEMU code once the block layer and hardware emulation is
      able to run outside the global mutex.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      e72f66a0
    • S
      dataplane: add virtqueue vring code · 88807f89
      Stefan Hajnoczi 提交于
      The virtio-blk-data-plane cannot access memory using the usual QEMU
      functions since it executes outside the global mutex and the memory APIs
      are this time are not thread-safe.
      
      This patch introduces a virtqueue module based on the kernel's vhost
      vring code.  The trick is that we map guest memory ahead of time and
      access it cheaply outside the global mutex.
      
      Once the hardware emulation code can execute outside the global mutex it
      will be possible to drop this code.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      88807f89
  7. 17 12月, 2012 1 次提交
    • M
      spice-qemu-char: add spiceport chardev · 5a49d3e9
      Marc-André Lureau 提交于
      Add a new spice chardev to allow arbitrary communication between the
      host and the Spice client via the spice server.
      
      Examples:
      
      This allows the Spice client to have a special port for the qemu
      monitor:
      
      ... -chardev spiceport,name=org.qemu.monitor,id=monitorport
          -mon chardev=monitorport
      
      v2:
      - remove support for chardev to chardev linking
      - conditionnaly compile with SPICE_SERVER_VERSION
      Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
      5a49d3e9
  8. 14 12月, 2012 1 次提交
  9. 16 11月, 2012 2 次提交
    • G
      usb-host: update tracing · 8c908fca
      Gerd Hoffmann 提交于
      Now that we have separate status and length fields in USBPacket
      update the completion tracepoint to log both.
      Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
      8c908fca
    • G
      ehci: handle dma errors · 55903f1d
      Gerd Hoffmann 提交于
      Starting with commit 1c380f94 dma
      transfers can actually fail.  This patch makes ehci keep track
      of the busmaster bit in pci config space, by setting/clearing the
      dma_context pointer.  Attempts to dma without context will result
      in raising HSE (Host System Error) interrupt and stopping the host
      controller.
      
      This patch fixes WinXP not booting with a usb stick attached to ehci.
      Root cause is seabios activating ehci so you can boot from the stick,
      and WinXP clearing the busmaster bit before resetting the host
      controller, leading to ehci actually trying dma while it is disabled.
      Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
      55903f1d
  10. 05 11月, 2012 1 次提交
  11. 01 11月, 2012 1 次提交
  12. 31 10月, 2012 1 次提交
    • P
      aio: add generic thread-pool facility · d354c7ec
      Paolo Bonzini 提交于
      Add a generic thread-pool.  The code is roughly based on posix-aio-compat.c,
      with some changes, especially the following:
      
      - use QemuSemaphore instead of QemuCond;
      
      - separate the state of the thread from the return code of the worker
      function.  The return code is totally opaque for the thread pool;
      
      - do not busy wait when doing cancellation.
      
      A more generic threadpool (but still specific to I/O so that in the future
      it can use special scheduling classes or PI mutexes) can have many uses:
      it allows more flexibility in raw-posix.c and can more easily be extended
      to Win32, and it will also be used to do an msync of the persistent bitmap.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d354c7ec
  13. 25 10月, 2012 1 次提交
    • H
      uhci: Verify queue has not been changed by guest · 66a08cbe
      Hans de Goede 提交于
      According to the spec a guest can unlink a qh, and then as soon as frindex
      has changed by 1 since the unlink, assume it is idle and re-use it. However
      for various reasons, we cannot simply consider a qh as unlinked if we've not
      seen it for 1 frame. This means that it is possible for a guest to re-use /
      restart the queue while we still see its old state. This patch adds a safety
      check for this, and "early" retires queues when they were changed by the guest.
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
      66a08cbe
  14. 24 10月, 2012 2 次提交
    • P
      mirror: introduce mirror job · 893f7eba
      Paolo Bonzini 提交于
      This patch adds the implementation of a new job that mirrors a disk to
      a new image while letting the guest continue using the old image.
      The target is treated as a "black box" and data is copied from the
      source to the target in the background.  This can be used for several
      purposes, including storage migration, continuous replication, and
      observation of the guest I/O in an external program.  It is also a
      first step in replacing the inefficient block migration code that is
      part of QEMU.
      
      The job is possibly never-ending, but it is logically structured into
      two phases: 1) copy all data as fast as possible until the target
      first gets in sync with the source; 2) keep target in sync and
      ensure that reopening to the target gets a correct (full) copy
      of the source data.
      
      The second phase is indicated by the progress in "info block-jobs"
      reporting the current offset to be equal to the length of the file.
      When the job is cancelled in the second phase, QEMU will run the
      job until the source is clean and quiescent, then it will report
      successful completion of the job.
      
      In other words, the BLOCK_JOB_CANCELLED event means that the target
      may _not_ be consistent with a past state of the source; the
      BLOCK_JOB_COMPLETED event means that the target is consistent with
      a past state of the source.  (Note that it could already happen
      that management lost the race against QEMU and got a completion
      event instead of cancellation).
      
      It is not yet possible to complete the job and switch over to the target
      disk.  The next patches will fix this and add many refinements to the
      basic idea introduced here.  These include improved error management,
      some tunable knobs and performance optimizations.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      893f7eba
    • P
      block: add block-job-complete · aeae883b
      Paolo Bonzini 提交于
      While streaming can be dropped as soon as it progressed through the whole
      image, mirroring needs to be completed manually for two reasons: 1) so that
      management knows exactly when the VM switches to the target; 2) because
      for other use cases such as replication, we may leave the operation running
      for the whole life of the virtual machine.
      
      Add a new block job command that manually completes background operations.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      aeae883b
  15. 18 10月, 2012 1 次提交
  16. 29 9月, 2012 3 次提交
  17. 13 9月, 2012 3 次提交
  18. 12 9月, 2012 1 次提交
    • G
      ehci: switch to new-style memory ops · 3e4f910c
      Gerd Hoffmann 提交于
      Also register different memory regions for capabilities,
      operational registers and port status registers.  Create
      separate tracepoints for operational regs and port status
      regs.  Ditch a bunch of sanity checks because the memory
      core will do this for us now.
      
      Offloading the byte, word and dword access handling to the
      memory core also has the side effect of fixing ehci register
      access on bigendian hosts.
      
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
      3e4f910c
  19. 11 9月, 2012 8 次提交
  20. 05 9月, 2012 1 次提交
    • A
      qxl: add QXL_IO_MONITORS_CONFIG_ASYNC · 020af1c4
      Alon Levy 提交于
      Revision bumped to 4 for new IO support, enabled for spice-server >=
      0.11.1. New io enabled if revision is 4. Revision can be set to 4.
      
      [ kraxel: 3 continues to be the default revision.  Once we have a new
                stable spice-server release and the qemu patches to enable
                the new bits merged we'll go flip the switch and make rev4
                the default ]
      
      This io calls the corresponding new spice api
      spice_qxl_monitors_config_async to let spice-server read a new guest set
      monitors config and notify the client.
      
      On migration reissue spice_qxl_monitors_config_async.
      
      RHBZ: 770842
      Signed-off-by: NAlon Levy <alevy@redhat.com>
      Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
      
      fixup
      Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
      020af1c4