1. 24 10月, 2012 2 次提交
    • P
      mirror: add support for on-source-error/on-target-error · b952b558
      Paolo Bonzini 提交于
      Error management is important for mirroring; otherwise, an error on the
      target (even something as "innocent" as ENOSPC) requires to start again
      with a full copy.  Similar to on_read_error/on_write_error, two separate
      knobs are provided for on_source_error (reads) and on_target_error (writes).
      The default is 'report' for both.
      
      The 'ignore' policy will leave the sector dirty, so that it will be
      retried later.  Thus, it will not cause corruption.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      b952b558
    • P
      mirror: introduce mirror job · 893f7eba
      Paolo Bonzini 提交于
      This patch adds the implementation of a new job that mirrors a disk to
      a new image while letting the guest continue using the old image.
      The target is treated as a "black box" and data is copied from the
      source to the target in the background.  This can be used for several
      purposes, including storage migration, continuous replication, and
      observation of the guest I/O in an external program.  It is also a
      first step in replacing the inefficient block migration code that is
      part of QEMU.
      
      The job is possibly never-ending, but it is logically structured into
      two phases: 1) copy all data as fast as possible until the target
      first gets in sync with the source; 2) keep target in sync and
      ensure that reopening to the target gets a correct (full) copy
      of the source data.
      
      The second phase is indicated by the progress in "info block-jobs"
      reporting the current offset to be equal to the length of the file.
      When the job is cancelled in the second phase, QEMU will run the
      job until the source is clean and quiescent, then it will report
      successful completion of the job.
      
      In other words, the BLOCK_JOB_CANCELLED event means that the target
      may _not_ be consistent with a past state of the source; the
      BLOCK_JOB_COMPLETED event means that the target is consistent with
      a past state of the source.  (Note that it could already happen
      that management lost the race against QEMU and got a completion
      event instead of cancellation).
      
      It is not yet possible to complete the job and switch over to the target
      disk.  The next patches will fix this and add many refinements to the
      basic idea introduced here.  These include improved error management,
      some tunable knobs and performance optimizations.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      893f7eba
  2. 29 9月, 2012 7 次提交
  3. 24 9月, 2012 2 次提交
    • J
      block: remove keep_read_only flag from BlockDriverState struct · dc1c13d9
      Jeff Cody 提交于
      The keep_read_only flag is no longer used, in favor of the bdrv
      flag BDRV_O_ALLOW_RDWR.
      Signed-off-by: NJeff Cody <jcody@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      dc1c13d9
    • J
      block: Framework for reopening files safely · e971aa12
      Jeff Cody 提交于
      This is based on Supriya Kannery's bdrv_reopen() patch series.
      
      This provides a transactional method to reopen multiple
      images files safely.
      
      Image files are queue for reopen via bdrv_reopen_queue(), and the
      reopen occurs when bdrv_reopen_multiple() is called.  Changes are
      staged in bdrv_reopen_prepare() and in the equivalent driver level
      functions.  If any of the staged images fails a prepare, then all
      of the images left untouched, and the staged changes for each image
      abandoned.
      
      Block drivers are passed a reopen state structure, that contains:
          * BDS to reopen
          * flags for the reopen
          * opaque pointer for any driver-specific data that needs to be
            persistent from _prepare to _commit/_abort
          * reopen queue pointer, if the driver needs to queue additional
            BDS for a reopen
      Signed-off-by: NJeff Cody <jcody@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      e971aa12
  4. 14 8月, 2012 1 次提交
  5. 07 8月, 2012 1 次提交
    • S
      qcow2: implement lazy refcounts · bfe8043e
      Stefan Hajnoczi 提交于
      Lazy refcounts is a performance optimization for qcow2 that postpones
      refcount metadata updates and instead marks the image dirty.  In the
      case of crash or power failure the image will be left in a dirty state
      and repaired next time it is opened.
      
      Reducing metadata I/O is important for cache=writethrough and
      cache=directsync because these modes guarantee that data is on disk
      after each write (hence we cannot take advantage of caching updates in
      RAM).  Refcount metadata is not needed for guest->file block address
      translation and therefore does not need to be on-disk at the time of
      write completion - this is the motivation behind the lazy refcount
      optimization.
      
      The lazy refcount optimization must be enabled at image creation time:
      
        qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G
        qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough
      
      Update qemu-iotests 031 and 036 since the extension header size changes
      when we add feature bit table entries.
      Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      bfe8043e
  6. 17 7月, 2012 1 次提交
    • M
      block: Geometry and translation hints are now useless, purge them · 2b584959
      Markus Armbruster 提交于
      There are two producers of these hints: drive_init() on behalf of
      -drive, and hd_geometry_guess().
      
      The only consumer of the hint is hd_geometry_guess().
      
      The callers of hd_geometry_guess() call it only when drive_init()
      didn't set the hints.  Therefore, drive_init()'s hints are never used.
      
      Thus, hd_geometry_guess() only ever sees hints it produced itself in a
      prior call.  Only the first call computes something, subsequent calls
      just repeat the first call's results.  However, hd_geometry_guess() is
      never called more than once: the device models don't, and the block
      device is destroyed on unplug.  Thus, dropping the repeat feature
      doesn't break anything now.
      
      If a block device wasn't destroyed on unplug and could be reused with
      a new device, then repeating old results would be wrong.  Thus,
      dropping the repeat feature prevents future breakage.
      
      This renders the hints unused.  Purge them from the block layer.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      2b584959
  7. 15 6月, 2012 1 次提交
    • K
      qemu-img check -r for repairing images · 4534ff54
      Kevin Wolf 提交于
      The QED block driver already provides the functionality to not only
      detect inconsistencies in images, but also fix them. However, this
      functionality cannot be manually invoked with qemu-img, but the
      check happens only automatically during bdrv_open().
      
      This adds a -r switch to qemu-img check that allows manual invocation
      of an image repair.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      4534ff54
  8. 30 5月, 2012 2 次提交
    • J
      block: prevent snapshot mode $TMPDIR symlink attack · c2d76497
      Jim Meyering 提交于
      In snapshot mode, bdrv_open creates an empty temporary file without
      checking for mkstemp or close failure, and ignoring the possibility
      of a buffer overrun given a surprisingly long $TMPDIR.
      Change the get_tmp_filename function to return int (not void),
      so that it can inform its two callers of those failures.
      Also avoid the risk of buffer overrun and do not ignore mkstemp
      or close failure.
      Update both callers (in block.c and vvfat.c) to propagate
      temp-file-creation failure to their callers.
      
      get_tmp_filename creates and closes an empty file, while its
      callers later open that presumed-existing file with O_CREAT.
      The problem was that a malicious user could provoke mkstemp failure
      and race to create a symlink with the selected temporary file name,
      thus causing the qemu process (usually root owned) to open through
      the symlink, overwriting an attacker-chosen file.
      
      This addresses CVE-2012-2652.
      http://bugzilla.redhat.com/CVE-2012-2652Signed-off-by: NJim Meyering <meyering@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      c2d76497
    • J
      block: prevent snapshot mode $TMPDIR symlink attack · eba25057
      Jim Meyering 提交于
      In snapshot mode, bdrv_open creates an empty temporary file without
      checking for mkstemp or close failure, and ignoring the possibility
      of a buffer overrun given a surprisingly long $TMPDIR.
      Change the get_tmp_filename function to return int (not void),
      so that it can inform its two callers of those failures.
      Also avoid the risk of buffer overrun and do not ignore mkstemp
      or close failure.
      Update both callers (in block.c and vvfat.c) to propagate
      temp-file-creation failure to their callers.
      
      get_tmp_filename creates and closes an empty file, while its
      callers later open that presumed-existing file with O_CREAT.
      The problem was that a malicious user could provoke mkstemp failure
      and race to create a symlink with the selected temporary file name,
      thus causing the qemu process (usually root owned) to open through
      the symlink, overwriting an attacker-chosen file.
      
      This addresses CVE-2012-2652.
      http://bugzilla.redhat.com/CVE-2012-2652Reviewed-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Signed-off-by: NJim Meyering <meyering@redhat.com>
      Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com>
      eba25057
  9. 10 5月, 2012 3 次提交
    • P
      block: wait for job callback in block_job_cancel_sync · fa4478d5
      Paolo Bonzini 提交于
      The limitation on not having I/O after cancellation cannot really be
      kept.  Even streaming has a very small race window where you could
      cancel a job and have it report completion.  If this window is hit,
      bdrv_change_backing_file() will yield and possibly cause accesses to
      dangling pointers etc.
      
      So, let's just assume that we cannot know exactly what will happen
      after the coroutine has set busy to false.  We can set a very lax
      condition:
      
      - if we cancel the job, the coroutine won't set it to false again
      (and hence will not call co_sleep_ns again).
      
      - block_job_cancel_sync will wait for the coroutine to exit, which
      pretty much ensures no race.
      
      Instead, we track the coroutine that executes the job and put very
      strict conditions on what to do while it is quiescent (busy = false).
      First of all, the coroutine must never set busy = false while the job
      has been cancelled.  Second, the coroutine can be reentered arbitrarily
      while it is quiescent, so you cannot really do anything but co_sleep_ns at
      that time.  This condition is obeyed by the block_job_sleep_ns function.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      fa4478d5
    • P
      block: add block_job_sleep_ns · 4513eafe
      Paolo Bonzini 提交于
      This function abstracts the pretty complex semantics of the "busy"
      member of BlockJob.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      4513eafe
    • P
      block: fix snapshot on QED · e023b2e2
      Paolo Bonzini 提交于
      QED's opaque data includes a pointer back to the BlockDriverState.
      This breaks when bdrv_append shuffles data between bs_new and bs_top.
      To avoid this, add a "rebind" function that tells the driver about
      the new relationship between the BlockDriverState and its opaque.
      
      The patch also adds rebind to VVFAT for completeness, even though
      it is not used with live snapshots.
      Reviewed-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      e023b2e2
  10. 27 4月, 2012 4 次提交
  11. 20 4月, 2012 1 次提交
    • K
      qcow2: Version 3 images · 6744cbab
      Kevin Wolf 提交于
      This adds the basic infrastructure to qcow2 to handle version 3 images.
      It includes code to create v3 images, allow header updates for v3 images
      and checks feature bits.
      
      It still misses support for zero clusters, so this is not a fully
      compliant implementation of v3 yet.
      
      The default for creating new images stays at v2 for now.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      6744cbab
  12. 05 4月, 2012 3 次提交
  13. 29 2月, 2012 3 次提交
  14. 23 2月, 2012 1 次提交
  15. 09 2月, 2012 1 次提交
    • S
      block: add .bdrv_co_write_zeroes() interface · f08f2dda
      Stefan Hajnoczi 提交于
      The ability to zero regions of an image file is a useful primitive for
      higher-level features such as image streaming or zero write detection.
      
      Image formats may support an optimized metadata representation instead
      of writing zeroes into the image file.  This allows zero writes to be
      potentially faster than regular write operations and also preserve
      sparseness of the image file.
      
      The .bdrv_co_write_zeroes() interface should be implemented by block
      drivers that wish to provide efficient zeroing.
      
      Note that this operation is different from the discard operation, which
      may leave the contents of the region indeterminate.  That means
      discarded blocks are not guaranteed to contain zeroes and may contain
      junk data instead.
      Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      f08f2dda
  16. 26 1月, 2012 4 次提交
  17. 05 12月, 2011 3 次提交