1. 14 8月, 2012 1 次提交
  2. 07 8月, 2012 1 次提交
    • S
      qcow2: implement lazy refcounts · bfe8043e
      Stefan Hajnoczi 提交于
      Lazy refcounts is a performance optimization for qcow2 that postpones
      refcount metadata updates and instead marks the image dirty.  In the
      case of crash or power failure the image will be left in a dirty state
      and repaired next time it is opened.
      
      Reducing metadata I/O is important for cache=writethrough and
      cache=directsync because these modes guarantee that data is on disk
      after each write (hence we cannot take advantage of caching updates in
      RAM).  Refcount metadata is not needed for guest->file block address
      translation and therefore does not need to be on-disk at the time of
      write completion - this is the motivation behind the lazy refcount
      optimization.
      
      The lazy refcount optimization must be enabled at image creation time:
      
        qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G
        qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough
      
      Update qemu-iotests 031 and 036 since the extension header size changes
      when we add feature bit table entries.
      Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      bfe8043e
  3. 17 7月, 2012 1 次提交
    • M
      block: Geometry and translation hints are now useless, purge them · 2b584959
      Markus Armbruster 提交于
      There are two producers of these hints: drive_init() on behalf of
      -drive, and hd_geometry_guess().
      
      The only consumer of the hint is hd_geometry_guess().
      
      The callers of hd_geometry_guess() call it only when drive_init()
      didn't set the hints.  Therefore, drive_init()'s hints are never used.
      
      Thus, hd_geometry_guess() only ever sees hints it produced itself in a
      prior call.  Only the first call computes something, subsequent calls
      just repeat the first call's results.  However, hd_geometry_guess() is
      never called more than once: the device models don't, and the block
      device is destroyed on unplug.  Thus, dropping the repeat feature
      doesn't break anything now.
      
      If a block device wasn't destroyed on unplug and could be reused with
      a new device, then repeating old results would be wrong.  Thus,
      dropping the repeat feature prevents future breakage.
      
      This renders the hints unused.  Purge them from the block layer.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      2b584959
  4. 15 6月, 2012 1 次提交
    • K
      qemu-img check -r for repairing images · 4534ff54
      Kevin Wolf 提交于
      The QED block driver already provides the functionality to not only
      detect inconsistencies in images, but also fix them. However, this
      functionality cannot be manually invoked with qemu-img, but the
      check happens only automatically during bdrv_open().
      
      This adds a -r switch to qemu-img check that allows manual invocation
      of an image repair.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      4534ff54
  5. 30 5月, 2012 2 次提交
    • J
      block: prevent snapshot mode $TMPDIR symlink attack · c2d76497
      Jim Meyering 提交于
      In snapshot mode, bdrv_open creates an empty temporary file without
      checking for mkstemp or close failure, and ignoring the possibility
      of a buffer overrun given a surprisingly long $TMPDIR.
      Change the get_tmp_filename function to return int (not void),
      so that it can inform its two callers of those failures.
      Also avoid the risk of buffer overrun and do not ignore mkstemp
      or close failure.
      Update both callers (in block.c and vvfat.c) to propagate
      temp-file-creation failure to their callers.
      
      get_tmp_filename creates and closes an empty file, while its
      callers later open that presumed-existing file with O_CREAT.
      The problem was that a malicious user could provoke mkstemp failure
      and race to create a symlink with the selected temporary file name,
      thus causing the qemu process (usually root owned) to open through
      the symlink, overwriting an attacker-chosen file.
      
      This addresses CVE-2012-2652.
      http://bugzilla.redhat.com/CVE-2012-2652Signed-off-by: NJim Meyering <meyering@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      c2d76497
    • J
      block: prevent snapshot mode $TMPDIR symlink attack · eba25057
      Jim Meyering 提交于
      In snapshot mode, bdrv_open creates an empty temporary file without
      checking for mkstemp or close failure, and ignoring the possibility
      of a buffer overrun given a surprisingly long $TMPDIR.
      Change the get_tmp_filename function to return int (not void),
      so that it can inform its two callers of those failures.
      Also avoid the risk of buffer overrun and do not ignore mkstemp
      or close failure.
      Update both callers (in block.c and vvfat.c) to propagate
      temp-file-creation failure to their callers.
      
      get_tmp_filename creates and closes an empty file, while its
      callers later open that presumed-existing file with O_CREAT.
      The problem was that a malicious user could provoke mkstemp failure
      and race to create a symlink with the selected temporary file name,
      thus causing the qemu process (usually root owned) to open through
      the symlink, overwriting an attacker-chosen file.
      
      This addresses CVE-2012-2652.
      http://bugzilla.redhat.com/CVE-2012-2652Reviewed-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Signed-off-by: NJim Meyering <meyering@redhat.com>
      Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com>
      eba25057
  6. 10 5月, 2012 3 次提交
    • P
      block: wait for job callback in block_job_cancel_sync · fa4478d5
      Paolo Bonzini 提交于
      The limitation on not having I/O after cancellation cannot really be
      kept.  Even streaming has a very small race window where you could
      cancel a job and have it report completion.  If this window is hit,
      bdrv_change_backing_file() will yield and possibly cause accesses to
      dangling pointers etc.
      
      So, let's just assume that we cannot know exactly what will happen
      after the coroutine has set busy to false.  We can set a very lax
      condition:
      
      - if we cancel the job, the coroutine won't set it to false again
      (and hence will not call co_sleep_ns again).
      
      - block_job_cancel_sync will wait for the coroutine to exit, which
      pretty much ensures no race.
      
      Instead, we track the coroutine that executes the job and put very
      strict conditions on what to do while it is quiescent (busy = false).
      First of all, the coroutine must never set busy = false while the job
      has been cancelled.  Second, the coroutine can be reentered arbitrarily
      while it is quiescent, so you cannot really do anything but co_sleep_ns at
      that time.  This condition is obeyed by the block_job_sleep_ns function.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      fa4478d5
    • P
      block: add block_job_sleep_ns · 4513eafe
      Paolo Bonzini 提交于
      This function abstracts the pretty complex semantics of the "busy"
      member of BlockJob.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      4513eafe
    • P
      block: fix snapshot on QED · e023b2e2
      Paolo Bonzini 提交于
      QED's opaque data includes a pointer back to the BlockDriverState.
      This breaks when bdrv_append shuffles data between bs_new and bs_top.
      To avoid this, add a "rebind" function that tells the driver about
      the new relationship between the BlockDriverState and its opaque.
      
      The patch also adds rebind to VVFAT for completeness, even though
      it is not used with live snapshots.
      Reviewed-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      e023b2e2
  7. 27 4月, 2012 4 次提交
  8. 20 4月, 2012 1 次提交
    • K
      qcow2: Version 3 images · 6744cbab
      Kevin Wolf 提交于
      This adds the basic infrastructure to qcow2 to handle version 3 images.
      It includes code to create v3 images, allow header updates for v3 images
      and checks feature bits.
      
      It still misses support for zero clusters, so this is not a fully
      compliant implementation of v3 yet.
      
      The default for creating new images stays at v2 for now.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      6744cbab
  9. 05 4月, 2012 3 次提交
  10. 29 2月, 2012 3 次提交
  11. 23 2月, 2012 1 次提交
  12. 09 2月, 2012 1 次提交
    • S
      block: add .bdrv_co_write_zeroes() interface · f08f2dda
      Stefan Hajnoczi 提交于
      The ability to zero regions of an image file is a useful primitive for
      higher-level features such as image streaming or zero write detection.
      
      Image formats may support an optimized metadata representation instead
      of writing zeroes into the image file.  This allows zero writes to be
      potentially faster than regular write operations and also preserve
      sparseness of the image file.
      
      The .bdrv_co_write_zeroes() interface should be implemented by block
      drivers that wish to provide efficient zeroing.
      
      Note that this operation is different from the discard operation, which
      may leave the contents of the region indeterminate.  That means
      discarded blocks are not guaranteed to contain zeroes and may contain
      junk data instead.
      Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      f08f2dda
  13. 26 1月, 2012 4 次提交
  14. 05 12月, 2011 6 次提交
  15. 22 11月, 2011 1 次提交
    • A
      block: allow migration to work with image files (v3) · 0f15423c
      Anthony Liguori 提交于
      Image files have two types of data: immutable data that describes things like
      image size, backing files, etc. and mutable data that includes offset and
      reference count tables.
      
      Today, image formats aggressively cache mutable data to improve performance.  In
      some cases, this happens before a guest even starts.  When dealing with live
      migration, since a file is open on two machines, the caching of meta data can
      lead to data corruption.
      
      This patch addresses this by introducing a mechanism to invalidate any cached
      mutable data a block driver may have which is then used by the live migration
      code.
      
      NB, this still requires coherent shared storage.  Addressing migration without
      coherent shared storage (i.e. NFS) requires additional work.
      Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com>
      0f15423c
  16. 11 11月, 2011 2 次提交
    • K
      block: Introduce bdrv_co_flush_to_os · eb489bb1
      Kevin Wolf 提交于
      qcow2 has a writeback metadata cache, so flushing a qcow2 image actually
      consists of writing back that cache to the protocol and only then flushes the
      protocol in order to get everything stable on disk.
      
      This introduces a separate bdrv_co_flush_to_os to reflect the split.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      eb489bb1
    • K
      block: Rename bdrv_co_flush to bdrv_co_flush_to_disk · c68b89ac
      Kevin Wolf 提交于
      There are two different types of flush that you can do: Flushing one level up
      to the OS (i.e. writing data to the host page cache) or flushing it all the way
      down to the disk. The existing functions flush to the disk, reflect this in the
      function name.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      c68b89ac
  17. 27 10月, 2011 2 次提交
  18. 21 10月, 2011 3 次提交