1. 18 8月, 2016 1 次提交
    • E
      block: fix deadlock in bdrv_co_flush · ce83ee57
      Evgeny Yakovlev 提交于
      The following commit
          commit 3ff2f67a
          Author: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
          Date:   Mon Jul 18 22:39:52 2016 +0300
          block: ignore flush requests when storage is clean
      has introduced a regression.
      
      There is a problem that it is still possible for 2 requests to execute
      in non sequential fashion and sometimes this results in a deadlock
      when bdrv_drain_one/all are called for BDS with such stalled requests.
      
      1. Current flushed_gen and flush_started_gen is 1.
      2. Request 1 enters bdrv_co_flush to with write_gen 1 (i.e. the same
         as flushed_gen). It gets past flushed_gen != flush_started_gen and
         sets flush_started_gen to 1 (again, the same it was before).
      3. Request 1 yields somewhere before exiting bdrv_co_flush
      4. Request 2 enters bdrv_co_flush with write_gen 2. It gets past
         flushed_gen != flush_started_gen and sets flush_started_gen to 2.
      5. Request 2 runs to completion and sets flushed_gen to 2
      6. Request 1 is resumed, runs to completion and sets flushed_gen to 1.
         However flush_started_gen is now 2.
      
      From here on out flushed_gen is always != to flush_started_gen and all
      further requests will wait on flush_queue. This change replaces
      flush_started_gen with an explicitly tracked active flush request.
      Signed-off-by: NEvgeny Yakovlev <eyakovlev@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      Message-id: 1471457214-3994-2-git-send-email-den@openvz.org
      CC: Stefan Hajnoczi <stefanha@redhat.com>
      CC: Fam Zheng <famz@redhat.com>
      CC: Kevin Wolf <kwolf@redhat.com>
      CC: Max Reitz <mreitz@redhat.com>
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      ce83ee57
  2. 04 8月, 2016 1 次提交
    • E
      block: Cater to iscsi with non-power-of-2 discard · b8d0a980
      Eric Blake 提交于
      Dell Equallogic iSCSI SANs have a very unusual advertised geometry:
      
      $ iscsi-inq -e 1 -c $((0xb0)) iscsi://XXX/0
      wsnz:0
      maximum compare and write length:1
      optimal transfer length granularity:0
      maximum transfer length:0
      optimal transfer length:0
      maximum prefetch xdread xdwrite transfer length:0
      maximum unmap lba count:30720
      maximum unmap block descriptor count:2
      optimal unmap granularity:30720
      ugavalid:1
      unmap granularity alignment:0
      maximum write same length:30720
      
      which says that both the maximum and the optimal discard size
      is 15M.  It is not immediately apparent if the device allows
      discard requests not aligned to the optimal size, nor if it
      allows discards at a finer granularity than the optimal size.
      
      I tried to find details in the SCSI Commands Reference Manual
      Rev. A on what valid values of maximum and optimal sizes are
      permitted, but while that document mentions a "Block Limits
      VPD Page", I couldn't actually find documentation of that page
      or what values it would have, or if a SCSI device has an
      advertisement of its minimal unmap granularity.  So it is not
      obvious to me whether the Dell Equallogic device is compliance
      with the SCSI specification.
      
      Fortunately, it is easy enough to support non-power-of-2 sizing,
      even if it means we are less efficient than truly possible when
      targetting that device (for example, it means that we refuse to
      unmap anything that is not a multiple of 15M and aligned to a
      15M boundary, even if the device truly does support a smaller
      granularity where unmapping actually works).
      Reported-by: NPeter Lieven <pl@kamp.de>
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-Id: <1469129688-22848-5-git-send-email-eblake@redhat.com>
      Acked-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b8d0a980
  3. 20 7月, 2016 9 次提交
    • E
      block: Kill .bdrv_co_discard() · 02aefe43
      Eric Blake 提交于
      Now that all drivers have a byte-based .bdrv_co_pdiscard(), we
      no longer need to worry about the sector-based version.  We can
      also relax our minimum alignment to 1 for drivers that support it.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Message-id: 1468624988-423-18-git-send-email-eblake@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      02aefe43
    • E
      block: Add .bdrv_co_pdiscard() driver callback · 47a5486d
      Eric Blake 提交于
      There's enough drivers with a sector-based callback that it will
      be easier to switch one at a time.  This patch adds a byte-based
      callback, and then after all drivers are swapped, we'll drop the
      sector-based callback.
      
      [checkpatch doesn't like the space after coroutine_fn in
      block_int.h, but it's consistent with the rest of the file]
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Message-id: 1468624988-423-10-git-send-email-eblake@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      47a5486d
    • E
      block: Convert .bdrv_aio_discard() to byte-based · 4da444a0
      Eric Blake 提交于
      Another step towards byte-based interfaces everywhere.  Replace
      the sector-based driver callback .bdrv_aio_discard() with a new
      byte-based .bdrv_aio_pdiscard().  Only raw-posix and RBD drivers
      are affected, so it was not worth splitting into multiple patches.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Message-id: 1468624988-423-9-git-send-email-eblake@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      4da444a0
    • E
      block: Convert bdrv_aio_discard() to byte-based · 60ebac16
      Eric Blake 提交于
      Another step towards byte-based interfaces everywhere.  Replace
      the sector-based bdrv_aio_discard() with a new byte-based
      bdrv_aio_pdiscard(), which silently ignores any unaligned head
      or tail.  Driver callbacks will be converted in followup patches.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-id: 1468624988-423-5-git-send-email-eblake@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      60ebac16
    • E
      block: Switch BlockRequest to byte-based · b15404e0
      Eric Blake 提交于
      BlockRequest is the internal struct used by bdrv_aio_*.  At the
      moment, all such calls were sector-based, but we will eventually
      convert to byte-based; start by changing the internal variables
      to be byte-based.  No change to behavior, although the read and
      write code can now go byte-based through more of the stack.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-id: 1468624988-423-4-git-send-email-eblake@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      b15404e0
    • E
      block: Convert bdrv_discard() to byte-based · 0c51a893
      Eric Blake 提交于
      Another step towards byte-based interfaces everywhere.  Replace
      the sector-based bdrv_discard() with a new byte-based
      bdrv_pdiscard(), which silently ignores any unaligned head
      or tail.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Message-id: 1468624988-423-3-git-send-email-eblake@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      0c51a893
    • E
      block: Convert bdrv_co_discard() to byte-based · 9f1963b3
      Eric Blake 提交于
      Another step towards byte-based interfaces everywhere.  Replace
      the sector-based bdrv_co_discard() with a new byte-based
      bdrv_co_pdiscard(), which silently ignores any unaligned head
      or tail.  Driver callbacks will be converted in followup patches.
      
      By calculating the alignment outside of the loop, and clamping
      the max discard to an aligned value, we can simplify the actions
      done within the loop.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Message-id: 1468624988-423-2-git-send-email-eblake@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      9f1963b3
    • E
      block: Fragment writes to max transfer length · 04ed95f4
      Eric Blake 提交于
      Drivers should be able to rely on the block layer honoring the
      max transfer length, rather than needing to return -EINVAL
      (iscsi) or manually fragment things (nbd).  We already fragment
      write zeroes at the block layer; this patch adds the fragmentation
      for normal writes, after requests have been aligned (fragmenting
      before alignment would lead to multiple unaligned requests, rather
      than just the head and tail).
      
      When fragmenting a large request where FUA was requested, but
      where we know that FUA is implemented by flushing all requests
      rather than the given request, then we can still get by with
      only one flush.  Note, however, that we need a followup patch
      to the raw format driver to avoid a regression in the number of
      flushes actually issued.
      
      The return value was previously nebulous on success (sometimes
      zero, sometimes the length written); since we never have a short
      write, and since fragmenting may store yet another positive
      value in 'ret', change the function to always return 0 on success,
      matching what we do in bdrv_aligned_preadv().
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-id: 1468607524-19021-4-git-send-email-eblake@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      04ed95f4
    • E
      block: Fragment reads to max transfer length · 1a62d0ac
      Eric Blake 提交于
      Drivers should be able to rely on the block layer honoring the
      max transfer length, rather than needing to return -EINVAL
      (iscsi) or manually fragment things (nbd).  This patch adds
      the fragmentation in the block layer, after requests have been
      aligned (fragmenting before alignment would lead to multiple
      unaligned requests, rather than just the head and tail).
      
      The return value was previously nebulous on success on whether
      it was zero or the length read; and fragmenting may introduce
      yet other non-zero values if we use the last length read.  But
      as at least some callers are sloppy and expect only zero on
      success, it is easiest to just guarantee 0.
      
      [Fix uninitialized ret local variable in bdrv_aligned_preadv().
      --Stefan]
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-id: 1468607524-19021-2-git-send-email-eblake@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      1a62d0ac
  4. 19 7月, 2016 1 次提交
    • E
      block: ignore flush requests when storage is clean · 3ff2f67a
      Evgeny Yakovlev 提交于
      Some guests (win2008 server for example) do a lot of unnecessary
      flushing when underlying media has not changed. This adds additional
      overhead on host when calling fsync/fdatasync.
      
      This change introduces a write generation scheme in BlockDriverState.
      Current write generation is checked against last flushed generation to
      avoid unnessesary flushes.
      
      The problem with excessive flushing was found by a performance test
      which does parallel directory tree creation (from 2 processes).
      Results improved from 0.424 loops/sec to 0.432 loops/sec.
      Each loop creates 10^3 directories with 10 files in each.
      
      This affected some blkdebug testcases that were expecting error logs from
      failure-injected flushes which are now skipped entirely
      (tests 026 071 089).
      
      This also affects the performance of block jobs and thus BLOCK_JOB_READY
      events for driver-mirror and active block-commit commands now arrives
      faster, before QMP send successfully returns to caller (tests 141 144).
      Signed-off-by: NEvgeny Yakovlev <eyakovlev@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-id: 1468870792-7411-5-git-send-email-den@openvz.org
      CC: Kevin Wolf <kwolf@redhat.com>
      CC: Max Reitz <mreitz@redhat.com>
      CC: Stefan Hajnoczi <stefanha@redhat.com>
      CC: Fam Zheng <famz@redhat.com>
      CC: John Snow <jsnow@redhat.com>
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      3ff2f67a
  5. 13 7月, 2016 1 次提交
    • P
      coroutine: move entry argument to qemu_coroutine_create · 0b8b8753
      Paolo Bonzini 提交于
      In practice the entry argument is always known at creation time, and
      it is confusing that sometimes qemu_coroutine_enter is used with a
      non-NULL argument to re-enter a coroutine (this happens in
      block/sheepdog.c and tests/test-coroutine.c).  So pass the opaque value
      at creation time, for consistency with e.g. aio_bh_new.
      
      Mostly done with the following semantic patch:
      
      @ entry1 @
      expression entry, arg, co;
      @@
      - co = qemu_coroutine_create(entry);
      + co = qemu_coroutine_create(entry, arg);
        ...
      - qemu_coroutine_enter(co, arg);
      + qemu_coroutine_enter(co);
      
      @ entry2 @
      expression entry, arg;
      identifier co;
      @@
      - Coroutine *co = qemu_coroutine_create(entry);
      + Coroutine *co = qemu_coroutine_create(entry, arg);
        ...
      - qemu_coroutine_enter(co, arg);
      + qemu_coroutine_enter(co);
      
      @ entry3 @
      expression entry, arg;
      @@
      - qemu_coroutine_enter(qemu_coroutine_create(entry), arg);
      + qemu_coroutine_enter(qemu_coroutine_create(entry, arg));
      
      @ reentry @
      expression co;
      @@
      - qemu_coroutine_enter(co, NULL);
      + qemu_coroutine_enter(co);
      
      except for the aforementioned few places where the semantic patch
      stumbled (as expected) and for test_co_queue, which would otherwise
      produce an uninitialized variable warning.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      0b8b8753
  6. 05 7月, 2016 20 次提交
  7. 20 6月, 2016 3 次提交
  8. 16 6月, 2016 4 次提交