1. 28 7月, 2020 1 次提交
  2. 17 7月, 2020 1 次提交
  3. 23 6月, 2020 1 次提交
    • S
      block/nvme: support nested aio_poll() · 7838c67f
      Stefan Hajnoczi 提交于
      QEMU block drivers are supposed to support aio_poll() from I/O
      completion callback functions. This means completion processing must be
      re-entrant.
      
      The standard approach is to schedule a BH during completion processing
      and cancel it at the end of processing. If aio_poll() is invoked by a
      callback function then the BH will run. The BH continues the suspended
      completion processing.
      
      All of this means that request A's cb() can synchronously wait for
      request B to complete. Previously the nvme block driver would hang
      because it didn't process completions from nested aio_poll().
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: NSergio Lopez <slp@redhat.com>
      Message-id: 20200617132201.1832152-8-stefanha@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      7838c67f
  4. 11 3月, 2020 1 次提交
  5. 31 1月, 2020 1 次提交
  6. 28 10月, 2019 3 次提交
  7. 10 10月, 2019 3 次提交
  8. 17 8月, 2019 1 次提交
    • J
      block/backup: teach TOP to never copy unallocated regions · 7e30dd61
      John Snow 提交于
      Presently, If sync=TOP is selected, we mark the entire bitmap as dirty.
      In the write notifier handler, we dutifully copy out such regions.
      
      Fix this in three parts:
      
      1. Mark the bitmap as being initialized before the first yield.
      2. After the first yield but before the backup loop, interrogate the
      allocation status asynchronously and initialize the bitmap.
      3. Teach the write notifier to interrogate allocation status if it is
      invoked during bitmap initialization.
      
      As an effect of this patch, the job progress for TOP backups
      now behaves like this:
      
      - total progress starts at bdrv_length.
      - As allocation status is interrogated, total progress decreases.
      - As blocks are copied, current progress increases.
      
      Taken together, the floor and ceiling move to meet each other.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Message-id: 20190716000117.25219-10-jsnow@redhat.com
      [Remove ret = -ECANCELED change. --js]
      [Squash in conflict resolution based on Max's patch --js]
      Message-id: c8b0ab36-79c8-0b4b-3193-4e12ed8c848b@redhat.com
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      7e30dd61
  9. 24 6月, 2019 1 次提交
    • P
      ssh: switch from libssh2 to libssh · b10d49d7
      Pino Toscano 提交于
      Rewrite the implementation of the ssh block driver to use libssh instead
      of libssh2.  The libssh library has various advantages over libssh2:
      - easier API for authentication (for example for using ssh-agent)
      - easier API for known_hosts handling
      - supports newer types of keys in known_hosts
      
      Use APIs/features available in libssh 0.8 conditionally, to support
      older versions (which are not recommended though).
      
      Adjust the iotest 207 according to the different error message, and to
      find the default key type for localhost (to properly compare the
      fingerprint with).
      Contributed-by: NMax Reitz <mreitz@redhat.com>
      
      Adjust the various Docker/Travis scripts to use libssh when available
      instead of libssh2. The mingw/mxe testing is dropped for now, as there
      are no packages for it.
      Signed-off-by: NPino Toscano <ptoscano@redhat.com>
      Tested-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
      Acked-by: NAlex Bennée <alex.bennee@linaro.org>
      Message-id: 20190620200840.17655-1-ptoscano@redhat.com
      Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
      Message-id: 5873173.t2JhDm7DL7@lindworm.usersys.redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      b10d49d7
  10. 18 6月, 2019 1 次提交
    • V
      block: drop bs->job · b23c580c
      Vladimir Sementsov-Ogievskiy 提交于
      Drop remaining users of bs->job:
      1. assertions actually duplicated by assert(!bs->refcnt)
      2. trace-point seems not enough reason to change stream_start to return
         BlockJob pointer
      3. Restricting creation of two jobs based on same bs is bad idea, as
         3.1 Some jobs creates filters to be their main node, so, this check
         don't actually prevent creating second job on same real node (which
         will create another filter node) (but I hope it is restricted by
         other mechanisms)
         3.2 Even without bs->job we have two systems of permissions:
         op-blockers and BLK_PERM
         3.3 We may want to run several jobs on one node one day
      
      And finally, drop bs->job pointer itself. Hurrah!
      Suggested-by: NKevin Wolf <kwolf@redhat.com>
      Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      b23c580c
  11. 13 6月, 2019 2 次提交
  12. 04 6月, 2019 1 次提交
  13. 29 5月, 2019 1 次提交
  14. 18 4月, 2019 1 次提交
    • M
      block/ssh: Do not report read/write/flush errors to the user · 6b3048ce
      Markus Armbruster 提交于
      Callbacks ssh_co_readv(), ssh_co_writev(), ssh_co_flush() report
      errors to the user with error_printf().  They shouldn't, it's their
      caller's job.  Replace by a suitable trace point.  While there, drop
      the unreachable !s->sftp case.
      
      Perhaps we should convert this part of the block driver interface to
      Error, so block drivers can pass more detail to their callers.  Not
      today.
      
      Cc: "Richard W.M. Jones" <rjones@redhat.com>
      Cc: Kevin Wolf <kwolf@redhat.com>
      Cc: Max Reitz <mreitz@redhat.com>
      Cc: qemu-block@nongnu.org
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20190417190641.26814-3-armbru@redhat.com>
      6b3048ce
  15. 01 4月, 2019 1 次提交
    • E
      nbd/client: Trace server noncompliance on structured reads · 75d34eb9
      Eric Blake 提交于
      Just as we recently added a trace for a server sending block status
      that doesn't match the server's advertised minimum block alignment,
      let's do the same for read chunks.  But since qemu 3.1 is such a
      server (because it advertised 512-byte alignment, but when serving a
      file that ends in data but is not sector-aligned, NBD_CMD_READ would
      detect a mid-sector change between data and hole at EOF and the
      resulting read chunks are unaligned), we don't want to change our
      behavior of otherwise tolerating unaligned reads.
      
      Note that even though we fixed the server for 4.0 to advertise an
      actual block alignment (which gets rid of the unaligned reads at EOF
      for posix files), we can still trigger it via other means:
      
      $ qemu-nbd --image-opts driver=blkdebug,align=512,image.driver=file,image.filename=/path/to/non-aligned-file
      
      Arguably, that is a bug in the blkdebug block status function, for
      leaking a block status that is not aligned. It may also be possible to
      observe issues with a backing layer with smaller alignment than the
      active layer, although so far I have been unable to write a reliable
      iotest for that scenario.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20190330165349.32256-1-eblake@redhat.com>
      Reviewed-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      75d34eb9
  16. 30 3月, 2019 1 次提交
    • E
      nbd: Tolerate some server non-compliance in NBD_CMD_BLOCK_STATUS · a39286dd
      Eric Blake 提交于
      The NBD spec states that NBD_CMD_FLAG_REQ_ONE (which we currently
      always use) should not reply with an extent larger than our request,
      and that the server's response should be exactly one extent. Right
      now, that means that if a server sends more than one extent, we treat
      the server as broken, fail the block status request, and disconnect,
      which prevents all further use of the block device. But while good
      software should be strict in what it sends, it should be tolerant in
      what it receives.
      
      While trying to implement NBD_CMD_BLOCK_STATUS in nbdkit, we
      temporarily had a non-compliant server sending too many extents in
      spite of REQ_ONE. Oddly enough, 'qemu-img convert' with qemu 3.1
      failed with a somewhat useful message:
        qemu-img: Protocol error: invalid payload for NBD_REPLY_TYPE_BLOCK_STATUS
      
      which then disappeared with commit d8b4bad8, on the grounds that an
      error message flagged only at the time of coroutine teardown is
      pointless, and instead we should rely on the actual failed API to
      report an error - in other words, the 3.1 behavior was masking the
      fact that qemu-img was not reporting an error. That has since been
      fixed in the previous patch, where qemu-img convert now fails with:
        qemu-img: error while reading block status of sector 0: Invalid argument
      
      But even that is harsh.  Since we already partially relaxed things in
      commit acfd8f7a to tolerate a server that exceeds the cap (although
      that change was made prior to the NBD spec actually putting a cap on
      the extent length during REQ_ONE - in fact, the NBD spec change was
      BECAUSE of the qemu behavior prior to that commit), it's not that much
      harder to argue that we should also tolerate a server that sends too
      many extents.  But at the same time, it's nice to trace when we are
      being tolerant of server non-compliance, in order to help server
      writers fix their implementations to be more portable (if they refer
      to our traces, rather than just stderr).
      Reported-by: NRichard W.M. Jones <rjones@redhat.com>
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20190323212639.579-3-eblake@redhat.com>
      Reviewed-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      a39286dd
  17. 23 3月, 2019 2 次提交
  18. 31 1月, 2019 4 次提交
  19. 05 1月, 2019 1 次提交
    • V
      block/nbd-client: use traces instead of noisy error_report_err · d8b4bad8
      Vladimir Sementsov-Ogievskiy 提交于
      Reduce extra noise of nbd-client, change 083 correspondingly.
      
      In various commits (be41c100 in 2.10, f140e300 in 2.11, 78a33ab5
      in 2.12), we added spots where qemu as an NBD client would report
      problems communicating with the server to stderr, because there
      was no where else to send the error to.  However, this is racy,
      particularly since the most common source of these errors is when
      either the client or the server abruptly hangs up, leaving one
      coroutine to report the error only if it wins (or loses) the
      race in attempting the read from the server before another
      thread completes its cleanup of a protocol error that caused the
      disconnect in the first place.  The race is also apparent in the
      fact that differences in the flush behavior of the server can
      alter the frequency of encountering the race in the client (see
      commit 6d39db96).
      
      Rather than polluting stderr, it's better to just trace these
      situations, for use by developers debugging a flaky connection,
      particularly since the real error that either triggers the abrupt
      disconnection in the first place, or that results from the EIO
      when a request can't receive a reply, DOES make it back to the
      user in the normal Error propagation channels.
      Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-Id: <20181102151152.288399-4-vsementsov@virtuozzo.com>
      [eblake: drop depedence on error hint, enhance commit message]
      Signed-off-by: NEric Blake <eblake@redhat.com>
      d8b4bad8
  20. 10 7月, 2018 2 次提交
  21. 03 7月, 2018 1 次提交
    • F
      backup: Use copy offloading · 9ded4a01
      Fam Zheng 提交于
      The implementation is similar to the 'qemu-img convert'. In the
      beginning of the job, offloaded copy is attempted. If it fails, further
      I/O will go through the existing bounce buffer code path.
      
      Then, as Kevin pointed out, both this and qemu-img convert can benefit
      from a local check if one request fails because of, for example, the
      offset is beyond EOF, but another may well be accepted by the protocol
      layer. This will be implemented separately.
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NFam Zheng <famz@redhat.com>
      Message-id: 20180703023758.14422-4-famz@redhat.com
      Signed-off-by: NJeff Cody <jcody@redhat.com>
      9ded4a01
  22. 23 5月, 2018 2 次提交
  23. 19 3月, 2018 5 次提交
    • J
      blockjobs: add block-job-finalize · 11b61fbc
      John Snow 提交于
      Instead of automatically transitioning from PENDING to CONCLUDED, gate
      the .prepare() and .commit() phases behind an explicit acknowledgement
      provided by the QMP monitor if auto_finalize = false has been requested.
      
      This allows us to perform graph changes in prepare and/or commit so that
      graph changes do not occur autonomously without knowledge of the
      controlling management layer.
      
      Transactions that have reached the "PENDING" state together can all be
      moved to invoke their finalization methods by issuing block_job_finalize
      to any one job in the transaction.
      
      Jobs in a transaction with mixed job->auto_finalize settings will all
      remain stuck in the "PENDING" state, as if the entire transaction was
      specified with auto_finalize = false. Jobs that specified
      auto_finalize = true, however, will still not emit the PENDING event.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      11b61fbc
    • J
      blockjobs: ensure abort is called for cancelled jobs · 35d6b368
      John Snow 提交于
      Presently, even if a job is canceled post-completion as a result of
      a failing peer in a transaction, it will still call .commit because
      nothing has updated or changed its return code.
      
      The reason why this does not cause problems currently is because
      backup's implementation of .commit checks for cancellation itself.
      
      I'd like to simplify this contract:
      
      (1) Abort is called if the job/transaction fails
      (2) Commit is called if the job/transaction succeeds
      
      To this end: A job's return code, if 0, will be forcibly set as
      -ECANCELED if that job has already concluded. Remove the now
      redundant check in the backup job implementation.
      
      We need to check for cancellation in both block_job_completed
      AND block_job_completed_single, because jobs may be cancelled between
      those two calls; for instance in transactions. This also necessitates
      an ABORTING -> ABORTING transition to be allowed.
      
      The check in block_job_completed could be removed, but there's no
      point in starting to attempt to succeed a transaction that we know
      in advance will fail.
      
      This does NOT affect mirror jobs that are "canceled" during their
      synchronous phase. The mirror job itself forcibly sets the canceled
      property to false prior to ceding control, so such cases will invoke
      the "commit" callback.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      35d6b368
    • J
      blockjobs: add block_job_dismiss · 75f71059
      John Snow 提交于
      For jobs that have reached their CONCLUDED state, prior to having their
      last reference put down (meaning jobs that have completed successfully,
      unsuccessfully, or have been canceled), allow the user to dismiss the
      job's lingering status report via block-job-dismiss.
      
      This gives management APIs the chance to conclusively determine if a job
      failed or succeeded, even if the event broadcast was missed.
      
      Note: block_job_do_dismiss and block_job_decommission happen to do
      exactly the same thing, but they're called from different semantic
      contexts, so both aliases are kept to improve readability.
      
      Note 2: Don't worry about the 0x04 flag definition for AUTO_DISMISS, she
      has a friend coming in a future patch to fill the hole where 0x02 is.
      
      Verbs:
      Dismiss: operates on CONCLUDED jobs only.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      75f71059
    • J
      blockjobs: add block_job_verb permission table · 0ec4dfb8
      John Snow 提交于
      Which commands ("verbs") are appropriate for jobs in which state is
      also somewhat burdensome to keep track of.
      
      As of this commit, it looks rather useless, but begins to look more
      interesting the more states we add to the STM table.
      
      A recurring theme is that no verb will apply to an 'undefined' job.
      
      Further, it's not presently possible to restrict the "pause" or "resume"
      verbs any more than they are in this commit because of the asynchronous
      nature of how jobs enter the PAUSED state; justifications for some
      seemingly erroneous applications are given below.
      
      =====
      Verbs
      =====
      
      Cancel:    Any state except undefined.
      Pause:     Any state except undefined;
                 'created': Requests that the job pauses as it starts.
                 'running': Normal usage. (PAUSED)
                 'paused':  The job may be paused for internal reasons,
                            but the user may wish to force an indefinite
                            user-pause, so this is allowed.
                 'ready':   Normal usage. (STANDBY)
                 'standby': Same logic as above.
      Resume:    Any state except undefined;
                 'created': Will lift a user's pause-on-start request.
                 'running': Will lift a pause request before it takes effect.
                 'paused':  Normal usage.
                 'ready':   Will lift a pause request before it takes effect.
                 'standby': Normal usage.
      Set-speed: Any state except undefined, though ready may not be meaningful.
      Complete:  Only a 'ready' job may accept a complete request.
      
      =======
      Changes
      =======
      
      (1)
      
      To facilitate "nice" error checking, all five major block-job verb
      interfaces in blockjob.c now support an errp parameter:
      
      - block_job_user_cancel is added as a new interface.
      - block_job_user_pause gains an errp paramter
      - block_job_user_resume gains an errp parameter
      - block_job_set_speed already had an errp parameter.
      - block_job_complete already had an errp parameter.
      
      (2)
      
      block-job-pause and block-job-resume will no longer no-op when trying
      to pause an already paused job, or trying to resume a job that isn't
      paused. These functions will now report that they did not perform the
      action requested because it was not possible.
      
      iotests have been adjusted to address this new behavior.
      
      (3)
      
      block-job-complete doesn't worry about checking !block_job_started,
      because the permission table guards against this.
      
      (4)
      
      test-bdrv-drain's job implementation needs to announce that it is
      'ready' now, in order to be completed.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      0ec4dfb8
    • J
      blockjobs: add state transition table · c9de4050
      John Snow 提交于
      The state transition table has mostly been implied. We're about to make
      it a bit more complex, so let's make the STM explicit instead.
      
      Perform state transitions with a function that for now just asserts the
      transition is appropriate.
      
      Transitions:
      Undefined -> Created: During job initialization.
      Created   -> Running: Once the job is started.
                            Jobs cannot transition from "Created" to "Paused"
                            directly, but will instead synchronously transition
                            to running to paused immediately.
      Running   -> Paused:  Normal workflow for pauses.
      Running   -> Ready:   Normal workflow for jobs reaching their sync point.
                            (e.g. mirror)
      Ready     -> Standby: Normal workflow for pausing ready jobs.
      Paused    -> Running: Normal resume.
      Standby   -> Ready:   Resume of a Standby job.
      
      +---------+
      |UNDEFINED|
      +--+------+
         |
      +--v----+
      |CREATED|
      +--+----+
         |
      +--v----+     +------+
      |RUNNING<----->PAUSED|
      +--+----+     +------+
         |
      +--v--+       +-------+
      |READY<------->STANDBY|
      +-----+       +-------+
      
      Notably, there is no state presently defined as of this commit that
      deals with a job after the "running" or "ready" states, so this table
      will be adjusted alongside the commits that introduce those states.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      c9de4050
  24. 14 3月, 2018 1 次提交
  25. 08 2月, 2018 1 次提交
    • F
      block: Add VFIO based NVMe driver · bdd6a90a
      Fam Zheng 提交于
      This is a new protocol driver that exclusively opens a host NVMe
      controller through VFIO. It achieves better latency than linux-aio by
      completely bypassing host kernel vfs/block layer.
      
          $rw-$bs-$iodepth  linux-aio     nvme://
          ----------------------------------------
          randread-4k-1     10.5k         21.6k
          randread-512k-1   745           1591
          randwrite-4k-1    30.7k         37.0k
          randwrite-512k-1  1945          1980
      
          (unit: IOPS)
      
      The driver also integrates with the polling mechanism of iothread.
      
      This patch is co-authored by Paolo and me.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NFam Zheng <famz@redhat.com>
      Message-Id: <20180116060901.17413-4-famz@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NFam Zheng <famz@redhat.com>
      bdd6a90a