提交 · dd7fdaad654d7484b66410b4b002b14644396587 · openeuler / qemu

14 7月, 2017 2 次提交

nbd: Implement NBD_INFO_BLOCK_SIZE on client · 081dd1fe

由 Eric Blake 提交于 7月 07, 2017

The upstream NBD Protocol has defined a new extension to allow
the server to advertise block sizes to the client, as well as
a way for the client to inform the server whether it intends to
obey block sizes.

When using the block layer as the client, we will obey block
sizes; but when used as 'qemu-nbd -c' to hand off to the
kernel nbd module as the client, we are still waiting for the
kernel to implement a way for us to learn if it will honor
block sizes (perhaps by an addition to sysfs, rather than an
ioctl), as well as any way to tell the kernel what additional
block sizes to obey (NBD_SET_BLKSIZE appears to be accurate
for the minimum size, but preferred and maximum sizes would
probably be new ioctl()s), so until then, we need to make our
request for block sizes conditional.

When using ioctl(NBD_SET_BLKSIZE) to hand off to the kernel,
use the minimum block size as the sector size if it is larger
than 512, which also has the nice effect of cooperating with
(non-qemu) servers that don't do read-modify-write when
exposing a block device with 4k sectors; it might also allow
us to visit a file larger than 2T on a 32-bit kernel.
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-10-eblake@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

081dd1fe

nbd: Create struct for tracking export info · 004a89fc

由 Eric Blake 提交于 7月 07, 2017

The NBD Protocol is introducing some additional information
about exports, such as minimum request size and alignment, as
well as an advertised maximum request size.  It will be easier
to feed this information back to the block layer if we gather
all the information into a struct, rather than adding yet more
pointer parameters during negotiation.
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-Id: <20170707203049.534-2-eblake@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

004a89fc

04 7月, 2017 1 次提交

nbd: fix NBD over TLS · 96d06835

由 Paolo Bonzini 提交于 6月 15, 2017

When attaching the NBD QIOChannel to an AioContext, the TLS channel should
be used, not the underlying socket channel.  This is because, trivially,
the TLS channel will be the one that we read/write to and thus the one
that will get the qio_channel_yield() call.

Fixes: ff82911c
Cc: qemu-stable@nongnu.org
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NDaniel P. Berrange <berrange@redhat.com>
Tested-by: NDaniel P. Berrange <berrange@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

96d06835

26 6月, 2017 1 次提交

block: change variable names in BlockDriverState · f5a5ca79

由 Manos Pitsidianakis 提交于 6月 09, 2017

Change the 'int count' parameter in *pwrite_zeros, *pdiscard related
functions (and some others) to 'int bytes', as they both refer to bytes.
This helps with code legibility.
Signed-off-by: NManos Pitsidianakis <el13635@mail.ntua.gr>
Message-id: 20170609101808.13506-1-el13635@mail.ntua.gr
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

f5a5ca79

15 6月, 2017 1 次提交

nbd: rename read_sync and friends · d1fdf257

由 Vladimir Sementsov-Ogievskiy 提交于 6月 02, 2017

Rename
  nbd_wr_syncv -> nbd_rwv
  read_sync -> nbd_read
  read_sync_eof -> nbd_read_eof
  write_sync -> nbd_write
  drop_sync -> nbd_drop

1. nbd_ prefix
   read_sync and write_sync are already shared, so it is good to have a
   namespace prefix. drop_sync will be shared, and read_sync_eof is
   related to read_sync, so let's rename them all.

2. _sync suffix
   _sync is related to the fact that nbd_wr_syncv doesn't return if a
   write to socket returns EAGAIN. The first implementation of
   nbd_wr_syncv (was wr_sync in 7a5ca864) just loops while getting
   EAGAIN, the current implementation yields in this case.
   Why we want to get rid of it:
   - it is normal for r/w functions to be synchronous, so having an
     additional suffix for it looks redundant (contrariwise, we have
     _aio suffix for async functions)
   - _sync suffix in block layer is used when function does flush (so
     using it for other thing is confusing a bit)
   - keep function names short after adding nbd_ prefix

3. for nbd_wr_syncv let's use more common notation 'rw'
Reviewed-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170602150150.258222-2-vsementsov@virtuozzo.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d1fdf257

08 6月, 2017 1 次提交

nbd: make it thread-safe, fix qcow2 over nbd · 6bdcc018

由 Paolo Bonzini 提交于 6月 01, 2017

NBD is not thread safe, because it accesses s->in_flight without
a CoMutex.  Fixing this will be required for multiqueue.
CoQueue doesn't have spurious wakeups but, when another coroutine can
run between qemu_co_queue_next's wakeup and qemu_co_queue_wait's
re-locking of the mutex, the wait condition can become false and
a loop is necessary.

In fact, it turns out that the loop is necessary even without this
multi-threaded scenario.  A particular sequence of coroutine wakeups
is happening ~80% of the time when starting a guest with qcow2 image
served over NBD (i.e. qemu-nbd --format=raw, and QEMU's -drive option
has -format=qcow2).  This patch fixes that issue too.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6bdcc018

07 6月, 2017 2 次提交

nbd/client.c: use errp instead of LOG · be41c100

由 Vladimir Sementsov-Ogievskiy 提交于 5月 26, 2017

Move to modern errp scheme from just LOGging errors.
Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170526110913.89098-1-vsementsov@virtuozzo.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

be41c100

nbd: add errp parameter to nbd_wr_syncv() · f2609565

由 Vladimir Sementsov-Ogievskiy 提交于 5月 16, 2017

Will be used in following patch to provide actual error message in
some cases.
Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20170516094533.6160-4-vsementsov@virtuozzo.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f2609565

27 3月, 2017 1 次提交

nbd-client: fix handling of hungup connections · a12a712a

由 Paolo Bonzini 提交于 3月 14, 2017

After the switch to reading replies in a coroutine, nothing is
reentering pending receive coroutines if the connection hangs.
Move nbd_recv_coroutines_enter_all to the reply read coroutine,
which is the place where hangups are detected.  nbd_teardown_connection
can simply wait for the reply read coroutine to detect the hangup
and clean up after itself.

This wouldn't be enough though because nbd_receive_reply returns 0
(rather than -EPIPE or similar) when reading from a hung connection.
Fix the return value check in nbd_read_reply_entry.

This fixes qemu-iotests 083.
Reported-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 20170314111157.14464-1-pbonzini@redhat.com
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

a12a712a

21 2月, 2017 2 次提交

coroutine-lock: add mutex argument to CoQueue APIs · 1ace7cea

由 Paolo Bonzini 提交于 2月 13, 2017

All that CoQueue needs in order to become thread-safe is help
from an external mutex.  Add this to the API.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-6-pbonzini@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

1ace7cea

nbd: convert to use qio_channel_yield · ff82911c

由 Paolo Bonzini 提交于 2月 13, 2017

In the client, read the reply headers from a coroutine, switching the
read side between the "read header" coroutine and the I/O coroutine that
reads the body of the reply.

In the server, if the server can read more requests it will create a new
"read request" coroutine as soon as a request has been read.  Otherwise,
the new coroutine is created in nbd_request_put.
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NDaniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-8-pbonzini@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

ff82911c

04 1月, 2017 1 次提交

aio: add AioPollFn and io_poll() interface · f6a51c84

由 Stefan Hajnoczi 提交于 12月 01, 2016

The new AioPollFn io_poll() argument to aio_set_fd_handler() and
aio_set_event_handler() is used in the next patch.

Keep this code change separate due to the number of files it touches.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 20161201192652.9509-3-stefanha@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

f6a51c84

23 11月, 2016 1 次提交

nbd: Allow unmap and fua during write zeroes · 169407e1

由 Eric Blake 提交于 11月 17, 2016

Commit fa778fff wired up support to send the NBD_CMD_WRITE_ZEROES,
but forgot to inform the block layer that FUA unmapping of zeroes is
supported.  Without BDRV_REQ_MAY_UNMAP listed as a supported flag,
the block layer will always insist on the NBD layer passing
NBD_CMD_FLAG_NO_HOLE, resulting in the server always allocating
things even when it was desired to let the server punch holes.
Similarly, failing to set BDRV_REQ_FUA means that the client may
send unnecessary NBD_CMD_FLUSH when it could have instead used the
NBD_CMD_FLAG_FUA bit.

CC: qemu-stable@nongnu.org
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-Id: <1479413642-22463-2-git-send-email-eblake@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

169407e1

02 11月, 2016 4 次提交

nbd: Implement NBD_CMD_WRITE_ZEROES on client · fa778fff

由 Eric Blake 提交于 10月 14, 2016

Upstream NBD protocol recently added the ability to efficiently
write zeroes without having to send the zeroes over the wire,
along with a flag to control whether the client wants a hole.

The generic block code takes care of falling back to the obvious
write of lots of zeroes if we return -ENOTSUP because the server
does not have WRITE_ZEROES.

Ideally, since NBD_CMD_WRITE_ZEROES does not involve any data
over the wire, we want to support transactions that are much
larger than the normal 32M limit imposed on NBD_CMD_WRITE.  But
the server may still have a limit smaller than UINT_MAX, so
until experimental NBD protocol additions for advertising various
command sizes is finalized (see [1], [2]), for now we just stick to
the same limits as normal writes.

[1] https://github.com/yoe/nbd/blob/extension-info/doc/proto.md
[2] https://sourceforge.net/p/nbd/mailman/message/35081223/Signed-off-by: NEric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-17-git-send-email-eblake@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fa778fff

nbd: Rename struct nbd_request and nbd_reply · ed2dd912

由 Eric Blake 提交于 10月 14, 2016

Our coding convention prefers CamelCase names, and we already
have other existing structs with NBDFoo naming.  Let's be
consistent, before later patches add even more structs.
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-6-git-send-email-eblake@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ed2dd912

nbd: Rename NbdClientSession to NBDClientSession · 10676b81

由 Eric Blake 提交于 10月 14, 2016

It's better to use consistent capitalization of the namespace
used for NBD functions; we have more instances of NBD* than
Nbd*.
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-5-git-send-email-eblake@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

10676b81

nbd: Treat flags vs. command type as separate fields · b626b51a

由 Eric Blake 提交于 10月 14, 2016

Current upstream NBD documents that requests have a 16-bit flags,
followed by a 16-bit type integer; although older versions mentioned
only a 32-bit field with masking to find flags. Since the protocol
is in network order (big-endian over the wire), the ABI is unchanged;
but dealing with the flags as a separate field rather than masking
will make it easier to add support for upcoming NBD extensions that
increase the number of both flags and commands.

Improve some comments in nbd.h based on the current upstream
NBD protocol (https://github.com/yoe/nbd/blob/master/doc/proto.md),
and touch some nearby code to keep checkpatch.pl happy.
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-Id: <1476469998-28592-3-git-send-email-eblake@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b626b51a

01 11月, 2016 1 次提交

nbd: Use CoQueue for free_sema instead of CoMutex · 9bc9732f

由 Changlong Xie 提交于 10月 12, 2016

NBD is using the CoMutex in a way that wasn't anticipated. For example, if there are
N(N=26, MAX_NBD_REQUESTS=16) nbd write requests, so we will invoke nbd_client_co_pwritev
N times.
----------------------------------------------------------------------------------------
time request Actions
1    1       in_flight=1, Coroutine=C1
2    2       in_flight=2, Coroutine=C2
...
15   15      in_flight=15, Coroutine=C15
16   16      in_flight=16, Coroutine=C16, free_sema->holder=C16, mutex->locked=true
17   17      in_flight=16, Coroutine=C17, queue C17 into free_sema->queue
18   18      in_flight=16, Coroutine=C18, queue C18 into free_sema->queue
...
26   N       in_flight=16, Coroutine=C26, queue C26 into free_sema->queue
----------------------------------------------------------------------------------------

Once nbd client recieves request No.16' reply, we will re-enter C16. It's ok, because
it's equal to 'free_sema->holder'.
----------------------------------------------------------------------------------------
time request Actions
27   16      in_flight=15, Coroutine=C16, free_sema->holder=C16, mutex->locked=false
----------------------------------------------------------------------------------------

Then nbd_coroutine_end invokes qemu_co_mutex_unlock what will pop coroutines from
free_sema->queue's head and enter C17. More free_sema->holder is C17 now.
----------------------------------------------------------------------------------------
time request Actions
28   17      in_flight=16, Coroutine=C17, free_sema->holder=C17, mutex->locked=true
----------------------------------------------------------------------------------------

In above scenario, we only recieves request No.16' reply. As time goes by, nbd client will
almostly recieves replies from requests 1 to 15 rather than request 17 who owns C17. In this
case, we will encounter assert "mutex->holder == self" failed since Kevin's commit 0e438cdc
"coroutine: Let CoMutex remember who holds it". For example, if nbd client recieves request
No.15' reply, qemu will stop unexpectedly:
----------------------------------------------------------------------------------------
time request       Actions
29   15(most case) in_flight=15, Coroutine=C15, free_sema->holder=C17, mutex->locked=false
----------------------------------------------------------------------------------------

Per Paolo's suggestion "The simplest fix is to change it to CoQueue, which is like a condition
variable", this patch replaces CoMutex with CoQueue.

Cc: Wen Congyang <wency@cn.fujitsu.com>
Reported-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Message-Id: <1476267508-19499-1-git-send-email-xiecl.fnst@cn.fujitsu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9bc9732f

20 7月, 2016 4 次提交

nbd: Convert to byte-based interface · 70c4fb26

由 Eric Blake 提交于 7月 15, 2016

The NBD protocol doesn't have any notion of sectors, so it is
a fairly easy conversion to use byte-based read and write.
Signed-off-by: NEric Blake <eblake@redhat.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 1468624988-423-19-git-send-email-eblake@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

70c4fb26

nbd: Switch .bdrv_co_discard() to byte-based · 447e57c3

由 Eric Blake 提交于 7月 15, 2016

Another step towards killing off sector-based block APIs.

While at it, call directly into nbd-client.c instead of having
a pointless trivial wrapper in nbd.c.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-14-git-send-email-eblake@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

447e57c3

nbd: Drop unused offset parameter · 1e2a77a8

由 Eric Blake 提交于 7月 15, 2016

Now that NBD relies on the block layer to fragment things, we no
longer need to track an offset argument for which fragment of
a request we are actually servicing.

While at it, use true and false instead of 0 and 1 for a bool
parameter.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 1468607524-19021-6-git-send-email-eblake@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

1e2a77a8

nbd: Rely on block layer to break up large requests · fb1a6de1

由 Eric Blake 提交于 7月 15, 2016

Now that the block layer will honor max_transfer, we can simplify
our code to rely on that guarantee.

The readv code can call directly into nbd-client, just as the
writev code has done since commit 52a46505.

Interestingly enough, while qemu-io 'w 0 40m' splits into a 32M
and 8M transaction, 'w -z 0 40m' splits into two 16M and an 8M,
because the block layer caps the bounce buffer for writing zeroes
at 16M.  When we later introduce support for NBD_CMD_WRITE_ZEROES,
we can get a full 32M zero write (or larger, if the client and
server negotiate that write zeroes can use a larger size than
ordinary writes).
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 1468607524-19021-5-git-send-email-eblake@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

fb1a6de1

13 7月, 2016 1 次提交

coroutine: move entry argument to qemu_coroutine_create · 0b8b8753

由 Paolo Bonzini 提交于 7月 04, 2016

In practice the entry argument is always known at creation time, and
it is confusing that sometimes qemu_coroutine_enter is used with a
non-NULL argument to re-enter a coroutine (this happens in
block/sheepdog.c and tests/test-coroutine.c).  So pass the opaque value
at creation time, for consistency with e.g. aio_bh_new.

Mostly done with the following semantic patch:

@ entry1 @
expression entry, arg, co;
@@
- co = qemu_coroutine_create(entry);
+ co = qemu_coroutine_create(entry, arg);
  ...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);

@ entry2 @
expression entry, arg;
identifier co;
@@
- Coroutine *co = qemu_coroutine_create(entry);
+ Coroutine *co = qemu_coroutine_create(entry, arg);
  ...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);

@ entry3 @
expression entry, arg;
@@
- qemu_coroutine_enter(qemu_coroutine_create(entry), arg);
+ qemu_coroutine_enter(qemu_coroutine_create(entry, arg));

@ reentry @
expression co;
@@
- qemu_coroutine_enter(co, NULL);
+ qemu_coroutine_enter(co);

except for the aforementioned few places where the semantic patch
stumbled (as expected) and for test_co_queue, which would otherwise
produce an uninitialized variable warning.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

0b8b8753

05 7月, 2016 1 次提交

nbd: Allow larger requests · 476b923c

由 Eric Blake 提交于 6月 23, 2016

The NBD layer was breaking up request at a limit of 2040 sectors
(just under 1M) to cater to old qemu-nbd. But the server limit
was raised to 32M in commit 2d821488 to match the kernel, more
than three years ago; and the upstream NBD Protocol is proposing
documentation that without any explicit communication to state
otherwise, a client should be able to safely assume that a 32M
transaction will work.  It is time to rely on the larger sizing,
and any downstream distro that cares about maximum
interoperability to older qemu-nbd servers can just tweak the
value of #define NBD_MAX_SECTORS.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

476b923c

12 5月, 2016 2 次提交

nbd: Simplify client FUA handling · 52a46505

由 Eric Blake 提交于 5月 03, 2016

Now that the block layer honors per-bds FUA support, we don't
have to duplicate the fallback flush at the NBD layer.  The
static function nbd_co_writev_flags() is no longer needed, and
the driver can just directly use nbd_client_co_writev().
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

52a46505

block: Make supported_write_flags a per-bds property · 4df863f3

由 Eric Blake 提交于 5月 03, 2016

Pre-patch, .supported_write_flags lives at the driver level, which
means we are blindly declaring that all block devices using a
given driver will either equally support FUA, or that we need a
fallback at the block layer.  But there are drivers where FUA
support is a per-block decision: the NBD block driver is dependent
on the remote server advertising NBD_FLAG_SEND_FUA (and has
fallback code to duplicate the flush that the block layer would do
if NBD had not set .supported_write_flags); and the iscsi block
driver is dependent on the mode sense bits advertised by the
underlying device (and is currently silently ignoring FUA requests
if the underlying device does not support FUA).

The fix is to make supported flags as a per-BDS option, set during
.bdrv_open().  This patch moves the variable and fixes NBD and iscsi
to set it only conditionally; later patches will then further
simplify the NBD driver to quit duplicating work done at the block
layer, as well as tackle the fact that SCSI does not support FUA
semantics on WRITESAME(10/16) but only on WRITE(10/16).
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

4df863f3

05 4月, 2016 1 次提交

nbd: don't request FUA on FLUSH · a89ef0c3

由 Eric Blake 提交于 4月 01, 2016

The NBD protocol does not clearly document what will happen
if a client sends NBD_CMD_FLAG_FUA on NBD_CMD_FLUSH.
Historically, both the qemu and upstream NBD servers silently
ignored that flag, but that feels a bit risky.  Meanwhile, the
qemu NBD client unconditionally sends the flag (without even
bothering to check whether the caller cares; at least with
NBD_CMD_WRITE the client only sends FUA if requested by a
higher layer).

There is ongoing discussion on the NBD list to fix the
protocol documentation to require that the server MUST ignore
the flag (unless the kernel folks can better explain what FUA
means for a flush), but until those doc improvements land, the
current nbd.git master was recently changed to reject the flag
with EINVAL (see nbd commit ab22e082), which now makes it
impossible for a qemu client to use FLUSH with an upstream NBD
server.

We should not send FUA with flush unless the upstream protocol
documents what it will do, and even then, it should be something
that the caller can opt into, rather than being unconditional.
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-Id: <1459526902-32561-1-git-send-email-eblake@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a89ef0c3

30 3月, 2016 1 次提交

nbd: Support BDRV_REQ_FUA · 2b556518

由 Kevin Wolf 提交于 3月 10, 2016

The NBD server already used to send a FUA flag when the writethrough
mode was set. This code was a remnant from the times where protocol
drivers actually had to implement writethrough modes. Since nowadays the
block layer sends flushes in writethrough mode and non-root nodes are
always writeback, this was mostly dead code - only mostly because if NBD
was configured to be used without a format, we sent _both_ FUA and an
explicit flush afterwards, which makes the code not technically dead,
but useless overhead.

This patch changes the code so that the block layer's FUA flag is
recognised and translated into a NBD FUA flag. The additional flush is
avoided now.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

2b556518

17 2月, 2016 4 次提交

nbd: enable use of TLS with NBD block driver · 75822a12

由 Daniel P. Berrange 提交于 2月 10, 2016

This modifies the NBD driver so that it is possible to request
use of TLS. This is done by providing the 'tls-creds' parameter
with the ID of a previously created QCryptoTLSCreds object.

For example

  $QEMU -object tls-creds-x509,id=tls0,endpoint=client,\
                dir=/home/berrange/security/qemutls \
        -drive driver=nbd,host=localhost,port=9000,tls-creds=tls0

The client will drop the connection if the NBD server does not
provide TLS.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-15-git-send-email-berrange@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

75822a12

nbd: implement TLS support in the protocol negotiation · f95910fe

由 Daniel P. Berrange 提交于 2月 10, 2016

This extends the NBD protocol handling code so that it is capable
of negotiating TLS support during the connection setup. This involves
requesting the STARTTLS protocol option before any other NBD options.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-14-git-send-email-berrange@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f95910fe

nbd: convert to using I/O channels for actual socket I/O · 1c778ef7

由 Daniel P. Berrange 提交于 2月 10, 2016

Now that all callers are converted to use I/O channels for
initial connection setup, it is possible to switch the core
NBD protocol handling core over to use QIOChannel APIs for
actual sockets I/O.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-7-git-send-email-berrange@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1c778ef7

nbd: convert block client to use I/O channels for connection setup · 064097d9

由 Daniel P. Berrange 提交于 2月 10, 2016

This converts the NBD block driver client to use the QIOChannelSocket
class for initial connection setup. The NbdClientSession struct has
two pointers, one to the master QIOChannelSocket providing the raw
data channel, and one to a QIOChannel which is the current channel
used for I/O. Initially the two point to the same object, but when
TLS support is added, they will point to different objects.

The qemu-img & qemu-io tools now need to use MODULE_INIT_QOM to
ensure the QIOChannel object classes are registered. The qemu-nbd
tool already did this.

In this initial conversion though, all I/O is still actually done
using the raw POSIX sockets APIs.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-4-git-send-email-berrange@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

064097d9

20 1月, 2016 1 次提交

block: Clean up includes · 80c71a24

由 Peter Maydell 提交于 1月 18, 2016

Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

80c71a24

24 10月, 2015 1 次提交

aio: Add "is_external" flag for event handlers · dca21ef2

由 Fam Zheng 提交于 10月 23, 2015

All callers pass in false, and the real external ones will switch to
true in coming patches.
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NJeff Cody <jcody@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

dca21ef2

18 3月, 2015 1 次提交

nbd: Set block size to BDRV_SECTOR_SIZE · 3f472659

由 Max Reitz 提交于 2月 25, 2015

Signed-off-by: NMax Reitz <mreitz@redhat.com>
Message-Id: <1424887718-10800-13-git-send-email-mreitz@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3f472659

16 2月, 2015 2 次提交

nbd: fix the co_queue multi-adding bug · 141cabe6

由 Bin Wu 提交于 2月 10, 2015

When we tested the VM migartion between different hosts with NBD
devices, we found if we sent a cancel command after the drive_mirror
was just started, a coroutine re-enter error would occur. The stack
was as follow:

(gdb) bt
00)  0x00007fdfc744d885 in raise () from /lib64/libc.so.6
01)  0x00007fdfc744ee61 in abort () from /lib64/libc.so.6
02)  0x00007fdfca467cc5 in qemu_coroutine_enter (co=0x7fdfcaedb400, opaque=0x0)
at qemu-coroutine.c:118
03)  0x00007fdfca467f6c in qemu_co_queue_run_restart (co=0x7fdfcaedb400) at
qemu-coroutine-lock.c:59
04)  0x00007fdfca467be5 in coroutine_swap (from=0x7fdfcaf3c4e8,
to=0x7fdfcaedb400) at qemu-coroutine.c:96
05)  0x00007fdfca467cea in qemu_coroutine_enter (co=0x7fdfcaedb400, opaque=0x0)
at qemu-coroutine.c:123
06)  0x00007fdfca467f6c in qemu_co_queue_run_restart (co=0x7fdfcaedbdc0) at
qemu-coroutine-lock.c:59
07)  0x00007fdfca467be5 in coroutine_swap (from=0x7fdfcaf3c4e8,
to=0x7fdfcaedbdc0) at qemu-coroutine.c:96
08)  0x00007fdfca467cea in qemu_coroutine_enter (co=0x7fdfcaedbdc0, opaque=0x0)
at qemu-coroutine.c:123
09)  0x00007fdfca4a1fa4 in nbd_recv_coroutines_enter_all (s=0x7fdfcaef7dd0) at
block/nbd-client.c:41
10) 0x00007fdfca4a1ff9 in nbd_teardown_connection (client=0x7fdfcaef7dd0) at
block/nbd-client.c:50
11) 0x00007fdfca4a20f0 in nbd_reply_ready (opaque=0x7fdfcaef7dd0) at
block/nbd-client.c:92
12) 0x00007fdfca45ed80 in aio_dispatch (ctx=0x7fdfcae15e90) at aio-posix.c:144
13) 0x00007fdfca45ef1b in aio_poll (ctx=0x7fdfcae15e90, blocking=false) at
aio-posix.c:222
14) 0x00007fdfca448c34 in aio_ctx_dispatch (source=0x7fdfcae15e90, callback=0x0,
user_data=0x0) at async.c:212
15) 0x00007fdfc8f2f69a in g_main_context_dispatch () from
/usr/lib64/libglib-2.0.so.0
16) 0x00007fdfca45c391 in glib_pollfds_poll () at main-loop.c:190
17) 0x00007fdfca45c489 in os_host_main_loop_wait (timeout=1483677098) at
main-loop.c:235
18) 0x00007fdfca45c57b in main_loop_wait (nonblocking=0) at main-loop.c:484
19) 0x00007fdfca25f403 in main_loop () at vl.c:2249
20) 0x00007fdfca266fc2 in main (argc=42, argv=0x7ffff517d638,
envp=0x7ffff517d790) at vl.c:4814

We find the nbd_recv_coroutines_enter_all function (triggered by a cancel
command or a network connection breaking down) will enter a coroutine which
is waiting for the sending lock. If the lock is still held by another coroutine,
the entering coroutine will be added into the co_queue again. Latter, when the
lock is released, a coroutine re-enter error will occur.

This bug can be fixed simply by delaying the setting of recv_coroutine as
suggested by paolo. After applying this patch, we have tested the cancel
operation in mirror phase looply for more than 5 hous and everything is fine.
Without this patch, a coroutine re-enter error will occur in 5 minutes.
Signed-off-by: NBn Wu <wu.wubin@huawei.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 1423552846-3896-1-git-send-email-wu.wubin@huawei.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

141cabe6

nbd: Drop BDS backpointer · f53a829b

由 Max Reitz 提交于 2月 06, 2015

Before this patch, the "opaque" pointer in an NBD BDS points to a
BDRVNBDState, which contains an NbdClientSession object, which in turn
contains a pointer to the BDS. This pointer may become invalid due to
bdrv_swap(), so drop it, and instead pass the BDS directly to the
nbd-client.c functions which then retrieve the NbdClientSession object
from there.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 1423256778-3340-2-git-send-email-mreitz@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

f53a829b

07 2月, 2015 1 次提交

nbd: Improve error messages · 1ce52846

由 Max Reitz 提交于 1月 26, 2015

This patch makes use of the Error object for nbd_receive_negotiate() so
that errors during negotiation look nicer.

Furthermore, this patch adds an additional error message if the received
magic was wrong, but would be correct for the other protocol version,
respectively: So if an export name was specified, but the NBD server
magic corresponds to an old handshake, this condition is explicitly
signaled to the user, and vice versa.

As these messages are now part of the "Could not open image" error
message, additional filtering has to be employed in iotest 083, which
this patch does as well.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

1ce52846

04 6月, 2014 1 次提交

nbd: implement .bdrv_detach/attach_aio_context() · 69447cd8

由 Stefan Hajnoczi 提交于 5月 08, 2014

Drop the assumption that we're using the main AioContext.  Convert
qemu_aio_set_fd_handler() calls to aio_set_fd_handler().

The .bdrv_detach/attach_aio_context() interfaces also need to be
implemented to move the socket fd handler from the old to the new
AioContext.
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

69447cd8

14 3月, 2014 1 次提交

nbd: close socket if connection breaks · 4a41a2d6

由 Stefan Hajnoczi 提交于 2月 26, 2014

nbd_receive_reply() is called by the event loop whenever data is
available or the socket has been closed by the remote side.

This patch closes the socket when an error occurs to prevent the
nbd_receive_reply() handler from being called indefinitely after the
connection has failed.

Note that we were already correctly returning EIO for pending requests
but leaving the nbd_receive_reply() handler registered resulted in high
CPU consumption and a flood of error messages.

Reuse nbd_teardown_connection() to close the socket.
Reported-by: NZhifeng Cai <bluewindow@h3c.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

4a41a2d6