提交 · 88ab85384da431950d319ab69438891c29e4a5dd · openeuler / qemu

13 2月, 2018 15 次提交

ui: correctly advance output buffer when writing SASL data · 88ab8538

由 Daniel P. Berrangé 提交于 2月 01, 2018

In this previous commit:

  commit 8f61f1c5
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Mon Dec 18 19:12:20 2017 +0000

    ui: track how much decoded data we consumed when doing SASL encoding

I attempted to fix a flaw with tracking how much data had actually been
processed when encoding with SASL. With that flaw, the VNC server could
mistakenly discard queued data that had not been sent.

The fix was not quite right though, because it merely decremented the
vs->output.offset value. This is effectively discarding data from the
end of the pending output buffer. We actually need to discard data from
the start of the pending output buffer. We also want to free memory that
is no longer required. The correct way to handle this is to use the
buffer_advance() helper method instead of directly manipulating the
offset value.
Reported-by: NLaszlo Ersek <lersek@redhat.com>
Signed-off-by: NDaniel P. Berrangé <berrange@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NLaszlo Ersek <lersek@redhat.com>
Message-id: 20180201155841.27509-1-berrange@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 627ebec2)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

88ab8538

ui: avoid sign extension using client width/height · 64653b7f

由 Daniel P. Berrange 提交于 1月 18, 2018

Pixman returns a signed int for the image width/height, but the VNC
protocol only permits a unsigned int16. Effective framebuffer size
is determined by the guest, limited by the video RAM size, so the
dimensions are unlikely to exceed the range of an unsigned int16,
but this is not currently validated.

With the current use of 'int' for client width/height, the calculation
of offsets in vnc_update_throttle_offset() suffers from integer size
promotion and sign extension, causing coverity warnings

*** CID 1385147:  Integer handling issues  (SIGN_EXTENSION)
/ui/vnc.c: 979 in vnc_update_throttle_offset()
973      * than that the client would already suffering awful audio
974      * glitches, so dropping samples is no worse really).
975      */
976     static void vnc_update_throttle_offset(VncState *vs)
977     {
978         size_t offset =
>>>     CID 1385147:  Integer handling issues  (SIGN_EXTENSION)
>>>     Suspicious implicit sign extension:
    "vs->client_pf.bytes_per_pixel" with type "unsigned char" (8 bits,
    unsigned) is promoted in "vs->client_width * vs->client_height *
    vs->client_pf.bytes_per_pixel" to type "int" (32 bits, signed), then
    sign-extended to type "unsigned long" (64 bits, unsigned).  If
    "vs->client_width * vs->client_height * vs->client_pf.bytes_per_pixel"
    is greater than 0x7FFFFFFF, the upper bits of the result will all be 1.
979             vs->client_width * vs->client_height * vs->client_pf.bytes_per_pixel;

Change client_width / client_height to be a size_t to avoid sign
extension and integer promotion. Then validate that dimensions are in
range wrt the RFB protocol u16 limits.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Message-id: 20180118155254.17053-1-berrange@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 4c956bd8)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

64653b7f

ui: mix misleading comments & return types of VNC I/O helper methods · 9a26ca6b

由 Daniel P. Berrange 提交于 12月 18, 2017

While the QIOChannel APIs for reading/writing data return ssize_t, with negative
value indicating an error, the VNC code passes this return value through the
vnc_client_io_error() method. This detects the error condition, disconnects the
client and returns 0 to indicate error. Thus all the VNC helper methods should
return size_t (unsigned), and misleading comments which refer to the possibility
of negative return values need fixing.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Reviewed-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-14-berrange@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 30b80fd5)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

9a26ca6b

ui: add trace events related to VNC client throttling · 172f4e5a

由 Daniel P. Berrange 提交于 12月 18, 2017

The VNC client throttling is quite subtle so will benefit from having trace
points available for live debugging.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Reviewed-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-13-berrange@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 6aa22a29)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

172f4e5a

ui: place a hard cap on VNC server output buffer size · 0c85a40e

由 Daniel P. Berrange 提交于 12月 18, 2017

The previous patches fix problems with throttling of forced framebuffer updates
and audio data capture that would cause the QEMU output buffer size to grow
without bound. Those fixes are graceful in that once the client catches up with
reading data from the server, everything continues operating normally.

There is some data which the server sends to the client that is impractical to
throttle. Specifically there are various pseudo framebuffer update encodings to
inform the client of things like desktop resizes, pointer changes, audio
playback start/stop, LED state and so on. These generally only involve sending
a very small amount of data to the client, but a malicious guest might be able
to do things that trigger these changes at a very high rate. Throttling them is
not practical as missed or delayed events would cause broken behaviour for the
client.

This patch thus takes a more forceful approach of setting an absolute upper
bound on the amount of data we permit to be present in the output buffer at
any time. The previous patch set a threshold for throttling the output buffer
by allowing an amount of data equivalent to one complete framebuffer update and
one seconds worth of audio data. On top of this it allowed for one further
forced framebuffer update to be queued.

To be conservative, we thus take that throttling threshold and multiply it by
5 to form an absolute upper bound. If this bound is hit during vnc_write() we
forceably disconnect the client, refusing to queue further data. This limit is
high enough that it should never be hit unless a malicious client is trying to
exploit the sever, or the network is completely saturated preventing any sending
of data on the socket.

This completes the fix for CVE-2017-15124 started in the previous patches.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Reviewed-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-12-berrange@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit f887cf16)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

0c85a40e

ui: fix VNC client throttling when forced update is requested · f9e53c77

由 Daniel P. Berrange 提交于 12月 18, 2017

The VNC server must throttle data sent to the client to prevent the 'output'
buffer size growing without bound, if the client stops reading data off the
socket (either maliciously or due to stalled/slow network connection).

The current throttling is very crude because it simply checks whether the
output buffer offset is zero. This check is disabled if the client has requested
a forced update, because we want to send these as soon as possible.

As a result, the VNC client can cause QEMU to allocate arbitrary amounts of RAM.
They can first start something in the guest that triggers lots of framebuffer
updates eg play a youtube video. Then repeatedly send full framebuffer update
requests, but never read data back from the server. This can easily make QEMU's
VNC server send buffer consume 100MB of RAM per second, until the OOM killer
starts reaping processes (hopefully the rogue QEMU process, but it might pick
others...).

To address this we make the throttling more intelligent, so we can throttle
full updates. When we get a forced update request, we keep track of exactly how
much data we put on the output buffer. We will not process a subsequent forced
update request until this data has been fully sent on the wire. We always allow
one forced update request to be in flight, regardless of what data is queued
for incremental updates or audio data. The slight complication is that we do
not initially know how much data an update will send, as this is done in the
background by the VNC job thread. So we must track the fact that the job thread
has an update pending, and not process any further updates until this job is
has been completed & put data on the output buffer.

This unbounded memory growth affects all VNC server configurations supported by
QEMU, with no workaround possible. The mitigating factor is that it can only be
triggered by a client that has authenticated with the VNC server, and who is
able to trigger a large quantity of framebuffer updates or audio samples from
the guest OS. Mostly they'll just succeed in getting the OOM killer to kill
their own QEMU process, but its possible other processes can get taken out as
collateral damage.

This is a more general variant of the similar unbounded memory usage flaw in
the websockets server, that was previously assigned CVE-2017-15268, and fixed
in 2.11 by:

commit a7b20a8e
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Mon Oct 9 14:43:42 2017 +0100

io: monitor encoutput buffer size from websocket GSource

This new general memory usage flaw has been assigned CVE-2017-15124, and is
partially fixed by this patch.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Reviewed-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-11-berrange@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit ada8d2e4)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

f9e53c77

ui: fix VNC client throttling when audio capture is active · f9c87678

由 Daniel P. Berrange 提交于 12月 18, 2017

The current throttling is very crude because it simply checks whether the
output buffer offset is zero. This check must be disabled if audio capture is
enabled, because when streaming audio the output buffer offset will rarely be
zero due to queued audio data, and so this would starve framebuffer updates.

As a result, the VNC client can cause QEMU to allocate arbitrary amounts of RAM.
They can first start something in the guest that triggers lots of framebuffer
updates eg play a youtube video. Then enable audio capture, and simply never
read data back from the server. This can easily make QEMU's VNC server send
buffer consume 100MB of RAM per second, until the OOM killer starts reaping
processes (hopefully the rogue QEMU process, but it might pick others...).

To address this we make the throttling more intelligent, so we can throttle
when audio capture is active too. To determine how to throttle incremental
updates or audio data, we calculate a size threshold. Normally the threshold is
the approximate number of bytes associated with a single complete framebuffer
update. ie width * height * bytes per pixel. We'll send incremental updates
until we hit this threshold, at which point we'll stop sending updates until
data has been written to the wire, causing the output buffer offset to fall
back below the threshold.

If audio capture is enabled, we increase the size of the threshold to also
allow for upto 1 seconds worth of audio data samples. ie nchannels * bytes
per sample * frequency. This allows the output buffer to have a mixture of
incremental framebuffer updates and audio data queued, but once the threshold
is exceeded, audio data will be dropped and incremental updates will be
throttled.

This is a more general variant of the similar unbounded memory usage flaw in
the websockets server, that was previously assigned CVE-2017-15268, and fixed
in 2.11 by:

commit a7b20a8e
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Mon Oct 9 14:43:42 2017 +0100

io: monitor encoutput buffer size from websocket GSource

This new general memory usage flaw has been assigned CVE-2017-15124, and is
partially fixed by this patch.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Reviewed-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-10-berrange@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit e2b72cb6)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

f9c87678

ui: refactor code for determining if an update should be sent to the client · 5af9f250

由 Daniel P. Berrange 提交于 12月 18, 2017

The logic for determining if it is possible to send an update to the client
will become more complicated shortly, so pull it out into a separate method
for easier extension later.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Reviewed-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-9-berrange@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 0bad8342)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

5af9f250

ui: correctly reset framebuffer update state after processing dirty regions · 2e6571e6

由 Daniel P. Berrange 提交于 12月 18, 2017

According to the RFB protocol, a client sends one or more framebuffer update
requests to the server. The server can reply with a single framebuffer update
response, that covers all previously received requests. Once the client has
read this update from the server, it may send further framebuffer update
requests to monitor future changes. The client is free to delay sending the
framebuffer update request if it needs to throttle the amount of data it is
reading from the server.

The QEMU VNC server, however, has never correctly handled the framebuffer
update requests. Once QEMU has received an update request, it will continue to
send client updates forever, even if the client hasn't asked for further
updates. This prevents the client from throttling back data it gets from the
server. This change fixes the flawed logic such that after a set of updates are
sent out, QEMU waits for a further update request before sending more data.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Reviewed-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-8-berrange@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 728a7ac9)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

2e6571e6

ui: introduce enum to track VNC client framebuffer update request state · 126617e6

由 Daniel P. Berrange 提交于 12月 18, 2017

Currently the VNC servers tracks whether a client has requested an incremental
or forced update with two boolean flags. There are only really 3 distinct
states to track, so create an enum to more accurately reflect permitted states.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Reviewed-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-7-berrange@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit fef1bbad)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

126617e6

ui: track how much decoded data we consumed when doing SASL encoding · 8a9c5c34

由 Daniel P. Berrange 提交于 12月 18, 2017

When we encode data for writing with SASL, we encode the entire pending output
buffer. The subsequent write, however, may not be able to send the full encoded
data in one go though, particularly with a slow network. So we delay setting the
output buffer offset back to zero until all the SASL encoded data is sent.

Between encoding the data and completing sending of the SASL encoded data,
however, more data might have been placed on the pending output buffer. So it
is not valid to set offset back to zero. Instead we must keep track of how much
data we consumed during encoding and subtract only that amount.

With the current bug we would be throwing away some pending data without having
sent it at all. By sheer luck this did not previously cause any serious problem
because appending data to the send buffer is always an atomic action, so we
only ever throw away complete RFB protocol messages. In the case of frame buffer
updates we'd catch up fairly quickly, so no obvious problem was visible.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Reviewed-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-6-berrange@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 8f61f1c5)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

8a9c5c34

ui: avoid pointless VNC updates if framebuffer isn't dirty · 616d64ac

由 Daniel P. Berrange 提交于 12月 18, 2017

The vnc_update_client() method checks the 'has_dirty' flag to see if there are
dirty regions that are pending to send to the client. Regardless of this flag,
if a forced update is requested, updates must be sent. For unknown reasons
though, the code also tries to sent updates if audio capture is enabled. This
makes no sense as audio capture state does not impact framebuffer contents, so
this check is removed.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Reviewed-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-5-berrange@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 3541b084)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

616d64ac

ui: remove redundant indentation in vnc_client_update · a7b2537f

由 Daniel P. Berrange 提交于 12月 18, 2017

Now that previous dead / unreachable code has been removed, we can simplify
the indentation in the vnc_client_update method.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Reviewed-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-4-berrange@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit b939eb89)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

a7b2537f

ui: remove unreachable code in vnc_update_client · de1e7a91

由 Daniel P. Berrange 提交于 12月 18, 2017

A previous commit:

  commit 5a8be0f7
  Author: Gerd Hoffmann <kraxel@redhat.com>
  Date:   Wed Jul 13 12:21:20 2016 +0200

    vnc: make sure we finish disconnect

Added a check for vs->disconnecting at the very start of the
vnc_update_client method. This means that the very next "if"
statement check for !vs->disconnecting always evaluates true,
and is thus redundant. This in turn means the vs->disconnecting
check at the very end of the method never evaluates true, and
is thus unreachable code.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Reviewed-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-3-berrange@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit c53df961)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

de1e7a91

ui: remove 'sync' parameter from vnc_update_client · 0181686a

由 Daniel P. Berrange 提交于 12月 18, 2017

There is only one caller of vnc_update_client and that always passes false
for the 'sync' parameter.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Reviewed-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-2-berrange@redhat.com
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 6af998db)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

0181686a

12 2月, 2018 7 次提交

migration: incoming postcopy advise sanity checks · a3fd64f2

由 Greg Kurz 提交于 2月 06, 2018

If postcopy-ram was set on the source but not on the destination,
migration doesn't occur, the destination prints an error and boots
the guest:

qemu-system-ppc64: Expected vmdescription section, but got 0

We end up with two running instances.

This behaviour was introduced in 2.11 by commit 58110f0a "migration:
split common postcopy out of ram postcopy" to prepare ground for the
upcoming dirty bitmap postcopy support. It adds a new case where the
source may send an empty postcopy advise because dirty bitmap doesn't
need to check page sizes like RAM postcopy does.

If the source has enabled postcopy-ram, then it sends an advise with
the page size values. If the destination hasn't enabled postcopy-ram,
then loadvm_postcopy_handle_advise() leaves the page size values on
the stream and returns. This confuses qemu_loadvm_state() later on
and causes the destination to start execution.

As discussed several times, postcopy-ram should be enabled both sides
to be functional. This patch changes the destination to perform some
extra checks on the advise length to ensure this is the case. Otherwise
an error is returned and migration is aborted.
Reported-by: NBalamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NDaniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reviewed-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <151791621042.19120.3103118434734245776.stgit@bahia>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 875fcd01)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

a3fd64f2

target/sh4: add missing tcg_temp_free() in _decode_opc() · 68d7e244

由 Philippe Mathieu-Daudé 提交于 12月 05, 2017

missed in c55497ec and 852d481f.
Signed-off-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20171205170013.22337-3-f4bug@amsat.org>
Reviewed-by: NAurelien Jarno <aurelien@aurel32.net>
Signed-off-by: NAurelien Jarno <aurelien@aurel32.net>
(cherry picked from commit e691e0ed)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

68d7e244

migration/savevm.c: set MAX_VM_CMD_PACKAGED_SIZE to 1ul << 32 · 2095c5a2

由 Daniel Henrique Barboza 提交于 1月 26, 2018

MAX_VM_CMD_PACKAGED_SIZE is a constant used in qemu_savevm_send_packaged
and loadvm_handle_cmd_packaged to determine whether a package is too
big to be sent or received. qemu_savevm_send_packaged is called inside
postcopy_start (migration/migration.c) to send the MigrationState
in a single blob to the destination, using the MIG_CMD_PACKAGED subcommand,
which will read it up using loadvm_handle_cmd_packaged. If the blob is
larger than MAX_VM_CMD_PACKAGED_SIZE, an error is thrown and the postcopy
migration is aborted. Both MAX_VM_CMD_PACKAGED_SIZE and MIG_CMD_PACKAGED
were introduced by commit 11cf1d98 ("MIG_CMD_PACKAGED: Send a packaged
chunk ..."). The constant has its original value of 1ul << 24 (16MB).

The current MAX_VM_CMD_PACKAGED_SIZE value is not enough to support postcopy
migration of bigger pseries guests. The blob size for a postcopy migration of
a pseries guest with the following setup:

qemu-system-ppc64 --nographic -vga none -machine pseries,accel=kvm -m 64G \
-smp 1,maxcpus=32 -device virtio-blk-pci,drive=rootdisk \
-drive file=f27.qcow2,if=none,cache=none,format=qcow2,id=rootdisk \
-netdev user,id=u1 -net nic,netdev=u1

Goes around 12MB. Bumping the RAM to 128G makes the blob sizes goes to 20MB.
With 256G the blob goes to 37MB - more than twice the current maximum size.
At this moment the pseries machine can handle guests with up to 1TB of RAM,
making this postcopy blob goes to 128MB of size approximately.

Following the discussions made in [1], there is a need to understand what
devices are aggressively consuming the blob in that manner and see if that
can be mitigated. Until then, we can set MAX_VM_CMD_PACKAGED_SIZE to the
maximum value allowed. Since the size is a 32 bit int variable, we can set
it as 1ul << 32, giving a maximum blob size of 4G that is enough to support
postcopy migration of 32TB RAM guests given the above constraints.

[1] https://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06313.htmlSigned-off-by: NDaniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reported-by: NBalamuruhan S <bala24@linux.vnet.ibm.com>
Reviewed-by: NJuan Quintela <quintela@redhat.com>
Signed-off-by: NJuan Quintela <quintela@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit ee555cdf)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

2095c5a2

migration: Recover block devices if failure in device state · c8847f55

由 Dr. David Alan Gilbert 提交于 2月 05, 2018

In e91d8951 I added the new pause-before-switchover mechanism
to allow migration completion to be delayed; this changes the
last state prior to completion to MIGRATE_STATUS_DEVICE rather
than MIGRATE_STATUS_ACTIVE.

Fix the failure path in migration_completion to recover the block
devices if it fails in MIGRATE_STATUS_DEVICE, not just the
MIGRATE_STATUS_ACTIVE that it previously had.

This corresponds to rh bz:
  https://bugzilla.redhat.com/show_bug.cgi?id=1538494
whose symptom is an occasional source crash on a failed migration.

Fixes: e91d8951Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 6039dd5b)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

c8847f55

migration: Don't leak IO channels · b9eec804

由 Ross Lagerwall 提交于 11月 01, 2017

Since qemu_fopen_channel_{in,out}put take references on the underlying
IO channels, make sure to release our references to them.
Signed-off-by: NRoss Lagerwall <ross.lagerwall@citrix.com>
Message-Id: <20171101142526.1006-2-ross.lagerwall@citrix.com>
Reviewed-by: NDaniel P. Berrange <berrange@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 032b79f7)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

b9eec804

s390x/sclp: fix event mask handling · b8aa511b

由 Christian Borntraeger 提交于 2月 02, 2018

commit 67915de9 ("s390x/event-facility: variable-length event
masks") switched the sclp receive/send mask. This broke the sclp
lm console.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Fixes: commit 67915de9 ("s390x/event-facility: variable-length event masks")
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Cc: qemu-stable@nongnu.org
Message-Id: <20180202094241.59537-1-borntraeger@de.ibm.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NCornelia Huck <cohuck@redhat.com>
(cherry picked from commit 869e676a)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

b8aa511b

memory: set ioeventfd_update_pending after address_space_update_ioeventfds · ab7b4f67

由 linzhecheng 提交于 1月 14, 2018

We should set ioeventfd_update_pending same as memory_region_update_pending.
Signed-off-by: Nlinzhecheng <linzc@zju.edu.cn>
Message-Id: <1515934519-16158-1-git-send-email-linzc@zju.edu.cn>
Cc: qemu-stable@nongnu.org
Fixes: ade9c1aaSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 0b152095)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

ab7b4f67

06 2月, 2018 18 次提交

target/ppc/spapr: Add H-Call H_GET_CPU_CHARACTERISTICS · ed8b4ecc

由 Suraj Jitindar Singh 提交于 1月 19, 2018

The new H-Call H_GET_CPU_CHARACTERISTICS is used by the guest to query
behaviours and available characteristics of the cpu.

Implement the handler for this new H-Call which formulates its response
based on the setting of the spapr_caps cap-cfpc, cap-sbbc and cap-ibs.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit c59704b2)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

ed8b4ecc

target/ppc/spapr_caps: Add new tristate cap safe_indirect_branch · eab4b517

由 Suraj Jitindar Singh 提交于 1月 19, 2018

Add new tristate cap cap-ibs to represent the indirect branch
serialisation capability.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 4be8d4e7)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

eab4b517

target/ppc/spapr_caps: Add new tristate cap safe_bounds_check · d7aa3d0a

由 Suraj Jitindar Singh 提交于 1月 19, 2018

Add new tristate cap cap-sbbc to represent the speculation barrier
bounds checking capability.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 09114fd8)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

d7aa3d0a

target/ppc/spapr_caps: Add new tristate cap safe_cache · 3dc12273

由 Suraj Jitindar Singh 提交于 1月 19, 2018

Add new tristate cap cap-cfpc to represent the cache flush on privilege
change capability.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 8f38eaf8)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

3dc12273

target/ppc/spapr_caps: Add support for tristate spapr_capabilities · e9a8747c

由 Suraj Jitindar Singh 提交于 1月 19, 2018

spapr_caps are used to represent the level of support for various
capabilities related to the spapr machine type. Currently there is
only support for boolean capabilities.

Add support for tristate capabilities by implementing their get/set
functions. These capabilities can have the values 0, 1 or 2
corresponding to broken, workaround and fixed.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 6898aed7)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

e9a8747c

target/ppc/kvm: Add cap_ppc_safe_[cache/bounds_check/indirect_branch] · 49b1fa33

由 Suraj Jitindar Singh 提交于 1月 19, 2018

Add three new kvm capabilities used to represent the level of host support
for three corresponding workarounds.

Host support for each of the capabilities is queried through the
new ioctl KVM_PPC_GET_CPU_CHAR which returns four uint64 quantities. The
first two, character and behaviour, represent the available
characteristics of the cpu and the behaviour of the cpu respectively.
The second two, c_mask and b_mask, represent the mask of known bits for
the character and beheviour dwords respectively.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
[dwg: Correct some compile errors due to name change in final kernel
 patch version]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

(cherry picked from commit 8acc2ae5)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

49b1fa33

target/ppc/spapr_caps: Add macro to generate spapr_caps migration vmstate · 43a29f00

由 Suraj Jitindar Singh 提交于 1月 19, 2018

The vmstate description and the contained needed function for migration
of spapr_caps is the same for each cap, with the name of the cap
substituted. As such introduce a macro to allow for easier generation of
these.

Convert the three existing spapr_caps (htm, vsx, and dfp) to use this
macro.
Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 1f63ebaa)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

43a29f00

target/ppc: introduce the PPC_BIT() macro · d72e0a69

由 Cédric Le Goater 提交于 12月 06, 2017

and use them in a couple of obvious places. Other macros will be used
in the model of the XIVE interrupt controller.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 2a83f997)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

d72e0a69

spapr: fix device tree properties when using compatibility mode · 4374cbca

由 Greg Kurz 提交于 1月 17, 2018

Commit 51f84465 changed the compatility mode setting logic:
- machine reset only sets compatibility mode for the boot CPU
- compatibility mode is set for other CPUs when they are put online
  by the guest with the "start-cpu" RTAS call

This causes a regression for machines started with max-compat-cpu:
the device tree nodes related to secondary CPU cores contain wrong
"cpu-version" and "ibm,pa-features" values, as shown below.

Guest started on a POWER8 host with:
     -smp cores=2 -machine pseries,max-cpu-compat=compat7

                        ibm,pa-features = [18 00 f6 3f c7 c0 80 f0 80 00
 00 00 00 00 00 00 00 00 80 00 80 00 80 00 00 00];
                        cpu-version = <0x4d0200>;

                               ^^^
                        second CPU core

                        ibm,pa-features = <0x600f63f 0xc70080c0>;
                        cpu-version = <0xf000003>;

                               ^^^
                          boot CPU core

The second core is advertised in raw POWER8 mode. This happens because
CAS assumes all CPUs to have the same compatibility mode. Since the
boot CPU already has the requested compatibility mode, the CAS code
does not set it for the secondary one, and exposes the bogus device
tree properties in in the CAS response to the guest.

A similar situation is observed when hot-plugging a CPU core. The
related device tree properties are generated and exposed to guest
with the "ibm,configure-connector" RTAS before "start-cpu" is called.
The CPU core is advertised to the guest in raw mode as well.

It both cases, it boils down to the fact that "start-cpu" happens too
late. This can be fixed globally by propagating the compatibility mode
of the boot CPU to the other CPUs during reset.  For this to work, the
compatibility mode of the boot CPU must be set before the machine code
actually resets all CPUs.

It is not needed to set the compatibility mode in "start-cpu" anymore,
so the code is dropped.

Fixes: 51f84465Signed-off-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 9012a53f)
 Conflicts:
	hw/ppc/spapr_cpu_core.c
	hw/ppc/spapr_rtas.c
* drop context dep on d6322252Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

4374cbca

ppc: Change Power9 compat table to support at most 8 threads/core · a1f33a5b

由 Jose Ricardo Ziviani 提交于 1月 14, 2018

Increases the max smt mode to 8 for Power9. That's because KVM supports
smt emulation in this platform so QEMU should allow users to use it as
well.

Today if we try to pass -smp ...,threads=8, QEMU will silently truncate
it to smt4 mode and may cause a crash if we try to perform a cpu
hotplug.
Signed-off-by: NJose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
[dwg: Added an explanatory comment]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

(cherry picked from commit 03ee51d3)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

a1f33a5b

hw/ppc/spapr_caps: Rework spapr_caps to use uint8 internal representation · 6a471367

由 Suraj Jitindar Singh 提交于 1月 12, 2018

Currently spapr_caps are tied to boolean values (on or off). This patch
reworks the caps so that they can have any uint8 value. This allows more
capabilities with various values to be represented in the same way
internally. Capabilities are numbered in ascending order. The internal
representation of capability values is an array of uint8s in the
sPAPRMachineState, indexed by capability number.

Capabilities can have their own name, description, options, getter and
setter functions, type and allow functions. They also each have their own
section in the migration stream. Capabilities are only migrated if they
were explictly set on the command line, with the assumption that
otherwise the default will match.

On migration we ensure that the capability value on the destination
is greater than or equal to the capability value from the source. So
long at this remains the case then the migration is considered
compatible and allowed to continue.

This patch implements generic getter and setter functions for boolean
capabilities. It also converts the existings cap-htm, cap-vsx and
cap-dfp capabilities to this new format.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 4e5fe368)
 Conflicts:
	include/hw/ppc/spapr.h
*drop context dep on 60c6823bSigned-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

6a471367

spapr: Handle Decimal Floating Point (DFP) as an optional capability · e4f4fa00

由 David Gibson 提交于 12月 11, 2017

Decimal Floating Point has been available on POWER7 and later (server)
cpus.  However, it can be disabled on the hypervisor, meaning that it's
not available to guests.

We currently handle this by conditionally advertising DFP support in the
device tree depending on whether the guest CPU model supports it - which
can also depend on what's allowed in the host for -cpu host.  That can lead
to confusion on migration, since host properties are silently affecting
guest visible properties.

This patch handles it by treating it as an optional capability for the
pseries machine type.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NGreg Kurz <groug@kaod.org>
(cherry picked from commit 2d1fb9bc)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

e4f4fa00

spapr: Handle VMX/VSX presence as an spapr capability flag · ff6f7e10

由 David Gibson 提交于 12月 07, 2017

We currently have some conditionals in the spapr device tree code to decide
whether or not to advertise the availability of the VMX (aka Altivec) and
VSX vector extensions to the guest, based on whether the guest cpu has
those features.

This can lead to confusion and subtle failures on migration, since it makes
a guest visible change based only on host capabilities.  We now have a
better mechanism for this, in spapr capabilities flags, which explicitly
depend on user options rather than host capabilities.

Rework the advertisement of VSX and VMX based on a new VSX capability.  We
no longer bother with a conditional for VMX support, because every CPU
that's ever been supported by the pseries machine type supports VMX.

NOTE: Some userspace distributions (e.g. RHEL7.4) already rely on
availability of VSX in libc, so using cap-vsx=off may lead to a fatal
SIGILL in init.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NGreg Kurz <groug@kaod.org>
(cherry picked from commit 29386642)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

ff6f7e10

target/ppc: Clean up probing of VMX, VSX and DFP availability on KVM · 7c578cbc

由 David Gibson 提交于 12月 11, 2017

When constructing the "host" cpu class we modify whether the VMX and VSX
vector extensions and DFP (Decimal Floating Point) are available
based on whether KVM can support those instructions. This can depend on
policy in the host kernel as well as on the actual host cpu capabilities.

However, the way we probe for this is not very nice: we explicitly check
the host's device tree. That works in practice, but it's not really
correct, since the device tree is a property of the host kernel's platform
which we don't really know about. We get away with it because the only
modern POWER platforms happen to encode VMX, VSX and DFP availability in
the device tree in the same way.

Arguably we should have an explicit KVM capability for this, but we haven't
needed one so far. Barring specific KVM policies which don't yet exist,
each of these instruction classes will be available in the guest if and
only if they're available in the qemu userspace process. We can determine
that from the ELF AUX vector we're supplied with.

Once reworked like this, there are no more callers for kvmppc_get_vmx() and
kvmppc_get_dfp() so remove them.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NGreg Kurz <groug@kaod.org>
(cherry picked from commit 3f2ca480)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

7c578cbc

spapr: Validate capabilities on migration · 804e5ea9

由 David Gibson 提交于 12月 11, 2017

Now that the "pseries" machine type implements optional capabilities (well,
one so far) there's the possibility of having different capabilities
available at either end of a migration.  Although arguably a user error,
it would be nice to catch this situation and fail as gracefully as we can.

This adds code to migrate the capabilities flags.  These aren't pulled
directly into the destination's configuration since what the user has
specified on the destination command line should take precedence.  However,
they are checked against the destination capabilities.

If the source was using a capability which is absent on the destination,
we fail the migration, since that could easily cause a guest crash or other
bad behaviour.  If the source lacked a capability which is present on the
destination we warn, but allow the migration to proceed.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NGreg Kurz <groug@kaod.org>
(cherry picked from commit be85537d)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

804e5ea9

spapr: Treat Hardware Transactional Memory (HTM) as an optional capability · 9070f408

由 David Gibson 提交于 12月 11, 2017

This adds an spapr capability bit for Hardware Transactional Memory.  It is
enabled by default for pseries-2.11 and earlier machine types. with POWER8
or later CPUs (as it must be, since earlier qemu versions would implicitly
allow it).  However it is disabled by default for the latest pseries-2.12
machine type.

This means that with the latest machine type, HTM will not be available,
regardless of CPU, unless it is explicitly enabled on the command line.
That change is made on the basis that:

 * This way running with -M pseries,accel=tcg will start with whatever cpu
   and will provide the same guest visible model as with accel=kvm.
     - More specifically, this means existing make check tests don't have
       to be modified to use cap-htm=off in order to run with TCG

 * We hope to add a new "HTM without suspend" feature in the not too
   distant future which could work on both POWER8 and POWER9 cpus, and
   could be enabled by default.

 * Best guesses suggest that future POWER cpus may well only support the
   HTM-without-suspend model, not the (frankly, horribly overcomplicated)
   POWER8 style HTM with suspend.

 * Anecdotal evidence suggests problems with HTM being enabled when it
   wasn't wanted are more common than being missing when it was.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NGreg Kurz <groug@kaod.org>
(cherry picked from commit ee76a09f)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

9070f408

spapr: Capabilities infrastructure · 78a38cd4

由 David Gibson 提交于 12月 08, 2017

Because PAPR is a paravirtual environment access to certain CPU (or other)
facilities can be blocked by the hypervisor.  PAPR provides ways to
advertise in the device tree whether or not those features are available to
the guest.

In some places we automatically determine whether to make a feature
available based on whether our host can support it, in most cases this is
based on limitations in the available KVM implementation.

Although we correctly advertise this to the guest, it means that host
factors might make changes to the guest visible environment which is bad:
as well as generaly reducing reproducibility, it means that a migration
between different host environments can easily go bad.

We've mostly gotten away with it because the environments considered mature
enough to be well supported (basically, KVM on POWER8) have had consistent
feature availability.  But, it's still not right and some limitations on
POWER9 is going to make it more of an issue in future.

This introduces an infrastructure for defining "sPAPR capabilities".  These
are set by default based on the machine version, masked by the capabilities
of the chosen cpu, but can be overriden with machine properties.

The intention is at reset time we verify that the requested capabilities
can be supported on the host (considering TCG, KVM and/or host cpu
limitations).  If not we simply fail, rather than silently modifying the
advertised featureset to the guest.

This does mean that certain configurations that "worked" may now fail, but
such configurations were already more subtly broken.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NGreg Kurz <groug@kaod.org>
(cherry picked from commit 33face6b)
 Conflicts:
	include/hw/ppc/spapr.h
*drop context dep on 60c6823bSigned-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

78a38cd4

spapr: Add pseries-2.12 machine type · 0fac4aa9

由 David Gibson 提交于 11月 13, 2017

While we're at it fix a couple of small errors in the 2.11 and 2.10 models
(they didn't have any real effect, but don't quite match the template).
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 2b615412)
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

0fac4aa9