提交 · fd392cfa8e6fb0dc34bd0327fc356dfbf6edf1fd · openeuler / qemu

15 5月, 2019 2 次提交

migration: Fix use-after-free during process exit · fd392cfa

由 Yury Kotov 提交于 4月 08, 2019

It fixes heap-use-after-free which was found by clang's ASAN.

Control flow of this use-after-free:
main_thread:
    * Got SIGTERM and completes main loop
    * Calls migration_shutdown
      - migrate_fd_cancel (so, migration_thread begins to complete)
      - object_unref(OBJECT(current_migration));

migration_thread:
    * migration_iteration_finish -> schedule cleanup bh
    * object_unref(OBJECT(s)); (Now, current_migration is freed)
    * exits

main_thread:
    * Calls vm_shutdown -> drain bdrvs -> main loop
      -> cleanup_bh -> use after free

If you want to reproduce, these couple of sleeps will help:
vl.c:4613:
     migration_shutdown();
+    sleep(2);
migration.c:3269:
+    sleep(1);
     trace_migration_thread_after_loop();
     migration_iteration_finish(s);

Original output:
qemu-system-x86_64: terminating on signal 15 from pid 31980 (<unknown process>)
=================================================================
==31958==ERROR: AddressSanitizer: heap-use-after-free on address 0x61900001d210
  at pc 0x555558a535ca bp 0x7fffffffb190 sp 0x7fffffffb188
READ of size 8 at 0x61900001d210 thread T0 (qemu-vm-0)
    #0 0x555558a535c9 in migrate_fd_cleanup migration/migration.c:1502:23
    #1 0x5555594fde0a in aio_bh_call util/async.c:90:5
    #2 0x5555594fe522 in aio_bh_poll util/async.c:118:13
    #3 0x555559524783 in aio_poll util/aio-posix.c:725:17
    #4 0x555559504fb3 in aio_wait_bh_oneshot util/aio-wait.c:71:5
    #5 0x5555573bddf6 in virtio_blk_data_plane_stop
      hw/block/dataplane/virtio-blk.c:282:5
    #6 0x5555589d5c09 in virtio_bus_stop_ioeventfd hw/virtio/virtio-bus.c:246:9
    #7 0x5555589e9917 in virtio_pci_stop_ioeventfd hw/virtio/virtio-pci.c:287:5
    #8 0x5555589e22bf in virtio_pci_vmstate_change hw/virtio/virtio-pci.c:1072:9
    #9 0x555557628931 in virtio_vmstate_change hw/virtio/virtio.c:2257:9
    #10 0x555557c36713 in vm_state_notify vl.c:1605:9
    #11 0x55555716ef53 in do_vm_stop cpus.c:1074:9
    #12 0x55555716eeff in vm_shutdown cpus.c:1092:12
    #13 0x555557c4283e in main vl.c:4617:5
    #14 0x7fffdfdb482f in __libc_start_main
      (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #15 0x555556ecb118 in _start (x86_64-softmmu/qemu-system-x86_64+0x1977118)

0x61900001d210 is located 144 bytes inside of 952-byte region
  [0x61900001d180,0x61900001d538)
freed by thread T6 (live_migration) here:
    #0 0x555556f76782 in __interceptor_free
      /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:124:3
    #1 0x555558d5fa94 in object_finalize qom/object.c:618:9
    #2 0x555558d57651 in object_unref qom/object.c:1068:9
    #3 0x555558a55588 in migration_thread migration/migration.c:3272:5
    #4 0x5555595393f2 in qemu_thread_start util/qemu-thread-posix.c:502:9
    #5 0x7fffe057f6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)

previously allocated by thread T0 (qemu-vm-0) here:
    #0 0x555556f76b03 in __interceptor_malloc
      /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:146:3
    #1 0x7ffff6ee37b8 in g_malloc (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4f7b8)
    #2 0x555558d58031 in object_new qom/object.c:640:12
    #3 0x555558a31f21 in migration_object_init migration/migration.c:139:25
    #4 0x555557c41398 in main vl.c:4320:5
    #5 0x7fffdfdb482f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

Thread T6 (live_migration) created by T0 (qemu-vm-0) here:
    #0 0x555556f5f0dd in pthread_create
      /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_interceptors.cc:210:3
    #1 0x555559538cf9 in qemu_thread_create util/qemu-thread-posix.c:539:11
    #2 0x555558a53304 in migrate_fd_connect migration/migration.c:3332:5
    #3 0x555558a72bd8 in migration_channel_connect migration/channel.c:92:5
    #4 0x555558a6ef87 in exec_start_outgoing_migration migration/exec.c:42:5
    #5 0x555558a4f3c2 in qmp_migrate migration/migration.c:1922:9
    #6 0x555558bb4f6a in qmp_marshal_migrate qapi/qapi-commands-migration.c:607:5
    #7 0x555559363738 in do_qmp_dispatch qapi/qmp-dispatch.c:131:5
    #8 0x555559362a15 in qmp_dispatch qapi/qmp-dispatch.c:174:11
    #9 0x5555571bac15 in monitor_qmp_dispatch monitor.c:4124:11
    #10 0x55555719a22d in monitor_qmp_bh_dispatcher monitor.c:4207:9
    #11 0x5555594fde0a in aio_bh_call util/async.c:90:5
    #12 0x5555594fe522 in aio_bh_poll util/async.c:118:13
    #13 0x5555595201e0 in aio_dispatch util/aio-posix.c:460:5
    #14 0x555559503553 in aio_ctx_dispatch util/async.c:261:5
    #15 0x7ffff6ede196 in g_main_context_dispatch
      (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4a196)

SUMMARY: AddressSanitizer: heap-use-after-free migration/migration.c:1502:23
  in migrate_fd_cleanup
Shadow bytes around the buggy address:
  0x0c327fffb9f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c327fffba00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c327fffba10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c327fffba20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c327fffba30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c327fffba40: fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c327fffba50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c327fffba60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c327fffba70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c327fffba80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c327fffba90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable: 00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone: fa
  Freed heap region: fd
  Stack left redzone: f1
  Stack mid redzone: f2
  Stack right redzone: f3
  Stack after return: f5
  Stack use after scope: f8
  Global redzone: f9
  Global init order: f6
  Poisoned by user: f7
  Container overflow: fc
  Array cookie: ac
  Intra object redzone: bb
  ASan internal: fe
  Left alloca redzone: ca
  Right alloca redzone: cb
  Shadow gap: cc
==31958==ABORTING
Signed-off-by: NYury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190408113343.2370-1-yury-kotov@yandex-team.ru>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
  Fixed up comment formatting

fd392cfa

migration: remove not used field xfer_limit · 15d2d64c

由 Wei Yang 提交于 3月 26, 2019

MigrationState->xfer_limit is only set to 0 in migrate_init().

Remove this unnecessary field.
Signed-off-by: NWei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190326055726.10539-1-richardw.yang@linux.intel.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>

15d2d64c

02 4月, 2019 2 次提交

migration: Support adding migration blockers earlier · daff7f0b

由 Markus Armbruster 提交于 4月 01, 2019

migrate_add_blocker() asserts we have a current_migration object, in
migrate_get_current().  We do only after migration_object_init().

This contributes to the following dependency cycle:

* configure_blockdev() must run before machine_set_property()
  so machine properties can refer to block backends

* machine_set_property() before configure_accelerator()
  so machine properties like kvm-irqchip get applied

* configure_accelerator() before migration_object_init()
  so that Xen's accelerator compat properties get applied.

* migration_object_init() before configure_blockdev()
  so configure_blockdev() can add migration blockers

The cycle was closed when recent commit cda4aa9a "Create block
backends before setting machine properties" added the first
dependency, and satisfied it by violating the last one.  Broke block
backends that add migration blockers, as demonstrated by qemu-iotests
055.

To fix it, break the last dependency: make migrate_add_blocker()
usable before migration_object_init().

The previous commit already removed the use of migrate_get_current()
from migrate_add_blocker() itself.  Didn't quite do the trick, as
there's another one hiding in migration_is_idle().

The use there isn't actually necessary: when no migration object has
been created yet, migration is surely idle.  Make migration_is_idle()
return true then.

Fixes: cda4aa9aSigned-off-by: NMarkus Armbruster <armbru@redhat.com>
Message-Id: <20190401090827.20793-4-armbru@redhat.com>
Reviewed-by: NIgor Mammedov <imammedo@redhat.com>

daff7f0b

Revert "migration: move only_migratable to MigrationState" · 811f8652

由 Markus Armbruster 提交于 4月 01, 2019

This reverts commit 3df663e5.
This reverts commit b605c47b.

Command line option --only-migratable is for disallowing any
configuration that can block migration.

Initially, --only-migratable set global variable @only_migratable.

Commit 3df663e5 "migration: move only_migratable to MigrationState"
replaced it by MigrationState member @only_migratable.  That was a
mistake.

First, it doesn't make sense on the design level.  MigrationState
captures the state of an individual migration, but --only-migratable
isn't a property of an individual migration, it's a restriction on
QEMU configuration.  With fault tolerance, we could have several
migrations at once.  --only-migratable would certainly protect all of
them.  Storing it in MigrationState feels inappropriate.

Second, it contributes to a dependency cycle that manifests itself as
a bug now.

Putting @only_migratable into MigrationState means its available only
after migration_object_init().

We can't set it before migration_object_init(), so we delay setting it
with a global property (this is fixup commit b605c47b "migration:
fix handling for --only-migratable").

We can't get it before migration_object_init(), so anything that uses
it can only run afterwards.

Since migrate_add_blocker() needs to obey --only-migratable, any code
adding migration blockers can run only afterwards.  This contributes
to the following dependency cycle:

* configure_blockdev() must run before machine_set_property()
  so machine properties can refer to block backends

* machine_set_property() before configure_accelerator()
  so machine properties like kvm-irqchip get applied

* configure_accelerator() before migration_object_init()
  so that Xen's accelerator compat properties get applied.

* migration_object_init() before configure_blockdev()
  so configure_blockdev() can add migration blockers

The cycle was closed when recent commit cda4aa9a "Create block
backends before setting machine properties" added the first
dependency, and satisfied it by violating the last one.  Broke block
backends that add migration blockers.

Moving @only_migratable into MigrationState was a mistake.  Revert it.

This doesn't quite break the "migration_object_init() before
configure_blockdev() dependency, since migrate_add_blocker() still has
another dependency on migration_object_init().  To be addressed the
next commit.

Note that the reverted commit made -only-migratable sugar for -global
migration.only-migratable=on below the hood.  Documentation has only
ever mentioned -only-migratable.  This commit removes the arcane &
undocumented alternative to -only-migratable again.  Nobody should be
using it.

Conflicts:
	include/migration/misc.h
	migration/migration.c
	migration/migration.h
	vl.c
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Message-Id: <20190401090827.20793-3-armbru@redhat.com>
Reviewed-by: NIgor Mammedov <imammedo@redhat.com>

811f8652

26 3月, 2019 4 次提交

migration/postcopy: Update the bandwidth during postcopy · c38c1c14

由 Dr. David Alan Gilbert 提交于 3月 08, 2019

The recently added max-postcopy-bandwidth parameter is only read
at the transition from precopy->postcopy where as the older
max-bandwidth parameter updates the migration bandwidth when changed
even if the migration is already running.

Fix this discrepency so that:
  a) You can change the bandwidth during postcopy by setting
     max-postcopy-bandwidth

  b) Changing max-bandwidth during postcopy has no effect
     (it currently changes the postcopy bandwidth which isn't
     expected).

Fixes: 7e555c6c
bz: https://bugzilla.redhat.com/show_bug.cgi?id=1686321Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: NJuan Quintela <quintela@redhat.com>
Signed-off-by: NJuan Quintela <quintela@redhat.com>

c38c1c14

migration: add support for a "tls-authz" migration parameter · d2f1d29b

由 Daniel P. Berrange 提交于 2月 27, 2019

The QEMU instance that runs as the server for the migration data
transport (ie the target QEMU) needs to be able to configure access
control so it can prevent unauthorized clients initiating an incoming
migration. This adds a new 'tls-authz' migration parameter that is used
to provide the QOM ID of a QAuthZ subclass instance that provides the
access control check. This is checked against the x509 certificate
obtained during the TLS handshake.

For example, when starting a QEMU for incoming migration, it is
possible to give an example identity of the source QEMU that is
intended to be connecting later:

  $QEMU \
     -monitor stdio \
     -incoming defer \
     ...other args...

  (qemu) object_add tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
             endpoint=server,verify-peer=yes \
  (qemu) object_add authz-simple,id=auth0,identity=CN=laptop.example.com,,\
             O=Example Org,,L=London,,ST=London,,C=GB \
  (qemu) migrate_incoming tcp:localhost:9000
Reviewed-by: NJuan Quintela <quintela@redhat.com>
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Signed-off-by: NJuan Quintela <quintela@redhat.com>

d2f1d29b

multifd: Drop x- · cbfd6c95

由 Juan Quintela 提交于 2月 06, 2019

We make it supported from now on.
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NJuan Quintela <quintela@redhat.com>

cbfd6c95

multifd: Drop x-multifd-page-count parameter · efd1a1d6

由 Juan Quintela 提交于 2月 20, 2019

Libvirt don't want to expose (and explain it).  From now on we measure
the number of packages in bytes instead of pages, so it is the same
independently of architecture.  We choose the page size of x86.
Notice that in the following patch we make this variable.
Signed-off-by: NJuan Quintela <quintela@redhat.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NJuan Quintela <quintela@redhat.com>

efd1a1d6

06 3月, 2019 5 次提交

Migration/colo.c: Make COLO node running after failover · db009729

由 Zhang Chen 提交于 3月 03, 2019

Delay to close COLO for auto start VM after failover.
Signed-off-by: NZhang Chen <chen.zhang@intel.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190303145021.2962-4-chen.zhang@intel.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>

db009729

migration: Create socket-address parameter · 9aca82ba

由 Juan Quintela 提交于 2月 27, 2019

It will be used to store the uri parameters. We want this only for
tcp, so we don't set it for other uris.  We need it to know what port
is migration running.
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NJuan Quintela <quintela@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Removed DummyStruct as suggested by Eric & Markus

--

9aca82ba

migration: Introduce ignore-shared capability · 18269069

由 Yury Kotov 提交于 2月 15, 2019

We want to use local migration to update QEMU for running guests.
In this case we don't need to migrate shared (file backed) RAM.
So, add a capability to ignore such blocks during live migration.
Signed-off-by: NYury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190215174548.2630-3-yury-kotov@yandex-team.ru>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>

18269069

migration: Cleanup during exit · 892ae715

由 Dr. David Alan Gilbert 提交于 2月 27, 2019

Currently we cleanup the migration object as we exit main after the
main_loop finishes; however if there's a migration running things
get messy and we can end up with the migration thread still trying
to access freed structures.

We now take a ref to the object around the migration thread itself,
so the act of dropping the ref during exit doesn't cause us to lose
the state until the thread quits.

Cancelling the migration during migration also tries to get the thread
to quit.

We do this a bit earlier; so hopefully migration gets out of the way
before all the devices etc are freed.
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: NAlex Bennée <alex.bennee@linaro.org>
Message-Id: <20190227164900.16378-1-dgilbert@redhat.com>
Reviewed-by: NJuan Quintela <quintela@redhat.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>

892ae715

migration: Fix cancel state · c3c5eae6

由 Dr. David Alan Gilbert 提交于 2月 19, 2019

During a cancelled migration there's a race where the fd can
go into an error state before we get back around the migration loop
and migration_detect_error transitions from cancelling->failed.

Check for cancelled/cancelling and don't change the state.

Red Hat bug: https://bugzilla.redhat.com/show_bug.cgi?id=1608649

Fixes: b23c2adeSigned-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190219195928.12289-1-dgilbert@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Reviewed-by: NJuan Quintela <quintela@redhat.com>

c3c5eae6

05 3月, 2019 3 次提交

migration: Switch to using announce timer · 7659505c

由 Dr. David Alan Gilbert 提交于 2月 27, 2019

Switch the announcements to using the new announce timer.
Move the code that does it to announce.c rather than savevm
because it really has nothing to do with the actual migration.

Migration starts the announce from bh's and so they're all
in the main thread/bql, and so there's never any racing with
the timers themselves.
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>

7659505c

migration: Add announce parameters · ee3d96ba

由 Dr. David Alan Gilbert 提交于 2月 27, 2019

Add migration parameters that control RARP/GARP announcement timeouts.

Based on earlier patches by myself and
  Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>

ee3d96ba

net: Introduce announce timer · 50510ea2

由 Dr. David Alan Gilbert 提交于 2月 27, 2019

The 'announce timer' will be used by migration, and explicit
requests for qemu to perform network announces.

Based on the work by Germano Veit Michel <germano@redhat.com>
 and Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>

50510ea2

23 1月, 2019 4 次提交

migration: introduce pages-per-second · aecbfe9c

由 Xiao Guangrong 提交于 1月 11, 2019

It introduces a new statistic, pages-per-second, as bandwidth or mbps is
not enough to measure the performance of posting pages out as we have
compression, xbzrle, which can significantly reduce the amount of the
data size, instead, pages-per-second is the one we want
Signed-off-by: NXiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20190111063732.10484-2-xiaoguangrong@tencent.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
  With typo's Eric spotted fixed

aecbfe9c

migration: unify error handling for process_incoming_migration_co · 6d99c2d4

由 Fei Li 提交于 1月 13, 2019

In the current code, if process_incoming_migration_co() fails we do
the same error handing: set the error state, close the source file,
do the cleanup for multifd, and then exit(EXIT_FAILURE). To make the
code clearer, add a "goto fail" to unify the error handling.

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NFei Li <fli@suse.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190113140849.38339-6-lifei1214@126.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>

6d99c2d4

migration: multifd_save_cleanup() can't fail, simplify · 1398b2e3

由 Fei Li 提交于 1月 13, 2019

multifd_save_cleanup() takes an Error ** argument and returns an
error code even though it can't actually fail.  Its callers
dutifully check for failure.  Remove the useless argument and return
value, and simplify the callers.

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: NFei Li <fli@suse.com>
Reviewed-by: NJuan Quintela <quintela@redhat.com>
Message-Id: <20190113140849.38339-4-lifei1214@126.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>

1398b2e3

migration: fix the multifd code when receiving less channels · 49ed0d24

由 Fei Li 提交于 1月 13, 2019

In our current code, when multifd is used during migration, if there
is an error before the destination receives all new channels, the
source keeps running, however the destination does not exit but keeps
waiting until the source is killed deliberately.

Fix this by dumping the specific error and let users decide whether
to quit from the destination side when failing to receive packet via
some channel. And update the comment for multifd_recv_new_channel().

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: NFei Li <fli@suse.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20190113140849.38339-3-lifei1214@126.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>

49ed0d24

18 12月, 2018 1 次提交

qmp hmp: Make system_wakeup check wake-up support and run state · fb064112

由 Daniel Henrique Barboza 提交于 12月 05, 2018

The qmp/hmp command 'system_wakeup' is simply a direct call to
'qemu_system_wakeup_request' from vl.c. This function verifies if
runstate is SUSPENDED and if the wake up reason is valid before
proceeding. However, no error or warning is thrown if any of those
pre-requirements isn't met. There is no way for the caller to
differentiate between a successful wakeup or an error state caused
when trying to wake up a guest that wasn't suspended.

This means that system_wakeup is silently failing, which can be
considered a bug. Adding error handling isn't an API break in this
case - applications that didn't check the result will remain broken,
the ones that check it will have a chance to deal with it.

Adding to that, the commit before previous created a new QMP API called
query-current-machine, with a new flag called wakeup-suspend-support,
that indicates if the guest has the capability of waking up from suspended
state. Although such guest will never reach SUSPENDED state and erroring
it out in this scenario would suffice, it is more informative for the user
to differentiate between a failure because the guest isn't suspended versus
a failure because the guest does not have support for wake up at all.

All this considered, this patch changes qmp_system_wakeup to check if
the guest is capable of waking up from suspend, and if it is suspended.
After this patch, this is the output of system_wakeup in a guest that
does not have wake-up from suspend support (ppc64):

(qemu) system_wakeup
wake-up from suspend is not supported by this guest
(qemu)

And this is the output of system_wakeup in a x86 guest that has the
support but isn't suspended:

(qemu) system_wakeup
Unable to wake up: guest is not in suspended state
(qemu)
Reported-by: NBalamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: NDaniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20181205194701.17836-4-danielhb413@gmail.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Acked-by: NEduardo Habkost <ehabkost@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>

fb064112

21 11月, 2018 1 次提交

migration/migration.c: Add COLO dependency checks · 7e934f5b

由 Zhang Chen 提交于 11月 15, 2018

Current COLO mode(independent disk mode) need replication module work
together. Suggested by Dr. David Alan Gilbert <dgilbert@redhat.com>.
Signed-off-by: NZhang Chen <chen.zhang@intel.com>
Message-Id: <20181114190912.7242-1-chen.zhang@intel.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>

7e934f5b

31 10月, 2018 1 次提交

migration: avoid segmentfault when take a snapshot of a VM which being migrated · 3d63da16

由 Jia Lina 提交于 10月 26, 2018

During an active background migration, snapshot will trigger a
segmentfault. As snapshot clears the "current_migration" struct
and updates "to_dst_file" before it finds out that there is a
migration task, Migration accesses the null pointer in
"current_migration" struct and qemu crashes eventually.
Signed-off-by: NJia Lina <jialina01@baidu.com>
Signed-off-by: NChai Wen <chaiwen@baidu.com>
Signed-off-by: NZhang Yu <zhangyu31@baidu.com>
Message-Id: <20181026083620.10172-1-jialina01@baidu.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>

3d63da16

19 10月, 2018 5 次提交

error: Fix use of error_prepend() with &error_fatal, &error_abort · 4b576648

由 Markus Armbruster 提交于 10月 17, 2018

From include/qapi/error.h:

  * Pass an existing error to the caller with the message modified:
  *     error_propagate(errp, err);
  *     error_prepend(errp, "Could not frobnicate '%s': ", name);

Fei Li pointed out that doing error_propagate() first doesn't work
well when @errp is &error_fatal or &error_abort: the error_prepend()
is never reached.

Since I doubt fixing the documentation will stop people from getting
it wrong, introduce error_propagate_prepend(), in the hope that it
lures people away from using its constituents in the wrong order.
Update the instructions in error.h accordingly.

Convert existing error_prepend() next to error_propagate to
error_propagate_prepend().  If any of these get reached with
&error_fatal or &error_abort, the error messages improve.  I didn't
check whether that's the case anywhere.

Cc: Fei Li <fli@suse.com>
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20181017082702.5581-2-armbru@redhat.com>

4b576648

COLO: Load dirty pages into SVM's RAM cache firstly · 13af18f2

由 Zhang Chen 提交于 9月 03, 2018

We should not load PVM's state directly into SVM, because there maybe some
errors happen when SVM is receving data, which will break SVM.

We need to ensure receving all data before load the state into SVM. We use
an extra memory to cache these data (PVM's ram). The ram cache in secondary side
is initially the same as SVM/PVM's memory. And in the process of checkpoint,
we cache the dirty pages of PVM into this ram cache firstly, so this ram cache
always the same as PVM's memory at every checkpoint, then we flush this cached ram
to SVM after we receive all PVM's state.
Signed-off-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: NLi Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: NZhang Chen <zhangckid@gmail.com>
Signed-off-by: NZhang Chen <chen.zhang@intel.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>

13af18f2

COLO: Remove colo_state migration struct · aad555c2

由 Zhang Chen 提交于 9月 03, 2018

We need to know if migration is going into COLO state for
incoming side before start normal migration.

Instead by using the VMStateDescription to send colo_state
from source side to destination side, we use MIG_CMD_ENABLE_COLO
to indicate whether COLO is enabled or not.
Signed-off-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: NZhang Chen <zhangckid@gmail.com>
Signed-off-by: NZhang Chen <chen.zhang@intel.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>

aad555c2

COLO: Add block replication into colo process · 8e48ac95

由 Zhang Chen 提交于 9月 03, 2018

Make sure master start block replication after slave's block
replication started.

Besides, we need to activate VM's blocks before goes into
COLO state.
Signed-off-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: NLi Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: NZhang Chen <zhangckid@gmail.com>
Signed-off-by: NZhang Chen <chen.zhang@intel.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>

8e48ac95

COLO: integrate colo compare with colo frame · 131b2153

由 Zhang Chen 提交于 9月 03, 2018

For COLO FT, both the PVM and SVM run at the same time,
only sync the state while it needs.

So here, let SVM runs while not doing checkpoint, change
DEFAULT_MIGRATE_X_CHECKPOINT_DELAY to 200*100.

Besides, we forgot to release colo_checkpoint_semd and
colo_delay_timer, fix them here.
Signed-off-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: NZhang Chen <zhangckid@gmail.com>
Signed-off-by: NZhang Chen <chen.zhang@intel.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>

131b2153

27 9月, 2018 1 次提交

migration: fix QEMUFile leak · 0284a2a8

由 Marc-André Lureau 提交于 9月 25, 2018

Spotted by ASAN while running:

$ tests/migration-test -p /x86_64/migration/postcopy/recovery

=================================================================
==18034==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 33864 byte(s) in 1 object(s) allocated from:
    #0 0x7f3da7f31e50 in calloc (/lib64/libasan.so.5+0xeee50)
    #1 0x7f3da644441d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d)
    #2 0x55af9db15440 in qemu_fopen_channel_input /home/elmarco/src/qemu/migration/qemu-file-channel.c:183
    #3 0x55af9db15413 in channel_get_output_return_path /home/elmarco/src/qemu/migration/qemu-file-channel.c:159
    #4 0x55af9db0d4ac in qemu_file_get_return_path /home/elmarco/src/qemu/migration/qemu-file.c:78
    #5 0x55af9dad5e4f in open_return_path_on_source /home/elmarco/src/qemu/migration/migration.c:2295
    #6 0x55af9dadb3bf in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:3111
    #7 0x55af9dae1bf3 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:91
    #8 0x55af9daddeca in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:108
    #9 0x55af9e13d3db in qio_task_complete /home/elmarco/src/qemu/io/task.c:158
    #10 0x55af9e13ca03 in qio_task_thread_result /home/elmarco/src/qemu/io/task.c:89
    #11 0x7f3da643b1ca in g_idle_dispatch gmain.c:5535
Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180925092245.29565-1-marcandre.lureau@redhat.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>

0284a2a8

26 9月, 2018 1 次提交

migration: show the statistics of compression · 76e03000

由 Xiao Guangrong 提交于 9月 06, 2018

Currently, it includes:
pages: amount of pages compressed and transferred to the target VM
busy: amount of count that no free thread to compress data
busy-rate: rate of thread busy
compressed-size: amount of bytes after compression
compression-rate: rate of compressed size
Reviewed-by: NJuan Quintela <quintela@redhat.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180906070101.27280-3-xiaoguangrong@tencent.com>
Signed-off-by: NJuan Quintela <quintela@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>

76e03000

29 8月, 2018 1 次提交

qapi: Drop qapi_event_send_FOO()'s Error ** argument · 3ab72385

由 Peter Xu 提交于 8月 15, 2018

The generated qapi_event_send_FOO() take an Error ** argument.  They
can't actually fail, because all they do with the argument is passing it
to functions that can't fail: the QObject output visitor, and the
@qmp_emit callback, which is either monitor_qapi_event_queue() or
event_test_emit().

Drop the argument, and pass &error_abort to the QObject output visitor
and @qmp_emit instead.
Suggested-by: NEric Blake <eblake@redhat.com>
Suggested-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20180815133747.25032-4-peterx@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
[Commit message rewritten, update to qapi-code-gen.txt corrected]
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>

3ab72385

22 8月, 2018 4 次提交

migration: do not wait for free thread · 1d58872a

由 Xiao Guangrong 提交于 8月 21, 2018

Instead of putting the main thread to sleep state to wait for
free compression thread, we can directly post it out as normal
page that reduces the latency and uses CPUs more efficiently

A parameter, compress-wait-thread, is introduced, it can be
enabled if the user really wants the old behavior
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@tencent.com>
Reviewed-by: NJuan Quintela <quintela@redhat.com>
Signed-off-by: NJuan Quintela <quintela@redhat.com>

1d58872a

migration: poll the cm event for destination qemu · 92370989

由 Lidong Chen 提交于 8月 06, 2018

The destination qemu only poll the comp_channel->fd in
qemu_rdma_wait_comp_channel. But when source qemu disconnnect
the rdma connection, the destination qemu should be notified.
Signed-off-by: NLidong Chen <lidongchen@tencent.com>
Reviewed-by: NJuan Quintela <quintela@redhat.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NJuan Quintela <quintela@redhat.com>

92370989

migration: implement bi-directional RDMA QIOChannel · 74637e6f

由 Lidong Chen 提交于 8月 06, 2018

This patch implements bi-directional RDMA QIOChannel. Because different
threads may access RDMAQIOChannel currently, this patch use RCU to protect it.
Signed-off-by: NLidong Chen <lidongchen@tencent.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: NJuan Quintela <quintela@redhat.com>
Signed-off-by: NJuan Quintela <quintela@redhat.com>

74637e6f

migrate/cpu-throttle: Add max-cpu-throttle migration parameter · 4cbc9c7f

由 Li Qiang 提交于 8月 01, 2018

Currently, the default maximum CPU throttle for migration is
99(CPU_THROTTLE_PCT_MAX). This is too big and can make a remarkable
performance effect for the guest. We see a lot of packets latency
exceed 500ms when the CPU_THROTTLE_PCT_MAX reached. This patch set
adds a new max-cpu-throttle parameter to limit the CPU throttle.
Signed-off-by: NLi Qiang <liq3ea@gmail.com>
Reviewed-by: NJuan Quintela <quintela@redhat.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NJuan Quintela <quintela@redhat.com>

4cbc9c7f

25 7月, 2018 3 次提交

migration: fix duplicate initialization for expected_downtime and cleanup_bh · 4b3fb65d

由 Lidong Chen 提交于 7月 24, 2018

migrate_fd_connect duplicate initialize expected_downtime and cleanup_bh.
Signed-off-by: NLidong Chen <lidongchen@tencent.com>
Message-Id: <1532434585-14732-2-git-send-email-lidongchen@tencent.com>
Reviewed-by: NJuan Quintela <quintela@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>

4b3fb65d

migration: disallow recovery for release-ram · 97ca211c

由 Peter Xu 提交于 7月 23, 2018

Postcopy recovery won't work well with release-ram capability since
release-ram will drop the page buffer as long as the page is put into
the send buffer.  So if there is a network failure happened, any page
buffers that have not yet reached the destination VM but have already
been sent from the source VM will be lost forever.  Let's refuse the
client from resuming such a postcopy migration.  Luckily release-ram was
designed to only be used when src and destination VMs are on the same
host, so it should be fine.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20180723123305.24792-3-peterx@redhat.com>
Reviewed-by: NJuan Quintela <quintela@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>

97ca211c

migrate: Fix cancelling state warning · 57225e5f

由 Dr. David Alan Gilbert 提交于 7月 19, 2018

We've been getting the warning:

migration_iteration_finish: Unknown ending state 2

on a cancel.

I think that's originally due to 39b9e179;  although
I've only seen the warning, I think that in some cases
that we could find the VM stays paused after a cancel where
it should restart.
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20180719092257.12703-1-dgilbert@redhat.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Reviewed-by: NJuan Quintela <quintela@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>

57225e5f

10 7月, 2018 2 次提交

migration: show pause/recover state on dst host · 3c9928d9

由 Peter Xu 提交于 7月 10, 2018

These two states will be missing when doing "query-migrate" on
destination VM.  Add these states so that we can get the query results
as expected.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20180710091902.28780-5-peterx@redhat.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>

3c9928d9

migration: unify incoming processing · a429e7f4

由 Peter Xu 提交于 6月 27, 2018

This is the 2nd patch to unbreak postcopy recovery.

Let's unify the migration_incoming_process() call at a single place
rather than calling it in connection setup codes.  This fixes a problem
that we will go into incoming migration procedure even if we are trying
to recovery from a paused postcopy migration.

Fixes: 36c2f8be ("migration: Delay start of migration main routines")
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20180627132246.5576-5-peterx@redhat.com>
Reviewed-by: NJuan Quintela <quintela@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>

a429e7f4