- 10 7月, 2018 4 次提交
-
-
由 Peter Xu 提交于
These two states will be missing when doing "query-migrate" on destination VM. Add these states so that we can get the query results as expected. Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180710091902.28780-5-peterx@redhat.com> Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
-
由 Peter Xu 提交于
This is the 2nd patch to unbreak postcopy recovery. Let's unify the migration_incoming_process() call at a single place rather than calling it in connection setup codes. This fixes a problem that we will go into incoming migration procedure even if we are trying to recovery from a paused postcopy migration. Fixes: 36c2f8be ("migration: Delay start of migration main routines") Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180627132246.5576-5-peterx@redhat.com> Reviewed-by: NJuan Quintela <quintela@redhat.com> Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
-
由 Peter Xu 提交于
The whole postcopy recovery logic was accidentally broken. We need to fix it in two steps. This is the first step that we should do the recovery when needed. It was bypassed before after commit 36c2f8be. Introduce postcopy_try_recovery() helper for the postcopy recovery logic. Call it both in migration_fd_process_incoming() and migration_ioc_process_incoming(). Fixes: 36c2f8be ("migration: Delay start of migration main routines") Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180627132246.5576-4-peterx@redhat.com> Reviewed-by: NJuan Quintela <quintela@redhat.com> Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
-
由 Peter Xu 提交于
Move the call to migration_incoming_process() out of multifd code. It's a bit strange that we can migration generic calls in multifd code. Instead, let multifd_recv_new_channel() return a boolean showing whether it's ready to continue the incoming migration. Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180627132246.5576-3-peterx@redhat.com> Reviewed-by: NJuan Quintela <quintela@redhat.com> Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
-
- 27 6月, 2018 5 次提交
-
-
由 Daniel P. Berrangé 提交于
The way we determine if we can start the incoming migration was changed to use migration_has_all_channels() in: commit 428d8908 Author: Juan Quintela <quintela@redhat.com> Date: Mon Jul 24 13:06:25 2017 +0200 migration: Create migration_has_all_channels This method in turn calls multifd_recv_all_channels_created() which is hardcoded to always return 'true' when multifd is not in use. This is a latent bug... ...activated in a following commit where that return result ends up acting as the flag to indicate whether it is possible to start processing the migration: commit 36c2f8be Author: Juan Quintela <quintela@redhat.com> Date: Wed Mar 7 08:40:52 2018 +0100 migration: Delay start of migration main routines This means that if channel initialization fails with normal migration, it'll never notice and attempt to start the incoming migration regardless and crash on a NULL pointer. This can be seen, for example, if a client connects to a server requiring TLS, but has an invalid x509 certificate: qemu-system-x86_64: The certificate hasn't got a known issuer qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: Assertion `mis->from_src_file' failed. #0 0x00007fffebd24f2b in raise () at /lib64/libc.so.6 #1 0x00007fffebd0f561 in abort () at /lib64/libc.so.6 #2 0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 #3 0x00007fffebd1d692 in () at /lib64/libc.so.6 #4 0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:386 #5 0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116 #6 0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6 #7 0x0000000000000000 in () To handle the non-multifd case, we check whether mis->from_src_file is non-NULL. With this in place, the migration server drops the rejected client and stays around waiting for another, hopefully valid, client to arrive. Signed-off-by: NDaniel P. Berrangé <berrange@redhat.com> Message-Id: <20180619163552.18206-1-berrange@redhat.com> Reviewed-by: NJuan Quintela <quintela@redhat.com> Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
由 Juan Quintela 提交于
The function still don't use multifd, but we have simplified ram_save_page, xbzrle and RDMA stuff is gone. We have added a new counter. Signed-off-by: NJuan Quintela <quintela@redhat.com> Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> -- Add last_page parameter Add commets for done and address Remove multifd field, it is the same than normal pages Merge next patch, now we send multiple pages at a time Remove counter for multifd pages, it is identical to normal pages Use iovec's instead of creating the equivalent. Clear memory used by pages (dave) Use g_new0(danp) define MULTIFD_CONTINUE now pages member is a pointer Fix off-by-one in number of pages in one packet Remove RAM_SAVE_FLAG_MULTIFD_PAGE s/multifd_pages_t/MultiFDPages_t/ add comment explaining what it means
-
由 Juan Quintela 提交于
This will include how many bytes they are sent through multifd. Signed-off-by: NJuan Quintela <quintela@redhat.com> Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
-
由 Juan Quintela 提交于
Right now we use the "position" inside the QEMUFile, but things like RDMA already do weird things to be able to maintain that counter right, and multifd will have some similar problems. Signed-off-by: NJuan Quintela <quintela@redhat.com> Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
-
由 Juan Quintela 提交于
We used to include in this calculation the setup time, but that can be quite big in rdma or multifd. Signed-off-by: NJuan Quintela <quintela@redhat.com> Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
-
- 15 6月, 2018 3 次提交
-
-
由 Balamuruhan S 提交于
expected_downtime value is not accurate with dirty_pages_rate * page_size, using ram_bytes_remaining() would yeild it resonable. consider to read the remaining ram just after having updated the dirty pages count later migration_bitmap_sync_range() in migration_bitmap_sync() and reuse the `remaining` field in ram_counters to hold ram_bytes_remaining() for calculating expected_downtime. Reported-by: NMichael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: NBalamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: NLaurent Vivier <lvivier@redhat.com> Message-Id: <20180612085009.17594-2-bala24@linux.vnet.ibm.com> Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
-
由 Dr. David Alan Gilbert 提交于
Rate limiting sleeps the migration thread for a while when it runs out of bandwidth; but sometimes we want to wake up to get on with something more urgent (like a postcopy request). Here we use a semaphore with a timedwait instead of a simple sleep; Incrementing the sempahore will wake it up sooner. Anything that consumes these urgent events must decrement the sempahore. Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180613102642.23995-3-dgilbert@redhat.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
-
由 Dr. David Alan Gilbert 提交于
Limit the background transfer bandwidth during the postcopy phase to the value set on this new parameter. The default, 0, corresponds to the existing behaviour which is unlimited bandwidth. Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180613102642.23995-2-dgilbert@redhat.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
-
- 04 6月, 2018 2 次提交
-
-
由 Dr. David Alan Gilbert 提交于
Activating the block devices causes the locks to be taken on the backing file. If we're running with -S and the destination libvirt hasn't started the destination with 'cont', it's expecting the locks are still untaken. Don't activate the block devices if we're not going to autostart the VM; 'cont' already will do that anyway. This change is tied to the new migration capability 'late-block-activate' that defaults to off, keeping the old behaviour by default. bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: NJuan Quintela <quintela@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
由 Xiao Guangrong 提交于
QEMU 3.0 enables strict check for compression & decompression to make the migration more robust, that depends on the source to fix the internal design which triggers the unexpected error conditions To make it work for migrating old version QEMU to 2.13 QEMU, we introduce this parameter to disable the error check on the destination which is the default behavior of the machine type which is older than 2.13, alternately, the strict check can be enabled explicitly as followings: -M pc-q35-2.11 -global migration.decompress-error-check=true Signed-off-by: NXiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: NJuan Quintela <quintela@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
- 16 5月, 2018 21 次提交
-
-
由 Peter Xu 提交于
It pauses an ongoing migration. Currently it only supports postcopy. Note that this command will work on either side of the migration. Basically when we trigger this on one side, it'll interrupt the other side as well since the other side will get notified on the disconnect event. However, it's still possible that the other side is not notified, for example, when the network is totally broken, or due to some firewall configuration changes. In that case, we will also need to run the same command on the other side so both sides will go into the paused state. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-24-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com> --- s/2.12/2.13/
-
由 Peter Xu 提交于
Let's introduce a lock for that QEMUFile since we are going to operate on it in multiple threads. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-23-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
由 Peter Xu 提交于
The first allow-oob=true command. It's used on destination side when the postcopy migration is paused and ready for a recovery. After execution, a new migration channel will be established for postcopy to continue. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-21-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com> --- s/2.12/2.13/
-
由 Peter Xu 提交于
Though we may not need it, now we init both the src/dst migration objects in migration_object_init() so that even incoming migration object would be thread safe (it was not). Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-20-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
由 Peter Xu 提交于
Finish the last step to do the final handshake for the recovery. First source sends one MIG_CMD_RESUME to dst, telling that source is ready to resume. Then, dest replies with MIG_RP_MSG_RESUME_ACK to source, telling that dest is ready to resume (after switch to postcopy-active state). When source received the RESUME_ACK, it switches its state to postcopy-active, and finally the recovery is completed. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-19-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
由 Peter Xu 提交于
This patch implements the first part of core RAM resume logic for postcopy. ram_resume_prepare() is provided for the work. When the migration is interrupted by network failure, the dirty bitmap on the source side will be meaningless, because even the dirty bit is cleared, it is still possible that the sent page was lost along the way to destination. Here instead of continue the migration with the old dirty bitmap on source, we ask the destination side to send back its received bitmap, then invert it to be our initial dirty bitmap. The source side send thread will issue the MIG_CMD_RECV_BITMAP requests, once per ramblock, to ask for the received bitmap. On destination side, MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap. Data will be received on the return-path thread of source, and the main migration thread will be notified when all the ramblock bitmaps are synchronized. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-17-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
由 Peter Xu 提交于
This is hook function to be called when a postcopy migration wants to resume from a failure. For each module, it should provide its own recovery logic before we switch to the postcopy-active state. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-16-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
由 Peter Xu 提交于
Creating new message to reply for MIG_CMD_POSTCOPY_RESUME. One uint32_t is used as payload to let the source know whether destination is ready to continue the migration. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-15-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
由 Peter Xu 提交于
Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send received bitmap of ramblock back to source. This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only the header (including the ramblock name), and it was appended with the whole ramblock received bitmap on the destination side. When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP), it parses it, convert it to the dirty bitmap by inverting the bits. One thing to mention is that, when we send the recv bitmap, we are doing these things in extra: - converting the bitmap to little endian, to support when hosts are using different endianess on src/dst. - do proper alignment for 8 bytes, to support when hosts are using different word size (32/64 bits) on src/dst. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-13-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
由 Peter Xu 提交于
On the destination side, we cannot wake up all the threads when we got reconnected. The first thing to do is to wake up the main load thread, so that we can continue to receive valid messages from source again and reply when needed. At this point, we switch the destination VM state from postcopy-paused back to postcopy-recover. Now we are finally ready to do the resume logic. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-11-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
由 Peter Xu 提交于
Introducing new migration state "postcopy-recover". If a migration procedure is paused and the connection is rebuilt afterward successfully, we'll switch the source VM state from "postcopy-paused" to the new state "postcopy-recover", then we'll do the resume logic in the migration thread (along with the return path thread). This patch only do the state switch on source side. Another following up patch will handle the state switching on destination side using the same status bit. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-10-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com> --- s/2.11/2.13/
-
由 Peter Xu 提交于
This patch detects the "resume" flag of migration command, rebuild the channels only if the flag is set. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-9-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
由 Peter Xu 提交于
It will be used when we want to resume one paused migration. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-8-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com> --- s/2.12/2.13/
-
由 Peter Xu 提交于
Allows the fault thread to stop handling page faults temporarily. When network failure happened (and if we expect a recovery afterwards), we should not allow the fault thread to continue sending things to source, instead, it should halt for a while until the connection is rebuilt. When the dest main thread noticed the failure, it kicks the fault thread to switch to pause state. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-7-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
由 Peter Xu 提交于
Let the thread pause for network issues. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-6-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
由 Peter Xu 提交于
When there is IO error on the incoming channel (e.g., network down), instead of bailing out immediately, we allow the dst vm to switch to the new POSTCOPY_PAUSE state. Currently it is still simple - it waits the new semaphore, until someone poke it for another attempt. One note is that here on ram loading thread we cannot detect the POSTCOPY_ACTIVE state, but we need to detect the more specific POSTCOPY_INCOMING_RUNNING state, to make sure we have already loaded all the device states. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-5-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
由 Peter Xu 提交于
Now when network down for postcopy, the source side will not fail the migration. Instead we convert the status into this new paused state, and we will try to wait for a rescue in the future. If a recovery is detected, migration_thread() will reset its local variables to prepare for that. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-4-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
由 Peter Xu 提交于
Introducing a new state "postcopy-paused", which can be used when the postcopy migration is paused. It is targeted for postcopy network failure recovery. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: NJuan Quintela <quintela@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-3-peterx@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com>
-
由 Juan Quintela 提交于
We need to make sure that we have started all the multifd threads. Signed-off-by: NJuan Quintela <quintela@redhat.com> Reviewed-by: NDaniel P. Berrangé <berrange@redhat.com>
-
由 Juan Quintela 提交于
We need them before we start migration. Signed-off-by: NJuan Quintela <quintela@redhat.com> Reviewed-by: NDaniel P. Berrangé <berrange@redhat.com>
-
由 Juan Quintela 提交于
Signed-off-by: NJuan Quintela <quintela@redhat.com> Reviewed-by: NDaniel P. Berrangé <berrange@redhat.com>
-
- 04 5月, 2018 1 次提交
-
-
由 Marc-André Lureau 提交于
Now that we can safely call QOBJECT() on QObject * as well as its subtypes, we can have macros qobject_ref() / qobject_unref() that work everywhere instead of having to use QINCREF() / QDECREF() for QObject and qobject_incref() / qobject_decref() for its subtypes. The replacement is mechanical, except I broke a long line, and added a cast in monitor_qmp_cleanup_req_queue_locked(). Unlike qobject_decref(), qobject_unref() doesn't accept void *. Note that the new macros evaluate their argument exactly once, thus no need to shout them. Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Message-Id: <20180419150145.24795-4-marcandre.lureau@redhat.com> Reviewed-by: NMarkus Armbruster <armbru@redhat.com> [Rebased, semantic conflict resolved, commit message improved] Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
-
- 26 4月, 2018 2 次提交
-
-
由 Alexey Perevalov 提交于
Postcopy total blocktime is available on destination side only. But query-migrate was possible only for source. This patch adds ability to call query-migrate on destination. To be able to see postcopy blocktime, need to request postcopy-blocktime capability. The query-migrate command will show following sample result: {"return": "postcopy-vcpu-blocktime": [115, 100], "status": "completed", "postcopy-blocktime": 100 }} postcopy_vcpu_blocktime contains list, where the first item is the first vCPU in QEMU. This patch has a drawback, it combines states of incoming and outgoing migration. Ongoing migration state will overwrite incoming state. Looks like better to separate query-migrate for incoming and outgoing migration or add parameter to indicate type of migration. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NAlexey Perevalov <a.perevalov@samsung.com> Reviewed-by: NJuan Quintela <quintela@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com> Message-Id: <1521742647-25550-7-git-send-email-a.perevalov@samsung.com> Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
-
由 Alexey Perevalov 提交于
Right now it could be used on destination side to enable vCPU blocktime calculation for postcopy live migration. vCPU blocktime - it's time since vCPU thread was put into interruptible sleep, till memory page was copied and thread awake. Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: NAlexey Perevalov <a.perevalov@samsung.com> Reviewed-by: NJuan Quintela <quintela@redhat.com> Signed-off-by: NJuan Quintela <quintela@redhat.com> Message-Id: <1521742647-25550-2-git-send-email-a.perevalov@samsung.com> Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
-
- 10 4月, 2018 1 次提交
-
-
由 Dr. David Alan Gilbert 提交于
This reverts commit 0746a926. Discussion with kwolf suggests this is actually an API change that we need to gate on a capability. Push to 2.13. Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
-
- 29 3月, 2018 1 次提交
-
-
由 Dr. David Alan Gilbert 提交于
Activating the block devices causes the locks to be taken on the backing file. If we're running with -S and the destination libvirt hasn't started the destination with 'cont', it's expecting the locks are still untaken. Don't activate the block devices if we're not going to autostart the VM; 'cont' already will do that anyway. bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180328170207.49512-1-dgilbert@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
-