提交 · e49f827725d53d2fb1b8ec42db96c442d0caf6cd · openeuler / qemu

13 9月, 2016 18 次提交

由 Laurent Vivier 提交于 8月 27, 2016

vq->avail.idx and vq->avail->ring[] are a 16bit values,
so read and write them with readw()/writew() instead of
readl()/writel().

To read/write a 16bit value with a 32bit accessor works fine
on little-endian CPU but not on big endian CPU.

[An equivalent patch for the writew() calls was also sent by
Zhang Shuai <zhangshuai13@huawei.com>.
--Stefan]
Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
Message-id: 1472330054-22607-1-git-send-email-lvivier@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

e49f8277

MAINTAINERS: add maintainer for replication · 049105a3

由 Changlong Xie 提交于 7月 27, 2016

As per Stefan's suggestion, add Wen and I as co-maintainers
of replication.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Message-id: 1469602913-20979-13-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

049105a3

support replication driver in blockdev-add · 82ac5543

由 Wen Congyang 提交于 7月 27, 2016

Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: NWang WeiWei <wangww.fnst@cn.fujitsu.com>
Signed-off-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-id: 1469602913-20979-12-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

82ac5543

tests: add unit test case for replication · b3110466

由 Changlong Xie 提交于 7月 27, 2016

[Rename get_error test cases to get_error_all to avoid tripping up
scripts that grep for "error:" in test output.  It also reflects the
actual replication API function name better.
-Stefan]
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: NWang WeiWei <wangww.fnst@cn.fujitsu.com>
Message-id: 1469602913-20979-11-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

b3110466

replication: Implement new driver for block replication · 29ff7890

由 Wen Congyang 提交于 7月 27, 2016

Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: NWang WeiWei <wangww.fnst@cn.fujitsu.com>
Signed-off-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Message-id: 1469602913-20979-10-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

29ff7890

replication: Introduce new APIs to do replication operation · 190b9a8b

由 Changlong Xie 提交于 7月 27, 2016

This commit introduces six replication interfaces(for block, network etc).
Firstly we can use replication_(new/remove) to create/destroy replication
instances, then in migration we can use replication_(start/stop/do_checkpoint
/get_error)_all to handle all replication operations. More detail please
refer to replication.h
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: NWang WeiWei <wangww.fnst@cn.fujitsu.com>
Signed-off-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Message-id: 1469602913-20979-9-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

190b9a8b

configure: support replication · a6b1d4c0

由 Changlong Xie 提交于 7月 27, 2016

configure --(enable/disable)-replication to switch replication
support on/off, and it is on by default.
We later introduce replation support.
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: NWang WeiWei <wangww.fnst@cn.fujitsu.com>
Message-id: 1469602913-20979-8-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

a6b1d4c0

mirror: auto complete active commit · b49f7ead

由 Wen Congyang 提交于 7月 27, 2016

Auto complete mirror job in background to prevent from
blocking synchronously
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: NWang WeiWei <wangww.fnst@cn.fujitsu.com>
Message-id: 1469602913-20979-7-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

b49f7ead

docs: block replication's description · 68365a38

由 Wen Congyang 提交于 7月 27, 2016

Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: NWang WeiWei <wangww.fnst@cn.fujitsu.com>
Signed-off-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Message-id: 1469602913-20979-6-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

68365a38

block: Link backup into block core · 258854ad

由 Wen Congyang 提交于 7月 27, 2016

Some programs that add a dependency on it will use
the block layer directly.
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: NWang WeiWei <wangww.fnst@cn.fujitsu.com>
Signed-off-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NJeff Cody <jcody@redhat.com>
Message-id: 1469602913-20979-5-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

258854ad

Backup: export interfaces for extra serialization · a8bbee0e

由 Changlong Xie 提交于 7月 27, 2016

Normal backup(sync='none') workflow:
step 1. NBD peformance I/O write from client to server
   qcow2_co_writev
    bdrv_co_writev
     ...
       bdrv_aligned_pwritev
        notifier_with_return_list_notify -> backup_do_cow
         bdrv_driver_pwritev // write new contents

step 2. drive-backup sync=none
   backup_do_cow
   {
    wait_for_overlapping_requests
    cow_request_begin
    for(; start < end; start++) {
            bdrv_co_readv_no_serialising //read old contents from Secondary disk
            bdrv_co_writev // write old contents to hidden-disk
    }
    cow_request_end
   }

step 3. Then roll back to "step 1" to write new contents to Secondary disk.

And for replication, we must make sure that we only read the old contents from
Secondary disk in order to keep contents consistent.

1) Replication workflow of Secondary
                                                         virtio-blk
                                                              ^
------->  1 NBD                                               |
   ||     server                                       3 replication
   ||        ^                                                ^
   ||        |           backing                 backing      |
   ||  Secondary disk 6<-------- hidden-disk 5 <-------- active-disk 4
   ||        |                         ^
   ||        '-------------------------'
   ||           drive-backup sync=none 2

Hence, we need these interfaces to implement coarse-grained serialization between
COW of Secondary disk and the read operation of replication.

Example codes about how to use them:

*#include "block/block_backup.h"

static coroutine_fn int xxx_co_readv()
{
        CowRequest req;
        BlockJob *job = secondary_disk->bs->job;

        if (job) {
              backup_wait_for_overlapping_requests(job, start, end);
              backup_cow_request_begin(&req, job, start, end);
              ret = bdrv_co_readv();
              backup_cow_request_end(&req);
              goto out;
        }
        ret = bdrv_co_readv();
out:
        return ret;
}
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NWang WeiWei <wangww.fnst@cn.fujitsu.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Message-id: 1469602913-20979-4-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

a8bbee0e

Backup: clear all bitmap when doing block checkpoint · 49d3e828

由 Wen Congyang 提交于 7月 27, 2016

Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: NWang WeiWei <wangww.fnst@cn.fujitsu.com>
Signed-off-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Message-id: 1469602913-20979-3-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

49d3e828

block: unblock backup operations in backing file · e9d6456e

由 Wen Congyang 提交于 7月 27, 2016

Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: NWang WeiWei <wangww.fnst@cn.fujitsu.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Kashyap Chamarthy <kchamart@redhat.com>
Message-id: 1469602913-20979-2-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

e9d6456e

virtio-blk: rename virtio_device_info to virtio_blk_info · b5c7ceaf

由 Changlong Xie 提交于 8月 03, 2016

The old one is confusing with @virtio_device_info in virtio.c,
so make it more appropriate.
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Message-id: 1470214147-32560-1-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

b5c7ceaf

linux-aio: process completions from ioq_submit() · 0ed93d84

由 Roman Pen 提交于 7月 19, 2016

In order to reduce completion latency it makes sense to harvest completed
requests ASAP.  Very fast backend device can complete requests just after
submission, so it is worth trying to check ring buffer in order to peek
completed requests directly after io_submit() has been called.

Indeed, this patch reduces the completions latencies and increases the
overall throughput, e.g. the following is the percentiles of number of
completed requests at once:

        1th 10th  20th  30th  40th  50th  60th  70th  80th  90th  99.99th
Before    2    4    42   112   128   128   128   128   128   128    128
 After    1    1     4    14    33    45    47    48    50    51    108

That means, that before the current patch is applied the ring buffer is
observed as full (128 requests were consumed at once) in 60% of calls.

After patch is applied the distribution of number of completed requests
is "smoother" and the queue (requests in-flight) is almost never full.

The fio read results are the following (write results are almost the
same and are not showed here):

  Before
  ------
job: (groupid=0, jobs=8): err= 0: pid=2227: Tue Jul 19 11:29:50 2016
  Description  : [Emulation of Storage Server Access Pattern]
  read : io=54681MB, bw=1822.7MB/s, iops=179779, runt= 30001msec
    slat (usec): min=172, max=16883, avg=338.35, stdev=109.66
    clat (usec): min=1, max=21977, avg=1051.45, stdev=299.29
     lat (usec): min=317, max=22521, avg=1389.83, stdev=300.73
    clat percentiles (usec):
     |  1.00th=[  346],  5.00th=[  596], 10.00th=[  708], 20.00th=[  852],
     | 30.00th=[  932], 40.00th=[  996], 50.00th=[ 1048], 60.00th=[ 1112],
     | 70.00th=[ 1176], 80.00th=[ 1256], 90.00th=[ 1384], 95.00th=[ 1496],
     | 99.00th=[ 1800], 99.50th=[ 1928], 99.90th=[ 2320], 99.95th=[ 2672],
     | 99.99th=[ 4704]
    bw (KB  /s): min=205229, max=553181, per=12.50%, avg=233278.26, stdev=18383.51

  After
  ------
job: (groupid=0, jobs=8): err= 0: pid=2220: Tue Jul 19 11:31:51 2016
  Description  : [Emulation of Storage Server Access Pattern]
  read : io=57637MB, bw=1921.2MB/s, iops=189529, runt= 30002msec
    slat (usec): min=169, max=20636, avg=329.61, stdev=124.18
    clat (usec): min=2, max=19592, avg=988.78, stdev=251.04
     lat (usec): min=381, max=21067, avg=1318.42, stdev=243.58
    clat percentiles (usec):
     |  1.00th=[  310],  5.00th=[  580], 10.00th=[  748], 20.00th=[  876],
     | 30.00th=[  908], 40.00th=[  948], 50.00th=[ 1012], 60.00th=[ 1064],
     | 70.00th=[ 1080], 80.00th=[ 1128], 90.00th=[ 1224], 95.00th=[ 1288],
     | 99.00th=[ 1496], 99.50th=[ 1608], 99.90th=[ 1960], 99.95th=[ 2256],
     | 99.99th=[ 5408]
    bw (KB  /s): min=212149, max=390160, per=12.49%, avg=245746.04, stdev=11606.75

Throughput increased from 1822MB/s to 1921MB/s, average completion latencies
decreased from 1051us to 988us.
Signed-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
Message-id: 1468931263-32667-4-git-send-email-roman.penyaev@profitbricks.com
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

0ed93d84

linux-aio: split processing events function · 3407de57

由 Roman Pen 提交于 7月 19, 2016

Prepare processing events function to be called from ioq_submit(),
thus split function on two parts: the first harvests completed IO
requests, the second submits pending requests.
Signed-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
Message-id: 1468931263-32667-3-git-send-email-roman.penyaev@profitbricks.com
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

3407de57

linux-aio: consume events in userspace instead of calling io_getevents · 9e909a58

由 Roman Pen 提交于 7月 19, 2016

AIO context in userspace is represented as a simple ring buffer, which
can be consumed directly without entering the kernel, which obviously
can bring some performance gain.  QEMU does not use timeout value for
waiting for events completions, so we can consume all events from
userspace.
Signed-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
Message-id: 1468931263-32667-2-git-send-email-roman.penyaev@profitbricks.com
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

9e909a58

qcow2: avoid memcpy(dst, NULL, len) · 0647d47c

由 Stefan Hajnoczi 提交于 9月 13, 2016

Section "7.1.4 Use of library functions" in the C99 standard says:

If an argument to a function has an invalid value (such as [...]
a null pointer [...]) [...] the behavior is undefined.

Additionally the "searching and sorting" functions are specified as
requiring valid pointer values as described in 7.1.4.

This patch fixes the following sanitizer errors:

block/qcow2.c:1807:41: runtime error: null pointer passed as argument 2, which is declared to never be null
block/qcow2-cluster.c:86:26: runtime error: null pointer passed as argument 2, which is declared to never be null
Reported-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Message-id: 1473758138-19260-1-git-send-email-stefanha@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

0647d47c

12 9月, 2016 7 次提交

Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-signed' into staging · 7263da78

由 Peter Maydell 提交于 9月 12, 2016

Update OpenBIOS images

# gpg: Signature made Mon 12 Sep 2016 11:51:09 BST
# gpg:                using RSA key 0x5BC2C56FAE0F321F
# gpg: Good signature from "Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>"
# Primary key fingerprint: CC62 1AB9 8E82 200D 915C  C9C4 5BC2 C56F AE0F 321F

* remotes/mcayland/tags/qemu-openbios-signed:
  Update OpenBIOS images to c5542f2 built from submodule.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

7263da78

Merge remote-tracking branch 'remotes/berrange/tags/pull-qcrypto-2016-09-12-1' into staging · d4c61988

由 Peter Maydell 提交于 9月 12, 2016

Merge qcrypto 2016/09/12 v1

# gpg: Signature made Mon 12 Sep 2016 12:02:20 BST
# gpg:                using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/pull-qcrypto-2016-09-12-1:
  crypto: report enum strings instead of values in errors
  crypto: fix building complaint
  crypto: ensure XTS is only used with ciphers with 16 byte blocks
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

d4c61988

crypto: report enum strings instead of values in errors · 90d6f60d

由 Daniel P. Berrange 提交于 9月 05, 2016

Several error messages print out the raw enum value, which
is less than helpful to users, as these values are not
documented, nor stable across QEMU releases. Switch to use
the enum string instead.

The nettle impl also had two typos where it mistakenly
said "algorithm" instead of "mode", and actually reported
the algorithm value too.
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

90d6f60d

crypto: fix building complaint · d9269b27

由 Gonglei 提交于 9月 05, 2016

gnutls commit 846753877d renamed LIBGNUTLS_VERSION_NUMBER to GNUTLS_VERSION_NUMBER.
If using gnutls before that verion, we'll get the below warning:
crypto/tlscredsx509.c:618:5: warning: "GNUTLS_VERSION_NUMBER" is not defined

Because gnutls 3.x still defines LIBGNUTLS_VERSION_NUMBER for back compat, Let's
use LIBGNUTLS_VERSION_NUMBER instead of GNUTLS_VERSION_NUMBER to fix building
complaint.
Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

d9269b27

crypto: ensure XTS is only used with ciphers with 16 byte blocks · a5d2f44d

由 Daniel P. Berrange 提交于 8月 24, 2016

The XTS cipher mode needs to be used with a cipher which has
a block size of 16 bytes. If a mis-matching block size is used,
the code will either corrupt memory beyond the IV array, or
not fully encrypt/decrypt the IV.

This fixes a memory corruption crash when attempting to use
cast5-128 with xts, since the former has an 8 byte block size.

A test case is added to ensure the cipher creation fails with
such an invalid combination.
Reviewed-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

a5d2f44d

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging · c569c537

由 Peter Maydell 提交于 9月 12, 2016

virtio,vhost,pc: fixes and updates

balloon fixes wrt migration
virtio-vsock device support
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Fri 09 Sep 2016 22:36:13 BST
# gpg:                using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  vhost-vsock: add virtio sockets device
  tests/acpi: speedup acpi tests
  virtio-pci: minor refactoring
  vhost: don't set vring call if no vector
  virtio-pci: error out when both legacy and modern modes are disabled
  virtio-balloon: fix stats vq migration
  virtio: add virtqueue_rewind()
  virtio-balloon: discard virtqueue element on reset
  virtio: zero vq->inuse in virtio_reset()
  virtio-pci: reduce modern_mem_bar size
  target-i386: present virtual L3 cache info for vcpus
  pc: Add 2.8 machine
  virtio-pci: use size from correct structure
  virtio: Tell the user what went wrong when event_notifier_init failed
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

c569c537

M
Update OpenBIOS images to c5542f2 built from submodule. · a26f7f2c
由 Mark Cave-Ayland 提交于 9月 12, 2016
```
Signed-off-by: NMark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
```
a26f7f2c

10 9月, 2016 14 次提交

vhost-vsock: add virtio sockets device · fc0b9b0e

由 Stefan Hajnoczi 提交于 8月 16, 2016

Implement the new virtio sockets device for host<->guest communication
using the Sockets API.  Most of the work is done in a vhost kernel
driver so that virtio-vsock can hook into the AF_VSOCK address family.
The QEMU vhost-vsock device handles configuration and live migration
while the rx/tx happens in the vhost_vsock.ko Linux kernel driver.

The vsock device must be given a CID (host-wide unique address):

  # qemu -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3 ...

For more information see:
http://qemu-project.org/Features/VirtioVsock

[Endianness fixes and virtio-ccw support by Claudio Imbrenda
<imbrenda@linux.vnet.ibm.com>]
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
[mst: rebase to master]
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

fc0b9b0e

tests/acpi: speedup acpi tests · 947b205f

由 Marcel Apfelbaum 提交于 9月 06, 2016

Use kvm acceleration if available.
Disable kernel-irqchip and use qemu64 cpu
for both kvm and tcg cases.

Using kvm acceleration saves about a second
and disabling kernel-irqchip has no visible
performance impact.
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMarcel Apfelbaum <marcel@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

947b205f

virtio-pci: minor refactoring · 71d19fc5

由 Michael S. Tsirkin 提交于 9月 09, 2016

!legacy && !modern is shorter than !(legacy || modern).
I also perfer this (less ()s) as a matter of taste.

Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

71d19fc5

vhost: don't set vring call if no vector · 96a3d98d

由 Jason Wang 提交于 8月 01, 2016

We used to set vring call fd unconditionally even if guest driver does
not use MSIX for this vritqueue at all. This will cause lots of
unnecessary userspace access and other checks for drivers does not use
interrupt at all (e.g virtio-net pmd). So check and clean vring call
fd if guest does not use any vector for this virtqueue at
all.

Perf diffs (on rx) shows lots of cpus wasted on vhost_signal() were saved:

#
    28.12%  -27.82%  [vhost]           [k] vhost_signal
    14.44%   -1.69%  [kernel.vmlinux]  [k] copy_user_generic_string
     7.05%   +1.53%  [kernel.vmlinux]  [k] __free_page_frag
     6.51%   +5.53%  [vhost]           [k] vhost_get_vq_desc
...

Pktgen tests shows 15.8% improvement on rx pps and 6.5% on tx pps.

Before: RX 2.08Mpps TX 1.35Mpps
After:  RX 2.41Mpps TX 1.44Mpps
Signed-off-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

96a3d98d

virtio-pci: error out when both legacy and modern modes are disabled · 3eff3769

由 Greg Kurz 提交于 9月 09, 2016

Without presuming if we got there because of a user mistake or some
more subtle bug in the tooling, it really does not make sense to
implement a non-functional device.
Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: NMarcel Apfelbaum <marcel@redhat.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

3eff3769

virtio-balloon: fix stats vq migration · 4a1e48be

由 Ladi Prosek 提交于 9月 07, 2016

The statistics virtqueue is not migrated properly because virtio-balloon
does not include s->stats_vq_elem in the migration stream.

After migration the statistics virtqueue hangs because the host never
completes the last element (s->stats_vq_elem is NULL on the destination
QEMU).  Therefore the guest never submits new elements and the virtqueue
is hung.

Instead of changing the migration stream format in an incompatible way,
detect the migration case and rewind the virtqueue so the last element
can be completed.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Suggested-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

4a1e48be

virtio: add virtqueue_rewind() · 297a75e6

由 Stefan Hajnoczi 提交于 9月 07, 2016

virtqueue_discard() requires a VirtQueueElement but virtio-balloon does
not migrate its in-use element.  Introduce a new function that is
similar to virtqueue_discard() but doesn't require a VirtQueueElement.

This will allow virtio-balloon to access element again after migration
with the usual proviso that the guest may have modified the vring since
last time.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

297a75e6

virtio-balloon: discard virtqueue element on reset · 104e70ca

由 Ladi Prosek 提交于 9月 07, 2016

The one pending element is being freed but not discarded on device
reset, which causes svq->inuse to creep up, eventually hitting the
"Virtqueue size exceeded" error.

Properly discarding the element on device reset makes sure that its
buffers are unmapped and the inuse counter stays balanced.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

104e70ca

virtio: zero vq->inuse in virtio_reset() · 4b7f91ed

由 Stefan Hajnoczi 提交于 9月 07, 2016

vq->inuse must be zeroed upon device reset like most other virtqueue
fields.

In theory, virtio_reset() just needs assert(vq->inuse == 0) since
devices must clean up in-flight requests during reset (requests cannot
not be leaked!).

In practice, it is difficult to achieve vq->inuse == 0 across reset
because balloon, blk, 9p, etc implement various different strategies for
cleaning up requests.  Most devices call g_free(elem) directly without
telling virtio.c that the VirtQueueElement is cleaned up.  Therefore
vq->inuse is not decremented during reset.

This patch zeroes vq->inuse and trusts that devices are not leaking
VirtQueueElements across reset.

I will send a follow-up series that refactors request life-cycle across
all devices and converts vq->inuse = 0 into assert(vq->inuse == 0) but
this more invasive approach is not appropriate for stable trees.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-stable <qemu-stable@nongnu.org>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NLadi Prosek <lprosek@redhat.com>

4b7f91ed

virtio-pci: reduce modern_mem_bar size · d9997d89

由 Marcel Apfelbaum 提交于 9月 07, 2016

Currently each VQ Notification Virtio Capability is allocated
on a different page. The idea is to enable split drivers within
guests, however there are no known plans to do that.
The allocation will result in a 8MB BAR, more than various
guest firmwares pre-allocates for PCI Bridges hotplug process.

Reserve 4 bytes per VQ by default and add a new parameter
"page-per-vq" to be used with split drivers.
Signed-off-by: NMarcel Apfelbaum <marcel@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

d9997d89

target-i386: present virtual L3 cache info for vcpus · 14c985cf

由 Longpeng(Mike) 提交于 9月 07, 2016

Some software algorithms are based on the hardware's cache info, for example,
for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will trigger
a resched IPI and told cpu2 to do the wakeup if they don't share low level
cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc.
The relevant linux-kernel code as bellow:

	static void ttwu_queue(struct task_struct *p, int cpu)
	{
		struct rq *rq = cpu_rq(cpu);
		......
		if (... && !cpus_share_cache(smp_processor_id(), cpu)) {
			......
			ttwu_queue_remote(p, cpu); /* will trigger RES IPI */
			return;
		}
		......
		ttwu_do_activate(rq, p, 0); /* access target's rq directly */
		......
	}

In real hardware, the cpus on the same socket share L3 cache, so one won't
trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present a
virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs
under some workloads even if the virtual cpus belongs to the same virtual socket.

For KVM, there will be lots of vmexit due to guest send IPIs.
The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates)
and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during
the period:
        No-L3           With-L3(applied this patch)
cpu0:	363890		44582
cpu1:	373405		43109
cpu2:	340783		43797
cpu3:	333854		43409
cpu4:	327170		40038
cpu5:	325491		39922
cpu6:	319129		42391
cpu7:	306480		41035
cpu8:	161139		32188
cpu9:	164649		31024
cpu10:	149823		30398
cpu11:	149823		32455
cpu12:	164830		35143
cpu13:	172269		35805
cpu14:	179979		33898
cpu15:	194505		32754
avg:	268963.6	40129.8

The VM's topology is "1*socket 8*cores 2*threads".
After present virtual L3 cache info for VM, the amounts of RES IPIs in guest
reduce 85%.

For KVM, vcpus send IPIs will cause vmexit which is expensive, so it can cause
severe performance degradation. We had tested the overall system performance if
vcpus actually run on sparate physical socket. With L3 cache, the performance
improves 7.2%~33.1%(avg:15.7%).
Signed-off-by: NLongpeng(Mike) <longpeng2@huawei.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

14c985cf

pc: Add 2.8 machine · a4d3c834

由 Longpeng(Mike) 提交于 9月 07, 2016

This will used by the next patch.
Signed-off-by: NLongpeng(Mike) <longpeng2@huawei.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

a4d3c834

virtio-pci: use size from correct structure · e3aab6c7

由 Michael S. Tsirkin 提交于 9月 06, 2016

PIO MR registration should use size from the correct notify struct.
Doesn't affect any visible behaviour because the field values are the
same (both are 4).
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

e3aab6c7

virtio: Tell the user what went wrong when event_notifier_init failed · a8bba0ad

由 Thomas Huth 提交于 6月 28, 2016

event_notifier_init() can fail in real life, for example when there
are not enough open file handles available (EMFILE) when using a lot
of devices. So instead of leaving the average user with a cryptic
error number only, print out a proper error message with strerror()
instead, so that the user has a better way to figure out what is
going on and that using "ulimit -n" might help here for example.
Signed-off-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

a8bba0ad

09 9月, 2016 1 次提交

Merge remote-tracking branch 'remotes/famz/tags/docker-pull-request' into staging · c2a57aae

由 Peter Maydell 提交于 9月 09, 2016

# gpg: Signature made Fri 09 Sep 2016 05:54:35 BST
# gpg:                using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021  AD56 CA35 624C 6A91 71C6

* remotes/famz/tags/docker-pull-request:
  docker: silence debootstrap when --quiet is given
  docker: build debootstrap after cloning
  docker: make sure debootstrap is at least 1.0.67
  docker: print warning if EXECUTABLE is not set when building debootstrap image
  docker: debian-bootstrap.pre: print helpful message if DEB_ARCH/DEB_TYPE unset
  docker: debian-bootstrap.pre: print error messages to stderr
  docker: avoid dependency on 'realpath' package
  docker.py: don't hang on large docker output
  docker: Add a glib2-2.22 image
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

c2a57aae