提交 · 110c477c2ede4d06d3a95a2bfda5c4c58d2af516 · openeuler / qemu

01 7月, 2019 4 次提交

pcie: work around for racy guest init · 110c477c

由 Michael S. Tsirkin 提交于 6月 21, 2019

During boot, linux guests tend to clear all bits in pcie slot status
register which is used for hotplug.
If they clear bits that weren't set this is racy and will lose events:
not a big problem for manual hotplug on bare-metal, but a problem for us.

For example, the following is broken ATM:

/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -S -machine q35  \
    -device pcie-root-port,id=pcie_root_port_0,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device virtio-balloon-pci,id=balloon,bus=pcie_root_port_0 \
    -monitor stdio disk.qcow2
(qemu)device_del balloon
(qemu)cont

Balloon isn't deleted as it should.

As a work-around, detect this attempt to clear slot status and revert
status to what it was before the write.

Note: in theory this can be detected as a duplicate button press
which cancels the previous press. Does not seem to happen in
practice as guests seem to only have this bug during init.

Note2: the right thing to do is probably to fix Linux to
read status before clearing it, and act on the bits that are set.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NMarcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: NIgor Mammedov <imammedo@redhat.com>
Tested-by: NIgor Mammedov <imammedo@redhat.com>

110c477c

pcie: check that slt ctrl changed before deleting · 2841ab43

由 Michael S. Tsirkin 提交于 6月 21, 2019

During boot, linux would sometimes overwrites control of a powered off
slot before powering it on. Unfortunately QEMU interprets that as a
power off request and ejects the device.

For example:

/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -S -machine q35  \
    -device pcie-root-port,id=pcie_root_port_0,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -monitor stdio disk.qcow2
(qemu)device_add virtio-balloon-pci,id=balloon,bus=pcie_root_port_0
(qemu)cont

Balloon is deleted during guest boot.

To fix, save control beforehand and check that power
or led state actually change before ejecting.

Note: this is more a hack than a solution, ideally we'd
find a better way to detect ejects, or move away
from ejects completely and instead monitor whether
it's safe to delete device due to e.g. its power state.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NMarcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: NIgor Mammedov <imammedo@redhat.com>
Tested-by: NIgor Mammedov <imammedo@redhat.com>

2841ab43

pcie: don't skip multi-mask events · 861dc735

由 Michael S. Tsirkin 提交于 6月 20, 2019

If we are trying to set multiple bits at once, testing that just one of
them is already set gives a false positive. As a result we won't
interrupt guest if e.g. presence detection change and attention button
press are both set. This happens with multi-function device removal.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NIgor Mammedov <imammedo@redhat.com>
Reviewed-by: NMarcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>

861dc735

Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-06-24' into staging · 7fec76a0

由 Peter Maydell 提交于 7月 01, 2019

Block patches:
- The SSH block driver now uses libssh instead of libssh2
- The VMDK block driver gets read-only support for the seSparse
  subformat
- Various fixes

# gpg: Signature made Mon 24 Jun 2019 15:42:56 BST
# gpg:                using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
# gpg:                issuer "mreitz@redhat.com"
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full]
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2019-06-24:
  iotests: Fix 205 for concurrent runs
  ssh: switch from libssh2 to libssh
  vmdk: Add read-only support for seSparse snapshots
  vmdk: Reduce the max bound for L1 table size
  vmdk: Fix comment regarding max l1_size coverage
  iotest 134: test cluster-misaligned encrypted write
  blockdev: enable non-root nodes for transaction drive-backup source
  nvme: do not advertise support for unsupported arbitration mechanism
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

7fec76a0

24 6月, 2019 8 次提交

iotests: Fix 205 for concurrent runs · ab5d4a30

由 Max Reitz 提交于 6月 18, 2019

Tests should place their files into the test directory.  This includes
Unix sockets.  205 currently fails to do so, which prevents it from
being run concurrently.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20190618210238.9524-1-mreitz@redhat.com
Reviewed-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

ab5d4a30

ssh: switch from libssh2 to libssh · b10d49d7

由 Pino Toscano 提交于 6月 20, 2019

Rewrite the implementation of the ssh block driver to use libssh instead
of libssh2.  The libssh library has various advantages over libssh2:
- easier API for authentication (for example for using ssh-agent)
- easier API for known_hosts handling
- supports newer types of keys in known_hosts

Use APIs/features available in libssh 0.8 conditionally, to support
older versions (which are not recommended though).

Adjust the iotest 207 according to the different error message, and to
find the default key type for localhost (to properly compare the
fingerprint with).
Contributed-by: NMax Reitz <mreitz@redhat.com>

Adjust the various Docker/Travis scripts to use libssh when available
instead of libssh2. The mingw/mxe testing is dropped for now, as there
are no packages for it.
Signed-off-by: NPino Toscano <ptoscano@redhat.com>
Tested-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: NAlex Bennée <alex.bennee@linaro.org>
Message-id: 20190620200840.17655-1-ptoscano@redhat.com
Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 5873173.t2JhDm7DL7@lindworm.usersys.redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

b10d49d7

vmdk: Add read-only support for seSparse snapshots · 98eb9733

由 Sam Eiderman 提交于 6月 20, 2019

Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in
QEMU).

This format was lacking in the following:

    * Grain directory (L1) and grain table (L2) entries were 32-bit,
      allowing access to only 2TB (slightly less) of data.
    * The grain size (default) was 512 bytes - leading to data
      fragmentation and many grain tables.
    * For space reclamation purposes, it was necessary to find all the
      grains which are not pointed to by any grain table - so a reverse
      mapping of "offset of grain in vmdk" to "grain table" must be
      constructed - which takes large amounts of CPU/RAM.

The format specification can be found in VMware's documentation:
https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf

In ESXi 6.5, to support snapshot files larger than 2TB, a new format was
introduced: SESparse (Space Efficient).

This format fixes the above issues:

    * All entries are now 64-bit.
    * The grain size (default) is 4KB.
    * Grain directory and grain tables are now located at the beginning
      of the file.
      + seSparse format reserves space for all grain tables.
      + Grain tables can be addressed using an index.
      + Grains are located in the end of the file and can also be
        addressed with an index.
      - seSparse vmdks of large disks (64TB) have huge preallocated
        headers - mainly due to L2 tables, even for empty snapshots.
    * The header contains a reverse mapping ("backmap") of "offset of
      grain in vmdk" to "grain table" and a bitmap ("free bitmap") which
      specifies for each grain - whether it is allocated or not.
      Using these data structures we can implement space reclamation
      efficiently.
    * Due to the fact that the header now maintains two mappings:
        * The regular one (grain directory & grain tables)
        * A reverse one (backmap and free bitmap)
      These data structures can lose consistency upon crash and result
      in a corrupted VMDK.
      Therefore, a journal is also added to the VMDK and is replayed
      when the VMware reopens the file after a crash.

Since ESXi 6.7 - SESparse is the only snapshot format available.

Unfortunately, VMware does not provide documentation regarding the new
seSparse format.

This commit is based on black-box research of the seSparse format.
Various in-guest block operations and their effect on the snapshot file
were tested.

The only VMware provided source of information (regarding the underlying
implementation) was a log file on the ESXi:

    /var/log/hostd.log

Whenever an seSparse snapshot is created - the log is being populated
with seSparse records.

Relevant log records are of the form:

[...] Const Header:
[...]  constMagic     = 0xcafebabe
[...]  version        = 2.1
[...]  capacity       = 204800
[...]  grainSize      = 8
[...]  grainTableSize = 64
[...]  flags          = 0
[...] Extents:
[...]  Header         : <1 : 1>
[...]  JournalHdr     : <2 : 2>
[...]  Journal        : <2048 : 2048>
[...]  GrainDirectory : <4096 : 2048>
[...]  GrainTables    : <6144 : 2048>
[...]  FreeBitmap     : <8192 : 2048>
[...]  BackMap        : <10240 : 2048>
[...]  Grain          : <12288 : 204800>
[...] Volatile Header:
[...] volatileMagic     = 0xcafecafe
[...] FreeGTNumber      = 0
[...] nextTxnSeqNumber  = 0
[...] replayJournal     = 0

The sizes that are seen in the log file are in sectors.
Extents are of the following format: <offset : size>

This commit is a strict implementation which enforces:
    * magics
    * version number 2.1
    * grain size of 8 sectors  (4KB)
    * grain table size of 64 sectors
    * zero flags
    * extent locations

Additionally, this commit proivdes only a subset of the functionality
offered by seSparse's format:
    * Read-only
    * No journal replay
    * No space reclamation
    * No unmap support

Hence, journal header, journal, free bitmap and backmap extents are
unused, only the "classic" (L1 -> L2 -> data) grain access is
implemented.

However there are several differences in the grain access itself.
Grain directory (L1):
    * Grain directory entries are indexes (not offsets) to grain
      tables.
    * Valid grain directory entries have their highest nibble set to
      0x1.
    * Since grain tables are always located in the beginning of the
      file - the index can fit into 32 bits - so we can use its low
      part if it's valid.
Grain table (L2):
    * Grain table entries are indexes (not offsets) to grains.
    * If the highest nibble of the entry is:
        0x0:
            The grain in not allocated.
            The rest of the bytes are 0.
        0x1:
            The grain is unmapped - guest sees a zero grain.
            The rest of the bits point to the previously mapped grain,
            see 0x3 case.
        0x2:
            The grain is zero.
        0x3:
            The grain is allocated - to get the index calculate:
            ((entry & 0x0fff000000000000) >> 48) |
            ((entry & 0x0000ffffffffffff) << 12)
    * The difference between 0x1 and 0x2 is that 0x1 is an unallocated
      grain which results from the guest using sg_unmap to unmap the
      grain - but the grain itself still exists in the grain extent - a
      space reclamation procedure should delete it.
      Unmapping a zero grain has no effect (0x2 will not change to 0x1)
      but unmapping an unallocated grain will (0x0 to 0x1) - naturally.

In order to implement seSparse some fields had to be changed to support
both 32-bit and 64-bit entry sizes.
Reviewed-by: NKarl Heubaum <karl.heubaum@oracle.com>
Reviewed-by: NEyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: NArbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: NSam Eiderman <shmuel.eiderman@oracle.com>
Message-id: 20190620091057.47441-4-shmuel.eiderman@oracle.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

98eb9733

vmdk: Reduce the max bound for L1 table size · 59d6ee48

由 Sam Eiderman 提交于 6月 20, 2019

512M of L1 entries is a very loose bound, only 32M are required to store
the maximal supported VMDK file size of 2TB.

Fixed qemu-iotest 59# - now failure occures before on impossible L1
table size.
Reviewed-by: NKarl Heubaum <karl.heubaum@oracle.com>
Reviewed-by: NEyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NArbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: NSam Eiderman <shmuel.eiderman@oracle.com>
Message-id: 20190620091057.47441-3-shmuel.eiderman@oracle.com
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

59d6ee48

vmdk: Fix comment regarding max l1_size coverage · 940a2cd5

由 Sam Eiderman 提交于 6月 20, 2019

Commit b0651b8c ("vmdk: Move l1_size check into vmdk_add_extent")
extended the l1_size check from VMDK4 to VMDK3 but did not update the
default coverage in the moved comment.

The previous vmdk4 calculation:

    (512 * 1024 * 1024) * 512(l2 entries) * 65536(grain) = 16PB

The added vmdk3 calculation:

    (512 * 1024 * 1024) * 4096(l2 entries) * 512(grain) = 1PB

Adding the calculation of vmdk3 to the comment.

In any case, VMware does not offer virtual disks more than 2TB for
vmdk4/vmdk3 or 64TB for the new undocumented seSparse format which is
not implemented yet in qemu.
Reviewed-by: NKarl Heubaum <karl.heubaum@oracle.com>
Reviewed-by: NEyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NArbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: NSam Eiderman <shmuel.eiderman@oracle.com>
Message-id: 20190620091057.47441-2-shmuel.eiderman@oracle.com
Reviewed-by: Nyuchenlin <yuchenlin@synology.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

940a2cd5

iotest 134: test cluster-misaligned encrypted write · 6ec889eb

由 Anton Nefedov 提交于 5月 16, 2019

COW (even empty/zero) areas require encryption too
Signed-off-by: NAnton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Message-id: 20190516143028.81155-1-anton.nefedov@virtuozzo.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

6ec889eb

blockdev: enable non-root nodes for transaction drive-backup source · 85c9d133

由 Vladimir Sementsov-Ogievskiy 提交于 6月 18, 2019

We forget to enable it for transaction .prepare, while it is already
enabled in do_drive_backup since commit a2d665c1
    "blockdev: loosen restrictions on drive-backup source node"
Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190618140804.59214-1-vsementsov@virtuozzo.com
Reviewed-by: NJohn Snow <jsnow@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

85c9d133

nvme: do not advertise support for unsupported arbitration mechanism · 1cc354ac

由 Klaus Birkelund Jensen 提交于 6月 06, 2019

The device mistakenly reports that the Weighted Round Robin with Urgent
Priority Class arbitration mechanism is supported.

It is not.
Signed-off-by: NKlaus Birkelund Jensen <klaus.jensen@cnexlabs.com>
Message-id: 20190606092530.14206-1-klaus@birkelund.eu
Acked-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

1cc354ac

21 6月, 2019 28 次提交

Merge remote-tracking branch 'remotes/amarkovic/tags/mips-queue-jun-21-2019' into staging · 474f3938

由 Peter Maydell 提交于 6月 21, 2019

MIPS queue for June 21st, 2019

# gpg: Signature made Fri 21 Jun 2019 10:46:57 BST
# gpg:                using RSA key D4972A8967F75A65
# gpg: Good signature from "Aleksandar Markovic <amarkovic@wavecomp.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 8526 FBF1 5DA3 811F 4A01  DD75 D497 2A89 67F7 5A65

* remotes/amarkovic/tags/mips-queue-jun-21-2019:
  target/mips: Fix emulation of ILVR.<B|H|W> on big endian host
  target/mips: Fix emulation of ILVL.<B|H|W> on big endian host
  target/mips: Fix emulation of ILVOD.<B|H|W> on big endian host
  target/mips: Fix emulation of ILVEV.<B|H|W> on big endian host
  tests/tcg: target/mips: Amend tests for MSA pack instructions
  tests/tcg: target/mips: Include isa/ase and group name in test output
  target/mips: Fix if-else-switch-case arms checkpatch errors in translate.c
  target/mips: Fix some space checkpatch errors in translate.c
  MAINTAINERS: Consolidate MIPS disassembler-related items
  MAINTAINERS: Update file items for MIPS Malta board
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

474f3938

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging · 68d7ff0c

由 Peter Maydell 提交于 6月 21, 2019

* Nuke hw_compat_4_0_1 and pc_compat_4_0_1 (Greg)
* Static analysis fixes (Igor, Lidong)
* X86 Hyper-V CPUID improvements (Vitaly)
* X86 nested virt migration (Liran)
* New MSR-based features (Xiaoyao)

# gpg: Signature made Fri 21 Jun 2019 12:25:42 BST
# gpg:                using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (25 commits)
  hw: Nuke hw_compat_4_0_1 and pc_compat_4_0_1
  util/main-loop: Fix incorrect assertion
  sd: Fix out-of-bounds assertions
  target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities
  target/i386: kvm: Add support for KVM_CAP_EXCEPTION_PAYLOAD
  target/i386: kvm: Add support for save and restore nested state
  vmstate: Add support for kernel integer types
  linux-headers: sync with latest KVM headers from Linux 5.2
  target/i386: kvm: Block migration for vCPUs exposed with nested virtualization
  target/i386: kvm: Re-inject #DB to guest with updated DR6
  target/i386: kvm: Use symbolic constant for #DB/#BP exception constants
  KVM: Introduce kvm_arch_destroy_vcpu()
  target/i386: kvm: Delete VMX migration blocker on vCPU init failure
  target/i386: define a new MSR based feature word - FEAT_CORE_CAPABILITY
  i386/kvm: add support for Direct Mode for Hyper-V synthetic timers
  i386/kvm: hv-evmcs requires hv-vapic
  i386/kvm: hv-tlbflush/ipi require hv-vpindex
  i386/kvm: hv-stimer requires hv-time and hv-synic
  i386/kvm: implement 'hv-passthrough' mode
  i386/kvm: document existing Hyper-V enlightenments
  ...
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

68d7ff0c

hw: Nuke hw_compat_4_0_1 and pc_compat_4_0_1 · 8e8cbed0

由 Greg Kurz 提交于 6月 14, 2019

Commit c87759ce fixed a regression affecting pc-q35 machines by
introducing a new pc-q35-4.0.1 machine version to be used instead
of pc-q35-4.0. The only purpose was to revert the default behaviour
of not using split irqchip, but the change also introduced the usual
hw_compat and pc_compat bits, and wired them for pc-q35 only.

This raises questions when it comes to add new compat properties for
4.0* machine versions of any architecture. Where to add them ? In
4.0, 4.0.1 or both ? Error prone. Another possibility would be to teach
all other architectures about 4.0.1. This solution isn't satisfying,
especially since this is a pc-q35 specific issue.

It turns out that the split irqchip default is handled in the machine
option function and doesn't involve compat lists at all.

Drop all the 4.0.1 compat lists and use the 4.0 ones instead in the 4.0.1
machine option function.

Move the compat props that were added to the 4.0.1 since c87759ce to
4.0.

Even if only hw_compat_4_0_1 had an impact on other architectures,
drop pc_compat_4_0_1 as well for consistency.

Fixes: c87759ce "q35: Revert to kernel irqchip"
Suggested-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Message-Id: <156051774276.244890.8660277280145466396.stgit@bahia.lan>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8e8cbed0

util/main-loop: Fix incorrect assertion · 6512e34b

由 Lidong Chen 提交于 6月 19, 2019

The check for poll_fds in g_assert() was incorrect. The correct assertion
should check "n_poll_fds + w->num <= ARRAY_SIZE(poll_fds)" because the
subsequent for-loop is doing access to poll_fds[n_poll_fds + i] where i
is in [0, w->num).  This could happen with a very high number of file
descriptors and/or wait objects.
Signed-off-by: NLidong Chen <lidong.chen@oracle.com>
Suggested-by: NPeter Maydell <peter.maydell@linaro.org>
Suggested-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Reviewed-by: NLi Qiang <liq3ea@gmail.com>
Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <ded30967982811617ce7f0222d11228130c198b7.1560806687.git.lidong.chen@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6512e34b

sd: Fix out-of-bounds assertions · 1c598ab2

由 Lidong Chen 提交于 6月 19, 2019

Due to an off-by-one error, the assert statements allow an
out-of-bound array access.  This doesn't happen in practice,
but the static analyzer notices.
Signed-off-by: NLidong Chen <lidong.chen@oracle.com>
Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
Reviewed-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: NLi Qiang <liq3ea@gmail.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Message-Id: <6b19cb7359a10a6bedc3ea0fce22fed3ef93c102.1560806687.git.lidong.chen@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1c598ab2

target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities · 12604092

由 Liran Alon 提交于 6月 19, 2019

Previous commits have added support for migration of nested virtualization
workloads. This was done by utilising two new KVM capabilities:
KVM_CAP_NESTED_STATE and KVM_CAP_EXCEPTION_PAYLOAD. Both which are
required in order to correctly migrate such workloads.

Therefore, change code to add a migration blocker for vCPUs exposed with
Intel VMX or AMD SVM in case one of these kernel capabilities is
missing.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NMaran Wilson <maran.wilson@oracle.com>
Message-Id: <20190619162140.133674-11-liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

12604092

target/i386: kvm: Add support for KVM_CAP_EXCEPTION_PAYLOAD · fd13f23b

由 Liran Alon 提交于 6月 19, 2019

Kernel commit c4f55198c7c2 ("kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD")
introduced a new KVM capability which allows userspace to correctly
distinguish between pending and injected exceptions.

This distinguish is important in case of nested virtualization scenarios
because a L2 pending exception can still be intercepted by the L1 hypervisor
while a L2 injected exception cannot.

Furthermore, when an exception is attempted to be injected by QEMU,
QEMU should specify the exception payload (CR2 in case of #PF or
DR6 in case of #DB) instead of having the payload already delivered in
the respective vCPU register. Because in case exception is injected to
L2 guest and is intercepted by L1 hypervisor, then payload needs to be
reported to L1 intercept (VMExit handler) while still preserving
respective vCPU register unchanged.

This commit adds support for QEMU to properly utilise this new KVM
capability (KVM_CAP_EXCEPTION_PAYLOAD).
Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Message-Id: <20190619162140.133674-10-liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fd13f23b

target/i386: kvm: Add support for save and restore nested state · ebbfef2f

由 Liran Alon 提交于 6月 19, 2019

Kernel commit 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
introduced new IOCTLs to extract and restore vCPU state related to
Intel VMX & AMD SVM.

Utilize these IOCTLs to add support for migration of VMs which are
running nested hypervisors.
Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: NMaran Wilson <maran.wilson@oracle.com>
Tested-by: NMaran Wilson <maran.wilson@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Message-Id: <20190619162140.133674-9-liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ebbfef2f

vmstate: Add support for kernel integer types · 6cfd7639

由 Liran Alon 提交于 6月 19, 2019

Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: NMaran Wilson <maran.wilson@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190619162140.133674-8-liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6cfd7639

linux-headers: sync with latest KVM headers from Linux 5.2 · 1d33bea4

由 Liran Alon 提交于 6月 19, 2019

Improve the KVM_{GET,SET}_NESTED_STATE structs by detailing the format
of VMX nested state data in a struct.

In order to avoid changing the ioctl values of
KVM_{GET,SET}_NESTED_STATE, there is a need to preserve
sizeof(struct kvm_nested_state). This is done by defining the data
struct as "data.vmx[0]". It was the most elegant way I found to
preserve struct size while still keeping struct readable and easy to
maintain. It does have a misfortunate side-effect that now it has to be
accessed as "data.vmx[0]" rather than just "data.vmx".

Because we are already modifying these structs, I also modified the
following:
* Define the "format" field values as macros.
* Rename vmcs_pa to vmcs12_pa for better readability.
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NMaran Wilson <maran.wilson@oracle.com>
Message-Id: <20190619162140.133674-7-liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1d33bea4

target/i386: kvm: Block migration for vCPUs exposed with nested virtualization · 18ab37ba

由 Liran Alon 提交于 6月 19, 2019

Commit d98f2607 ("target/i386: kvm: add VMX migration blocker")
added a migration blocker for vCPU exposed with Intel VMX.
However, migration should also be blocked for vCPU exposed with
AMD SVM.

Both cases should be blocked because QEMU should extract additional
vCPU state from KVM that should be migrated as part of vCPU VMState.
E.g. Whether vCPU is running in guest-mode or host-mode.

Fixes: d98f2607 ("target/i386: kvm: add VMX migration blocker")
Reviewed-by: NMaran Wilson <maran.wilson@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Message-Id: <20190619162140.133674-6-liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

18ab37ba

target/mips: Fix emulation of ILVR.<B|H|W> on big endian host · 14f5d874

由 Aleksandar Markovic 提交于 6月 20, 2019

Fix emulation of ILVR.<B|H|W> on big endian host by applying
mapping of data element indexes from one endian to another.
Signed-off-by: NAleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: NAleksandar Rikalo <arikalo@wavecomp.com>
Message-Id: <1561038349-17105-5-git-send-email-aleksandar.markovic@rt-rk.com>

14f5d874

target/mips: Fix emulation of ILVL.<B|H|W> on big endian host · 8e74bceb

由 Aleksandar Markovic 提交于 6月 20, 2019

Fix emulation of ILVL.<B|H|W> on big endian host by applying
mapping of data element indexes from one endian to another.
Signed-off-by: NAleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: NAleksandar Rikalo <arikalo@wavecomp.com>
Message-Id: <1561038349-17105-4-git-send-email-aleksandar.markovic@rt-rk.com>

8e74bceb

target/mips: Fix emulation of ILVOD.<B|H|W> on big endian host · b000169e

由 Aleksandar Markovic 提交于 6月 20, 2019

Fix emulation of ILVOD.<B|H|W> on big endian host by applying
mapping of data element indexes from one endian to another.
Signed-off-by: NAleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: NAleksandar Rikalo <arikalo@wavecomp.com>
Message-Id: <1561038349-17105-3-git-send-email-aleksandar.markovic@rt-rk.com>

b000169e

target/mips: Fix emulation of ILVEV.<B|H|W> on big endian host · 98880cb5

由 Aleksandar Markovic 提交于 6月 20, 2019

Fix emulation of ILVEV.<B|H|W> on big endian host by applying
mapping of data element indexes from one endian to another.
Signed-off-by: NAleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: NAleksandar Rikalo <arikalo@wavecomp.com>
Message-Id: <1561038349-17105-2-git-send-email-aleksandar.markovic@rt-rk.com>

98880cb5

tests/tcg: target/mips: Amend tests for MSA pack instructions · f9fa196b

由 Aleksandar Markovic 提交于 6月 20, 2019

Add tests for cases when destination register is the same as one
of source registers.
Signed-off-by: NAleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: NAleksandar Rikalo <arikalo@wavecomp.com>
Message-Id: <1561031359-6727-3-git-send-email-aleksandar.markovic@rt-rk.com>

f9fa196b

tests/tcg: target/mips: Include isa/ase and group name in test output · 8e6fe6b8

由 Aleksandar Markovic 提交于 6月 20, 2019

For better appearance and usefullnes, include ISA/ASE name and
instruction group name in the output of tests. For example, all
this data will be displayed for FMAX_A.W test:

| MSA       | Float Max Min       | FMAX_A.W    |
| PASS:  80 | FAIL:   0 | elapsed time: 0.16 ms |

(the data will be displayed in one row; they are presented here in two
rows not to exceed the width of the commit message)
Signed-off-by: NAleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: NAleksandar Rikalo <arikalo@wavecomp.com>
Message-Id: <1561031359-6727-2-git-send-email-aleksandar.markovic@rt-rk.com>

8e6fe6b8

target/mips: Fix if-else-switch-case arms checkpatch errors in translate.c · 1f8929d2

由 Aleksandar Markovic 提交于 6月 20, 2019

Remove if-else-switch-case-arms-related checkpatch errors.
Signed-off-by: NAleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: NAleksandar Rikalo <arikalo@wavecomp.com>
Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <1561037595-14413-5-git-send-email-aleksandar.markovic@rt-rk.com>

1f8929d2

target/mips: Fix some space checkpatch errors in translate.c · 235785e8

由 Aleksandar Markovic 提交于 6月 20, 2019

Remove some space-related checkpatch warning.
Signed-off-by: NAleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: NAleksandar Rikalo <arikalo@wavecomp.com>
Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <1561037595-14413-4-git-send-email-aleksandar.markovic@rt-rk.com>

235785e8

MAINTAINERS: Consolidate MIPS disassembler-related items · d02d5fff

由 Aleksandar Markovic 提交于 6月 20, 2019

Eliminate duplicate MIPS disassembler-related items in the
MAINTAINERS file, and use wildcards to shorten the list of
involved files.
Signed-off-by: NAleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: NAleksandar Rikalo <arikalo@wavecomp.com>
Message-Id: <1561037595-14413-3-git-send-email-aleksandar.markovic@rt-rk.com>

d02d5fff

MAINTAINERS: Update file items for MIPS Malta board · 93081f51

由 Aleksandar Markovic 提交于 6月 20, 2019

hw/mips/gt64xxx_pci.c is used for Malta only, so it is logical to
place this file in Malta board section of the MAINTAINERS file.
Signed-off-by: NAleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: NAleksandar Rikalo <arikalo@wavecomp.com>
Message-Id: <1561037595-14413-2-git-send-email-aleksandar.markovic@rt-rk.com>

93081f51

target/i386: kvm: Re-inject #DB to guest with updated DR6 · bceeeef9

由 Liran Alon 提交于 6月 19, 2019

If userspace (QEMU) debug guest, when #DB is raised in guest and
intercepted by KVM, KVM forwards information on #DB to userspace
instead of injecting #DB to guest.
While doing so, KVM don't update vCPU DR6 but instead report the #DB DR6
value to userspace for further handling.
See KVM's handle_exception() DB_VECTOR handler.

QEMU handler for this case is kvm_handle_debug(). This handler basically
checks if #DB is related to one of user set hardware breakpoints and if
not, it re-inject #DB into guest.
The re-injection is done by setting env->exception_injected to #DB which
will later be passed as events.exception.nr to KVM_SET_VCPU_EVENTS ioctl
by kvm_put_vcpu_events().

However, in case userspace re-injects #DB, KVM expects userspace to set
vCPU DR6 as reported to userspace when #DB was intercepted! Otherwise,
KVM_REQ_EVENT handler will inject #DB with wrong DR6 to guest.

Fix this issue by updating vCPU DR6 appropriately when re-inject #DB to
guest.
Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Message-Id: <20190619162140.133674-5-liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bceeeef9

target/i386: kvm: Use symbolic constant for #DB/#BP exception constants · 37936ac7

由 Liran Alon 提交于 6月 19, 2019

Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Message-Id: <20190619162140.133674-4-liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

37936ac7

KVM: Introduce kvm_arch_destroy_vcpu() · b1115c99

由 Liran Alon 提交于 6月 19, 2019

Simiar to how kvm_init_vcpu() calls kvm_arch_init_vcpu() to perform
arch-dependent initialisation, introduce kvm_arch_destroy_vcpu()
to be called from kvm_destroy_vcpu() to perform arch-dependent
destruction.

This was added because some architectures (Such as i386)
currently do not free memory that it have allocated in
kvm_arch_init_vcpu().
Suggested-by: NMaran Wilson <maran.wilson@oracle.com>
Reviewed-by: NMaran Wilson <maran.wilson@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Message-Id: <20190619162140.133674-3-liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b1115c99

target/i386: kvm: Delete VMX migration blocker on vCPU init failure · 6b2341ee

由 Liran Alon 提交于 6月 19, 2019

Commit d98f2607 ("target/i386: kvm: add VMX migration blocker")
added migration blocker for vCPU exposed with Intel VMX because QEMU
doesn't yet contain code to support migration of nested virtualization
workloads.

However, that commit missed adding deletion of the migration blocker in
case init of vCPU failed. Similar to invtsc_mig_blocker. This commit fix
that issue.

Fixes: d98f2607 ("target/i386: kvm: add VMX migration blocker")
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NMaran Wilson <maran.wilson@oracle.com>
Message-Id: <20190619162140.133674-2-liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6b2341ee

target/i386: define a new MSR based feature word - FEAT_CORE_CAPABILITY · 597360c0

由 Xiaoyao Li 提交于 6月 17, 2019

MSR IA32_CORE_CAPABILITY is a feature-enumerating MSR, which only
enumerates the feature split lock detection (via bit 5) by now.

The existence of MSR IA32_CORE_CAPABILITY is enumerated by CPUID.7_0:EDX[30].

The latest kernel patches about them can be found here:
https://lkml.org/lkml/2019/4/24/1909Signed-off-by: NXiaoyao Li <xiaoyao.li@linux.intel.com>
Message-Id: <20190617153654.916-1-xiaoyao.li@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

597360c0

i386/kvm: add support for Direct Mode for Hyper-V synthetic timers · 128531d9

由 Vitaly Kuznetsov 提交于 5月 17, 2019

Hyper-V on KVM can only use Synthetic timers with Direct Mode (opting for
an interrupt instead of VMBus message). This new capability is only
announced in KVM_GET_SUPPORTED_HV_CPUID.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20190517141924.19024-10-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

128531d9

i386/kvm: hv-evmcs requires hv-vapic · 8caba36d

由 Vitaly Kuznetsov 提交于 5月 17, 2019

Enlightened VMCS is enabled by writing to a field in VP assist page and
these require virtual APIC.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20190517141924.19024-9-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8caba36d