提交 · d9e0dfa2462e32cc5c6c49401ad7bff36453f75c · openeuler / qemu

05 7月, 2016 37 次提交

block: Split bdrv_merge_limits() from bdrv_refresh_limits() · d9e0dfa2

由 Eric Blake 提交于 6月 23, 2016

During bdrv_merge_limits(), we were computing initial limits
based on another BDS in two places.  At first glance, the two
computations are not identical (one is doing straight copying,
the other is doing merging towards or away from zero) - but
when you realize that the first round is starting with all-0
memory, all of the merging happens to work.  Factoring out the
merging makes it easier to track how two BDS limits are merged,
in case we have future reasons to merge in even more limits.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

d9e0dfa2

block: Drop raw_refresh_limits() · ad82be2f

由 Eric Blake 提交于 6月 23, 2016

The raw block driver was blindly copying all limits from bs->file,
even though: 1. the main bdrv_refresh_limits() already does this
for many of the limits, and 2. blindly copying from the children
can weaken any stricter limits that were already inherited from
the backing chain during the main bdrv_refresh_limits().  Also,
a future patch is about to move .request_alignment into
BlockLimits, and that is a limit that should NOT be copied from
other layers in the BDS chain.

Thus, we can completely drop raw_refresh_limits(), and rely on
the block layer setting up the proper limits.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

ad82be2f

block: Switch discard length bounds to byte-based · b9f7855a

由 Eric Blake 提交于 6月 23, 2016

Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_discard and
discard_alignment.  Rename them, using 'pdiscard' as an aid to
track which remaining discard interfaces need conversion, and so
that the compiler will help us catch the change in semantics
across any rebased code.  The BlockLimits type is now completely
byte-based; and in iscsi.c, sector_limits_lun2qemu() is no
longer needed.

pdiscard_alignment is made unsigned (we use power-of-2 alignments
as bitmasks, where unsigned is easier to think about) while
leaving max_pdiscard signed (since we still have an 'int'
interface); this is comparable to what commit cf081fca did for
write zeroes limits.  We may later want to make everything an
unsigned 64-bit limit - but that requires a bigger code audit.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

b9f7855a

block: Wording tweaks to write zeroes limits · 29cc6a68

由 Eric Blake 提交于 6月 23, 2016

Improve the documentation of the write zeroes limits, to mention
additional constraints that drivers should observe.  Worth squashing
into commit cf081fca, if that hadn't been pushed already :)
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

29cc6a68

block: Switch transfer length bounds to byte-based · 5def6b80

由 Eric Blake 提交于 6月 23, 2016

Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_transfer_length
and opt_transfer_length.  Rename them (dropping the _length suffix)
so that the compiler will help us catch the change in semantics
across any rebased code, and improve the documentation.  Use unsigned
values, so that we don't have to worry about negative values and
so that bit-twiddling is easier; however, we are still constrained
by 2^31 of signed int in most APIs.

When a value comes from an external source (iscsi and raw-posix),
sanitize the results to ensure that opt_transfer is a power of 2.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

5def6b80

block: Set default request_alignment during bdrv_refresh_limits() · 79ba8c98

由 Eric Blake 提交于 6月 23, 2016

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Now that all drivers have been updated to supply an override
of request_alignment during their .bdrv_refresh_limits(), as
needed, the block layer itself can defer setting the default
alignment until part of the overall bdrv_refresh_limits().
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

79ba8c98

block: Set request_alignment during .bdrv_refresh_limits() · a6506481

由 Eric Blake 提交于 6月 23, 2016

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Add a .bdrv_refresh_limits() to all four of our legacy devices
that will always be sector-only (bochs, cloop, dmg, vvfat), in
spite of their recent conversion to expose a byte interface.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

a6506481

raw-win32: Set request_alignment during .bdrv_refresh_limits() · 2914a1de

由 Eric Blake 提交于 6月 23, 2016

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

In this case, raw_probe_alignment() already did what we needed,
so just fix its signature and wire it in correctly.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

2914a1de

qcow2: Set request_alignment during .bdrv_refresh_limits() · a84178cc

由 Eric Blake 提交于 6月 23, 2016

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

a84178cc

iscsi: Set request_alignment during .bdrv_refresh_limits() · c8b3b998

由 Eric Blake 提交于 6月 23, 2016

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

c8b3b998

blkdebug: Set request_alignment during .bdrv_refresh_limits() · 835db3ee

由 Eric Blake 提交于 6月 23, 2016

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Note that when the user does not provide "align", then we were
defaulting to bs->request_alignment - but at this stage in the
initialization, that was always 512.  We were also rejecting an
explicit "align":0 from the user; this patch now allows that,
as an explicit request for the default alignment (which may not
always be 512 in the future).

qemu-iotests 77 is particularly sensitive to the fact that we
can specify an artificial alignment override in blkdebug, and
that override must continue to work even when limits are
refreshed on an already open device.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

835db3ee

block: Give nonzero result to blk_get_max_transfer_length() · 24ce9a20

由 Eric Blake 提交于 6月 23, 2016

Making all callers special-case 0 as unlimited is awkward,
and we DO have a hard maximum of BDRV_REQUEST_MAX_SECTORS given
our current block layer API limits.

In the case of scsi, this means that we now always advertise a
limit to the guest, even in cases where the underlying layers
previously use 0 for no inherent limit beyond the block layer.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

24ce9a20

scsi: Advertise limits by blocksize, not 512 · efaf4781

由 Eric Blake 提交于 6月 23, 2016

s->blocksize may be larger than 512, in which case our
tweaks to max_xfer_len and opt_xfer_len must be scaled
appropriately.

CC: qemu-stable@nongnu.org
Reported-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

efaf4781

iscsi: Advertise realistic limits to block layer · f9e95af0

由 Eric Blake 提交于 6月 23, 2016

The function sector_limits_lun2qemu() returns a value in units of
the block layer's 512-byte sector, and can be as large as
0x40000000, which is much larger than the block layer's inherent
limit of BDRV_REQUEST_MAX_SECTORS.  The block layer already
handles '0' as a synonym to the inherent limit, and it is nicer
to return this value than it is to calculate an arbitrary
maximum, for two reasons: we want to ensure that the block layer
continues to special-case '0' as 'no limit beyond the inherent
limits'; and we want to be able to someday expand the block
layer to allow 64-bit limits, where auditing for uses of
BDRV_REQUEST_MAX_SECTORS will help us make sure we aren't
artificially constraining iscsi to old block layer limits.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

f9e95af0

nbd: Advertise realistic limits to block layer · 20220471

由 Eric Blake 提交于 6月 23, 2016

We were basing the advertisement of maximum discard and transfer
length off of UINT32_MAX, but since the rest of the block layer
has signed int limits on a transaction, nothing could ever reach
that maximum, and we risk overflowing an int once things are
converted to byte-based rather than sector-based limits.  What's
more, we DO have a much smaller limit: both the current kernel
and qemu-nbd have a hard limit of 32M on a read or write
transaction, and while they may also permit up to a full 32 bits
on a discard transaction, the upstream NBD protocol is proposing
wording that without any explicit advertisement otherwise,
clients should limit ALL requests to the same limits as read and
write, even though the other requests do not actually require as
many bytes across the wire.  So the better limit to tell the
block layer is 32M for both values.

Behavior doesn't actually change with this patch (the block layer
is currently ignoring the max_transfer advertisements); but when
that problem is fixed in a later series, this patch will prevent
the exposure of a latent bug.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

20220471

nbd: Allow larger requests · 476b923c

由 Eric Blake 提交于 6月 23, 2016

The NBD layer was breaking up request at a limit of 2040 sectors
(just under 1M) to cater to old qemu-nbd. But the server limit
was raised to 32M in commit 2d821488 to match the kernel, more
than three years ago; and the upstream NBD Protocol is proposing
documentation that without any explicit communication to state
otherwise, a client should be able to safely assume that a 32M
transaction will work.  It is time to rely on the larger sizing,
and any downstream distro that cares about maximum
interoperability to older qemu-nbd servers can just tweak the
value of #define NBD_MAX_SECTORS.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

476b923c

block: Fix harmless off-by-one in bdrv_aligned_preadv() · 82524274

由 Eric Blake 提交于 6月 23, 2016

If the amount of data to read ends exactly on the total size
of the bs, then we were wasting time creating a local qiov
to read the data in preparation for what would normally be
appending zeroes beyond the end, even though this corner case
has nothing further to do.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

82524274

block: Document supported flags during bdrv_aligned_preadv() · a604fa2b

由 Eric Blake 提交于 6月 23, 2016

We don't pass any flags on to drivers to handle.  Tighten an
assert to explain why we pass 0 to bdrv_driver_preadv(), and add
some comments on things to be aware of if we want to turn on
per-BDS BDRV_REQ_FUA support during reads in the future.  Also,
document that we may want to consider using unmap during
copy-on-read operations where the read is all zeroes.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

a604fa2b

block: Tighter assertions on bdrv_aligned_pwritev() · cff86b38

由 Eric Blake 提交于 6月 23, 2016

For symmetry with bdrv_aligned_preadv(), assert that the caller
really has aligned things properly. This requires adding an align
parameter, which is used now only in the new asserts, but will
come in handy in a later patch that adds auto-fragmentation to the
max transfer size, since that value need not always be a multiple
of the alignment, and therefore must be rounded down.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

cff86b38

qemu-img: fix failed autotests · cfef6a45

由 Denis V. Lunev 提交于 7月 04, 2016

There are 9 iotests failed on Ubuntu 15.10 at the moment.
The problem is that options parsing in qemu-img is broken by the
following commit:
    commit 10985131
    Author: Denis V. Lunev <den@openvz.org>
    Date:   Fri Jun 17 17:44:13 2016 +0300
    qemu-img: move common options parsing before commands processing

This strange command line reports error
  ./qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- 1024
  qemu-img: Invalid image size specified!
while original code parses it successfully.

The problem is that getopt_long state should be reset. This could be done
using this assignment according to the manual:
    optind = 0
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Eric Blake <eblake@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

cfef6a45

Merge remote-tracking branch 'remotes/kraxel/tags/pull-ipxe-20160704-1' into staging · 60a0f1af

由 Peter Maydell 提交于 7月 05, 2016

ipxe: update submodule from 4e03af8ec to 041863191
e1000e+vmxnet3: add boot rom

# gpg: Signature made Mon 04 Jul 2016 07:25:46 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-ipxe-20160704-1:
  build: add pc-bios to config-host.mak deps
  ipxe: add new roms to BLOBS
  ipxe: update prebuilt binaries
  vmxnet3: add boot rom
  e1000e: add boot rom
  ipxe: add vmxnet3 rom
  ipxe: add e1000e rom
  ipxe: update submodule from 4e03af8ec to 041863191
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

60a0f1af

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160705' into staging · 8662d7db

由 Peter Maydell 提交于 7月 05, 2016

ppc patch queue for 2016-07-05

Here's the current ppc, sPAPR and related drivers patch queue.

  * The big addition is dynamic DMA window support (this includes some
    core VFIO changes)
  * There are also several fixes to the MMU emulation for bugs
    introduced with the HV mode patches
  * Several other bugfixes and cleanups

Changes in v2:
  I messed up and forgot to make a fix in the last patch which BenH
  pointed out (introduced by my rebasing).  That's fixed in this
  version, and I'm replacing the tag in place with the revised
  version.

# gpg: Signature made Tue 05 Jul 2016 06:28:58 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.7-20160705:
  ppc/hash64: Fix support for LPCR:ISL
  ppc/hash64: Add proper real mode translation support
  target-ppc: Return page shift from PTEG search
  target-ppc: Simplify HPTE matching
  target-ppc: Correct page size decoding in ppc_hash64_pteg_search()
  ppc: simplify ppc_hash64_hpte_page_shift_noslb()
  spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)
  vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)
  vfio: Add host side DMA window capabilities
  vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2)
  spapr_iommu: Realloc guest visible TCE table when starting/stopping listening
  ppc: simplify max_smt initialization in ppc_cpu_realizefn()
  spapr: Ensure thread0 of CPU core is always realized first
  ppc: Fix xsrdpi, xvrdpi and xvrspi rounding
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

8662d7db

ppc/hash64: Fix support for LPCR:ISL · 2c7ad804

由 Benjamin Herrenschmidt 提交于 7月 04, 2016

We need to ignore the segment page size and essentially treat
all pages as coming from a 4K segment.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[dwg: Adjusted for differences in my version of the prereq patches]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

2c7ad804

ppc/hash64: Add proper real mode translation support · 912acdf4

由 Benjamin Herrenschmidt 提交于 7月 05, 2016

This adds proper support for translating real mode addresses based
on the combination of HV and LPCR bits. This handles HRMOR offset
for hypervisor real mode, and both RMA and VRMA modes for guest
real mode. PAPR mode adjusts the offsets appropriately to match the
RMA used in TCG, but we need to limit to the max supported by the
implementation (16G).

This includes some fixes by Cédric Le Goater <clg@kaod.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[dwg: Adjusted for differences in my version of the prereq patches]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

912acdf4

target-ppc: Return page shift from PTEG search · 94986863

由 David Gibson 提交于 7月 05, 2016

ppc_hash64_pteg_search() now decodes a PTEs page size encoding, which it
didn't previously do.  This means we're now double decoding the page size
because we check it int he fault path after ppc64_hash64_htab_lookup()
returns.

To avoid this duplication have ppc_hash64_pteg_search() and
ppc_hash64_htab_lookup() return the page size from the PTE and use that in
the callers instead of decoding again.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

94986863

target-ppc: Simplify HPTE matching · 073de86a

由 David Gibson 提交于 7月 05, 2016

ppc_hash64_pteg_search() explicitly checks each HPTE's VALID and
SECONDARY bits, then uses the HPTE64_V_COMPARE() macro to check the B field
and AVPN. However, a small tweak to HPTE64_V_COMPARE() means we can check
all of these bits at once with a suitable ptem value. So, consolidate all
the comparisons for simplicity.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

073de86a

target-ppc: Correct page size decoding in ppc_hash64_pteg_search() · 651060ab

由 David Gibson 提交于 7月 05, 2016

The architecture specifies that when searching a PTEG for PTEs, entries
with a page size encoding that's not valid for the current segment should
be ignored, continuing the search.

The current implementation does this with ppc_hash64_pte_size_decode()
which is a very incomplete implementation of this check. We already have
code to do a full and correct page size decode in hpte_page_shift().

This patch moves hpte_page_shift() so it can be used in
ppc_hash64_pteg_search() and adjusts the latter's parameters to include
a full SLBE instead of just a segment page shift.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

651060ab

ppc: simplify ppc_hash64_hpte_page_shift_noslb() · 1f0252e6

由 Cédric Le Goater 提交于 7月 01, 2016

The segment page shift parameter is never used. Let's remove it.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

1f0252e6

spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) · ae4de14c

由 Alexey Kardashevskiy 提交于 7月 04, 2016

This adds support for Dynamic DMA Windows (DDW) option defined by
the SPAPR specification which allows to have additional DMA window(s)

The "ddw" property is enabled by default on a PHB but for compatibility
the pseries-2.6 machine and older disable it.
This also creates a single DMA window for the older machines to
maintain backward migration.

This implements DDW for PHB with emulated and VFIO devices. The host
kernel support is required. The advertised IOMMU page sizes are 4K and
64K; 16M pages are supported but not advertised by default, in order to
enable them, the user has to specify "pgsz" property for PHB and
enable huge pages for RAM.

The existing linux guests try creating one additional huge DMA window
with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
the guest switches to dma_direct_ops and never calls TCE hypercalls
(H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
and not waste time on map/unmap later. This adds a "dma64_win_addr"
property which is a bus address for the 64bit window and by default
set to 0x800.0000.0000.0000 as this is what the modern POWER8 hardware
uses and this allows having emulated and VFIO devices on the same bus.

This adds 4 RTAS handlers:
* ibm,query-pe-dma-window
* ibm,create-pe-dma-window
* ibm,remove-pe-dma-window
* ibm,reset-pe-dma-window
These are registered from type_init() callback.

These RTAS handlers are implemented in a separate file to avoid polluting
spapr_iommu.c with PCI.

This changes sPAPRPHBState::dma_liobn to an array to allow 2 LIOBNs
and updates all references to dma_liobn. However this does not add
64bit LIOBN to the migration stream as in fact even 32bit LIOBN is
rather pointless there (as it is a PHB property and the management
software can/should pass LIOBNs via CLI) but we keep it for the backward
migration support.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

ae4de14c

vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2) · 2e4109de

由 Alexey Kardashevskiy 提交于 7月 04, 2016

New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management.
This adds ability to VFIO common code to dynamically allocate/remove
DMA windows in the host kernel when new VFIO container is added/removed.

This adds a helper to vfio_listener_region_add which makes
VFIO_IOMMU_SPAPR_TCE_CREATE ioctl and adds just created IOMMU into
the host IOMMU list; the opposite action is taken in
vfio_listener_region_del.

When creating a new window, this uses heuristic to decide on the TCE table
levels number.

This should cause no guest visible change in behavior.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
[dwg: Added some casts to prevent printf() warnings on certain targets
 where the kernel headers' __u64 doesn't match uint64_t or PRIx64]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

2e4109de

vfio: Add host side DMA window capabilities · f4ec5e26

由 Alexey Kardashevskiy 提交于 7月 04, 2016

There are going to be multiple IOMMUs per a container. This moves
the single host IOMMU parameter set to a list of VFIOHostDMAWindow.

This should cause no behavioral change and will be used later by
the SPAPR TCE IOMMU v2 which will also add a vfio_host_win_del() helper.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

f4ec5e26

vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2) · 318f67ce

由 Alexey Kardashevskiy 提交于 7月 04, 2016

This makes use of the new "memory registering" feature. The idea is
to provide the userspace ability to notify the host kernel about pages
which are going to be used for DMA. Having this information, the host
kernel can pin them all once per user process, do locked pages
accounting (once) and not spent time on doing that in real time with
possible failures which cannot be handled nicely in some cases.

This adds a prereg memory listener which listens on address_space_memory
and notifies a VFIO container about memory which needs to be
pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are skipped.

The feature is only enabled for SPAPR IOMMU v2. The host kernel changes
are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does
not call it when v2 is detected and enabled.

This enforces guest RAM blocks to be host page size aligned; however
this is not new as KVM already requires memory slots to be host page
size aligned.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
[dwg: Fix compile error on 32-bit host]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

318f67ce

spapr_iommu: Realloc guest visible TCE table when starting/stopping listening · 606b5498

由 Alexey Kardashevskiy 提交于 7月 04, 2016

The sPAPR TCE tables manage 2 copies when VFIO is using an IOMMU -
a guest view of the table and a hardware TCE table. If there is no VFIO
presense in the address space, then just the guest view is used, if
this is the case, it is allocated in the KVM. However since there is no
support yet for VFIO in KVM TCE hypercalls, when we start using VFIO,
we need to move the guest view from KVM to the userspace; and we need
to do this for every IOMMU on a bus with VFIO devices.

This implements the callbacks for the sPAPR IOMMU - notify_started()
reallocated the guest view to the user space, notify_stopped() does
the opposite.

This removes explicit spapr_tce_set_need_vfio() call from PCI hotplug
path as the new callbacks do this better - they notify IOMMU at
the exact moment when the configuration is changed, and this also
includes the case of PCI hot unplug.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

606b5498

ppc: simplify max_smt initialization in ppc_cpu_realizefn() · c4e6c423

由 Greg Kurz 提交于 7月 02, 2016

kvmppc_smt_threads() returns 1 if KVM is not enabled.
Signed-off-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

c4e6c423

spapr: Ensure thread0 of CPU core is always realized first · 7093645a

由 Bharata B Rao 提交于 7月 01, 2016

During CPU core realization, we create all the thread objects and parent
them to the core object in a loop. However, the realization of thread
objects is done separately by walking the threads of a core using
object_child_foreach(). With this, there is no guarantee on the order
in which the child thread objects get realized. Since CPU device tree
properties are currently derived from the CPU thread object, we assume
thread0 of the core to be the representative thread of the core when
creating device tree properties for the core. If thread0 is not the
first thread that gets realized, then we would end up having an
incorrect dt_id for the core and this causes hotplug failures from
the guest.

Fix this by realizing each thread object by walking the core's thread
object list thereby ensuring that thread0 and other threads are always
realized in the correct order.

Future TODO: CPU DT nodes are per-core properties and we should
ideally base the creation of CPU DT nodes on core objects rather than
the thread objects.
Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

7093645a

ppc: Fix xsrdpi, xvrdpi and xvrspi rounding · 158c87e5

由 Anton Blanchard 提交于 7月 04, 2016

xsrdpi, xvrdpi and xvrspi use the round ties away method, not round
nearest even.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

158c87e5

Merge remote-tracking branch 'remotes/kraxel/tags/pull-seabios-20160704-3' into staging · 11659423

由 Peter Maydell 提交于 7月 04, 2016

Revert "bios: Add fast variant of SeaBIOS for use with -kernel on x86."

# gpg: Signature made Mon 04 Jul 2016 16:24:55 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-seabios-20160704-3:
  Revert "bios: Add fast variant of SeaBIOS for use with -kernel on x86."
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

11659423

04 7月, 2016 3 次提交

Merge remote-tracking branch 'remotes/berrange/tags/pull-qcrypto-2016-07-04-1' into staging · 0d7e96c9

由 Peter Maydell 提交于 7月 04, 2016

Merge qcrypto 2016/07/04 v1

# gpg: Signature made Mon 04 Jul 2016 15:54:26 BST
# gpg:                using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/pull-qcrypto-2016-07-04-1:
  crypto: allow default TLS priority to be chosen at build time
  crypto: add support for TLS priority string override
  crypto: implement sha224, sha384, sha512 and ripemd160 hashes
  crypto: switch hash code to use nettle/gcrypt directly
  crypto: rename OUT to out in xts test to avoid clash on MinGW
  crypto: fix handling of iv generator hash defaults
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

0d7e96c9

Revert "bios: Add fast variant of SeaBIOS for use with -kernel on x86." · 3b1154ff

由 Gerd Hoffmann 提交于 7月 04, 2016

This reverts commit 4e04ab6a.

Also remove pc-bios/bios-fast.bin.

Commit was merged by mistake.
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>

3b1154ff

crypto: allow default TLS priority to be chosen at build time · a1c5e949

由 Daniel P. Berrange 提交于 6月 06, 2016

Modern gnutls can use a global config file to control the
crypto priority settings for TLS connections. For example
the priority string "@SYSTEM" instructs gnutls to find the
priority setting named "SYSTEM" in the global config file.

Latest gnutls GIT codebase gained the ability to reference
multiple priority strings in the config file, with the first
one that is found to existing winning. This means it is now
possible to configure QEMU out of the box with a default
priority of "@QEMU,SYSTEM", which says to look for the
settings "QEMU" first, and if not found, use the "SYSTEM"
settings.

To make use of this facility, we introduce the ability to
set the QEMU default priority at build time via a new
configure argument.  It is anticipated that distro vendors
will set this when building QEMU to a suitable value for
use with distro crypto policy setup. eg current Fedora
would run

 ./configure --tls-priority=@SYSTEM

while future Fedora would run

 ./configure --tls-priority=@QEMU,SYSTEM
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

a1c5e949