1. 05 7月, 2016 37 次提交
    • E
      block: Split bdrv_merge_limits() from bdrv_refresh_limits() · d9e0dfa2
      Eric Blake 提交于
      During bdrv_merge_limits(), we were computing initial limits
      based on another BDS in two places.  At first glance, the two
      computations are not identical (one is doing straight copying,
      the other is doing merging towards or away from zero) - but
      when you realize that the first round is starting with all-0
      memory, all of the merging happens to work.  Factoring out the
      merging makes it easier to track how two BDS limits are merged,
      in case we have future reasons to merge in even more limits.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      d9e0dfa2
    • E
      block: Drop raw_refresh_limits() · ad82be2f
      Eric Blake 提交于
      The raw block driver was blindly copying all limits from bs->file,
      even though: 1. the main bdrv_refresh_limits() already does this
      for many of the limits, and 2. blindly copying from the children
      can weaken any stricter limits that were already inherited from
      the backing chain during the main bdrv_refresh_limits().  Also,
      a future patch is about to move .request_alignment into
      BlockLimits, and that is a limit that should NOT be copied from
      other layers in the BDS chain.
      
      Thus, we can completely drop raw_refresh_limits(), and rely on
      the block layer setting up the proper limits.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      ad82be2f
    • E
      block: Switch discard length bounds to byte-based · b9f7855a
      Eric Blake 提交于
      Sector-based limits are awkward to think about; in our on-going
      quest to move to byte-based interfaces, convert max_discard and
      discard_alignment.  Rename them, using 'pdiscard' as an aid to
      track which remaining discard interfaces need conversion, and so
      that the compiler will help us catch the change in semantics
      across any rebased code.  The BlockLimits type is now completely
      byte-based; and in iscsi.c, sector_limits_lun2qemu() is no
      longer needed.
      
      pdiscard_alignment is made unsigned (we use power-of-2 alignments
      as bitmasks, where unsigned is easier to think about) while
      leaving max_pdiscard signed (since we still have an 'int'
      interface); this is comparable to what commit cf081fca did for
      write zeroes limits.  We may later want to make everything an
      unsigned 64-bit limit - but that requires a bigger code audit.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      b9f7855a
    • E
      block: Wording tweaks to write zeroes limits · 29cc6a68
      Eric Blake 提交于
      Improve the documentation of the write zeroes limits, to mention
      additional constraints that drivers should observe.  Worth squashing
      into commit cf081fca, if that hadn't been pushed already :)
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      29cc6a68
    • E
      block: Switch transfer length bounds to byte-based · 5def6b80
      Eric Blake 提交于
      Sector-based limits are awkward to think about; in our on-going
      quest to move to byte-based interfaces, convert max_transfer_length
      and opt_transfer_length.  Rename them (dropping the _length suffix)
      so that the compiler will help us catch the change in semantics
      across any rebased code, and improve the documentation.  Use unsigned
      values, so that we don't have to worry about negative values and
      so that bit-twiddling is easier; however, we are still constrained
      by 2^31 of signed int in most APIs.
      
      When a value comes from an external source (iscsi and raw-posix),
      sanitize the results to ensure that opt_transfer is a power of 2.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      5def6b80
    • E
      block: Set default request_alignment during bdrv_refresh_limits() · 79ba8c98
      Eric Blake 提交于
      We want to eventually stick request_alignment alongside other
      BlockLimits, but first, we must ensure it is populated at the
      same time as all other limits, rather than being a special case
      that is set only when a block is first opened.
      
      Now that all drivers have been updated to supply an override
      of request_alignment during their .bdrv_refresh_limits(), as
      needed, the block layer itself can defer setting the default
      alignment until part of the overall bdrv_refresh_limits().
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      79ba8c98
    • E
      block: Set request_alignment during .bdrv_refresh_limits() · a6506481
      Eric Blake 提交于
      We want to eventually stick request_alignment alongside other
      BlockLimits, but first, we must ensure it is populated at the
      same time as all other limits, rather than being a special case
      that is set only when a block is first opened.
      
      Add a .bdrv_refresh_limits() to all four of our legacy devices
      that will always be sector-only (bochs, cloop, dmg, vvfat), in
      spite of their recent conversion to expose a byte interface.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      a6506481
    • E
      raw-win32: Set request_alignment during .bdrv_refresh_limits() · 2914a1de
      Eric Blake 提交于
      We want to eventually stick request_alignment alongside other
      BlockLimits, but first, we must ensure it is populated at the
      same time as all other limits, rather than being a special case
      that is set only when a block is first opened.
      
      In this case, raw_probe_alignment() already did what we needed,
      so just fix its signature and wire it in correctly.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      2914a1de
    • E
      qcow2: Set request_alignment during .bdrv_refresh_limits() · a84178cc
      Eric Blake 提交于
      We want to eventually stick request_alignment alongside other
      BlockLimits, but first, we must ensure it is populated at the
      same time as all other limits, rather than being a special case
      that is set only when a block is first opened.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      a84178cc
    • E
      iscsi: Set request_alignment during .bdrv_refresh_limits() · c8b3b998
      Eric Blake 提交于
      We want to eventually stick request_alignment alongside other
      BlockLimits, but first, we must ensure it is populated at the
      same time as all other limits, rather than being a special case
      that is set only when a block is first opened.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      c8b3b998
    • E
      blkdebug: Set request_alignment during .bdrv_refresh_limits() · 835db3ee
      Eric Blake 提交于
      We want to eventually stick request_alignment alongside other
      BlockLimits, but first, we must ensure it is populated at the
      same time as all other limits, rather than being a special case
      that is set only when a block is first opened.
      
      Note that when the user does not provide "align", then we were
      defaulting to bs->request_alignment - but at this stage in the
      initialization, that was always 512.  We were also rejecting an
      explicit "align":0 from the user; this patch now allows that,
      as an explicit request for the default alignment (which may not
      always be 512 in the future).
      
      qemu-iotests 77 is particularly sensitive to the fact that we
      can specify an artificial alignment override in blkdebug, and
      that override must continue to work even when limits are
      refreshed on an already open device.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      835db3ee
    • E
      block: Give nonzero result to blk_get_max_transfer_length() · 24ce9a20
      Eric Blake 提交于
      Making all callers special-case 0 as unlimited is awkward,
      and we DO have a hard maximum of BDRV_REQUEST_MAX_SECTORS given
      our current block layer API limits.
      
      In the case of scsi, this means that we now always advertise a
      limit to the guest, even in cases where the underlying layers
      previously use 0 for no inherent limit beyond the block layer.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      24ce9a20
    • E
      scsi: Advertise limits by blocksize, not 512 · efaf4781
      Eric Blake 提交于
      s->blocksize may be larger than 512, in which case our
      tweaks to max_xfer_len and opt_xfer_len must be scaled
      appropriately.
      
      CC: qemu-stable@nongnu.org
      Reported-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      efaf4781
    • E
      iscsi: Advertise realistic limits to block layer · f9e95af0
      Eric Blake 提交于
      The function sector_limits_lun2qemu() returns a value in units of
      the block layer's 512-byte sector, and can be as large as
      0x40000000, which is much larger than the block layer's inherent
      limit of BDRV_REQUEST_MAX_SECTORS.  The block layer already
      handles '0' as a synonym to the inherent limit, and it is nicer
      to return this value than it is to calculate an arbitrary
      maximum, for two reasons: we want to ensure that the block layer
      continues to special-case '0' as 'no limit beyond the inherent
      limits'; and we want to be able to someday expand the block
      layer to allow 64-bit limits, where auditing for uses of
      BDRV_REQUEST_MAX_SECTORS will help us make sure we aren't
      artificially constraining iscsi to old block layer limits.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      f9e95af0
    • E
      nbd: Advertise realistic limits to block layer · 20220471
      Eric Blake 提交于
      We were basing the advertisement of maximum discard and transfer
      length off of UINT32_MAX, but since the rest of the block layer
      has signed int limits on a transaction, nothing could ever reach
      that maximum, and we risk overflowing an int once things are
      converted to byte-based rather than sector-based limits.  What's
      more, we DO have a much smaller limit: both the current kernel
      and qemu-nbd have a hard limit of 32M on a read or write
      transaction, and while they may also permit up to a full 32 bits
      on a discard transaction, the upstream NBD protocol is proposing
      wording that without any explicit advertisement otherwise,
      clients should limit ALL requests to the same limits as read and
      write, even though the other requests do not actually require as
      many bytes across the wire.  So the better limit to tell the
      block layer is 32M for both values.
      
      Behavior doesn't actually change with this patch (the block layer
      is currently ignoring the max_transfer advertisements); but when
      that problem is fixed in a later series, this patch will prevent
      the exposure of a latent bug.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      20220471
    • E
      nbd: Allow larger requests · 476b923c
      Eric Blake 提交于
      The NBD layer was breaking up request at a limit of 2040 sectors
      (just under 1M) to cater to old qemu-nbd. But the server limit
      was raised to 32M in commit 2d821488 to match the kernel, more
      than three years ago; and the upstream NBD Protocol is proposing
      documentation that without any explicit communication to state
      otherwise, a client should be able to safely assume that a 32M
      transaction will work.  It is time to rely on the larger sizing,
      and any downstream distro that cares about maximum
      interoperability to older qemu-nbd servers can just tweak the
      value of #define NBD_MAX_SECTORS.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: qemu-stable@nongnu.org
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      476b923c
    • E
      block: Fix harmless off-by-one in bdrv_aligned_preadv() · 82524274
      Eric Blake 提交于
      If the amount of data to read ends exactly on the total size
      of the bs, then we were wasting time creating a local qiov
      to read the data in preparation for what would normally be
      appending zeroes beyond the end, even though this corner case
      has nothing further to do.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      82524274
    • E
      block: Document supported flags during bdrv_aligned_preadv() · a604fa2b
      Eric Blake 提交于
      We don't pass any flags on to drivers to handle.  Tighten an
      assert to explain why we pass 0 to bdrv_driver_preadv(), and add
      some comments on things to be aware of if we want to turn on
      per-BDS BDRV_REQ_FUA support during reads in the future.  Also,
      document that we may want to consider using unmap during
      copy-on-read operations where the read is all zeroes.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      a604fa2b
    • E
      block: Tighter assertions on bdrv_aligned_pwritev() · cff86b38
      Eric Blake 提交于
      For symmetry with bdrv_aligned_preadv(), assert that the caller
      really has aligned things properly. This requires adding an align
      parameter, which is used now only in the new asserts, but will
      come in handy in a later patch that adds auto-fragmentation to the
      max transfer size, since that value need not always be a multiple
      of the alignment, and therefore must be rounded down.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      cff86b38
    • D
      qemu-img: fix failed autotests · cfef6a45
      Denis V. Lunev 提交于
      There are 9 iotests failed on Ubuntu 15.10 at the moment.
      The problem is that options parsing in qemu-img is broken by the
      following commit:
          commit 10985131
          Author: Denis V. Lunev <den@openvz.org>
          Date:   Fri Jun 17 17:44:13 2016 +0300
          qemu-img: move common options parsing before commands processing
      
      This strange command line reports error
        ./qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- 1024
        qemu-img: Invalid image size specified!
      while original code parses it successfully.
      
      The problem is that getopt_long state should be reset. This could be done
      using this assignment according to the manual:
          optind = 0
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Eric Blake <eblake@redhat.com>
      CC: Kevin Wolf <kwolf@redhat.com>
      CC: Max Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      cfef6a45
    • P
      Merge remote-tracking branch 'remotes/kraxel/tags/pull-ipxe-20160704-1' into staging · 60a0f1af
      Peter Maydell 提交于
      ipxe: update submodule from 4e03af8ec to 041863191
      e1000e+vmxnet3: add boot rom
      
      # gpg: Signature made Mon 04 Jul 2016 07:25:46 BST
      # gpg:                using RSA key 0x4CB6D8EED3E87138
      # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
      # gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
      # gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
      # Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138
      
      * remotes/kraxel/tags/pull-ipxe-20160704-1:
        build: add pc-bios to config-host.mak deps
        ipxe: add new roms to BLOBS
        ipxe: update prebuilt binaries
        vmxnet3: add boot rom
        e1000e: add boot rom
        ipxe: add vmxnet3 rom
        ipxe: add e1000e rom
        ipxe: update submodule from 4e03af8ec to 041863191
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      60a0f1af
    • P
      Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160705' into staging · 8662d7db
      Peter Maydell 提交于
      ppc patch queue for 2016-07-05
      
      Here's the current ppc, sPAPR and related drivers patch queue.
      
        * The big addition is dynamic DMA window support (this includes some
          core VFIO changes)
        * There are also several fixes to the MMU emulation for bugs
          introduced with the HV mode patches
        * Several other bugfixes and cleanups
      
      Changes in v2:
        I messed up and forgot to make a fix in the last patch which BenH
        pointed out (introduced by my rebasing).  That's fixed in this
        version, and I'm replacing the tag in place with the revised
        version.
      
      # gpg: Signature made Tue 05 Jul 2016 06:28:58 BST
      # gpg:                using RSA key 0x6C38CACA20D9B392
      # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
      # gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
      # gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
      # gpg: WARNING: This key is not certified with sufficiently trusted signatures!
      # gpg:          It is not certain that the signature belongs to the owner.
      # Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392
      
      * remotes/dgibson/tags/ppc-for-2.7-20160705:
        ppc/hash64: Fix support for LPCR:ISL
        ppc/hash64: Add proper real mode translation support
        target-ppc: Return page shift from PTEG search
        target-ppc: Simplify HPTE matching
        target-ppc: Correct page size decoding in ppc_hash64_pteg_search()
        ppc: simplify ppc_hash64_hpte_page_shift_noslb()
        spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)
        vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)
        vfio: Add host side DMA window capabilities
        vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2)
        spapr_iommu: Realloc guest visible TCE table when starting/stopping listening
        ppc: simplify max_smt initialization in ppc_cpu_realizefn()
        spapr: Ensure thread0 of CPU core is always realized first
        ppc: Fix xsrdpi, xvrdpi and xvrspi rounding
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      8662d7db
    • B
      ppc/hash64: Fix support for LPCR:ISL · 2c7ad804
      Benjamin Herrenschmidt 提交于
      We need to ignore the segment page size and essentially treat
      all pages as coming from a 4K segment.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [dwg: Adjusted for differences in my version of the prereq patches]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      2c7ad804
    • B
      ppc/hash64: Add proper real mode translation support · 912acdf4
      Benjamin Herrenschmidt 提交于
      This adds proper support for translating real mode addresses based
      on the combination of HV and LPCR bits. This handles HRMOR offset
      for hypervisor real mode, and both RMA and VRMA modes for guest
      real mode. PAPR mode adjusts the offsets appropriately to match the
      RMA used in TCG, but we need to limit to the max supported by the
      implementation (16G).
      
      This includes some fixes by Cédric Le Goater <clg@kaod.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [dwg: Adjusted for differences in my version of the prereq patches]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      912acdf4
    • D
      target-ppc: Return page shift from PTEG search · 94986863
      David Gibson 提交于
      ppc_hash64_pteg_search() now decodes a PTEs page size encoding, which it
      didn't previously do.  This means we're now double decoding the page size
      because we check it int he fault path after ppc64_hash64_htab_lookup()
      returns.
      
      To avoid this duplication have ppc_hash64_pteg_search() and
      ppc_hash64_htab_lookup() return the page size from the PTE and use that in
      the callers instead of decoding again.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      94986863
    • D
      target-ppc: Simplify HPTE matching · 073de86a
      David Gibson 提交于
      ppc_hash64_pteg_search() explicitly checks each HPTE's VALID and
      SECONDARY bits, then uses the HPTE64_V_COMPARE() macro to check the B field
      and AVPN.  However, a small tweak to HPTE64_V_COMPARE() means we can check
      all of these bits at once with a suitable ptem value.  So, consolidate all
      the comparisons for simplicity.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      073de86a
    • D
      target-ppc: Correct page size decoding in ppc_hash64_pteg_search() · 651060ab
      David Gibson 提交于
      The architecture specifies that when searching a PTEG for PTEs, entries
      with a page size encoding that's not valid for the current segment should
      be ignored, continuing the search.
      
      The current implementation does this with ppc_hash64_pte_size_decode()
      which is a very incomplete implementation of this check.  We already have
      code to do a full and correct page size decode in hpte_page_shift().
      
      This patch moves hpte_page_shift() so it can be used in
      ppc_hash64_pteg_search() and adjusts the latter's parameters to include
      a full SLBE instead of just a segment page shift.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      651060ab
    • C
      ppc: simplify ppc_hash64_hpte_page_shift_noslb() · 1f0252e6
      Cédric Le Goater 提交于
      The segment page shift parameter is never used. Let's remove it.
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      1f0252e6
    • A
      spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) · ae4de14c
      Alexey Kardashevskiy 提交于
      This adds support for Dynamic DMA Windows (DDW) option defined by
      the SPAPR specification which allows to have additional DMA window(s)
      
      The "ddw" property is enabled by default on a PHB but for compatibility
      the pseries-2.6 machine and older disable it.
      This also creates a single DMA window for the older machines to
      maintain backward migration.
      
      This implements DDW for PHB with emulated and VFIO devices. The host
      kernel support is required. The advertised IOMMU page sizes are 4K and
      64K; 16M pages are supported but not advertised by default, in order to
      enable them, the user has to specify "pgsz" property for PHB and
      enable huge pages for RAM.
      
      The existing linux guests try creating one additional huge DMA window
      with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
      the guest switches to dma_direct_ops and never calls TCE hypercalls
      (H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
      and not waste time on map/unmap later. This adds a "dma64_win_addr"
      property which is a bus address for the 64bit window and by default
      set to 0x800.0000.0000.0000 as this is what the modern POWER8 hardware
      uses and this allows having emulated and VFIO devices on the same bus.
      
      This adds 4 RTAS handlers:
      * ibm,query-pe-dma-window
      * ibm,create-pe-dma-window
      * ibm,remove-pe-dma-window
      * ibm,reset-pe-dma-window
      These are registered from type_init() callback.
      
      These RTAS handlers are implemented in a separate file to avoid polluting
      spapr_iommu.c with PCI.
      
      This changes sPAPRPHBState::dma_liobn to an array to allow 2 LIOBNs
      and updates all references to dma_liobn. However this does not add
      64bit LIOBN to the migration stream as in fact even 32bit LIOBN is
      rather pointless there (as it is a PHB property and the management
      software can/should pass LIOBNs via CLI) but we keep it for the backward
      migration support.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      ae4de14c
    • A
      vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2) · 2e4109de
      Alexey Kardashevskiy 提交于
      New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management.
      This adds ability to VFIO common code to dynamically allocate/remove
      DMA windows in the host kernel when new VFIO container is added/removed.
      
      This adds a helper to vfio_listener_region_add which makes
      VFIO_IOMMU_SPAPR_TCE_CREATE ioctl and adds just created IOMMU into
      the host IOMMU list; the opposite action is taken in
      vfio_listener_region_del.
      
      When creating a new window, this uses heuristic to decide on the TCE table
      levels number.
      
      This should cause no guest visible change in behavior.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [dwg: Added some casts to prevent printf() warnings on certain targets
       where the kernel headers' __u64 doesn't match uint64_t or PRIx64]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      2e4109de
    • A
      vfio: Add host side DMA window capabilities · f4ec5e26
      Alexey Kardashevskiy 提交于
      There are going to be multiple IOMMUs per a container. This moves
      the single host IOMMU parameter set to a list of VFIOHostDMAWindow.
      
      This should cause no behavioral change and will be used later by
      the SPAPR TCE IOMMU v2 which will also add a vfio_host_win_del() helper.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      f4ec5e26
    • A
      vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2) · 318f67ce
      Alexey Kardashevskiy 提交于
      This makes use of the new "memory registering" feature. The idea is
      to provide the userspace ability to notify the host kernel about pages
      which are going to be used for DMA. Having this information, the host
      kernel can pin them all once per user process, do locked pages
      accounting (once) and not spent time on doing that in real time with
      possible failures which cannot be handled nicely in some cases.
      
      This adds a prereg memory listener which listens on address_space_memory
      and notifies a VFIO container about memory which needs to be
      pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are skipped.
      
      The feature is only enabled for SPAPR IOMMU v2. The host kernel changes
      are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does
      not call it when v2 is detected and enabled.
      
      This enforces guest RAM blocks to be host page size aligned; however
      this is not new as KVM already requires memory slots to be host page
      size aligned.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [dwg: Fix compile error on 32-bit host]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      318f67ce
    • A
      spapr_iommu: Realloc guest visible TCE table when starting/stopping listening · 606b5498
      Alexey Kardashevskiy 提交于
      The sPAPR TCE tables manage 2 copies when VFIO is using an IOMMU -
      a guest view of the table and a hardware TCE table. If there is no VFIO
      presense in the address space, then just the guest view is used, if
      this is the case, it is allocated in the KVM. However since there is no
      support yet for VFIO in KVM TCE hypercalls, when we start using VFIO,
      we need to move the guest view from KVM to the userspace; and we need
      to do this for every IOMMU on a bus with VFIO devices.
      
      This implements the callbacks for the sPAPR IOMMU - notify_started()
      reallocated the guest view to the user space, notify_stopped() does
      the opposite.
      
      This removes explicit spapr_tce_set_need_vfio() call from PCI hotplug
      path as the new callbacks do this better - they notify IOMMU at
      the exact moment when the configuration is changed, and this also
      includes the case of PCI hot unplug.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      606b5498
    • G
      ppc: simplify max_smt initialization in ppc_cpu_realizefn() · c4e6c423
      Greg Kurz 提交于
      kvmppc_smt_threads() returns 1 if KVM is not enabled.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      c4e6c423
    • B
      spapr: Ensure thread0 of CPU core is always realized first · 7093645a
      Bharata B Rao 提交于
      During CPU core realization, we create all the thread objects and parent
      them to the core object in a loop. However, the realization of thread
      objects is done separately by walking the threads of a core using
      object_child_foreach(). With this, there is no guarantee on the order
      in which the child thread objects get realized. Since CPU device tree
      properties are currently derived from the CPU thread object, we assume
      thread0 of the core to be the representative thread of the core when
      creating device tree properties for the core. If thread0 is not the
      first thread that gets realized, then we would end up having an
      incorrect dt_id for the core and this causes hotplug failures from
      the guest.
      
      Fix this by realizing each thread object by walking the core's thread
      object list thereby ensuring that thread0 and other threads are always
      realized in the correct order.
      
      Future TODO: CPU DT nodes are per-core properties and we should
      ideally base the creation of CPU DT nodes on core objects rather than
      the thread objects.
      Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      7093645a
    • A
      ppc: Fix xsrdpi, xvrdpi and xvrspi rounding · 158c87e5
      Anton Blanchard 提交于
      xsrdpi, xvrdpi and xvrspi use the round ties away method, not round
      nearest even.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      158c87e5
    • P
      Merge remote-tracking branch 'remotes/kraxel/tags/pull-seabios-20160704-3' into staging · 11659423
      Peter Maydell 提交于
      Revert "bios: Add fast variant of SeaBIOS for use with -kernel on x86."
      
      # gpg: Signature made Mon 04 Jul 2016 16:24:55 BST
      # gpg:                using RSA key 0x4CB6D8EED3E87138
      # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
      # gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
      # gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
      # Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138
      
      * remotes/kraxel/tags/pull-seabios-20160704-3:
        Revert "bios: Add fast variant of SeaBIOS for use with -kernel on x86."
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      11659423
  2. 04 7月, 2016 3 次提交
    • P
      Merge remote-tracking branch 'remotes/berrange/tags/pull-qcrypto-2016-07-04-1' into staging · 0d7e96c9
      Peter Maydell 提交于
      Merge qcrypto 2016/07/04 v1
      
      # gpg: Signature made Mon 04 Jul 2016 15:54:26 BST
      # gpg:                using RSA key 0xBE86EBB415104FDF
      # gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
      # gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"
      # Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF
      
      * remotes/berrange/tags/pull-qcrypto-2016-07-04-1:
        crypto: allow default TLS priority to be chosen at build time
        crypto: add support for TLS priority string override
        crypto: implement sha224, sha384, sha512 and ripemd160 hashes
        crypto: switch hash code to use nettle/gcrypt directly
        crypto: rename OUT to out in xts test to avoid clash on MinGW
        crypto: fix handling of iv generator hash defaults
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      0d7e96c9
    • G
      Revert "bios: Add fast variant of SeaBIOS for use with -kernel on x86." · 3b1154ff
      Gerd Hoffmann 提交于
      This reverts commit 4e04ab6a.
      
      Also remove pc-bios/bios-fast.bin.
      
      Commit was merged by mistake.
      Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
      3b1154ff
    • D
      crypto: allow default TLS priority to be chosen at build time · a1c5e949
      Daniel P. Berrange 提交于
      Modern gnutls can use a global config file to control the
      crypto priority settings for TLS connections. For example
      the priority string "@SYSTEM" instructs gnutls to find the
      priority setting named "SYSTEM" in the global config file.
      
      Latest gnutls GIT codebase gained the ability to reference
      multiple priority strings in the config file, with the first
      one that is found to existing winning. This means it is now
      possible to configure QEMU out of the box with a default
      priority of "@QEMU,SYSTEM", which says to look for the
      settings "QEMU" first, and if not found, use the "SYSTEM"
      settings.
      
      To make use of this facility, we introduce the ability to
      set the QEMU default priority at build time via a new
      configure argument.  It is anticipated that distro vendors
      will set this when building QEMU to a suitable value for
      use with distro crypto policy setup. eg current Fedora
      would run
      
       ./configure --tls-priority=@SYSTEM
      
      while future Fedora would run
      
       ./configure --tls-priority=@QEMU,SYSTEM
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      a1c5e949