1. 01 7月, 2019 2 次提交
    • P
      Merge remote-tracking branch 'remotes/aperard/tags/pull-xen-20190624' into staging · ab67678a
      Peter Maydell 提交于
      Xen queue
      
      * Fix build
      * xen-block: support feature-large-sector-size
      * xen-block: Support IOThread polling for PV shared rings
      * Avoid usage of a VLA
      * Cleanup Xen headers usage
      
      # gpg: Signature made Mon 24 Jun 2019 16:30:32 BST
      # gpg:                using RSA key F80C006308E22CFD8A92E7980CF5572FD7FB55AF
      # gpg:                issuer "anthony.perard@citrix.com"
      # gpg: Good signature from "Anthony PERARD <anthony.perard@gmail.com>" [marginal]
      # gpg:                 aka "Anthony PERARD <anthony.perard@citrix.com>" [marginal]
      # gpg: WARNING: This key is not certified with sufficiently trusted signatures!
      # gpg:          It is not certain that the signature belongs to the owner.
      # Primary key fingerprint: 5379 2F71 024C 600F 778A  7161 D8D5 7199 DF83 42C8
      #      Subkey fingerprint: F80C 0063 08E2 2CFD 8A92  E798 0CF5 572F D7FB 55AF
      
      * remotes/aperard/tags/pull-xen-20190624:
        xen: Import other xen/io/*.h
        Revert xen/io/ring.h of "Clean up a few header guard symbols"
        xen: Drop includes of xen/hvm/params.h
        xen: Avoid VLA
        xen-bus / xen-block: add support for event channel polling
        xen-bus: allow AioContext to be specified for each event channel
        xen-bus: use a separate fd for each event channel
        xen-block: support feature-large-sector-size
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      ab67678a
    • P
      Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-06-24' into staging · 7fec76a0
      Peter Maydell 提交于
      Block patches:
      - The SSH block driver now uses libssh instead of libssh2
      - The VMDK block driver gets read-only support for the seSparse
        subformat
      - Various fixes
      
      # gpg: Signature made Mon 24 Jun 2019 15:42:56 BST
      # gpg:                using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
      # gpg:                issuer "mreitz@redhat.com"
      # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full]
      # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40
      
      * remotes/maxreitz/tags/pull-block-2019-06-24:
        iotests: Fix 205 for concurrent runs
        ssh: switch from libssh2 to libssh
        vmdk: Add read-only support for seSparse snapshots
        vmdk: Reduce the max bound for L1 table size
        vmdk: Fix comment regarding max l1_size coverage
        iotest 134: test cluster-misaligned encrypted write
        blockdev: enable non-root nodes for transaction drive-backup source
        nvme: do not advertise support for unsupported arbitration mechanism
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      7fec76a0
  2. 24 6月, 2019 16 次提交
    • M
      iotests: Fix 205 for concurrent runs · ab5d4a30
      Max Reitz 提交于
      Tests should place their files into the test directory.  This includes
      Unix sockets.  205 currently fails to do so, which prevents it from
      being run concurrently.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20190618210238.9524-1-mreitz@redhat.com
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      ab5d4a30
    • P
      ssh: switch from libssh2 to libssh · b10d49d7
      Pino Toscano 提交于
      Rewrite the implementation of the ssh block driver to use libssh instead
      of libssh2.  The libssh library has various advantages over libssh2:
      - easier API for authentication (for example for using ssh-agent)
      - easier API for known_hosts handling
      - supports newer types of keys in known_hosts
      
      Use APIs/features available in libssh 0.8 conditionally, to support
      older versions (which are not recommended though).
      
      Adjust the iotest 207 according to the different error message, and to
      find the default key type for localhost (to properly compare the
      fingerprint with).
      Contributed-by: NMax Reitz <mreitz@redhat.com>
      
      Adjust the various Docker/Travis scripts to use libssh when available
      instead of libssh2. The mingw/mxe testing is dropped for now, as there
      are no packages for it.
      Signed-off-by: NPino Toscano <ptoscano@redhat.com>
      Tested-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
      Acked-by: NAlex Bennée <alex.bennee@linaro.org>
      Message-id: 20190620200840.17655-1-ptoscano@redhat.com
      Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
      Message-id: 5873173.t2JhDm7DL7@lindworm.usersys.redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      b10d49d7
    • S
      vmdk: Add read-only support for seSparse snapshots · 98eb9733
      Sam Eiderman 提交于
      Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in
      QEMU).
      
      This format was lacking in the following:
      
          * Grain directory (L1) and grain table (L2) entries were 32-bit,
            allowing access to only 2TB (slightly less) of data.
          * The grain size (default) was 512 bytes - leading to data
            fragmentation and many grain tables.
          * For space reclamation purposes, it was necessary to find all the
            grains which are not pointed to by any grain table - so a reverse
            mapping of "offset of grain in vmdk" to "grain table" must be
            constructed - which takes large amounts of CPU/RAM.
      
      The format specification can be found in VMware's documentation:
      https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf
      
      In ESXi 6.5, to support snapshot files larger than 2TB, a new format was
      introduced: SESparse (Space Efficient).
      
      This format fixes the above issues:
      
          * All entries are now 64-bit.
          * The grain size (default) is 4KB.
          * Grain directory and grain tables are now located at the beginning
            of the file.
            + seSparse format reserves space for all grain tables.
            + Grain tables can be addressed using an index.
            + Grains are located in the end of the file and can also be
              addressed with an index.
            - seSparse vmdks of large disks (64TB) have huge preallocated
              headers - mainly due to L2 tables, even for empty snapshots.
          * The header contains a reverse mapping ("backmap") of "offset of
            grain in vmdk" to "grain table" and a bitmap ("free bitmap") which
            specifies for each grain - whether it is allocated or not.
            Using these data structures we can implement space reclamation
            efficiently.
          * Due to the fact that the header now maintains two mappings:
              * The regular one (grain directory & grain tables)
              * A reverse one (backmap and free bitmap)
            These data structures can lose consistency upon crash and result
            in a corrupted VMDK.
            Therefore, a journal is also added to the VMDK and is replayed
            when the VMware reopens the file after a crash.
      
      Since ESXi 6.7 - SESparse is the only snapshot format available.
      
      Unfortunately, VMware does not provide documentation regarding the new
      seSparse format.
      
      This commit is based on black-box research of the seSparse format.
      Various in-guest block operations and their effect on the snapshot file
      were tested.
      
      The only VMware provided source of information (regarding the underlying
      implementation) was a log file on the ESXi:
      
          /var/log/hostd.log
      
      Whenever an seSparse snapshot is created - the log is being populated
      with seSparse records.
      
      Relevant log records are of the form:
      
      [...] Const Header:
      [...]  constMagic     = 0xcafebabe
      [...]  version        = 2.1
      [...]  capacity       = 204800
      [...]  grainSize      = 8
      [...]  grainTableSize = 64
      [...]  flags          = 0
      [...] Extents:
      [...]  Header         : <1 : 1>
      [...]  JournalHdr     : <2 : 2>
      [...]  Journal        : <2048 : 2048>
      [...]  GrainDirectory : <4096 : 2048>
      [...]  GrainTables    : <6144 : 2048>
      [...]  FreeBitmap     : <8192 : 2048>
      [...]  BackMap        : <10240 : 2048>
      [...]  Grain          : <12288 : 204800>
      [...] Volatile Header:
      [...] volatileMagic     = 0xcafecafe
      [...] FreeGTNumber      = 0
      [...] nextTxnSeqNumber  = 0
      [...] replayJournal     = 0
      
      The sizes that are seen in the log file are in sectors.
      Extents are of the following format: <offset : size>
      
      This commit is a strict implementation which enforces:
          * magics
          * version number 2.1
          * grain size of 8 sectors  (4KB)
          * grain table size of 64 sectors
          * zero flags
          * extent locations
      
      Additionally, this commit proivdes only a subset of the functionality
      offered by seSparse's format:
          * Read-only
          * No journal replay
          * No space reclamation
          * No unmap support
      
      Hence, journal header, journal, free bitmap and backmap extents are
      unused, only the "classic" (L1 -> L2 -> data) grain access is
      implemented.
      
      However there are several differences in the grain access itself.
      Grain directory (L1):
          * Grain directory entries are indexes (not offsets) to grain
            tables.
          * Valid grain directory entries have their highest nibble set to
            0x1.
          * Since grain tables are always located in the beginning of the
            file - the index can fit into 32 bits - so we can use its low
            part if it's valid.
      Grain table (L2):
          * Grain table entries are indexes (not offsets) to grains.
          * If the highest nibble of the entry is:
              0x0:
                  The grain in not allocated.
                  The rest of the bytes are 0.
              0x1:
                  The grain is unmapped - guest sees a zero grain.
                  The rest of the bits point to the previously mapped grain,
                  see 0x3 case.
              0x2:
                  The grain is zero.
              0x3:
                  The grain is allocated - to get the index calculate:
                  ((entry & 0x0fff000000000000) >> 48) |
                  ((entry & 0x0000ffffffffffff) << 12)
          * The difference between 0x1 and 0x2 is that 0x1 is an unallocated
            grain which results from the guest using sg_unmap to unmap the
            grain - but the grain itself still exists in the grain extent - a
            space reclamation procedure should delete it.
            Unmapping a zero grain has no effect (0x2 will not change to 0x1)
            but unmapping an unallocated grain will (0x0 to 0x1) - naturally.
      
      In order to implement seSparse some fields had to be changed to support
      both 32-bit and 64-bit entry sizes.
      Reviewed-by: NKarl Heubaum <karl.heubaum@oracle.com>
      Reviewed-by: NEyal Moscovici <eyal.moscovici@oracle.com>
      Reviewed-by: NArbel Moshe <arbel.moshe@oracle.com>
      Signed-off-by: NSam Eiderman <shmuel.eiderman@oracle.com>
      Message-id: 20190620091057.47441-4-shmuel.eiderman@oracle.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      98eb9733
    • S
      vmdk: Reduce the max bound for L1 table size · 59d6ee48
      Sam Eiderman 提交于
      512M of L1 entries is a very loose bound, only 32M are required to store
      the maximal supported VMDK file size of 2TB.
      
      Fixed qemu-iotest 59# - now failure occures before on impossible L1
      table size.
      Reviewed-by: NKarl Heubaum <karl.heubaum@oracle.com>
      Reviewed-by: NEyal Moscovici <eyal.moscovici@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NArbel Moshe <arbel.moshe@oracle.com>
      Signed-off-by: NSam Eiderman <shmuel.eiderman@oracle.com>
      Message-id: 20190620091057.47441-3-shmuel.eiderman@oracle.com
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      59d6ee48
    • S
      vmdk: Fix comment regarding max l1_size coverage · 940a2cd5
      Sam Eiderman 提交于
      Commit b0651b8c ("vmdk: Move l1_size check into vmdk_add_extent")
      extended the l1_size check from VMDK4 to VMDK3 but did not update the
      default coverage in the moved comment.
      
      The previous vmdk4 calculation:
      
          (512 * 1024 * 1024) * 512(l2 entries) * 65536(grain) = 16PB
      
      The added vmdk3 calculation:
      
          (512 * 1024 * 1024) * 4096(l2 entries) * 512(grain) = 1PB
      
      Adding the calculation of vmdk3 to the comment.
      
      In any case, VMware does not offer virtual disks more than 2TB for
      vmdk4/vmdk3 or 64TB for the new undocumented seSparse format which is
      not implemented yet in qemu.
      Reviewed-by: NKarl Heubaum <karl.heubaum@oracle.com>
      Reviewed-by: NEyal Moscovici <eyal.moscovici@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NArbel Moshe <arbel.moshe@oracle.com>
      Signed-off-by: NSam Eiderman <shmuel.eiderman@oracle.com>
      Message-id: 20190620091057.47441-2-shmuel.eiderman@oracle.com
      Reviewed-by: Nyuchenlin <yuchenlin@synology.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      940a2cd5
    • A
      iotest 134: test cluster-misaligned encrypted write · 6ec889eb
      Anton Nefedov 提交于
      COW (even empty/zero) areas require encryption too
      Signed-off-by: NAnton Nefedov <anton.nefedov@virtuozzo.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NAlberto Garcia <berto@igalia.com>
      Message-id: 20190516143028.81155-1-anton.nefedov@virtuozzo.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      6ec889eb
    • V
      blockdev: enable non-root nodes for transaction drive-backup source · 85c9d133
      Vladimir Sementsov-Ogievskiy 提交于
      We forget to enable it for transaction .prepare, while it is already
      enabled in do_drive_backup since commit a2d665c1
          "blockdev: loosen restrictions on drive-backup source node"
      Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-id: 20190618140804.59214-1-vsementsov@virtuozzo.com
      Reviewed-by: NJohn Snow <jsnow@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      85c9d133
    • K
      nvme: do not advertise support for unsupported arbitration mechanism · 1cc354ac
      Klaus Birkelund Jensen 提交于
      The device mistakenly reports that the Weighted Round Robin with Urgent
      Priority Class arbitration mechanism is supported.
      
      It is not.
      Signed-off-by: NKlaus Birkelund Jensen <klaus.jensen@cnexlabs.com>
      Message-id: 20190606092530.14206-1-klaus@birkelund.eu
      Acked-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      1cc354ac
    • A
      xen: Import other xen/io/*.h · a3434a2d
      Anthony PERARD 提交于
      A Xen public header have been imported into QEMU (by
      f65eadb6 "xen: import ring.h from xen"), but there are other header
      that depends on ring.h which come from the system when building QEMU.
      
      This patch resolves the issue of having headers from the system
      importing a different copie of ring.h.
      
      This patch is prompt by the build issue described in the previous
      patch: 'Revert xen/io/ring.h of "Clean up a few header guard symbols"'
      
      ring.h and the new imported headers are moved to
      "include/hw/xen/interface" as those describe interfaces with a guest.
      
      The imported headers are cleaned up a bit while importing them: some
      part of the file that QEMU doesn't use are removed (description
      of how to make hypercall in grant_table.h have been removed).
      
      Other cleanup:
      - xen-mapcache.c and xen-legacy-backend.c don't need grant_table.h.
      - xenfb.c doesn't need event_channel.h.
      Signed-off-by: NAnthony PERARD <anthony.perard@citrix.com>
      Reviewed-by: NDaniel P. Berrangé <berrange@redhat.com>
      Reviewed-by: NPaul Durrant <paul.durrant@citrix.com>
      Message-Id: <20190621105441.3025-3-anthony.perard@citrix.com>
      a3434a2d
    • A
      Revert xen/io/ring.h of "Clean up a few header guard symbols" · d1744bd3
      Anthony PERARD 提交于
      This reverts changes to include/hw/xen/io/ring.h from commit
      37677d7d.
      
      Following 37677d7d "Clean up a few header guard symbols", QEMU start
      to fail to build:
      
      In file included from ~/xen/tools/../tools/include/xen/io/blkif.h:31:0,
                       from ~/xen/tools/qemu-xen-dir/hw/block/xen_blkif.h:5,
                       from ~/xen/tools/qemu-xen-dir/hw/block/xen-block.c:22:
      ~/xen/tools/../tools/include/xen/io/ring.h:68:0: error: "__CONST_RING_SIZE" redefined [-Werror]
       #define __CONST_RING_SIZE(_s, _sz) \
      
      In file included from ~/xen/tools/qemu-xen-dir/hw/block/xen_blkif.h:4:0,
                       from ~/xen/tools/qemu-xen-dir/hw/block/xen-block.c:22:
      ~/xen/tools/qemu-xen-dir/include/hw/xen/io/ring.h:66:0: note: this is the location of the previous definition
       #define __CONST_RING_SIZE(_s, _sz) \
      
      The issue is that some public xen headers have been imported (by
      f65eadb6 "xen: import ring.h from xen") but not all. With the change
      in the guards symbole, the ring.h header start to be imported twice.
      Signed-off-by: NAnthony PERARD <anthony.perard@citrix.com>
      Reviewed-by: NDaniel P. Berrangé <berrange@redhat.com>
      Reviewed-by: NPaul Durrant <paul.durrant@citrix.com>
      Message-Id: <20190621105441.3025-2-anthony.perard@citrix.com>
      d1744bd3
    • A
      xen: Drop includes of xen/hvm/params.h · 6e8d4593
      Anthony PERARD 提交于
      xen-mapcache.c doesn't needs params.h.
      
      xen-hvm.c uses defines available in params.h but so is xen_common.h
      which is included before. HVM_PARAM_* flags are only needed to make
      xc_hvm_param_{get,set} calls so including only xenctrl.h, which is
      where the definition the function is, should be enough.
      (xenctrl.h does include params.h)
      Signed-off-by: NAnthony PERARD <anthony.perard@citrix.com>
      Reviewed-by: NPaul Durrant <paul.durrant@citrix.com>
      Message-Id: <20190618112341.513-4-anthony.perard@citrix.com>
      6e8d4593
    • A
      xen: Avoid VLA · 34fbbc16
      Anthony PERARD 提交于
      Avoid using a variable length array.
      
      We allocate the `dirty_bitmap' buffer only once when we start tracking
      for dirty bits.
      Signed-off-by: NAnthony PERARD <anthony.perard@citrix.com>
      Reviewed-by: NPaul Durrant <paul.durrant@citrix.com>
      Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
      Message-Id: <20190618112341.513-5-anthony.perard@citrix.com>
      34fbbc16
    • P
      xen-bus / xen-block: add support for event channel polling · 345f42b4
      Paul Durrant 提交于
      This patch introduces a poll callback for event channel fd-s and uses
      this to invoke the channel callback function.
      
      To properly support polling, it is necessary for the event channel callback
      function to return a boolean saying whether it has done any useful work or
      not. Thus xen_block_dataplane_event() is modified to directly invoke
      xen_block_handle_requests() and the latter only returns true if it actually
      processes any requests. This also means that the call to qemu_bh_schedule()
      is moved into xen_block_complete_aio(), which is more intuitive since the
      only reason for doing a deferred poll of the shared ring should be because
      there were previously insufficient resources to fully complete a previous
      poll.
      Signed-off-by: NPaul Durrant <paul.durrant@citrix.com>
      Reviewed-by: NAnthony PERARD <anthony.perard@citrix.com>
      Message-Id: <20190408151617.13025-4-paul.durrant@citrix.com>
      Signed-off-by: NAnthony PERARD <anthony.perard@citrix.com>
      345f42b4
    • P
      xen-bus: allow AioContext to be specified for each event channel · 83361a8a
      Paul Durrant 提交于
      This patch adds an AioContext parameter to xen_device_bind_event_channel()
      and then uses aio_set_fd_handler() to set the callback rather than
      qemu_set_fd_handler().
      Signed-off-by: NPaul Durrant <paul.durrant@citrix.com>
      Reviewed-by: NAnthony PERARD <anthony.perard@citrix.com>
      Message-Id: <20190408151617.13025-3-paul.durrant@citrix.com>
      [Call aio_set_fd_handler() with is_external=true]
      Signed-off-by: NAnthony PERARD <anthony.perard@citrix.com>
      83361a8a
    • P
      xen-bus: use a separate fd for each event channel · c0b336ea
      Paul Durrant 提交于
      To better support use of IOThread-s it will be necessary to be able to set
      the AioContext for each XenEventChannel and hence it is necessary to open a
      separate handle to libxenevtchan for each channel.
      
      This patch stops using NotifierList for event channel callbacks, replacing
      that construct by a list of complete XenEventChannel structures. Each of
      these now has a xenevtchn_handle pointer in place of the single pointer
      previously held in the XenDevice structure. The individual handles are
      opened/closed in xen_device_bind/unbind_event_channel(), replacing the
      single open/close in xen_device_realize/unrealize().
      
      NOTE: This patch does not add an AioContext parameter to
            xen_device_bind_event_channel(). That will be done in a subsequent
            patch.
      Signed-off-by: NPaul Durrant <paul.durrant@citrix.com>
      Reviewed-by: NAnthony PERARD <anthony.perard@citrix.com>
      Message-Id: <20190408151617.13025-2-paul.durrant@citrix.com>
      Signed-off-by: NAnthony PERARD <anthony.perard@citrix.com>
      c0b336ea
    • P
      xen-block: support feature-large-sector-size · 5feeb718
      Paul Durrant 提交于
      A recent Xen commit [1] clarified the semantics of sector based quantities
      used in the blkif protocol such that it is now safe to create a xen-block
      device with a logical_block_size != 512, as long as the device only
      connects to a frontend advertizing 'feature-large-block-size'.
      
      This patch modifies xen-block accordingly. It also uses a stack variable
      for the BlockBackend in xen_block_realize() to avoid repeated dereferencing
      of the BlockConf pointer, and changes the parameters of
      xen_block_dataplane_create() so that the BlockBackend pointer and sector
      size are passed expicitly rather than implicitly via the BlockConf.
      
      These modifications have been tested against a recent Windows PV XENVBD
      driver [2] using a xen-disk device with a 4kB logical block size.
      
      [1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=67e1c050e36b2c9900cca83618e56189effbad98
      [2] https://winpvdrvbuild.xenproject.org:8080/job/XENVBD-master/126Signed-off-by: NPaul Durrant <paul.durrant@citrix.com>
      Reviewed-by: NAnthony PERARD <anthony.perard@citrix.com>
      Message-Id: <20190409164038.25484-1-paul.durrant@citrix.com>
      [Edited error message]
      Signed-off-by: NAnthony PERARD <anthony.perard@citrix.com>
      5feeb718
  3. 21 6月, 2019 22 次提交