1. 14 10月, 2019 18 次提交
    • M
      qcow2: Limit total allocation range to INT_MAX · d1b9d19f
      Max Reitz 提交于
      When the COW areas are included, the size of an allocation can exceed
      INT_MAX.  This is kind of limited by handle_alloc() in that it already
      caps avail_bytes at INT_MAX, but the number of clusters still reflects
      the original length.
      
      This can have all sorts of effects, ranging from the storage layer write
      call failing to image corruption.  (If there were no image corruption,
      then I suppose there would be data loss because the .cow_end area is
      forced to be empty, even though there might be something we need to
      COW.)
      
      Fix all of it by limiting nb_clusters so the equivalent number of bytes
      will not exceed INT_MAX.
      
      Cc: qemu-stable@nongnu.org
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NPhilippe Mathieu-Daudé <philmd@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      d1b9d19f
    • K
      qemu-nbd: Support help options for --object · 495bf893
      Kevin Wolf 提交于
      Instead of parsing help options as normal object properties and
      returning an error, provide the same help functionality as the system
      emulator in qemu-nbd, too.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      495bf893
    • K
      qemu-img: Support help options for --object · c6e5cdfd
      Kevin Wolf 提交于
      Instead of parsing help options as normal object properties and
      returning an error, provide the same help functionality as the system
      emulator in qemu-img, too.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      c6e5cdfd
    • K
      qemu-io: Support help options for --object · 4fa1f0dc
      Kevin Wolf 提交于
      Instead of parsing help options as normal object properties and
      returning an error, provide the same help functionality as the system
      emulator in qemu-io, too.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      4fa1f0dc
    • K
      vl: Split off user_creatable_print_help() · 3e9297f3
      Kevin Wolf 提交于
      Printing help for --object is something that we not only want in the
      system emulator, but also in tools that support --object. Move it into a
      separate function in qom/object_interfaces.c to make the code accessible
      for tools.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      3e9297f3
    • M
      iotests/028: Fix for long $TEST_DIRs · 48c8d3ce
      Max Reitz 提交于
      For long test image paths, the order of the "Formatting" line and the
      "(qemu)" prompt after a drive_backup HMP command may be reversed.  In
      fact, the interaction between the prompt and the line may lead to the
      "Formatting" to being greppable at all after "read"-ing it (if the
      prompt injects an IFS character into the "Formatting" string).
      
      So just wait until we get a prompt.  At that point, the block job must
      have been started, so "info block-jobs" will only return "No active
      jobs" once it is done.
      Reported-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NJohn Snow <jsnow@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      48c8d3ce
    • A
      block: Reject misaligned write requests with BDRV_REQ_NO_FALLBACK · f2208fdc
      Alberto Garcia 提交于
      The BDRV_REQ_NO_FALLBACK flag means that an operation should only be
      performed if it can be offloaded or otherwise performed efficiently.
      
      However a misaligned write request requires a RMW so we should return
      an error and let the caller decide how to proceed.
      
      This hits an assertion since commit c8bb23cb if the required
      alignment is larger than the cluster size:
      
      qemu-img create -f qcow2 -o cluster_size=2k img.qcow2 4G
      qemu-io -c "open -o driver=qcow2,file.align=4k blkdebug::img.qcow2" \
              -c 'write 0 512'
      qemu-io: block/io.c:1127: bdrv_driver_pwritev: Assertion `!(flags & BDRV_REQ_NO_FALLBACK)' failed.
      Aborted
      
      The reason is that when writing to an unallocated cluster we try to
      skip the copy-on-write part and zeroize it using BDRV_REQ_NO_FALLBACK
      instead, resulting in a write request that is too small (2KB cluster
      size vs 4KB required alignment).
      Signed-off-by: NAlberto Garcia <berto@igalia.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      f2208fdc
    • P
      replay: add BH oneshot event for block layer · e4ec5ad4
      Pavel Dovgalyuk 提交于
      Replay is capable of recording normal BH events, but sometimes
      there are single use callbacks scheduled with aio_bh_schedule_oneshot
      function. This patch enables recording and replaying such callbacks.
      Block layer uses these events for calling the completion function.
      Replaying these calls makes the execution deterministic.
      Signed-off-by: NPavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
      Acked-by: NKevin Wolf <kwolf@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      e4ec5ad4
    • P
      replay: finish record/replay before closing the disks · ae25dccb
      Pavel Dovgalyuk 提交于
      After recent updates block devices cannot be closed on qemu exit.
      This happens due to the block request polling when replay is not finished.
      Therefore now we stop execution recording before closing the block devices.
      Signed-off-by: NPavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      ae25dccb
    • P
      replay: don't drain/flush bdrv queue while RR is working · c8aa7895
      Pavel Dovgalyuk 提交于
      In record/replay mode bdrv queue is controlled by replay mechanism.
      It does not allow saving or loading the snapshots
      when bdrv queue is not empty. Stopping the VM is not blocked by nonempty
      queue, but flushing the queue is still impossible there,
      because it may cause deadlocks in replay mode.
      This patch disables bdrv_drain_all and bdrv_flush_all in
      record/replay mode.
      
      Stopping the machine when the IO requests are not finished is needed
      for the debugging. E.g., breakpoint may be set at the specified step,
      and forcing the IO requests to finish may break the determinism
      of the execution.
      Signed-off-by: NPavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
      Acked-by: NKevin Wolf <kwolf@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      c8aa7895
    • P
      replay: update docs for record/replay with block devices · de499eb6
      Pavel Dovgalyuk 提交于
      This patch updates the description of the command lines for using
      record/replay with attached block devices.
      Signed-off-by: NPavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      de499eb6
    • P
      replay: disable default snapshot for record/replay · 25863975
      Pavel Dovgalyuk 提交于
      This patch disables setting '-snapshot' option on by default
      in record/replay mode. This is needed for creating vmstates in record
      and replay modes.
      Signed-off-by: NPavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
      Acked-by: NKevin Wolf <kwolf@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      25863975
    • P
      block: implement bdrv_snapshot_goto for blkreplay · 3c6c4348
      Pavel Dovgalyuk 提交于
      This patch enables making snapshots with blkreplay used in
      block devices.
      This function is required to make bdrv_snapshot_goto without
      calling .bdrv_open which is not implemented.
      Signed-off-by: NPavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
      Acked-by: NKevin Wolf <kwolf@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      3c6c4348
    • P
      block/vhdx: add check for truncated image files · 6caaad46
      Peter Lieven 提交于
      qemu is currently not able to detect truncated vhdx image files.
      Add a basic check if all allocated blocks are reachable at open and
      report all errors during bdrv_co_check.
      Signed-off-by: NPeter Lieven <pl@kamp.de>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      6caaad46
    • P
      Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20191010.0' into staging · 22dbfdec
      Peter Maydell 提交于
      VFIO update 2019-10-10
      
       - Fix MSI error path double free (Evgeny Yakovlev)
      
      # gpg: Signature made Thu 10 Oct 2019 20:07:39 BST
      # gpg:                using RSA key 239B9B6E3BB08B22
      # gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>" [full]
      # gpg:                 aka "Alex Williamson <alex@shazbot.org>" [full]
      # gpg:                 aka "Alex Williamson <alwillia@redhat.com>" [full]
      # gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>" [full]
      # Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22
      
      * remotes/awilliam/tags/vfio-update-20191010.0:
        hw/vfio/pci: fix double free in vfio_msi_disable
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      22dbfdec
    • P
      Merge remote-tracking branch 'remotes/gkurz/tags/9p-next-2019-10-10' into staging · c8b2bc51
      Peter Maydell 提交于
      The most notable change is that we now detect cross-device setups in the
      host since it may cause inode number collision and mayhem in the guest.
      A new fsdev property is added for the user to choose the appropriate
      policy to handle that: either remap all inode numbers or fail I/Os to
      another host device or just print out a warning (default behaviour).
      
      This is also my last PR as _active_ maintainer of 9pfs.
      
      # gpg: Signature made Thu 10 Oct 2019 12:14:07 BST
      # gpg:                using RSA key B4828BAF943140CEF2A3491071D4D5E5822F73D6
      # gpg: Good signature from "Greg Kurz <groug@kaod.org>" [full]
      # gpg:                 aka "Gregory Kurz <gregory.kurz@free.fr>" [full]
      # gpg:                 aka "[jpeg image of size 3330]" [full]
      # Primary key fingerprint: B482 8BAF 9431 40CE F2A3  4910 71D4 D5E5 822F 73D6
      
      * remotes/gkurz/tags/9p-next-2019-10-10:
        MAINTAINERS: Downgrade status of virtio-9p to "Odd Fixes"
        9p: Use variable length suffixes for inode remapping
        9p: stat_to_qid: implement slow path
        9p: Added virtfs option 'multidevs=remap|forbid|warn'
        9p: Treat multiple devices on one export as an error
        fsdev: Add return value to fsdev_throttle_parse_opts()
        9p: Simplify error path of v9fs_device_realize_common()
        9p: unsigned type for type, version, path
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      c8b2bc51
    • P
      Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-10-10' into staging · 088d6709
      Peter Maydell 提交于
      Block patches:
      - Parallelized request handling for qcow2
      - Backup job refactoring to use a filter node instead of before-write
        notifiers
      - Add discard accounting information to file-posix nodes
      - Allow trivial reopening of nbd nodes
      - Some iotest fixes
      
      # gpg: Signature made Thu 10 Oct 2019 12:40:34 BST
      # gpg:                using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
      # gpg:                issuer "mreitz@redhat.com"
      # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full]
      # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40
      
      * remotes/maxreitz/tags/pull-block-2019-10-10: (36 commits)
        iotests/162: Fix for newer Linux 5.3+
        tests: fix I/O test for hosts defaulting to LUKSv2
        nbd: add empty .bdrv_reopen_prepare
        block/backup: use backup-top instead of write notifiers
        block: introduce backup-top filter driver
        block/block-copy: split block_copy_set_callbacks function
        block/backup: move write_flags calculation inside backup_job_create
        block/backup: move in-flight requests handling from backup to block-copy
        iotests: Use stat -c %b in 125
        iotests: Disable 125 on broken XFS versions
        iotests: Fix 125 for growth_mode = metadata
        qapi: query-blockstat: add driver specific file-posix stats
        file-posix: account discard operations
        scsi: account unmap operations
        scsi: move unmap error checking to the complete callback
        scsi: store unmap offset and nb_sectors in request struct
        ide: account UNMAP (TRIM) operations
        block: add empty account cookie type
        qapi: add unmap to BlockDeviceStats
        qapi: group BlockDeviceStats fields
        ...
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      088d6709
    • P
      Merge remote-tracking branch 'remotes/davidhildenbrand/tags/s390x-tcg-2019-10-10' into staging · cdfc44ac
      Peter Maydell 提交于
      - MMU DAT translation rewrite and cleanup
      - Implement more TCG CPU features related to the MMU (e.g., IEP)
      - Add the current instruction length to unwind data and clean up
      - Resolve one TODO for the MVCL instruction
      
      # gpg: Signature made Thu 10 Oct 2019 12:25:06 BST
      # gpg:                using RSA key 1BD9CAAD735C4C3A460DFCCA4DDE10F700FF835A
      # gpg:                issuer "david@redhat.com"
      # gpg: Good signature from "David Hildenbrand <david@redhat.com>" [unknown]
      # gpg:                 aka "David Hildenbrand <davidhildenbrand@gmail.com>" [full]
      # Primary key fingerprint: 1BD9 CAAD 735C 4C3A 460D  FCCA 4DDE 10F7 00FF 835A
      
      * remotes/davidhildenbrand/tags/s390x-tcg-2019-10-10: (31 commits)
        s390x/tcg: MVCL: Exit to main loop if requested
        target/s390x: Remove ILEN_UNWIND
        target/s390x: Remove ilen argument from trigger_pgm_exception
        target/s390x: Remove ilen argument from trigger_access_exception
        target/s390x: Remove ILEN_AUTO
        target/s390x: Rely on unwinding in s390_cpu_virt_mem_rw
        target/s390x: Rely on unwinding in s390_cpu_tlb_fill
        target/s390x: Simplify helper_lra
        target/s390x: Remove fail variable from s390_cpu_tlb_fill
        target/s390x: Return exception from translate_pages
        target/s390x: Return exception from mmu_translate
        target/s390x: Remove exc argument to mmu_translate_asce
        target/s390x: Return exception from mmu_translate_real
        target/s390x: Handle tec in s390_cpu_tlb_fill
        target/s390x: Push trigger_pgm_exception lower in s390_cpu_tlb_fill
        target/s390x: Use tcg_s390_program_interrupt in TCG helpers
        target/s390x: Remove ilen parameter from s390_program_interrupt
        target/s390x: Remove ilen parameter from tcg_s390_program_interrupt
        target/s390x: Add ilen to unwind data
        s390x/cpumodel: Add new TCG features to QEMU cpu model
        ...
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      cdfc44ac
  2. 11 10月, 2019 1 次提交
    • E
      hw/vfio/pci: fix double free in vfio_msi_disable · d964d3b5
      Evgeny Yakovlev 提交于
      The following guest behaviour patter leads to double free in VFIO PCI:
      
      1. Guest enables MSI interrupts
      vfio_msi_enable is called, but fails in vfio_enable_vectors.
      In our case this was because VFIO GPU device was in D3 state.
      Unhappy path in vfio_msi_enable will g_free(vdev->msi_vectors) but not
      set this pointer to NULL
      
      2. Guest still sees MSI an enabled after that because emulated config
      write is done in vfio_pci_write_config unconditionally before calling
      vfio_msi_enable
      
      3. Guest disables MSI interrupts
      vfio_msi_disable is called and tries to g_free(vdev->msi_vectors)
      in vfio_msi_disable_common => double free
      Signed-off-by: NEvgeny Yakovlev <wrfsh@yandex-team.ru>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      d964d3b5
  3. 10 10月, 2019 21 次提交
    • G
      MAINTAINERS: Downgrade status of virtio-9p to "Odd Fixes" · e410bbc5
      Greg Kurz 提交于
      Latest submissions for 9pfs made me realize that I no longer have time
      and motivation to actively support it. I'll stay around for odd fixes
      though.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      e410bbc5
    • D
      s390x/tcg: MVCL: Exit to main loop if requested · 1f6493be
      David Hildenbrand 提交于
      MVCL is interruptible and we should check for interrupts and process
      them after writing back the variables to the registers. Let's check
      for any exit requests and exit to the main loop. Introduce a new helper
      function for that: cpu_loop_exit_requested().
      
      When booting Fedora 30, I can see a handful of these exits and it seems
      to work reliable. Also, Richard explained why this works correctly even
      when MVCL is called via EXECUTE:
      
          (1) TB with EXECUTE runs, at address Ae
              - env->psw_addr stored with Ae.
              - helper_ex() runs, memory address Am computed
                from D2a(X2a,B2a) or from psw.addr+RI2.
              - env->ex_value stored with memory value modified by R1a
      
          (2) TB of executee runs,
              - env->ex_value stored with 0.
              - helper_mvcl() runs, using and updating R1b, R1b+1, R2b, R2b+1.
      
          (3a) helper_mvcl() completes,
               - TB of executee continues, psw.addr += ilen.
               - Next instruction is the one following EXECUTE.
      
          (3b) helper_mvcl() exits to main loop,
               - cpu_loop_exit_restore() unwinds psw.addr = Ae.
               - Next instruction is the EXECUTE itself...
               - goto 1.
      
      As the PoP mentiones that an interruptible instruction called via EXECUTE
      should avoid modifying storage/registers that are used by EXECUTE itself,
      it is fine to retrigger EXECUTE.
      
      Cc: Alex Bennée <alex.bennee@linaro.org>
      Cc: Peter Maydell <peter.maydell@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Suggested-by: NRichard Henderson <richard.henderson@linaro.org>
      Reviewed-by: NRichard Henderson <richard.henderson@linaro.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      1f6493be
    • M
      iotests/162: Fix for newer Linux 5.3+ · 35f05b2e
      Max Reitz 提交于
      Linux 5.3 has made 0.0.0.0/8 a working IPv4 subnet.  As such, "42" is
      now a valid host, and the connection to it will (hopefully) time out
      over a long period rather than quickly return with EINVAL.
      
      So let us use a negative integer for testing that NBD will not crash
      when it receives integer hosts.  This way, the connection will again
      fail quickly and reliably.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20191002174052.5773-1-mreitz@redhat.com
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      35f05b2e
    • C
      9p: Use variable length suffixes for inode remapping · 6b6aa828
      Christian Schoenebeck 提交于
      Use variable length suffixes for inode remapping instead of the fixed
      16 bit size prefixes before. With this change the inode numbers on guest
      will typically be much smaller (e.g. around >2^1 .. >2^7 instead of >2^48
      with the previous fixed size inode remapping.
      
      Additionally this solution is more efficient, since inode numbers in
      practice can take almost their entire 64 bit range on guest as well, so
      there is less likely a need for generating and tracking additional suffixes,
      which might also be beneficial for nested virtualization where each level of
      virtualization would shift up the inode bits and increase the chance of
      expensive remapping actions.
      
      The "Exponential Golomb" algorithm is used as basis for generating the
      variable length suffixes. The algorithm has a parameter k which controls the
      distribution of bits on increasing indeces (minimum bits at low index vs.
      maximum bits at high index). With k=0 the generated suffixes look like:
      
      Index Dec/Bin -> Generated Suffix Bin
      1 [1] -> [1] (1 bits)
      2 [10] -> [010] (3 bits)
      3 [11] -> [110] (3 bits)
      4 [100] -> [00100] (5 bits)
      5 [101] -> [10100] (5 bits)
      6 [110] -> [01100] (5 bits)
      7 [111] -> [11100] (5 bits)
      8 [1000] -> [0001000] (7 bits)
      9 [1001] -> [1001000] (7 bits)
      10 [1010] -> [0101000] (7 bits)
      11 [1011] -> [1101000] (7 bits)
      12 [1100] -> [0011000] (7 bits)
      ...
      65533 [1111111111111101] ->  [1011111111111111000000000000000] (31 bits)
      65534 [1111111111111110] ->  [0111111111111111000000000000000] (31 bits)
      65535 [1111111111111111] ->  [1111111111111111000000000000000] (31 bits)
      Hence minBits=1 maxBits=31
      
      And with k=5 they would look like:
      
      Index Dec/Bin -> Generated Suffix Bin
      1 [1] -> [000001] (6 bits)
      2 [10] -> [100001] (6 bits)
      3 [11] -> [010001] (6 bits)
      4 [100] -> [110001] (6 bits)
      5 [101] -> [001001] (6 bits)
      6 [110] -> [101001] (6 bits)
      7 [111] -> [011001] (6 bits)
      8 [1000] -> [111001] (6 bits)
      9 [1001] -> [000101] (6 bits)
      10 [1010] -> [100101] (6 bits)
      11 [1011] -> [010101] (6 bits)
      12 [1100] -> [110101] (6 bits)
      ...
      65533 [1111111111111101] -> [0011100000000000100000000000] (28 bits)
      65534 [1111111111111110] -> [1011100000000000100000000000] (28 bits)
      65535 [1111111111111111] -> [0111100000000000100000000000] (28 bits)
      Hence minBits=6 maxBits=28
      Signed-off-by: NChristian Schoenebeck <qemu_oss@crudebyte.com>
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      6b6aa828
    • A
      9p: stat_to_qid: implement slow path · f3fe4a2d
      Antonios Motakis 提交于
      stat_to_qid attempts via qid_path_prefixmap to map unique files (which are
      identified by 64 bit inode nr and 32 bit device id) to a 64 QID path value.
      However this implementation makes some assumptions about inode number
      generation on the host.
      
      If qid_path_prefixmap fails, we still have 48 bits available in the QID
      path to fall back to a less memory efficient full mapping.
      Signed-off-by: NAntonios Motakis <antonios.motakis@huawei.com>
      [CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next
             (SHA1 7fc4c49e91).
           - Updated hash calls to new xxhash API.
           - Removed unnecessary parantheses in qpf_lookup_func().
           - Removed unnecessary g_malloc0() result checks.
           - Log error message when running out of prefixes in
             qid_path_fullmap().
           - Log warning message about potential degraded performance in
             qid_path_prefixmap().
           - Wrapped qpf_table initialization to dedicated qpf_table_init()
             function.
           - Fixed typo in comment. ]
      Signed-off-by: NChristian Schoenebeck <qemu_oss@crudebyte.com>
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      f3fe4a2d
    • A
      9p: Added virtfs option 'multidevs=remap|forbid|warn' · 1a6ed33c
      Antonios Motakis 提交于
      'warn' (default): Only log an error message (once) on host if more than one
      device is shared by same export, except of that just ignore this config
      error though. This is the default behaviour for not breaking existing
      installations implying that they really know what they are doing.
      
      'forbid': Like 'warn', but except of just logging an error this
      also denies access of guest to additional devices.
      
      'remap': Allows to share more than one device per export by remapping
      inodes from host to guest appropriately. To support multiple devices on the
      9p share, and avoid qid path collisions we take the device id as input to
      generate a unique QID path. The lowest 48 bits of the path will be set
      equal to the file inode, and the top bits will be uniquely assigned based
      on the top 16 bits of the inode and the device id.
      Signed-off-by: NAntonios Motakis <antonios.motakis@huawei.com>
      [CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next
             (SHA1 7fc4c49e91).
           - Added virtfs option 'multidevs', original patch simply did the inode
             remapping without being asked.
           - Updated hash calls to new xxhash API.
           - Updated docs for new option 'multidevs'.
           - Fixed v9fs_do_readdir() not having remapped inodes.
           - Log error message when running out of prefixes in
             qid_path_prefixmap().
           - Fixed definition of QPATH_INO_MASK.
           - Wrapped qpp_table initialization to dedicated qpp_table_init()
             function.
           - Dropped unnecessary parantheses in qpp_lookup_func().
           - Dropped unnecessary g_malloc0() result checks. ]
      Signed-off-by: NChristian Schoenebeck <qemu_oss@crudebyte.com>
      [groug: - Moved "multidevs" parsing to the local backend.
              - Added hint to invalid multidevs option error.
      	- Turn "remap" into "x-remap". ]
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      1a6ed33c
    • A
      9p: Treat multiple devices on one export as an error · 3b5ee9e8
      Antonios Motakis 提交于
      The QID path should uniquely identify a file. However, the
      inode of a file is currently used as the QID path, which
      on its own only uniquely identifies files within a device.
      Here we track the device hosting the 9pfs share, in order
      to prevent security issues with QID path collisions from
      other devices.
      
      We only print a warning for now but a subsequent patch will
      allow users to have finer control over the desired behaviour.
      Failing the I/O will be one the proposed behaviour, so we
      also change stat_to_qid() to return an error here in order to
      keep other patches simpler.
      Signed-off-by: NAntonios Motakis <antonios.motakis@huawei.com>
      [CS: - Assign dev_id to export root's device already in
             v9fs_device_realize_common(), not postponed in
             stat_to_qid().
           - error_report_once() if more than one device was
             shared by export.
           - Return -ENODEV instead of -ENOSYS in stat_to_qid().
           - Fixed typo in log comment. ]
      Signed-off-by: NChristian Schoenebeck <qemu_oss@crudebyte.com>
      [groug, changed to warning, updated message and changelog]
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      3b5ee9e8
    • G
      fsdev: Add return value to fsdev_throttle_parse_opts() · ea52cdd4
      Greg Kurz 提交于
      It is more convenient to use the return value of the function to notify
      errors, rather than to be tied up setting up the &local_err boilerplate.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      ea52cdd4
    • G
      9p: Simplify error path of v9fs_device_realize_common() · c0da0cb7
      Greg Kurz 提交于
      Make v9fs_device_unrealize_common() idempotent and use it for rollback,
      in order to reduce code duplication.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      c0da0cb7
    • A
      9p: unsigned type for type, version, path · 87032833
      Antonios Motakis 提交于
      There is no need for signedness on these QID fields for 9p.
      Signed-off-by: NAntonios Motakis <antonios.motakis@huawei.com>
      [CS: - Also make QID type unsigned.
           - Adjust donttouch_stat() to new types.
           - Adjust trace-events to new types. ]
      Signed-off-by: NChristian Schoenebeck <qemu_oss@crudebyte.com>
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      87032833
    • D
      tests: fix I/O test for hosts defaulting to LUKSv2 · da51e998
      Daniel P. Berrangé 提交于
      Some distros are now defaulting to LUKS version 2 which QEMU cannot
      process. For our I/O test that validates interoperability between the
      kernel/cryptsetup and QEMU, we need to explicitly ask for version 1
      of the LUKS format.
      Signed-off-by: NDaniel P. Berrangé <berrange@redhat.com>
      Message-id: 20190927101155.25896-1-berrange@redhat.com
      Tested-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      da51e998
    • M
      nbd: add empty .bdrv_reopen_prepare · e99754b4
      Maxim Levitsky 提交于
      Fixes commit job / qemu-img commit, when
      commiting qcow2 file which is based on nbd export.
      
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1718727Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-id: 20190930213820.29777-2-mlevitsk@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      e99754b4
    • V
      block/backup: use backup-top instead of write notifiers · 00e30f05
      Vladimir Sementsov-Ogievskiy 提交于
      Drop write notifiers and use filter node instead.
      
      = Changes =
      
      1. Add filter-node-name argument for backup qmp api. We have to do it
      in this commit, as 257 needs to be fixed.
      
      2. There are no more write notifiers here, so is_write_notifier
      parameter is dropped from block-copy paths.
      
      3. To sync with in-flight requests at job finish we now have drained
      removing of the filter, we don't need rw-lock.
      
      4. Block-copy is now using BdrvChildren instead of BlockBackends
      
      5. As backup-top owns these children, we also move block-copy state
      into backup-top's ownership.
      
      = Iotest changes =
      
      56: op-blocker doesn't shoot now, as we set it on source, but then
      check on filter, when trying to start second backup.
      To keep the test we instead can catch another collision: both jobs will
      get 'drive0' job-id, as job-id parameter is unspecified. To prevent
      interleaving with file-posix locks (as they are dependent on config)
      let's use another target for second backup.
      
      Also, it's obvious now that we'd like to drop this op-blocker at all
      and add a test-case for two backups from one node (to different
      destinations) actually works. But not in these series.
      
      141: Output changed: prepatch, "Node is in use" comes from bdrv_has_blk
      check inside qmp_blockdev_del. But we've dropped block-copy blk
      objects, so no more blk objects on source bs (job blk is on backup-top
      filter bs). New message is from op-blocker, which is the next check in
      qmp_blockdev_add.
      
      257: The test wants to emulate guest write during backup. They should
      go to filter node, not to original source node, of course. Therefore we
      need to specify filter node name and use it.
      Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-id: 20191001131409.14202-6-vsementsov@virtuozzo.com
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      00e30f05
    • V
      block: introduce backup-top filter driver · 7df7868b
      Vladimir Sementsov-Ogievskiy 提交于
      Backup-top filter caches write operations and does copy-before-write
      operations.
      
      The driver will be used in backup instead of write-notifiers.
      Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-id: 20191001131409.14202-5-vsementsov@virtuozzo.com
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      7df7868b
    • V
      block/block-copy: split block_copy_set_callbacks function · 0f4b02b7
      Vladimir Sementsov-Ogievskiy 提交于
      Split block_copy_set_callbacks out of block_copy_state_new. It's needed
      for further commit: block-copy will use BdrvChildren of backup-top
      filter, so it will be created from backup-top filter creation function.
      But callbacks will still belong to backup job and will be set in
      separate.
      Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-id: 20191001131409.14202-4-vsementsov@virtuozzo.com
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      0f4b02b7
    • V
      block/backup: move write_flags calculation inside backup_job_create · 843670f3
      Vladimir Sementsov-Ogievskiy 提交于
      This is logic-less refactoring, which simplifies further patch, as
      we'll need write_flags for backup-top filter creation and backup-top
      should be created before block job creation.
      Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-id: 20191001131409.14202-3-vsementsov@virtuozzo.com
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      843670f3
    • V
      block/backup: move in-flight requests handling from backup to block-copy · a6ffe199
      Vladimir Sementsov-Ogievskiy 提交于
      Move synchronization mechanism to block-copy, to be able to use one
      block-copy instance from backup job and backup-top filter in parallel.
      Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-id: 20191001131409.14202-2-vsementsov@virtuozzo.com
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      a6ffe199
    • M
      iotests: Use stat -c %b in 125 · f2d86ade
      Max Reitz 提交于
      125 should not use qemu-img to get the on-disk image size, because that
      reports it in a human-readable format that is useless to us.  Just use
      stat instead (like we do to get the image file length).
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20190925183231.11196-4-mreitz@redhat.com
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      f2d86ade
    • M
      iotests: Disable 125 on broken XFS versions · 285f595d
      Max Reitz 提交于
      And by that I mean all XFS versions, as far as I can tell.  All details
      are in the comment below.
      
      We never noticed this problem because we only read the first number from
      qemu-img info's "disk size" output -- and that is effectively useless,
      because qemu-img prints a human-readable value (which generally includes
      a decimal point).  That will be fixed in the next patch.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20190925183231.11196-3-mreitz@redhat.com
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      285f595d
    • M
      iotests: Fix 125 for growth_mode = metadata · e6e8db03
      Max Reitz 提交于
      If we use growth_mode = metadata, it is very much possible that the file
      uses more disk space after we have written something to the added area.
      We did indeed want to test for this case, but unfortunately we evidently
      just copied the code from the "Test creation preallocation" section and
      forgot to replace "$create_mode" by "$growth_mode".
      
      We never noticed because we only read the first number from qemu-img
      info's "disk size" output -- and that is effectively useless, because
      qemu-img prints a human-readable value (which generally includes a
      decimal point).  That will be fixed in the patch after the next one.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20190925183231.11196-2-mreitz@redhat.com
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      e6e8db03
    • A
      qapi: query-blockstat: add driver specific file-posix stats · d9245599
      Anton Nefedov 提交于
      A block driver can provide a callback to report driver-specific
      statistics.
      
      file-posix driver now reports discard statistics
      Signed-off-by: NAnton Nefedov <anton.nefedov@virtuozzo.com>
      Reviewed-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Acked-by: NMarkus Armbruster <armbru@redhat.com>
      Message-id: 20190923121737.83281-10-anton.nefedov@virtuozzo.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      d9245599