blkverify.txt 3.0 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
= Block driver correctness testing with blkverify =

== Introduction ==

This document describes how to use the blkverify protocol to test that a block
driver is operating correctly.

It is difficult to test and debug block drivers against real guests.  Often
processes inside the guest will crash because corrupt sectors were read as part
of the executable.  Other times obscure errors are raised by a program inside
the guest.  These issues are extremely hard to trace back to bugs in the block
driver.

Blkverify solves this problem by catching data corruption inside QEMU the first
time bad data is read and reporting the disk sector that is corrupted.

== How it works ==

The blkverify protocol has two child block devices, the "test" device and the
"raw" device.  Read/write operations are mirrored to both devices so their
state should always be in sync.

The "raw" device is a raw image, a flat file, that has identical starting
contents to the "test" image.  The idea is that the "raw" device will handle
read/write operations correctly and not corrupt data.  It can be used as a
reference for comparison against the "test" device.

After a mirrored read operation completes, blkverify will compare the data and
raise an error if it is not identical.  This makes it possible to catch the
first instance where corrupt data is read.

== Example ==

Imagine raw.img has 0xcd repeated throughout its first sector:

    $ ./qemu-io -c 'read -v 0 512' raw.img
    00000000:  cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd  ................
    00000010:  cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd  ................
    [...]
    000001e0:  cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd  ................
    000001f0:  cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd  ................
    read 512/512 bytes at offset 0
    512.000000 bytes, 1 ops; 0.0000 sec (97.656 MiB/sec and 200000.0000 ops/sec)

And test.img is corrupt, its first sector is zeroed when it shouldn't be:

    $ ./qemu-io -c 'read -v 0 512' test.img
    00000000:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00000010:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    [...]
    000001e0:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    000001f0:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    read 512/512 bytes at offset 0
    512.000000 bytes, 1 ops; 0.0000 sec (81.380 MiB/sec and 166666.6667 ops/sec)

This error is caught by blkverify:

    $ ./qemu-io -c 'read 0 512' blkverify:a.img:b.img
    blkverify: read sector_num=0 nb_sectors=4 contents mismatch in sector 0

A more realistic scenario is verifying the installation of a guest OS:

    $ ./qemu-img create raw.img 16G
    $ ./qemu-img create -f qcow2 test.qcow2 16G
    $ x86_64-softmmu/qemu-system-x86_64 -cdrom debian.iso \
                                        -drive file=blkverify:raw.img:test.qcow2

If the installation is aborted when blkverify detects corruption, use qemu-io
to explore the contents of the disk image at the sector in question.