virtio: order index/descriptor reads

virtio has the equivalent of: if (vq->last_avail_index != vring_avail_idx(vq)) { read descriptor head at vq->last_avail_index; } In theory, processor can reorder descriptor head read to happen speculatively before the index read. this would trigger the following race: host descriptor head read <- reads invalid head from ring guest writes valid descriptor head guest writes avail index host avail index read <- observes valid index as a result host will use an invalid head value. This was not observed in the field by me but after the experience with the previous two races I think it is prudent to address this theoretical race condition. Signed-off-by: N Michael S. Tsirkin <mst@redhat.com>

virtio: order index/descriptor reads
virtio has the equivalent of: if (vq->last_avail_index != vring_avail_idx(vq)) { read descriptor head at vq->last_avail_index; } In theory, processor can reorder descriptor head read to happen speculatively before the index read. this would trigger the following race: host descriptor head read <- reads invalid head from ring guest writes valid descriptor head guest writes avail index host avail index read <- observes valid index as a result host will use an invalid head value. This was not observed in the field by me but after the experience with the previous two races I think it is prudent to address this theoretical race condition. Signed-off-by: N Michael S. Tsirkin <mst@redhat.com>
a821ce59 · Michael S. Tsirkin · 92045d80 · a821ce59 · a821ce59
隐藏空白更改
内联并排

Showing with 17 addition and 2 deletion

hw/virtio.c hw/virtio.c +5 -0

qemu-barrier.h qemu-barrier.h +12 -2

未找到文件。
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -287,6 +287,11 @@ static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
                     idx, vring_avail_idx(vq));
        exit(1);
    }
+    /* On success, callers read a descriptor at vq->last_avail_idx.
+     * Make sure descriptor read does not bypass avail index read. */
+    if (num_heads) {
+        smp_rmb();
+    }

    return num_heads;
 }

--- a/qemu-barrier.h
+++ b/qemu-barrier.h
@@ -7,12 +7,13 @@
 #if defined(__i386__)

 /*
- * Because of the strongly ordered x86 storage model, wmb() is a nop
+ * Because of the strongly ordered x86 storage model, wmb() and rmb() are nops
 * on x86(well, a compiler barrier only).  Well, at least as long as
 * qemu doesn't do accesses to write-combining memory or non-temporal
 * load/stores from C code.
 */
 #define smp_wmb()   barrier()
+#define smp_rmb()   barrier()
 /*
 * We use GCC builtin if it's available, as that can use
 * mfence on 32 bit as well, e.g. if built with -march=pentium-m.
@@ -27,6 +28,7 @@
 #elif defined(__x86_64__)

 #define smp_wmb()   barrier()
+#define smp_rmb()   barrier()
 #define smp_mb() asm volatile("mfence" ::: "memory")

 #elif defined(_ARCH_PPC)
@@ -37,6 +39,13 @@
 * each other
 */
 #define smp_wmb()   asm volatile("eieio" ::: "memory")
+
+#if defined(__powerpc64__)
+#define smp_rmb()   asm volatile("lwsync" ::: "memory")
+#else
+#define smp_rmb()   asm volatile("sync" ::: "memory")
+#endif
+
 #define smp_mb()   asm volatile("sync" ::: "memory")

 #else
@@ -45,10 +54,11 @@
 * For (host) platforms we don't have explicit barrier definitions
 * for, we use the gcc __sync_synchronize() primitive to generate a
 * full barrier.  This should be safe on all platforms, though it may
- * be overkill for wmb().
+ * be overkill for wmb() and rmb().
 */
 #define smp_wmb()   __sync_synchronize()
 #define smp_mb()   __sync_synchronize()
+#define smp_rmb()   __sync_synchronize()

 #endif