qcow2-cache.txt 7.8 KB
Newer Older
1 2
qcow2 L2/refcount cache configuration
=====================================
3
Copyright (C) 2015, 2018 Igalia, S.L.
4 5 6 7 8 9 10 11 12 13 14 15 16 17
Author: Alberto Garcia <berto@igalia.com>

This work is licensed under the terms of the GNU GPL, version 2 or
later. See the COPYING file in the top-level directory.

Introduction
------------
The QEMU qcow2 driver has two caches that can improve the I/O
performance significantly. However, setting the right cache sizes is
not a straightforward operation.

This document attempts to give an overview of the L2 and refcount
caches, and how to configure them.

18
Please refer to the docs/interop/qcow2.txt file for an in-depth
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
technical description of the qcow2 file format.


Clusters
--------
A qcow2 file is organized in units of constant size called clusters.

The cluster size is configurable, but it must be a power of two and
its value 512 bytes or higher. QEMU currently defaults to 64 KB
clusters, and it does not support sizes larger than 2MB.

The 'qemu-img create' command supports specifying the size using the
cluster_size option:

   qemu-img create -f qcow2 -o cluster_size=128K hd.qcow2 4G


The L2 tables
-------------
The qcow2 format uses a two-level structure to map the virtual disk as
seen by the guest to the disk image in the host. These structures are
called the L1 and L2 tables.

There is one single L1 table per disk image. The table is small and is
always kept in memory.

There can be many L2 tables, depending on how much space has been
allocated in the image. Each table is one cluster in size. In order to
read or write data from the virtual disk, QEMU needs to read its
corresponding L2 table to find out where that data is located. Since
reading the table for each I/O operation can be expensive, QEMU keeps
an L2 cache in memory to speed up disk access.

The size of the L2 cache can be configured, and setting the right
value can improve the I/O performance significantly.


The refcount blocks
-------------------
The qcow2 format also mantains a reference count for each cluster.
Reference counts are used for cluster allocation and internal
snapshots. The data is stored in a two-level structure similar to the
L1/L2 tables described above.

The second level structures are called refcount blocks, are also one
cluster in size and the number is also variable and dependent on the
amount of allocated space.

Each block contains a number of refcount entries. Their size (in bits)
is a power of two and must not be higher than 64. It defaults to 16
bits, but a different value can be set using the refcount_bits option:

   qemu-img create -f qcow2 -o refcount_bits=8 hd.qcow2 4G

QEMU keeps a refcount cache to speed up I/O much like the
aforementioned L2 cache, and its size can also be configured.


Choosing the right cache sizes
------------------------------
In order to choose the cache sizes we need to know how they relate to
the amount of allocated space.

L
Leonid Bloch 已提交
82
The part of the virtual disk that can be mapped by the L2 and refcount
83 84 85 86 87 88
caches (in bytes) is:

   disk_size = l2_cache_size * cluster_size / 8
   disk_size = refcount_cache_size * cluster_size * 8 / refcount_bits

With the default values for cluster_size (64KB) and refcount_bits
L
Leonid Bloch 已提交
89
(16), this becomes:
90 91 92 93 94 95 96 97 98 99

   disk_size = l2_cache_size * 8192
   disk_size = refcount_cache_size * 32768

So in order to cover n GB of disk space with the default values we
need:

   l2_cache_size = disk_size_GB * 131072
   refcount_cache_size = disk_size_GB * 32768

L
Leonid Bloch 已提交
100 101
For example, 1MB of L2 cache is needed to cover every 8 GB of the virtual
image size (given that the default cluster size is used):
102

L
Leonid Bloch 已提交
103 104 105 106 107 108 109
   8 GB / 8192 = 1 MB

The refcount cache is 4 times the cluster size by default. With the default
cluster size of 64 KB, it is 256 KB (262144 bytes). This is sufficient for
8 GB of image size:

   262144 * 32768 = 8 GB
110 111 112 113 114 115 116 117 118 119 120 121 122


How to configure the cache sizes
--------------------------------
Cache sizes can be configured using the -drive option in the
command-line, or the 'blockdev-add' QMP command.

There are three options available, and all of them take bytes:

"l2-cache-size":         maximum size of the L2 table cache
"refcount-cache-size":   maximum size of the refcount block cache
"cache-size":            maximum size of both caches combined

123
There are a few things that need to be taken into account:
124

125 126
 - Both caches must have a size that is a multiple of the cluster size
   (or the cache entry size: see "Using smaller cache sizes" below).
127

128 129
 - The default L2 cache size is 8 clusters or 1MB (whichever is more),
   and the minimum is 2 clusters (or 2 cache entries, see below).
130

131
 - The default (and minimum) refcount cache size is 4 clusters.
132

133 134 135
 - If only "cache-size" is specified then QEMU will assign as much
   memory as possible to the L2 cache before increasing the refcount
   cache size.
136

L
Leonid Bloch 已提交
137 138 139
 - At most two of "l2-cache-size", "refcount-cache-size", and "cache-size"
   can be set simultaneously.

140 141 142 143 144 145
Unlike L2 tables, refcount blocks are not used during normal I/O but
only during allocations and internal snapshots. In most cases they are
accessed sequentially (even during random guest I/O) so increasing the
refcount cache size won't have any measurable effect in performance
(this can change if you are using internal snapshots, so you may want
to think about increasing the cache size if you use them heavily).
146

147 148 149
Before QEMU 2.12 the refcount cache had a default size of 1/4 of the
L2 cache size. This resulted in unnecessarily large caches, so now the
refcount cache is as small as possible unless overridden by the user.
150 151


152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
Using smaller cache entries
---------------------------
The qcow2 L2 cache stores complete tables by default. This means that
if QEMU needs an entry from an L2 table then the whole table is read
from disk and is kept in the cache. If the cache is full then a
complete table needs to be evicted first.

This can be inefficient with large cluster sizes since it results in
more disk I/O and wastes more cache memory.

Since QEMU 2.12 you can change the size of the L2 cache entry and make
it smaller than the cluster size. This can be configured using the
"l2-cache-entry-size" parameter:

   -drive file=hd.qcow2,l2-cache-size=2097152,l2-cache-entry-size=4096

Some things to take into account:

 - The L2 cache entry size has the same restrictions as the cluster
   size (power of two, at least 512 bytes).

 - Smaller entry sizes generally improve the cache efficiency and make
   disk I/O faster. This is particularly true with solid state drives
   so it's a good idea to reduce the entry size in those cases. With
   rotating hard drives the situation is a bit more complicated so you
   should test it first and stay with the default size if unsure.

 - Try different entry sizes to see which one gives faster performance
   in your case. The block size of the host filesystem is generally a
   good default (usually 4096 bytes in the case of ext4).

 - Only the L2 cache can be configured this way. The refcount cache
   always uses the cluster size as the entry size.

 - If the L2 cache is big enough to hold all of the image's L2 tables
   (as explained in the "Choosing the right cache sizes" section
   earlier in this document) then none of this is necessary and you
   can omit the "l2-cache-entry-size" parameter altogether.


192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208
Reducing the memory usage
-------------------------
It is possible to clean unused cache entries in order to reduce the
memory usage during periods of low I/O activity.

The parameter "cache-clean-interval" defines an interval (in seconds).
All cache entries that haven't been accessed during that interval are
removed from memory.

This example removes all unused cache entries every 15 minutes:

   -drive file=hd.qcow2,cache-clean-interval=900

If unset, the default value for this parameter is 0 and it disables
this feature.

Note that this functionality currently relies on the MADV_DONTNEED
209 210 211
argument for madvise() to actually free the memory. This is a
Linux-specific feature, so cache-clean-interval is not supported in
other systems.