dm-raid.txt 9.5 KB
Newer Older
1
dm-raid
2
=======
N
NeilBrown 已提交
3

4 5 6
The device-mapper RAID (dm-raid) target provides a bridge from DM to MD.
It allows the MD RAID drivers to be accessed using a device-mapper
interface.
N
NeilBrown 已提交
7

8 9 10

Mapping Table Interface
-----------------------
11 12 13 14 15 16
The target is named "raid" and it accepts the following parameters:

  <raid_type> <#raid_params> <raid_params> \
    <#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>]

<raid_type>:
17
  raid1		RAID1 mirroring
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
  raid4		RAID4 dedicated parity disk
  raid5_la	RAID5 left asymmetric
		- rotating parity 0 with data continuation
  raid5_ra	RAID5 right asymmetric
		- rotating parity N with data continuation
  raid5_ls	RAID5 left symmetric
		- rotating parity 0 with data restart
  raid5_rs 	RAID5 right symmetric
		- rotating parity N with data restart
  raid6_zr	RAID6 zero restart
		- rotating parity zero (left-to-right) with data restart
  raid6_nr	RAID6 N restart
		- rotating parity N (right-to-left) with data restart
  raid6_nc	RAID6 N continue
		- rotating parity N (right-to-left) with data continuation
33 34 35
  raid10        Various RAID10 inspired algorithms chosen by additional params
		- RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
		- RAID1E: Integrated Adjacent Stripe Mirroring
36
		- RAID1E: Integrated Offset Stripe Mirroring
37
		-  and other similar RAID10 variants
38

39
  Reference: Chapter 4 of
40 41 42 43 44 45 46 47 48 49 50 51 52
  http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf

<#raid_params>: The number of parameters that follow.

<raid_params> consists of
    Mandatory parameters:
        <chunk_size>: Chunk size in sectors.  This parameter is often known as
		      "stripe size".  It is the only mandatory parameter and
		      is placed first.

    followed by optional parameters (in any order):
	[sync|nosync]   Force or prevent RAID initialization.

53
	[rebuild <idx>]	Rebuild drive number 'idx' (first drive is 0).
54 55 56 57 58 59 60 61

	[daemon_sleep <ms>]
		Interval between runs of the bitmap daemon that
		clear bits.  A longer interval means less bitmap I/O but
		resyncing after a failure is likely to take longer.

	[min_recovery_rate <kB/sec/disk>]  Throttle RAID initialization
	[max_recovery_rate <kB/sec/disk>]  Throttle RAID initialization
62 63 64
	[write_mostly <idx>]		   Mark drive index 'idx' write-mostly.
	[max_write_behind <sectors>]       See '--write-behind=' (man mdadm)
	[stripe_cache <sectors>]           Stripe cache size (RAID 4/5/6 only)
65 66 67 68
	[region_size <sectors>]
		The region_size multiplied by the number of regions is the
		logical size of the array.  The bitmap records the device
		synchronisation state for each region.
69

70
        [raid10_copies   <# copies>]
71
        [raid10_format   <near|far|offset>]
72 73
		These two options are used to alter the default layout of
		a RAID10 configuration.  The number of copies is can be
74 75 76 77 78 79
		specified, but the default is 2.  There are also three
		variations to how the copies are laid down - the default
		is "near".  Near copies are what most people think of with
		respect to mirroring.  If these options are left unspecified,
		or 'raid10_copies 2' and/or 'raid10_format near' are given,
		then the layouts for 2, 3 and 4 devices	are:
80 81 82 83 84 85 86 87 88 89 90 91
		2 drives         3 drives          4 drives
		--------         ----------        --------------
		A1  A1           A1  A1  A2        A1  A1  A2  A2
		A2  A2           A2  A3  A3        A3  A3  A4  A4
		A3  A3           A4  A4  A5        A5  A5  A6  A6
		A4  A4           A5  A6  A6        A7  A7  A8  A8
		..  ..           ..  ..  ..        ..  ..  ..  ..
		The 2-device layout is equivalent 2-way RAID1.  The 4-device
		layout is what a traditional RAID10 would look like.  The
		3-device layout is what might be called a 'RAID1E - Integrated
		Adjacent Stripe Mirroring'.

92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
		If 'raid10_copies 2' and 'raid10_format far', then the layouts
		for 2, 3 and 4 devices are:
		2 drives             3 drives             4 drives
		--------             --------------       --------------------
		A1  A2               A1   A2   A3         A1   A2   A3   A4
		A3  A4               A4   A5   A6         A5   A6   A7   A8
		A5  A6               A7   A8   A9         A9   A10  A11  A12
		..  ..               ..   ..   ..         ..   ..   ..   ..
		A2  A1               A3   A1   A2         A2   A1   A4   A3
		A4  A3               A6   A4   A5         A6   A5   A8   A7
		A6  A5               A9   A7   A8         A10  A9   A12  A11
		..  ..               ..   ..   ..         ..   ..   ..   ..

		If 'raid10_copies 2' and 'raid10_format offset', then the
		layouts for 2, 3 and 4 devices are:
		2 drives       3 drives           4 drives
		--------       ------------       -----------------
		A1  A2         A1  A2  A3         A1  A2  A3  A4
		A2  A1         A3  A1  A2         A2  A1  A4  A3
		A3  A4         A4  A5  A6         A5  A6  A7  A8
		A4  A3         A6  A4  A5         A6  A5  A8  A7
		A5  A6         A7  A8  A9         A9  A10 A11 A12
		A6  A5         A9  A7  A8         A10 A9  A12 A11
		..  ..         ..  ..  ..         ..  ..  ..  ..
		Here we see layouts closely akin to 'RAID1E - Integrated
		Offset Stripe Mirroring'.

119 120 121
<#raid_devs>: The number of devices composing the array.
	Each device consists of two entries.  The first is the device
	containing the metadata (if any); the second is the one containing the
122
	data.
123 124 125 126 127

	If a drive has failed or is missing at creation time, a '-' can be
	given for both the metadata and data drives for a given position.


128
Example Tables
129
--------------
130
# RAID4 - 4 data drives, 1 parity (no metadata devices)
N
NeilBrown 已提交
131 132 133
# No metadata devices specified to hold superblock/bitmap info
# Chunk size of 1MiB
# (Lines separated for easy reading)
134

N
NeilBrown 已提交
135 136 137 138
0 1960893648 raid \
        raid4 1 2048 \
        5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81

139
# RAID4 - 4 data drives, 1 parity (with metadata devices)
N
NeilBrown 已提交
140 141
# Chunk size of 1MiB, force RAID initialization,
#       min recovery rate at 20 kiB/sec/disk
142

N
NeilBrown 已提交
143
0 1960893648 raid \
144 145
        raid4 4 2048 sync min_recovery_rate 20 \
        5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82
N
NeilBrown 已提交
146

147 148 149

Status Output
-------------
150
'dmsetup table' displays the table used to construct the mapping.
151
The optional parameters are always printed in the order listed
152 153
above with "sync" or "nosync" always output ahead of the other
arguments, regardless of the order used when originally loading the table.
154
Arguments that can be repeated are ordered by value.
N
NeilBrown 已提交
155

156 157 158 159

'dmsetup status' yields information on the state and health of the array.
The output is as follows (normally a single line, but expanded here for
clarity):
N
NeilBrown 已提交
160
1: <s> <l> raid \
161 162
2:      <raid_type> <#devices> <health_chars> \
3:      <sync_ratio> <sync_action> <mismatch_cnt>
N
NeilBrown 已提交
163

164
Line 1 is the standard output produced by device-mapper.
165 166
Line 2 & 3 are produced by the raid target and are best explained by example:
        0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0
N
NeilBrown 已提交
167
Here we can see the RAID type is raid4, there are 5 devices - all of
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210
which are 'A'live, and the array is 2/490221568 complete with its initial
recovery.  Here is a fuller description of the individual fields:
	<raid_type>     Same as the <raid_type> used to create the array.
	<health_chars>  One char for each device, indicating: 'A' = alive and
			in-sync, 'a' = alive but not in-sync, 'D' = dead/failed.
	<sync_ratio>    The ratio indicating how much of the array has undergone
			the process described by 'sync_action'.  If the
			'sync_action' is "check" or "repair", then the process
			of "resync" or "recover" can be considered complete.
	<sync_action>   One of the following possible states:
			idle    - No synchronization action is being performed.
			frozen  - The current action has been halted.
			resync  - Array is undergoing its initial synchronization
				  or is resynchronizing after an unclean shutdown
				  (possibly aided by a bitmap).
			recover - A device in the array is being rebuilt or
				  replaced.
			check   - A user-initiated full check of the array is
				  being performed.  All blocks are read and
				  checked for consistency.  The number of
				  discrepancies found are recorded in
				  <mismatch_cnt>.  No changes are made to the
				  array by this action.
			repair  - The same as "check", but discrepancies are
				  corrected.
			reshape - The array is undergoing a reshape.
	<mismatch_cnt>  The number of discrepancies found between mirror copies
			in RAID1/10 or wrong parity values found in RAID4/5/6.
			This value is valid only after a "check" of the array
			is performed.  A healthy array has a 'mismatch_cnt' of 0.

Message Interface
-----------------
The dm-raid target will accept certain actions through the 'message' interface.
('man dmsetup' for more information on the message interface.)  These actions
include:
	"idle"   - Halt the current sync action.
	"frozen" - Freeze the current sync action.
	"resync" - Initiate/continue a resync.
	"recover"- Initiate/continue a recover process.
	"check"  - Initiate a check (i.e. a "scrub") of the array.
	"repair" - Initiate a repair of the array.
	"reshape"- Currently unsupported (-EINVAL).
211 212 213 214 215 216 217 218

Version History
---------------
1.0.0	Initial version.  Support for RAID 4/5/6
1.1.0	Added support for RAID 1
1.2.0	Handle creation of arrays that contain failed devices.
1.3.0	Added support for RAID 10
1.3.1	Allow device replacement/rebuild for RAID 10
219
1.3.2   Fix/improve redundancy checking for RAID10
220
1.4.0	Non-functional change.  Removes arg from mapping function.
221 222 223 224
1.4.1   RAID10 fix redundancy validation checks (commit 55ebbb5).
1.4.2   Add RAID10 "far" and "offset" algorithm support.
1.5.0   Add message interface to allow manipulation of the sync_action.
	New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
225
1.5.1   Add ability to restore transiently failed devices on resume.
226
1.5.2   'mismatch_cnt' is zero unless [last_]sync_action is "check".