xref: /OK3568_Linux_fs/kernel/Documentation/admin-guide/device-mapper/dm-raid.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun=======
2*4882a593Smuzhiyundm-raid
3*4882a593Smuzhiyun=======
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunThe device-mapper RAID (dm-raid) target provides a bridge from DM to MD.
6*4882a593SmuzhiyunIt allows the MD RAID drivers to be accessed using a device-mapper
7*4882a593Smuzhiyuninterface.
8*4882a593Smuzhiyun
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunMapping Table Interface
11*4882a593Smuzhiyun-----------------------
12*4882a593SmuzhiyunThe target is named "raid" and it accepts the following parameters::
13*4882a593Smuzhiyun
14*4882a593Smuzhiyun  <raid_type> <#raid_params> <raid_params> \
15*4882a593Smuzhiyun    <#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>]
16*4882a593Smuzhiyun
17*4882a593Smuzhiyun<raid_type>:
18*4882a593Smuzhiyun
19*4882a593Smuzhiyun  ============= ===============================================================
20*4882a593Smuzhiyun  raid0		RAID0 striping (no resilience)
21*4882a593Smuzhiyun  raid1		RAID1 mirroring
22*4882a593Smuzhiyun  raid4		RAID4 with dedicated last parity disk
23*4882a593Smuzhiyun  raid5_n 	RAID5 with dedicated last parity disk supporting takeover
24*4882a593Smuzhiyun		Same as raid4
25*4882a593Smuzhiyun
26*4882a593Smuzhiyun		- Transitory layout
27*4882a593Smuzhiyun  raid5_la	RAID5 left asymmetric
28*4882a593Smuzhiyun
29*4882a593Smuzhiyun		- rotating parity 0 with data continuation
30*4882a593Smuzhiyun  raid5_ra	RAID5 right asymmetric
31*4882a593Smuzhiyun
32*4882a593Smuzhiyun		- rotating parity N with data continuation
33*4882a593Smuzhiyun  raid5_ls	RAID5 left symmetric
34*4882a593Smuzhiyun
35*4882a593Smuzhiyun		- rotating parity 0 with data restart
36*4882a593Smuzhiyun  raid5_rs 	RAID5 right symmetric
37*4882a593Smuzhiyun
38*4882a593Smuzhiyun		- rotating parity N with data restart
39*4882a593Smuzhiyun  raid6_zr	RAID6 zero restart
40*4882a593Smuzhiyun
41*4882a593Smuzhiyun		- rotating parity zero (left-to-right) with data restart
42*4882a593Smuzhiyun  raid6_nr	RAID6 N restart
43*4882a593Smuzhiyun
44*4882a593Smuzhiyun		- rotating parity N (right-to-left) with data restart
45*4882a593Smuzhiyun  raid6_nc	RAID6 N continue
46*4882a593Smuzhiyun
47*4882a593Smuzhiyun		- rotating parity N (right-to-left) with data continuation
48*4882a593Smuzhiyun  raid6_n_6	RAID6 with dedicate parity disks
49*4882a593Smuzhiyun
50*4882a593Smuzhiyun		- parity and Q-syndrome on the last 2 disks;
51*4882a593Smuzhiyun		  layout for takeover from/to raid4/raid5_n
52*4882a593Smuzhiyun  raid6_la_6	Same as "raid_la" plus dedicated last Q-syndrome disk
53*4882a593Smuzhiyun
54*4882a593Smuzhiyun		- layout for takeover from raid5_la from/to raid6
55*4882a593Smuzhiyun  raid6_ra_6	Same as "raid5_ra" dedicated last Q-syndrome disk
56*4882a593Smuzhiyun
57*4882a593Smuzhiyun		- layout for takeover from raid5_ra from/to raid6
58*4882a593Smuzhiyun  raid6_ls_6	Same as "raid5_ls" dedicated last Q-syndrome disk
59*4882a593Smuzhiyun
60*4882a593Smuzhiyun		- layout for takeover from raid5_ls from/to raid6
61*4882a593Smuzhiyun  raid6_rs_6	Same as "raid5_rs" dedicated last Q-syndrome disk
62*4882a593Smuzhiyun
63*4882a593Smuzhiyun		- layout for takeover from raid5_rs from/to raid6
64*4882a593Smuzhiyun  raid10        Various RAID10 inspired algorithms chosen by additional params
65*4882a593Smuzhiyun		(see raid10_format and raid10_copies below)
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun		- RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
68*4882a593Smuzhiyun		- RAID1E: Integrated Adjacent Stripe Mirroring
69*4882a593Smuzhiyun		- RAID1E: Integrated Offset Stripe Mirroring
70*4882a593Smuzhiyun		- and other similar RAID10 variants
71*4882a593Smuzhiyun  ============= ===============================================================
72*4882a593Smuzhiyun
73*4882a593Smuzhiyun  Reference: Chapter 4 of
74*4882a593Smuzhiyun  https://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
75*4882a593Smuzhiyun
76*4882a593Smuzhiyun<#raid_params>: The number of parameters that follow.
77*4882a593Smuzhiyun
78*4882a593Smuzhiyun<raid_params> consists of
79*4882a593Smuzhiyun
80*4882a593Smuzhiyun    Mandatory parameters:
81*4882a593Smuzhiyun        <chunk_size>:
82*4882a593Smuzhiyun		      Chunk size in sectors.  This parameter is often known as
83*4882a593Smuzhiyun		      "stripe size".  It is the only mandatory parameter and
84*4882a593Smuzhiyun		      is placed first.
85*4882a593Smuzhiyun
86*4882a593Smuzhiyun    followed by optional parameters (in any order):
87*4882a593Smuzhiyun	[sync|nosync]
88*4882a593Smuzhiyun		Force or prevent RAID initialization.
89*4882a593Smuzhiyun
90*4882a593Smuzhiyun	[rebuild <idx>]
91*4882a593Smuzhiyun		Rebuild drive number 'idx' (first drive is 0).
92*4882a593Smuzhiyun
93*4882a593Smuzhiyun	[daemon_sleep <ms>]
94*4882a593Smuzhiyun		Interval between runs of the bitmap daemon that
95*4882a593Smuzhiyun		clear bits.  A longer interval means less bitmap I/O but
96*4882a593Smuzhiyun		resyncing after a failure is likely to take longer.
97*4882a593Smuzhiyun
98*4882a593Smuzhiyun	[min_recovery_rate <kB/sec/disk>]
99*4882a593Smuzhiyun		Throttle RAID initialization
100*4882a593Smuzhiyun	[max_recovery_rate <kB/sec/disk>]
101*4882a593Smuzhiyun		Throttle RAID initialization
102*4882a593Smuzhiyun	[write_mostly <idx>]
103*4882a593Smuzhiyun		Mark drive index 'idx' write-mostly.
104*4882a593Smuzhiyun	[max_write_behind <sectors>]
105*4882a593Smuzhiyun		See '--write-behind=' (man mdadm)
106*4882a593Smuzhiyun	[stripe_cache <sectors>]
107*4882a593Smuzhiyun		Stripe cache size (RAID 4/5/6 only)
108*4882a593Smuzhiyun	[region_size <sectors>]
109*4882a593Smuzhiyun		The region_size multiplied by the number of regions is the
110*4882a593Smuzhiyun		logical size of the array.  The bitmap records the device
111*4882a593Smuzhiyun		synchronisation state for each region.
112*4882a593Smuzhiyun
113*4882a593Smuzhiyun        [raid10_copies   <# copies>], [raid10_format   <near|far|offset>]
114*4882a593Smuzhiyun		These two options are used to alter the default layout of
115*4882a593Smuzhiyun		a RAID10 configuration.  The number of copies is can be
116*4882a593Smuzhiyun		specified, but the default is 2.  There are also three
117*4882a593Smuzhiyun		variations to how the copies are laid down - the default
118*4882a593Smuzhiyun		is "near".  Near copies are what most people think of with
119*4882a593Smuzhiyun		respect to mirroring.  If these options are left unspecified,
120*4882a593Smuzhiyun		or 'raid10_copies 2' and/or 'raid10_format near' are given,
121*4882a593Smuzhiyun		then the layouts for 2, 3 and 4 devices	are:
122*4882a593Smuzhiyun
123*4882a593Smuzhiyun		========	 ==========	   ==============
124*4882a593Smuzhiyun		2 drives         3 drives          4 drives
125*4882a593Smuzhiyun		========	 ==========	   ==============
126*4882a593Smuzhiyun		A1  A1           A1  A1  A2        A1  A1  A2  A2
127*4882a593Smuzhiyun		A2  A2           A2  A3  A3        A3  A3  A4  A4
128*4882a593Smuzhiyun		A3  A3           A4  A4  A5        A5  A5  A6  A6
129*4882a593Smuzhiyun		A4  A4           A5  A6  A6        A7  A7  A8  A8
130*4882a593Smuzhiyun		..  ..           ..  ..  ..        ..  ..  ..  ..
131*4882a593Smuzhiyun		========	 ==========	   ==============
132*4882a593Smuzhiyun
133*4882a593Smuzhiyun		The 2-device layout is equivalent 2-way RAID1.  The 4-device
134*4882a593Smuzhiyun		layout is what a traditional RAID10 would look like.  The
135*4882a593Smuzhiyun		3-device layout is what might be called a 'RAID1E - Integrated
136*4882a593Smuzhiyun		Adjacent Stripe Mirroring'.
137*4882a593Smuzhiyun
138*4882a593Smuzhiyun		If 'raid10_copies 2' and 'raid10_format far', then the layouts
139*4882a593Smuzhiyun		for 2, 3 and 4 devices are:
140*4882a593Smuzhiyun
141*4882a593Smuzhiyun		========	     ============	  ===================
142*4882a593Smuzhiyun		2 drives             3 drives             4 drives
143*4882a593Smuzhiyun		========	     ============	  ===================
144*4882a593Smuzhiyun		A1  A2               A1   A2   A3         A1   A2   A3   A4
145*4882a593Smuzhiyun		A3  A4               A4   A5   A6         A5   A6   A7   A8
146*4882a593Smuzhiyun		A5  A6               A7   A8   A9         A9   A10  A11  A12
147*4882a593Smuzhiyun		..  ..               ..   ..   ..         ..   ..   ..   ..
148*4882a593Smuzhiyun		A2  A1               A3   A1   A2         A2   A1   A4   A3
149*4882a593Smuzhiyun		A4  A3               A6   A4   A5         A6   A5   A8   A7
150*4882a593Smuzhiyun		A6  A5               A9   A7   A8         A10  A9   A12  A11
151*4882a593Smuzhiyun		..  ..               ..   ..   ..         ..   ..   ..   ..
152*4882a593Smuzhiyun		========	     ============	  ===================
153*4882a593Smuzhiyun
154*4882a593Smuzhiyun		If 'raid10_copies 2' and 'raid10_format offset', then the
155*4882a593Smuzhiyun		layouts for 2, 3 and 4 devices are:
156*4882a593Smuzhiyun
157*4882a593Smuzhiyun		========       ==========         ================
158*4882a593Smuzhiyun		2 drives       3 drives           4 drives
159*4882a593Smuzhiyun		========       ==========         ================
160*4882a593Smuzhiyun		A1  A2         A1  A2  A3         A1  A2  A3  A4
161*4882a593Smuzhiyun		A2  A1         A3  A1  A2         A2  A1  A4  A3
162*4882a593Smuzhiyun		A3  A4         A4  A5  A6         A5  A6  A7  A8
163*4882a593Smuzhiyun		A4  A3         A6  A4  A5         A6  A5  A8  A7
164*4882a593Smuzhiyun		A5  A6         A7  A8  A9         A9  A10 A11 A12
165*4882a593Smuzhiyun		A6  A5         A9  A7  A8         A10 A9  A12 A11
166*4882a593Smuzhiyun		..  ..         ..  ..  ..         ..  ..  ..  ..
167*4882a593Smuzhiyun		========       ==========         ================
168*4882a593Smuzhiyun
169*4882a593Smuzhiyun		Here we see layouts closely akin to 'RAID1E - Integrated
170*4882a593Smuzhiyun		Offset Stripe Mirroring'.
171*4882a593Smuzhiyun
172*4882a593Smuzhiyun        [delta_disks <N>]
173*4882a593Smuzhiyun		The delta_disks option value (-251 < N < +251) triggers
174*4882a593Smuzhiyun		device removal (negative value) or device addition (positive
175*4882a593Smuzhiyun		value) to any reshape supporting raid levels 4/5/6 and 10.
176*4882a593Smuzhiyun		RAID levels 4/5/6 allow for addition of devices (metadata
177*4882a593Smuzhiyun		and data device tuple), raid10_near and raid10_offset only
178*4882a593Smuzhiyun		allow for device addition. raid10_far does not support any
179*4882a593Smuzhiyun		reshaping at all.
180*4882a593Smuzhiyun		A minimum of devices have to be kept to enforce resilience,
181*4882a593Smuzhiyun		which is 3 devices for raid4/5 and 4 devices for raid6.
182*4882a593Smuzhiyun
183*4882a593Smuzhiyun        [data_offset <sectors>]
184*4882a593Smuzhiyun		This option value defines the offset into each data device
185*4882a593Smuzhiyun		where the data starts. This is used to provide out-of-place
186*4882a593Smuzhiyun		reshaping space to avoid writing over data while
187*4882a593Smuzhiyun		changing the layout of stripes, hence an interruption/crash
188*4882a593Smuzhiyun		may happen at any time without the risk of losing data.
189*4882a593Smuzhiyun		E.g. when adding devices to an existing raid set during
190*4882a593Smuzhiyun		forward reshaping, the out-of-place space will be allocated
191*4882a593Smuzhiyun		at the beginning of each raid device. The kernel raid4/5/6/10
192*4882a593Smuzhiyun		MD personalities supporting such device addition will read the data from
193*4882a593Smuzhiyun		the existing first stripes (those with smaller number of stripes)
194*4882a593Smuzhiyun		starting at data_offset to fill up a new stripe with the larger
195*4882a593Smuzhiyun		number of stripes, calculate the redundancy blocks (CRC/Q-syndrome)
196*4882a593Smuzhiyun		and write that new stripe to offset 0. Same will be applied to all
197*4882a593Smuzhiyun		N-1 other new stripes. This out-of-place scheme is used to change
198*4882a593Smuzhiyun		the RAID type (i.e. the allocation algorithm) as well, e.g.
199*4882a593Smuzhiyun		changing from raid5_ls to raid5_n.
200*4882a593Smuzhiyun
201*4882a593Smuzhiyun	[journal_dev <dev>]
202*4882a593Smuzhiyun		This option adds a journal device to raid4/5/6 raid sets and
203*4882a593Smuzhiyun		uses it to close the 'write hole' caused by the non-atomic updates
204*4882a593Smuzhiyun		to the component devices which can cause data loss during recovery.
205*4882a593Smuzhiyun		The journal device is used as writethrough thus causing writes to
206*4882a593Smuzhiyun		be throttled versus non-journaled raid4/5/6 sets.
207*4882a593Smuzhiyun		Takeover/reshape is not possible with a raid4/5/6 journal device;
208*4882a593Smuzhiyun		it has to be deconfigured before requesting these.
209*4882a593Smuzhiyun
210*4882a593Smuzhiyun	[journal_mode <mode>]
211*4882a593Smuzhiyun		This option sets the caching mode on journaled raid4/5/6 raid sets
212*4882a593Smuzhiyun		(see 'journal_dev <dev>' above) to 'writethrough' or 'writeback'.
213*4882a593Smuzhiyun		If 'writeback' is selected the journal device has to be resilient
214*4882a593Smuzhiyun		and must not suffer from the 'write hole' problem itself (e.g. use
215*4882a593Smuzhiyun		raid1 or raid10) to avoid a single point of failure.
216*4882a593Smuzhiyun
217*4882a593Smuzhiyun<#raid_devs>: The number of devices composing the array.
218*4882a593Smuzhiyun	Each device consists of two entries.  The first is the device
219*4882a593Smuzhiyun	containing the metadata (if any); the second is the one containing the
220*4882a593Smuzhiyun	data. A Maximum of 64 metadata/data device entries are supported
221*4882a593Smuzhiyun	up to target version 1.8.0.
222*4882a593Smuzhiyun	1.9.0 supports up to 253 which is enforced by the used MD kernel runtime.
223*4882a593Smuzhiyun
224*4882a593Smuzhiyun	If a drive has failed or is missing at creation time, a '-' can be
225*4882a593Smuzhiyun	given for both the metadata and data drives for a given position.
226*4882a593Smuzhiyun
227*4882a593Smuzhiyun
228*4882a593SmuzhiyunExample Tables
229*4882a593Smuzhiyun--------------
230*4882a593Smuzhiyun
231*4882a593Smuzhiyun::
232*4882a593Smuzhiyun
233*4882a593Smuzhiyun  # RAID4 - 4 data drives, 1 parity (no metadata devices)
234*4882a593Smuzhiyun  # No metadata devices specified to hold superblock/bitmap info
235*4882a593Smuzhiyun  # Chunk size of 1MiB
236*4882a593Smuzhiyun  # (Lines separated for easy reading)
237*4882a593Smuzhiyun
238*4882a593Smuzhiyun  0 1960893648 raid \
239*4882a593Smuzhiyun          raid4 1 2048 \
240*4882a593Smuzhiyun          5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
241*4882a593Smuzhiyun
242*4882a593Smuzhiyun  # RAID4 - 4 data drives, 1 parity (with metadata devices)
243*4882a593Smuzhiyun  # Chunk size of 1MiB, force RAID initialization,
244*4882a593Smuzhiyun  #       min recovery rate at 20 kiB/sec/disk
245*4882a593Smuzhiyun
246*4882a593Smuzhiyun  0 1960893648 raid \
247*4882a593Smuzhiyun          raid4 4 2048 sync min_recovery_rate 20 \
248*4882a593Smuzhiyun          5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82
249*4882a593Smuzhiyun
250*4882a593Smuzhiyun
251*4882a593SmuzhiyunStatus Output
252*4882a593Smuzhiyun-------------
253*4882a593Smuzhiyun'dmsetup table' displays the table used to construct the mapping.
254*4882a593SmuzhiyunThe optional parameters are always printed in the order listed
255*4882a593Smuzhiyunabove with "sync" or "nosync" always output ahead of the other
256*4882a593Smuzhiyunarguments, regardless of the order used when originally loading the table.
257*4882a593SmuzhiyunArguments that can be repeated are ordered by value.
258*4882a593Smuzhiyun
259*4882a593Smuzhiyun
260*4882a593Smuzhiyun'dmsetup status' yields information on the state and health of the array.
261*4882a593SmuzhiyunThe output is as follows (normally a single line, but expanded here for
262*4882a593Smuzhiyunclarity)::
263*4882a593Smuzhiyun
264*4882a593Smuzhiyun  1: <s> <l> raid \
265*4882a593Smuzhiyun  2:      <raid_type> <#devices> <health_chars> \
266*4882a593Smuzhiyun  3:      <sync_ratio> <sync_action> <mismatch_cnt>
267*4882a593Smuzhiyun
268*4882a593SmuzhiyunLine 1 is the standard output produced by device-mapper.
269*4882a593Smuzhiyun
270*4882a593SmuzhiyunLine 2 & 3 are produced by the raid target and are best explained by example::
271*4882a593Smuzhiyun
272*4882a593Smuzhiyun        0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0
273*4882a593Smuzhiyun
274*4882a593SmuzhiyunHere we can see the RAID type is raid4, there are 5 devices - all of
275*4882a593Smuzhiyunwhich are 'A'live, and the array is 2/490221568 complete with its initial
276*4882a593Smuzhiyunrecovery.  Here is a fuller description of the individual fields:
277*4882a593Smuzhiyun
278*4882a593Smuzhiyun	=============== =========================================================
279*4882a593Smuzhiyun	<raid_type>     Same as the <raid_type> used to create the array.
280*4882a593Smuzhiyun	<health_chars>  One char for each device, indicating:
281*4882a593Smuzhiyun
282*4882a593Smuzhiyun			- 'A' = alive and in-sync
283*4882a593Smuzhiyun			- 'a' = alive but not in-sync
284*4882a593Smuzhiyun			- 'D' = dead/failed.
285*4882a593Smuzhiyun	<sync_ratio>    The ratio indicating how much of the array has undergone
286*4882a593Smuzhiyun			the process described by 'sync_action'.  If the
287*4882a593Smuzhiyun			'sync_action' is "check" or "repair", then the process
288*4882a593Smuzhiyun			of "resync" or "recover" can be considered complete.
289*4882a593Smuzhiyun	<sync_action>   One of the following possible states:
290*4882a593Smuzhiyun
291*4882a593Smuzhiyun			idle
292*4882a593Smuzhiyun				- No synchronization action is being performed.
293*4882a593Smuzhiyun			frozen
294*4882a593Smuzhiyun				- The current action has been halted.
295*4882a593Smuzhiyun			resync
296*4882a593Smuzhiyun				- Array is undergoing its initial synchronization
297*4882a593Smuzhiyun				  or is resynchronizing after an unclean shutdown
298*4882a593Smuzhiyun				  (possibly aided by a bitmap).
299*4882a593Smuzhiyun			recover
300*4882a593Smuzhiyun				- A device in the array is being rebuilt or
301*4882a593Smuzhiyun				  replaced.
302*4882a593Smuzhiyun			check
303*4882a593Smuzhiyun				- A user-initiated full check of the array is
304*4882a593Smuzhiyun				  being performed.  All blocks are read and
305*4882a593Smuzhiyun				  checked for consistency.  The number of
306*4882a593Smuzhiyun				  discrepancies found are recorded in
307*4882a593Smuzhiyun				  <mismatch_cnt>.  No changes are made to the
308*4882a593Smuzhiyun				  array by this action.
309*4882a593Smuzhiyun			repair
310*4882a593Smuzhiyun				- The same as "check", but discrepancies are
311*4882a593Smuzhiyun				  corrected.
312*4882a593Smuzhiyun			reshape
313*4882a593Smuzhiyun				- The array is undergoing a reshape.
314*4882a593Smuzhiyun	<mismatch_cnt>  The number of discrepancies found between mirror copies
315*4882a593Smuzhiyun			in RAID1/10 or wrong parity values found in RAID4/5/6.
316*4882a593Smuzhiyun			This value is valid only after a "check" of the array
317*4882a593Smuzhiyun			is performed.  A healthy array has a 'mismatch_cnt' of 0.
318*4882a593Smuzhiyun	<data_offset>   The current data offset to the start of the user data on
319*4882a593Smuzhiyun			each component device of a raid set (see the respective
320*4882a593Smuzhiyun			raid parameter to support out-of-place reshaping).
321*4882a593Smuzhiyun	<journal_char>	- 'A' - active write-through journal device.
322*4882a593Smuzhiyun			- 'a' - active write-back journal device.
323*4882a593Smuzhiyun			- 'D' - dead journal device.
324*4882a593Smuzhiyun			- '-' - no journal device.
325*4882a593Smuzhiyun	=============== =========================================================
326*4882a593Smuzhiyun
327*4882a593Smuzhiyun
328*4882a593SmuzhiyunMessage Interface
329*4882a593Smuzhiyun-----------------
330*4882a593SmuzhiyunThe dm-raid target will accept certain actions through the 'message' interface.
331*4882a593Smuzhiyun('man dmsetup' for more information on the message interface.)  These actions
332*4882a593Smuzhiyuninclude:
333*4882a593Smuzhiyun
334*4882a593Smuzhiyun	========= ================================================
335*4882a593Smuzhiyun	"idle"    Halt the current sync action.
336*4882a593Smuzhiyun	"frozen"  Freeze the current sync action.
337*4882a593Smuzhiyun	"resync"  Initiate/continue a resync.
338*4882a593Smuzhiyun	"recover" Initiate/continue a recover process.
339*4882a593Smuzhiyun	"check"   Initiate a check (i.e. a "scrub") of the array.
340*4882a593Smuzhiyun	"repair"  Initiate a repair of the array.
341*4882a593Smuzhiyun	========= ================================================
342*4882a593Smuzhiyun
343*4882a593Smuzhiyun
344*4882a593SmuzhiyunDiscard Support
345*4882a593Smuzhiyun---------------
346*4882a593SmuzhiyunThe implementation of discard support among hardware vendors varies.
347*4882a593SmuzhiyunWhen a block is discarded, some storage devices will return zeroes when
348*4882a593Smuzhiyunthe block is read.  These devices set the 'discard_zeroes_data'
349*4882a593Smuzhiyunattribute.  Other devices will return random data.  Confusingly, some
350*4882a593Smuzhiyundevices that advertise 'discard_zeroes_data' will not reliably return
351*4882a593Smuzhiyunzeroes when discarded blocks are read!  Since RAID 4/5/6 uses blocks
352*4882a593Smuzhiyunfrom a number of devices to calculate parity blocks and (for performance
353*4882a593Smuzhiyunreasons) relies on 'discard_zeroes_data' being reliable, it is important
354*4882a593Smuzhiyunthat the devices be consistent.  Blocks may be discarded in the middle
355*4882a593Smuzhiyunof a RAID 4/5/6 stripe and if subsequent read results are not
356*4882a593Smuzhiyunconsistent, the parity blocks may be calculated differently at any time;
357*4882a593Smuzhiyunmaking the parity blocks useless for redundancy.  It is important to
358*4882a593Smuzhiyununderstand how your hardware behaves with discards if you are going to
359*4882a593Smuzhiyunenable discards with RAID 4/5/6.
360*4882a593Smuzhiyun
361*4882a593SmuzhiyunSince the behavior of storage devices is unreliable in this respect,
362*4882a593Smuzhiyuneven when reporting 'discard_zeroes_data', by default RAID 4/5/6
363*4882a593Smuzhiyundiscard support is disabled -- this ensures data integrity at the
364*4882a593Smuzhiyunexpense of losing some performance.
365*4882a593Smuzhiyun
366*4882a593SmuzhiyunStorage devices that properly support 'discard_zeroes_data' are
367*4882a593Smuzhiyunincreasingly whitelisted in the kernel and can thus be trusted.
368*4882a593Smuzhiyun
369*4882a593SmuzhiyunFor trusted devices, the following dm-raid module parameter can be set
370*4882a593Smuzhiyunto safely enable discard support for RAID 4/5/6:
371*4882a593Smuzhiyun
372*4882a593Smuzhiyun    'devices_handle_discards_safely'
373*4882a593Smuzhiyun
374*4882a593Smuzhiyun
375*4882a593SmuzhiyunVersion History
376*4882a593Smuzhiyun---------------
377*4882a593Smuzhiyun
378*4882a593Smuzhiyun::
379*4882a593Smuzhiyun
380*4882a593Smuzhiyun 1.0.0	Initial version.  Support for RAID 4/5/6
381*4882a593Smuzhiyun 1.1.0	Added support for RAID 1
382*4882a593Smuzhiyun 1.2.0	Handle creation of arrays that contain failed devices.
383*4882a593Smuzhiyun 1.3.0	Added support for RAID 10
384*4882a593Smuzhiyun 1.3.1	Allow device replacement/rebuild for RAID 10
385*4882a593Smuzhiyun 1.3.2	Fix/improve redundancy checking for RAID10
386*4882a593Smuzhiyun 1.4.0	Non-functional change.  Removes arg from mapping function.
387*4882a593Smuzhiyun 1.4.1	RAID10 fix redundancy validation checks (commit 55ebbb5).
388*4882a593Smuzhiyun 1.4.2	Add RAID10 "far" and "offset" algorithm support.
389*4882a593Smuzhiyun 1.5.0	Add message interface to allow manipulation of the sync_action.
390*4882a593Smuzhiyun	New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
391*4882a593Smuzhiyun 1.5.1	Add ability to restore transiently failed devices on resume.
392*4882a593Smuzhiyun 1.5.2	'mismatch_cnt' is zero unless [last_]sync_action is "check".
393*4882a593Smuzhiyun 1.6.0	Add discard support (and devices_handle_discard_safely module param).
394*4882a593Smuzhiyun 1.7.0	Add support for MD RAID0 mappings.
395*4882a593Smuzhiyun 1.8.0	Explicitly check for compatible flags in the superblock metadata
396*4882a593Smuzhiyun	and reject to start the raid set if any are set by a newer
397*4882a593Smuzhiyun	target version, thus avoiding data corruption on a raid set
398*4882a593Smuzhiyun	with a reshape in progress.
399*4882a593Smuzhiyun 1.9.0	Add support for RAID level takeover/reshape/region size
400*4882a593Smuzhiyun	and set size reduction.
401*4882a593Smuzhiyun 1.9.1	Fix activation of existing RAID 4/10 mapped devices
402*4882a593Smuzhiyun 1.9.2	Don't emit '- -' on the status table line in case the constructor
403*4882a593Smuzhiyun	fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and
404*4882a593Smuzhiyun	'D' on the status line.  If '- -' is passed into the constructor, emit
405*4882a593Smuzhiyun	'- -' on the table line and '-' as the status line health character.
406*4882a593Smuzhiyun 1.10.0	Add support for raid4/5/6 journal device
407*4882a593Smuzhiyun 1.10.1	Fix data corruption on reshape request
408*4882a593Smuzhiyun 1.11.0	Fix table line argument order
409*4882a593Smuzhiyun	(wrong raid10_copies/raid10_format sequence)
410*4882a593Smuzhiyun 1.11.1	Add raid4/5/6 journal write-back support via journal_mode option
411*4882a593Smuzhiyun 1.12.1	Fix for MD deadlock between mddev_suspend() and md_write_start() available
412*4882a593Smuzhiyun 1.13.0	Fix dev_health status at end of "recover" (was 'a', now 'A')
413*4882a593Smuzhiyun 1.13.1	Fix deadlock caused by early md_stop_writes().  Also fix size an
414*4882a593Smuzhiyun	state races.
415*4882a593Smuzhiyun 1.13.2	Fix raid redundancy validation and avoid keeping raid set frozen
416*4882a593Smuzhiyun 1.14.0	Fix reshape race on small devices.  Fix stripe adding reshape
417*4882a593Smuzhiyun	deadlock/potential data corruption.  Update superblock when
418*4882a593Smuzhiyun	specific devices are requested via rebuild.  Fix RAID leg
419*4882a593Smuzhiyun	rebuild errors.
420*4882a593Smuzhiyun 1.15.0 Fix size extensions not being synchronized in case of new MD bitmap
421*4882a593Smuzhiyun        pages allocated;  also fix those not occuring after previous reductions
422*4882a593Smuzhiyun 1.15.1 Fix argument count and arguments for rebuild/write_mostly/journal_(dev|mode)
423*4882a593Smuzhiyun        on the status line.
424