xref: /OK3568_Linux_fs/kernel/Documentation/admin-guide/device-mapper/dm-integrity.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun============
2*4882a593Smuzhiyundm-integrity
3*4882a593Smuzhiyun============
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunThe dm-integrity target emulates a block device that has additional
6*4882a593Smuzhiyunper-sector tags that can be used for storing integrity information.
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunA general problem with storing integrity tags with every sector is that
9*4882a593Smuzhiyunwriting the sector and the integrity tag must be atomic - i.e. in case of
10*4882a593Smuzhiyuncrash, either both sector and integrity tag or none of them is written.
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunTo guarantee write atomicity, the dm-integrity target uses journal, it
13*4882a593Smuzhiyunwrites sector data and integrity tags into a journal, commits the journal
14*4882a593Smuzhiyunand then copies the data and integrity tags to their respective location.
15*4882a593Smuzhiyun
16*4882a593SmuzhiyunThe dm-integrity target can be used with the dm-crypt target - in this
17*4882a593Smuzhiyunsituation the dm-crypt target creates the integrity data and passes them
18*4882a593Smuzhiyunto the dm-integrity target via bio_integrity_payload attached to the bio.
19*4882a593SmuzhiyunIn this mode, the dm-crypt and dm-integrity targets provide authenticated
20*4882a593Smuzhiyundisk encryption - if the attacker modifies the encrypted device, an I/O
21*4882a593Smuzhiyunerror is returned instead of random data.
22*4882a593Smuzhiyun
23*4882a593SmuzhiyunThe dm-integrity target can also be used as a standalone target, in this
24*4882a593Smuzhiyunmode it calculates and verifies the integrity tag internally. In this
25*4882a593Smuzhiyunmode, the dm-integrity target can be used to detect silent data
26*4882a593Smuzhiyuncorruption on the disk or in the I/O path.
27*4882a593Smuzhiyun
28*4882a593SmuzhiyunThere's an alternate mode of operation where dm-integrity uses bitmap
29*4882a593Smuzhiyuninstead of a journal. If a bit in the bitmap is 1, the corresponding
30*4882a593Smuzhiyunregion's data and integrity tags are not synchronized - if the machine
31*4882a593Smuzhiyuncrashes, the unsynchronized regions will be recalculated. The bitmap mode
32*4882a593Smuzhiyunis faster than the journal mode, because we don't have to write the data
33*4882a593Smuzhiyuntwice, but it is also less reliable, because if data corruption happens
34*4882a593Smuzhiyunwhen the machine crashes, it may not be detected.
35*4882a593Smuzhiyun
36*4882a593SmuzhiyunWhen loading the target for the first time, the kernel driver will format
37*4882a593Smuzhiyunthe device. But it will only format the device if the superblock contains
38*4882a593Smuzhiyunzeroes. If the superblock is neither valid nor zeroed, the dm-integrity
39*4882a593Smuzhiyuntarget can't be loaded.
40*4882a593Smuzhiyun
41*4882a593SmuzhiyunTo use the target for the first time:
42*4882a593Smuzhiyun
43*4882a593Smuzhiyun1. overwrite the superblock with zeroes
44*4882a593Smuzhiyun2. load the dm-integrity target with one-sector size, the kernel driver
45*4882a593Smuzhiyun   will format the device
46*4882a593Smuzhiyun3. unload the dm-integrity target
47*4882a593Smuzhiyun4. read the "provided_data_sectors" value from the superblock
48*4882a593Smuzhiyun5. load the dm-integrity target with the target size
49*4882a593Smuzhiyun   "provided_data_sectors"
50*4882a593Smuzhiyun6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
51*4882a593Smuzhiyun   with the size "provided_data_sectors"
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun
54*4882a593SmuzhiyunTarget arguments:
55*4882a593Smuzhiyun
56*4882a593Smuzhiyun1. the underlying block device
57*4882a593Smuzhiyun
58*4882a593Smuzhiyun2. the number of reserved sector at the beginning of the device - the
59*4882a593Smuzhiyun   dm-integrity won't read of write these sectors
60*4882a593Smuzhiyun
61*4882a593Smuzhiyun3. the size of the integrity tag (if "-" is used, the size is taken from
62*4882a593Smuzhiyun   the internal-hash algorithm)
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun4. mode:
65*4882a593Smuzhiyun
66*4882a593Smuzhiyun	D - direct writes (without journal)
67*4882a593Smuzhiyun		in this mode, journaling is
68*4882a593Smuzhiyun		not used and data sectors and integrity tags are written
69*4882a593Smuzhiyun		separately. In case of crash, it is possible that the data
70*4882a593Smuzhiyun		and integrity tag doesn't match.
71*4882a593Smuzhiyun	J - journaled writes
72*4882a593Smuzhiyun		data and integrity tags are written to the
73*4882a593Smuzhiyun		journal and atomicity is guaranteed. In case of crash,
74*4882a593Smuzhiyun		either both data and tag or none of them are written. The
75*4882a593Smuzhiyun		journaled mode degrades write throughput twice because the
76*4882a593Smuzhiyun		data have to be written twice.
77*4882a593Smuzhiyun	B - bitmap mode - data and metadata are written without any
78*4882a593Smuzhiyun		synchronization, the driver maintains a bitmap of dirty
79*4882a593Smuzhiyun		regions where data and metadata don't match. This mode can
80*4882a593Smuzhiyun		only be used with internal hash.
81*4882a593Smuzhiyun	R - recovery mode - in this mode, journal is not replayed,
82*4882a593Smuzhiyun		checksums are not checked and writes to the device are not
83*4882a593Smuzhiyun		allowed. This mode is useful for data recovery if the
84*4882a593Smuzhiyun		device cannot be activated in any of the other standard
85*4882a593Smuzhiyun		modes.
86*4882a593Smuzhiyun
87*4882a593Smuzhiyun5. the number of additional arguments
88*4882a593Smuzhiyun
89*4882a593SmuzhiyunAdditional arguments:
90*4882a593Smuzhiyun
91*4882a593Smuzhiyunjournal_sectors:number
92*4882a593Smuzhiyun	The size of journal, this argument is used only if formatting the
93*4882a593Smuzhiyun	device. If the device is already formatted, the value from the
94*4882a593Smuzhiyun	superblock is used.
95*4882a593Smuzhiyun
96*4882a593Smuzhiyuninterleave_sectors:number
97*4882a593Smuzhiyun	The number of interleaved sectors. This values is rounded down to
98*4882a593Smuzhiyun	a power of two. If the device is already formatted, the value from
99*4882a593Smuzhiyun	the superblock is used.
100*4882a593Smuzhiyun
101*4882a593Smuzhiyunmeta_device:device
102*4882a593Smuzhiyun	Don't interleave the data and metadata on the device. Use a
103*4882a593Smuzhiyun	separate device for metadata.
104*4882a593Smuzhiyun
105*4882a593Smuzhiyunbuffer_sectors:number
106*4882a593Smuzhiyun	The number of sectors in one buffer. The value is rounded down to
107*4882a593Smuzhiyun	a power of two.
108*4882a593Smuzhiyun
109*4882a593Smuzhiyun	The tag area is accessed using buffers, the buffer size is
110*4882a593Smuzhiyun	configurable. The large buffer size means that the I/O size will
111*4882a593Smuzhiyun	be larger, but there could be less I/Os issued.
112*4882a593Smuzhiyun
113*4882a593Smuzhiyunjournal_watermark:number
114*4882a593Smuzhiyun	The journal watermark in percents. When the size of the journal
115*4882a593Smuzhiyun	exceeds this watermark, the thread that flushes the journal will
116*4882a593Smuzhiyun	be started.
117*4882a593Smuzhiyun
118*4882a593Smuzhiyuncommit_time:number
119*4882a593Smuzhiyun	Commit time in milliseconds. When this time passes, the journal is
120*4882a593Smuzhiyun	written. The journal is also written immediatelly if the FLUSH
121*4882a593Smuzhiyun	request is received.
122*4882a593Smuzhiyun
123*4882a593Smuzhiyuninternal_hash:algorithm(:key)	(the key is optional)
124*4882a593Smuzhiyun	Use internal hash or crc.
125*4882a593Smuzhiyun	When this argument is used, the dm-integrity target won't accept
126*4882a593Smuzhiyun	integrity tags from the upper target, but it will automatically
127*4882a593Smuzhiyun	generate and verify the integrity tags.
128*4882a593Smuzhiyun
129*4882a593Smuzhiyun	You can use a crc algorithm (such as crc32), then integrity target
130*4882a593Smuzhiyun	will protect the data against accidental corruption.
131*4882a593Smuzhiyun	You can also use a hmac algorithm (for example
132*4882a593Smuzhiyun	"hmac(sha256):0123456789abcdef"), in this mode it will provide
133*4882a593Smuzhiyun	cryptographic authentication of the data without encryption.
134*4882a593Smuzhiyun
135*4882a593Smuzhiyun	When this argument is not used, the integrity tags are accepted
136*4882a593Smuzhiyun	from an upper layer target, such as dm-crypt. The upper layer
137*4882a593Smuzhiyun	target should check the validity of the integrity tags.
138*4882a593Smuzhiyun
139*4882a593Smuzhiyunrecalculate
140*4882a593Smuzhiyun	Recalculate the integrity tags automatically. It is only valid
141*4882a593Smuzhiyun	when using internal hash.
142*4882a593Smuzhiyun
143*4882a593Smuzhiyunjournal_crypt:algorithm(:key)	(the key is optional)
144*4882a593Smuzhiyun	Encrypt the journal using given algorithm to make sure that the
145*4882a593Smuzhiyun	attacker can't read the journal. You can use a block cipher here
146*4882a593Smuzhiyun	(such as "cbc(aes)") or a stream cipher (for example "chacha20",
147*4882a593Smuzhiyun	"salsa20" or "ctr(aes)").
148*4882a593Smuzhiyun
149*4882a593Smuzhiyun	The journal contains history of last writes to the block device,
150*4882a593Smuzhiyun	an attacker reading the journal could see the last sector nubmers
151*4882a593Smuzhiyun	that were written. From the sector numbers, the attacker can infer
152*4882a593Smuzhiyun	the size of files that were written. To protect against this
153*4882a593Smuzhiyun	situation, you can encrypt the journal.
154*4882a593Smuzhiyun
155*4882a593Smuzhiyunjournal_mac:algorithm(:key)	(the key is optional)
156*4882a593Smuzhiyun	Protect sector numbers in the journal from accidental or malicious
157*4882a593Smuzhiyun	modification. To protect against accidental modification, use a
158*4882a593Smuzhiyun	crc algorithm, to protect against malicious modification, use a
159*4882a593Smuzhiyun	hmac algorithm with a key.
160*4882a593Smuzhiyun
161*4882a593Smuzhiyun	This option is not needed when using internal-hash because in this
162*4882a593Smuzhiyun	mode, the integrity of journal entries is checked when replaying
163*4882a593Smuzhiyun	the journal. Thus, modified sector number would be detected at
164*4882a593Smuzhiyun	this stage.
165*4882a593Smuzhiyun
166*4882a593Smuzhiyunblock_size:number
167*4882a593Smuzhiyun	The size of a data block in bytes.  The larger the block size the
168*4882a593Smuzhiyun	less overhead there is for per-block integrity metadata.
169*4882a593Smuzhiyun	Supported values are 512, 1024, 2048 and 4096 bytes.  If not
170*4882a593Smuzhiyun	specified the default block size is 512 bytes.
171*4882a593Smuzhiyun
172*4882a593Smuzhiyunsectors_per_bit:number
173*4882a593Smuzhiyun	In the bitmap mode, this parameter specifies the number of
174*4882a593Smuzhiyun	512-byte sectors that corresponds to one bitmap bit.
175*4882a593Smuzhiyun
176*4882a593Smuzhiyunbitmap_flush_interval:number
177*4882a593Smuzhiyun	The bitmap flush interval in milliseconds. The metadata buffers
178*4882a593Smuzhiyun	are synchronized when this interval expires.
179*4882a593Smuzhiyun
180*4882a593Smuzhiyunallow_discards
181*4882a593Smuzhiyun	Allow block discard requests (a.k.a. TRIM) for the integrity device.
182*4882a593Smuzhiyun	Discards are only allowed to devices using internal hash.
183*4882a593Smuzhiyun
184*4882a593Smuzhiyunfix_padding
185*4882a593Smuzhiyun	Use a smaller padding of the tag area that is more
186*4882a593Smuzhiyun	space-efficient. If this option is not present, large padding is
187*4882a593Smuzhiyun	used - that is for compatibility with older kernels.
188*4882a593Smuzhiyun
189*4882a593Smuzhiyunlegacy_recalculate
190*4882a593Smuzhiyun	Allow recalculating of volumes with HMAC keys. This is disabled by
191*4882a593Smuzhiyun	default for security reasons - an attacker could modify the volume,
192*4882a593Smuzhiyun	set recalc_sector to zero, and the kernel would not detect the
193*4882a593Smuzhiyun	modification.
194*4882a593Smuzhiyun
195*4882a593SmuzhiyunThe journal mode (D/J), buffer_sectors, journal_watermark, commit_time and
196*4882a593Smuzhiyunallow_discards can be changed when reloading the target (load an inactive
197*4882a593Smuzhiyuntable and swap the tables with suspend and resume). The other arguments
198*4882a593Smuzhiyunshould not be changed when reloading the target because the layout of disk
199*4882a593Smuzhiyundata depend on them and the reloaded target would be non-functional.
200*4882a593Smuzhiyun
201*4882a593Smuzhiyun
202*4882a593SmuzhiyunStatus line:
203*4882a593Smuzhiyun
204*4882a593Smuzhiyun1. the number of integrity mismatches
205*4882a593Smuzhiyun2. provided data sectors - that is the number of sectors that the user
206*4882a593Smuzhiyun   could use
207*4882a593Smuzhiyun3. the current recalculating position (or '-' if we didn't recalculate)
208*4882a593Smuzhiyun
209*4882a593Smuzhiyun
210*4882a593SmuzhiyunThe layout of the formatted block device:
211*4882a593Smuzhiyun
212*4882a593Smuzhiyun* reserved sectors
213*4882a593Smuzhiyun    (they are not used by this target, they can be used for
214*4882a593Smuzhiyun    storing LUKS metadata or for other purpose), the size of the reserved
215*4882a593Smuzhiyun    area is specified in the target arguments
216*4882a593Smuzhiyun
217*4882a593Smuzhiyun* superblock (4kiB)
218*4882a593Smuzhiyun	* magic string - identifies that the device was formatted
219*4882a593Smuzhiyun	* version
220*4882a593Smuzhiyun	* log2(interleave sectors)
221*4882a593Smuzhiyun	* integrity tag size
222*4882a593Smuzhiyun	* the number of journal sections
223*4882a593Smuzhiyun	* provided data sectors - the number of sectors that this target
224*4882a593Smuzhiyun	  provides (i.e. the size of the device minus the size of all
225*4882a593Smuzhiyun	  metadata and padding). The user of this target should not send
226*4882a593Smuzhiyun	  bios that access data beyond the "provided data sectors" limit.
227*4882a593Smuzhiyun	* flags
228*4882a593Smuzhiyun	    SB_FLAG_HAVE_JOURNAL_MAC
229*4882a593Smuzhiyun		- a flag is set if journal_mac is used
230*4882a593Smuzhiyun	    SB_FLAG_RECALCULATING
231*4882a593Smuzhiyun		- recalculating is in progress
232*4882a593Smuzhiyun	    SB_FLAG_DIRTY_BITMAP
233*4882a593Smuzhiyun		- journal area contains the bitmap of dirty
234*4882a593Smuzhiyun		  blocks
235*4882a593Smuzhiyun	* log2(sectors per block)
236*4882a593Smuzhiyun	* a position where recalculating finished
237*4882a593Smuzhiyun* journal
238*4882a593Smuzhiyun	The journal is divided into sections, each section contains:
239*4882a593Smuzhiyun
240*4882a593Smuzhiyun	* metadata area (4kiB), it contains journal entries
241*4882a593Smuzhiyun
242*4882a593Smuzhiyun	  - every journal entry contains:
243*4882a593Smuzhiyun
244*4882a593Smuzhiyun		* logical sector (specifies where the data and tag should
245*4882a593Smuzhiyun		  be written)
246*4882a593Smuzhiyun		* last 8 bytes of data
247*4882a593Smuzhiyun		* integrity tag (the size is specified in the superblock)
248*4882a593Smuzhiyun
249*4882a593Smuzhiyun	  - every metadata sector ends with
250*4882a593Smuzhiyun
251*4882a593Smuzhiyun		* mac (8-bytes), all the macs in 8 metadata sectors form a
252*4882a593Smuzhiyun		  64-byte value. It is used to store hmac of sector
253*4882a593Smuzhiyun		  numbers in the journal section, to protect against a
254*4882a593Smuzhiyun		  possibility that the attacker tampers with sector
255*4882a593Smuzhiyun		  numbers in the journal.
256*4882a593Smuzhiyun		* commit id
257*4882a593Smuzhiyun
258*4882a593Smuzhiyun	* data area (the size is variable; it depends on how many journal
259*4882a593Smuzhiyun	  entries fit into the metadata area)
260*4882a593Smuzhiyun
261*4882a593Smuzhiyun	    - every sector in the data area contains:
262*4882a593Smuzhiyun
263*4882a593Smuzhiyun		* data (504 bytes of data, the last 8 bytes are stored in
264*4882a593Smuzhiyun		  the journal entry)
265*4882a593Smuzhiyun		* commit id
266*4882a593Smuzhiyun
267*4882a593Smuzhiyun	To test if the whole journal section was written correctly, every
268*4882a593Smuzhiyun	512-byte sector of the journal ends with 8-byte commit id. If the
269*4882a593Smuzhiyun	commit id matches on all sectors in a journal section, then it is
270*4882a593Smuzhiyun	assumed that the section was written correctly. If the commit id
271*4882a593Smuzhiyun	doesn't match, the section was written partially and it should not
272*4882a593Smuzhiyun	be replayed.
273*4882a593Smuzhiyun
274*4882a593Smuzhiyun* one or more runs of interleaved tags and data.
275*4882a593Smuzhiyun    Each run contains:
276*4882a593Smuzhiyun
277*4882a593Smuzhiyun	* tag area - it contains integrity tags. There is one tag for each
278*4882a593Smuzhiyun	  sector in the data area
279*4882a593Smuzhiyun	* data area - it contains data sectors. The number of data sectors
280*4882a593Smuzhiyun	  in one run must be a power of two. log2 of this value is stored
281*4882a593Smuzhiyun	  in the superblock.
282