1*4882a593Smuzhiyun============ 2*4882a593Smuzhiyundm-integrity 3*4882a593Smuzhiyun============ 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunThe dm-integrity target emulates a block device that has additional 6*4882a593Smuzhiyunper-sector tags that can be used for storing integrity information. 7*4882a593Smuzhiyun 8*4882a593SmuzhiyunA general problem with storing integrity tags with every sector is that 9*4882a593Smuzhiyunwriting the sector and the integrity tag must be atomic - i.e. in case of 10*4882a593Smuzhiyuncrash, either both sector and integrity tag or none of them is written. 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunTo guarantee write atomicity, the dm-integrity target uses journal, it 13*4882a593Smuzhiyunwrites sector data and integrity tags into a journal, commits the journal 14*4882a593Smuzhiyunand then copies the data and integrity tags to their respective location. 15*4882a593Smuzhiyun 16*4882a593SmuzhiyunThe dm-integrity target can be used with the dm-crypt target - in this 17*4882a593Smuzhiyunsituation the dm-crypt target creates the integrity data and passes them 18*4882a593Smuzhiyunto the dm-integrity target via bio_integrity_payload attached to the bio. 19*4882a593SmuzhiyunIn this mode, the dm-crypt and dm-integrity targets provide authenticated 20*4882a593Smuzhiyundisk encryption - if the attacker modifies the encrypted device, an I/O 21*4882a593Smuzhiyunerror is returned instead of random data. 22*4882a593Smuzhiyun 23*4882a593SmuzhiyunThe dm-integrity target can also be used as a standalone target, in this 24*4882a593Smuzhiyunmode it calculates and verifies the integrity tag internally. In this 25*4882a593Smuzhiyunmode, the dm-integrity target can be used to detect silent data 26*4882a593Smuzhiyuncorruption on the disk or in the I/O path. 27*4882a593Smuzhiyun 28*4882a593SmuzhiyunThere's an alternate mode of operation where dm-integrity uses bitmap 29*4882a593Smuzhiyuninstead of a journal. If a bit in the bitmap is 1, the corresponding 30*4882a593Smuzhiyunregion's data and integrity tags are not synchronized - if the machine 31*4882a593Smuzhiyuncrashes, the unsynchronized regions will be recalculated. The bitmap mode 32*4882a593Smuzhiyunis faster than the journal mode, because we don't have to write the data 33*4882a593Smuzhiyuntwice, but it is also less reliable, because if data corruption happens 34*4882a593Smuzhiyunwhen the machine crashes, it may not be detected. 35*4882a593Smuzhiyun 36*4882a593SmuzhiyunWhen loading the target for the first time, the kernel driver will format 37*4882a593Smuzhiyunthe device. But it will only format the device if the superblock contains 38*4882a593Smuzhiyunzeroes. If the superblock is neither valid nor zeroed, the dm-integrity 39*4882a593Smuzhiyuntarget can't be loaded. 40*4882a593Smuzhiyun 41*4882a593SmuzhiyunTo use the target for the first time: 42*4882a593Smuzhiyun 43*4882a593Smuzhiyun1. overwrite the superblock with zeroes 44*4882a593Smuzhiyun2. load the dm-integrity target with one-sector size, the kernel driver 45*4882a593Smuzhiyun will format the device 46*4882a593Smuzhiyun3. unload the dm-integrity target 47*4882a593Smuzhiyun4. read the "provided_data_sectors" value from the superblock 48*4882a593Smuzhiyun5. load the dm-integrity target with the target size 49*4882a593Smuzhiyun "provided_data_sectors" 50*4882a593Smuzhiyun6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target 51*4882a593Smuzhiyun with the size "provided_data_sectors" 52*4882a593Smuzhiyun 53*4882a593Smuzhiyun 54*4882a593SmuzhiyunTarget arguments: 55*4882a593Smuzhiyun 56*4882a593Smuzhiyun1. the underlying block device 57*4882a593Smuzhiyun 58*4882a593Smuzhiyun2. the number of reserved sector at the beginning of the device - the 59*4882a593Smuzhiyun dm-integrity won't read of write these sectors 60*4882a593Smuzhiyun 61*4882a593Smuzhiyun3. the size of the integrity tag (if "-" is used, the size is taken from 62*4882a593Smuzhiyun the internal-hash algorithm) 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun4. mode: 65*4882a593Smuzhiyun 66*4882a593Smuzhiyun D - direct writes (without journal) 67*4882a593Smuzhiyun in this mode, journaling is 68*4882a593Smuzhiyun not used and data sectors and integrity tags are written 69*4882a593Smuzhiyun separately. In case of crash, it is possible that the data 70*4882a593Smuzhiyun and integrity tag doesn't match. 71*4882a593Smuzhiyun J - journaled writes 72*4882a593Smuzhiyun data and integrity tags are written to the 73*4882a593Smuzhiyun journal and atomicity is guaranteed. In case of crash, 74*4882a593Smuzhiyun either both data and tag or none of them are written. The 75*4882a593Smuzhiyun journaled mode degrades write throughput twice because the 76*4882a593Smuzhiyun data have to be written twice. 77*4882a593Smuzhiyun B - bitmap mode - data and metadata are written without any 78*4882a593Smuzhiyun synchronization, the driver maintains a bitmap of dirty 79*4882a593Smuzhiyun regions where data and metadata don't match. This mode can 80*4882a593Smuzhiyun only be used with internal hash. 81*4882a593Smuzhiyun R - recovery mode - in this mode, journal is not replayed, 82*4882a593Smuzhiyun checksums are not checked and writes to the device are not 83*4882a593Smuzhiyun allowed. This mode is useful for data recovery if the 84*4882a593Smuzhiyun device cannot be activated in any of the other standard 85*4882a593Smuzhiyun modes. 86*4882a593Smuzhiyun 87*4882a593Smuzhiyun5. the number of additional arguments 88*4882a593Smuzhiyun 89*4882a593SmuzhiyunAdditional arguments: 90*4882a593Smuzhiyun 91*4882a593Smuzhiyunjournal_sectors:number 92*4882a593Smuzhiyun The size of journal, this argument is used only if formatting the 93*4882a593Smuzhiyun device. If the device is already formatted, the value from the 94*4882a593Smuzhiyun superblock is used. 95*4882a593Smuzhiyun 96*4882a593Smuzhiyuninterleave_sectors:number 97*4882a593Smuzhiyun The number of interleaved sectors. This values is rounded down to 98*4882a593Smuzhiyun a power of two. If the device is already formatted, the value from 99*4882a593Smuzhiyun the superblock is used. 100*4882a593Smuzhiyun 101*4882a593Smuzhiyunmeta_device:device 102*4882a593Smuzhiyun Don't interleave the data and metadata on the device. Use a 103*4882a593Smuzhiyun separate device for metadata. 104*4882a593Smuzhiyun 105*4882a593Smuzhiyunbuffer_sectors:number 106*4882a593Smuzhiyun The number of sectors in one buffer. The value is rounded down to 107*4882a593Smuzhiyun a power of two. 108*4882a593Smuzhiyun 109*4882a593Smuzhiyun The tag area is accessed using buffers, the buffer size is 110*4882a593Smuzhiyun configurable. The large buffer size means that the I/O size will 111*4882a593Smuzhiyun be larger, but there could be less I/Os issued. 112*4882a593Smuzhiyun 113*4882a593Smuzhiyunjournal_watermark:number 114*4882a593Smuzhiyun The journal watermark in percents. When the size of the journal 115*4882a593Smuzhiyun exceeds this watermark, the thread that flushes the journal will 116*4882a593Smuzhiyun be started. 117*4882a593Smuzhiyun 118*4882a593Smuzhiyuncommit_time:number 119*4882a593Smuzhiyun Commit time in milliseconds. When this time passes, the journal is 120*4882a593Smuzhiyun written. The journal is also written immediatelly if the FLUSH 121*4882a593Smuzhiyun request is received. 122*4882a593Smuzhiyun 123*4882a593Smuzhiyuninternal_hash:algorithm(:key) (the key is optional) 124*4882a593Smuzhiyun Use internal hash or crc. 125*4882a593Smuzhiyun When this argument is used, the dm-integrity target won't accept 126*4882a593Smuzhiyun integrity tags from the upper target, but it will automatically 127*4882a593Smuzhiyun generate and verify the integrity tags. 128*4882a593Smuzhiyun 129*4882a593Smuzhiyun You can use a crc algorithm (such as crc32), then integrity target 130*4882a593Smuzhiyun will protect the data against accidental corruption. 131*4882a593Smuzhiyun You can also use a hmac algorithm (for example 132*4882a593Smuzhiyun "hmac(sha256):0123456789abcdef"), in this mode it will provide 133*4882a593Smuzhiyun cryptographic authentication of the data without encryption. 134*4882a593Smuzhiyun 135*4882a593Smuzhiyun When this argument is not used, the integrity tags are accepted 136*4882a593Smuzhiyun from an upper layer target, such as dm-crypt. The upper layer 137*4882a593Smuzhiyun target should check the validity of the integrity tags. 138*4882a593Smuzhiyun 139*4882a593Smuzhiyunrecalculate 140*4882a593Smuzhiyun Recalculate the integrity tags automatically. It is only valid 141*4882a593Smuzhiyun when using internal hash. 142*4882a593Smuzhiyun 143*4882a593Smuzhiyunjournal_crypt:algorithm(:key) (the key is optional) 144*4882a593Smuzhiyun Encrypt the journal using given algorithm to make sure that the 145*4882a593Smuzhiyun attacker can't read the journal. You can use a block cipher here 146*4882a593Smuzhiyun (such as "cbc(aes)") or a stream cipher (for example "chacha20", 147*4882a593Smuzhiyun "salsa20" or "ctr(aes)"). 148*4882a593Smuzhiyun 149*4882a593Smuzhiyun The journal contains history of last writes to the block device, 150*4882a593Smuzhiyun an attacker reading the journal could see the last sector nubmers 151*4882a593Smuzhiyun that were written. From the sector numbers, the attacker can infer 152*4882a593Smuzhiyun the size of files that were written. To protect against this 153*4882a593Smuzhiyun situation, you can encrypt the journal. 154*4882a593Smuzhiyun 155*4882a593Smuzhiyunjournal_mac:algorithm(:key) (the key is optional) 156*4882a593Smuzhiyun Protect sector numbers in the journal from accidental or malicious 157*4882a593Smuzhiyun modification. To protect against accidental modification, use a 158*4882a593Smuzhiyun crc algorithm, to protect against malicious modification, use a 159*4882a593Smuzhiyun hmac algorithm with a key. 160*4882a593Smuzhiyun 161*4882a593Smuzhiyun This option is not needed when using internal-hash because in this 162*4882a593Smuzhiyun mode, the integrity of journal entries is checked when replaying 163*4882a593Smuzhiyun the journal. Thus, modified sector number would be detected at 164*4882a593Smuzhiyun this stage. 165*4882a593Smuzhiyun 166*4882a593Smuzhiyunblock_size:number 167*4882a593Smuzhiyun The size of a data block in bytes. The larger the block size the 168*4882a593Smuzhiyun less overhead there is for per-block integrity metadata. 169*4882a593Smuzhiyun Supported values are 512, 1024, 2048 and 4096 bytes. If not 170*4882a593Smuzhiyun specified the default block size is 512 bytes. 171*4882a593Smuzhiyun 172*4882a593Smuzhiyunsectors_per_bit:number 173*4882a593Smuzhiyun In the bitmap mode, this parameter specifies the number of 174*4882a593Smuzhiyun 512-byte sectors that corresponds to one bitmap bit. 175*4882a593Smuzhiyun 176*4882a593Smuzhiyunbitmap_flush_interval:number 177*4882a593Smuzhiyun The bitmap flush interval in milliseconds. The metadata buffers 178*4882a593Smuzhiyun are synchronized when this interval expires. 179*4882a593Smuzhiyun 180*4882a593Smuzhiyunallow_discards 181*4882a593Smuzhiyun Allow block discard requests (a.k.a. TRIM) for the integrity device. 182*4882a593Smuzhiyun Discards are only allowed to devices using internal hash. 183*4882a593Smuzhiyun 184*4882a593Smuzhiyunfix_padding 185*4882a593Smuzhiyun Use a smaller padding of the tag area that is more 186*4882a593Smuzhiyun space-efficient. If this option is not present, large padding is 187*4882a593Smuzhiyun used - that is for compatibility with older kernels. 188*4882a593Smuzhiyun 189*4882a593Smuzhiyunlegacy_recalculate 190*4882a593Smuzhiyun Allow recalculating of volumes with HMAC keys. This is disabled by 191*4882a593Smuzhiyun default for security reasons - an attacker could modify the volume, 192*4882a593Smuzhiyun set recalc_sector to zero, and the kernel would not detect the 193*4882a593Smuzhiyun modification. 194*4882a593Smuzhiyun 195*4882a593SmuzhiyunThe journal mode (D/J), buffer_sectors, journal_watermark, commit_time and 196*4882a593Smuzhiyunallow_discards can be changed when reloading the target (load an inactive 197*4882a593Smuzhiyuntable and swap the tables with suspend and resume). The other arguments 198*4882a593Smuzhiyunshould not be changed when reloading the target because the layout of disk 199*4882a593Smuzhiyundata depend on them and the reloaded target would be non-functional. 200*4882a593Smuzhiyun 201*4882a593Smuzhiyun 202*4882a593SmuzhiyunStatus line: 203*4882a593Smuzhiyun 204*4882a593Smuzhiyun1. the number of integrity mismatches 205*4882a593Smuzhiyun2. provided data sectors - that is the number of sectors that the user 206*4882a593Smuzhiyun could use 207*4882a593Smuzhiyun3. the current recalculating position (or '-' if we didn't recalculate) 208*4882a593Smuzhiyun 209*4882a593Smuzhiyun 210*4882a593SmuzhiyunThe layout of the formatted block device: 211*4882a593Smuzhiyun 212*4882a593Smuzhiyun* reserved sectors 213*4882a593Smuzhiyun (they are not used by this target, they can be used for 214*4882a593Smuzhiyun storing LUKS metadata or for other purpose), the size of the reserved 215*4882a593Smuzhiyun area is specified in the target arguments 216*4882a593Smuzhiyun 217*4882a593Smuzhiyun* superblock (4kiB) 218*4882a593Smuzhiyun * magic string - identifies that the device was formatted 219*4882a593Smuzhiyun * version 220*4882a593Smuzhiyun * log2(interleave sectors) 221*4882a593Smuzhiyun * integrity tag size 222*4882a593Smuzhiyun * the number of journal sections 223*4882a593Smuzhiyun * provided data sectors - the number of sectors that this target 224*4882a593Smuzhiyun provides (i.e. the size of the device minus the size of all 225*4882a593Smuzhiyun metadata and padding). The user of this target should not send 226*4882a593Smuzhiyun bios that access data beyond the "provided data sectors" limit. 227*4882a593Smuzhiyun * flags 228*4882a593Smuzhiyun SB_FLAG_HAVE_JOURNAL_MAC 229*4882a593Smuzhiyun - a flag is set if journal_mac is used 230*4882a593Smuzhiyun SB_FLAG_RECALCULATING 231*4882a593Smuzhiyun - recalculating is in progress 232*4882a593Smuzhiyun SB_FLAG_DIRTY_BITMAP 233*4882a593Smuzhiyun - journal area contains the bitmap of dirty 234*4882a593Smuzhiyun blocks 235*4882a593Smuzhiyun * log2(sectors per block) 236*4882a593Smuzhiyun * a position where recalculating finished 237*4882a593Smuzhiyun* journal 238*4882a593Smuzhiyun The journal is divided into sections, each section contains: 239*4882a593Smuzhiyun 240*4882a593Smuzhiyun * metadata area (4kiB), it contains journal entries 241*4882a593Smuzhiyun 242*4882a593Smuzhiyun - every journal entry contains: 243*4882a593Smuzhiyun 244*4882a593Smuzhiyun * logical sector (specifies where the data and tag should 245*4882a593Smuzhiyun be written) 246*4882a593Smuzhiyun * last 8 bytes of data 247*4882a593Smuzhiyun * integrity tag (the size is specified in the superblock) 248*4882a593Smuzhiyun 249*4882a593Smuzhiyun - every metadata sector ends with 250*4882a593Smuzhiyun 251*4882a593Smuzhiyun * mac (8-bytes), all the macs in 8 metadata sectors form a 252*4882a593Smuzhiyun 64-byte value. It is used to store hmac of sector 253*4882a593Smuzhiyun numbers in the journal section, to protect against a 254*4882a593Smuzhiyun possibility that the attacker tampers with sector 255*4882a593Smuzhiyun numbers in the journal. 256*4882a593Smuzhiyun * commit id 257*4882a593Smuzhiyun 258*4882a593Smuzhiyun * data area (the size is variable; it depends on how many journal 259*4882a593Smuzhiyun entries fit into the metadata area) 260*4882a593Smuzhiyun 261*4882a593Smuzhiyun - every sector in the data area contains: 262*4882a593Smuzhiyun 263*4882a593Smuzhiyun * data (504 bytes of data, the last 8 bytes are stored in 264*4882a593Smuzhiyun the journal entry) 265*4882a593Smuzhiyun * commit id 266*4882a593Smuzhiyun 267*4882a593Smuzhiyun To test if the whole journal section was written correctly, every 268*4882a593Smuzhiyun 512-byte sector of the journal ends with 8-byte commit id. If the 269*4882a593Smuzhiyun commit id matches on all sectors in a journal section, then it is 270*4882a593Smuzhiyun assumed that the section was written correctly. If the commit id 271*4882a593Smuzhiyun doesn't match, the section was written partially and it should not 272*4882a593Smuzhiyun be replayed. 273*4882a593Smuzhiyun 274*4882a593Smuzhiyun* one or more runs of interleaved tags and data. 275*4882a593Smuzhiyun Each run contains: 276*4882a593Smuzhiyun 277*4882a593Smuzhiyun * tag area - it contains integrity tags. There is one tag for each 278*4882a593Smuzhiyun sector in the data area 279*4882a593Smuzhiyun * data area - it contains data sectors. The number of data sectors 280*4882a593Smuzhiyun in one run must be a power of two. log2 of this value is stored 281*4882a593Smuzhiyun in the superblock. 282