1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593SmuzhiyunJournal (jbd2) 4*4882a593Smuzhiyun-------------- 5*4882a593Smuzhiyun 6*4882a593SmuzhiyunIntroduced in ext3, the ext4 filesystem employs a journal to protect the 7*4882a593Smuzhiyunfilesystem against corruption in the case of a system crash. A small 8*4882a593Smuzhiyuncontinuous region of disk (default 128MiB) is reserved inside the 9*4882a593Smuzhiyunfilesystem as a place to land “important” data writes on-disk as quickly 10*4882a593Smuzhiyunas possible. Once the important data transaction is fully written to the 11*4882a593Smuzhiyundisk and flushed from the disk write cache, a record of the data being 12*4882a593Smuzhiyuncommitted is also written to the journal. At some later point in time, 13*4882a593Smuzhiyunthe journal code writes the transactions to their final locations on 14*4882a593Smuzhiyundisk (this could involve a lot of seeking or a lot of small 15*4882a593Smuzhiyunread-write-erases) before erasing the commit record. Should the system 16*4882a593Smuzhiyuncrash during the second slow write, the journal can be replayed all the 17*4882a593Smuzhiyunway to the latest commit record, guaranteeing the atomicity of whatever 18*4882a593Smuzhiyungets written through the journal to the disk. The effect of this is to 19*4882a593Smuzhiyunguarantee that the filesystem does not become stuck midway through a 20*4882a593Smuzhiyunmetadata update. 21*4882a593Smuzhiyun 22*4882a593SmuzhiyunFor performance reasons, ext4 by default only writes filesystem metadata 23*4882a593Smuzhiyunthrough the journal. This means that file data blocks are /not/ 24*4882a593Smuzhiyunguaranteed to be in any consistent state after a crash. If this default 25*4882a593Smuzhiyunguarantee level (``data=ordered``) is not satisfactory, there is a mount 26*4882a593Smuzhiyunoption to control journal behavior. If ``data=journal``, all data and 27*4882a593Smuzhiyunmetadata are written to disk through the journal. This is slower but 28*4882a593Smuzhiyunsafest. If ``data=writeback``, dirty data blocks are not flushed to the 29*4882a593Smuzhiyundisk before the metadata are written to disk through the journal. 30*4882a593Smuzhiyun 31*4882a593SmuzhiyunIn case of ``data=ordered`` mode, Ext4 also supports fast commits which 32*4882a593Smuzhiyunhelp reduce commit latency significantly. The default ``data=ordered`` 33*4882a593Smuzhiyunmode works by logging metadata blocks to the journal. In fast commit 34*4882a593Smuzhiyunmode, Ext4 only stores the minimal delta needed to recreate the 35*4882a593Smuzhiyunaffected metadata in fast commit space that is shared with JBD2. 36*4882a593SmuzhiyunOnce the fast commit area fills in or if fast commit is not possible 37*4882a593Smuzhiyunor if JBD2 commit timer goes off, Ext4 performs a traditional full commit. 38*4882a593SmuzhiyunA full commit invalidates all the fast commits that happened before 39*4882a593Smuzhiyunit and thus it makes the fast commit area empty for further fast 40*4882a593Smuzhiyuncommits. This feature needs to be enabled at mkfs time. 41*4882a593Smuzhiyun 42*4882a593SmuzhiyunThe journal inode is typically inode 8. The first 68 bytes of the 43*4882a593Smuzhiyunjournal inode are replicated in the ext4 superblock. The journal itself 44*4882a593Smuzhiyunis normal (but hidden) file within the filesystem. The file usually 45*4882a593Smuzhiyunconsumes an entire block group, though mke2fs tries to put it in the 46*4882a593Smuzhiyunmiddle of the disk. 47*4882a593Smuzhiyun 48*4882a593SmuzhiyunAll fields in jbd2 are written to disk in big-endian order. This is the 49*4882a593Smuzhiyunopposite of ext4. 50*4882a593Smuzhiyun 51*4882a593SmuzhiyunNOTE: Both ext4 and ocfs2 use jbd2. 52*4882a593Smuzhiyun 53*4882a593SmuzhiyunThe maximum size of a journal embedded in an ext4 filesystem is 2^32 54*4882a593Smuzhiyunblocks. jbd2 itself does not seem to care. 55*4882a593Smuzhiyun 56*4882a593SmuzhiyunLayout 57*4882a593Smuzhiyun~~~~~~ 58*4882a593Smuzhiyun 59*4882a593SmuzhiyunGenerally speaking, the journal has this format: 60*4882a593Smuzhiyun 61*4882a593Smuzhiyun.. list-table:: 62*4882a593Smuzhiyun :widths: 16 48 16 63*4882a593Smuzhiyun :header-rows: 1 64*4882a593Smuzhiyun 65*4882a593Smuzhiyun * - Superblock 66*4882a593Smuzhiyun - descriptor\_block (data\_blocks or revocation\_block) [more data or 67*4882a593Smuzhiyun revocations] commmit\_block 68*4882a593Smuzhiyun - [more transactions...] 69*4882a593Smuzhiyun * - 70*4882a593Smuzhiyun - One transaction 71*4882a593Smuzhiyun - 72*4882a593Smuzhiyun 73*4882a593SmuzhiyunNotice that a transaction begins with either a descriptor and some data, 74*4882a593Smuzhiyunor a block revocation list. A finished transaction always ends with a 75*4882a593Smuzhiyuncommit. If there is no commit record (or the checksums don't match), the 76*4882a593Smuzhiyuntransaction will be discarded during replay. 77*4882a593Smuzhiyun 78*4882a593SmuzhiyunExternal Journal 79*4882a593Smuzhiyun~~~~~~~~~~~~~~~~ 80*4882a593Smuzhiyun 81*4882a593SmuzhiyunOptionally, an ext4 filesystem can be created with an external journal 82*4882a593Smuzhiyundevice (as opposed to an internal journal, which uses a reserved inode). 83*4882a593SmuzhiyunIn this case, on the filesystem device, ``s_journal_inum`` should be 84*4882a593Smuzhiyunzero and ``s_journal_uuid`` should be set. On the journal device there 85*4882a593Smuzhiyunwill be an ext4 super block in the usual place, with a matching UUID. 86*4882a593SmuzhiyunThe journal superblock will be in the next full block after the 87*4882a593Smuzhiyunsuperblock. 88*4882a593Smuzhiyun 89*4882a593Smuzhiyun.. list-table:: 90*4882a593Smuzhiyun :widths: 12 12 12 32 12 91*4882a593Smuzhiyun :header-rows: 1 92*4882a593Smuzhiyun 93*4882a593Smuzhiyun * - 1024 bytes of padding 94*4882a593Smuzhiyun - ext4 Superblock 95*4882a593Smuzhiyun - Journal Superblock 96*4882a593Smuzhiyun - descriptor\_block (data\_blocks or revocation\_block) [more data or 97*4882a593Smuzhiyun revocations] commmit\_block 98*4882a593Smuzhiyun - [more transactions...] 99*4882a593Smuzhiyun * - 100*4882a593Smuzhiyun - 101*4882a593Smuzhiyun - 102*4882a593Smuzhiyun - One transaction 103*4882a593Smuzhiyun - 104*4882a593Smuzhiyun 105*4882a593SmuzhiyunBlock Header 106*4882a593Smuzhiyun~~~~~~~~~~~~ 107*4882a593Smuzhiyun 108*4882a593SmuzhiyunEvery block in the journal starts with a common 12-byte header 109*4882a593Smuzhiyun``struct journal_header_s``: 110*4882a593Smuzhiyun 111*4882a593Smuzhiyun.. list-table:: 112*4882a593Smuzhiyun :widths: 8 8 24 40 113*4882a593Smuzhiyun :header-rows: 1 114*4882a593Smuzhiyun 115*4882a593Smuzhiyun * - Offset 116*4882a593Smuzhiyun - Type 117*4882a593Smuzhiyun - Name 118*4882a593Smuzhiyun - Description 119*4882a593Smuzhiyun * - 0x0 120*4882a593Smuzhiyun - \_\_be32 121*4882a593Smuzhiyun - h\_magic 122*4882a593Smuzhiyun - jbd2 magic number, 0xC03B3998. 123*4882a593Smuzhiyun * - 0x4 124*4882a593Smuzhiyun - \_\_be32 125*4882a593Smuzhiyun - h\_blocktype 126*4882a593Smuzhiyun - Description of what this block contains. See the jbd2_blocktype_ table 127*4882a593Smuzhiyun below. 128*4882a593Smuzhiyun * - 0x8 129*4882a593Smuzhiyun - \_\_be32 130*4882a593Smuzhiyun - h\_sequence 131*4882a593Smuzhiyun - The transaction ID that goes with this block. 132*4882a593Smuzhiyun 133*4882a593Smuzhiyun.. _jbd2_blocktype: 134*4882a593Smuzhiyun 135*4882a593SmuzhiyunThe journal block type can be any one of: 136*4882a593Smuzhiyun 137*4882a593Smuzhiyun.. list-table:: 138*4882a593Smuzhiyun :widths: 16 64 139*4882a593Smuzhiyun :header-rows: 1 140*4882a593Smuzhiyun 141*4882a593Smuzhiyun * - Value 142*4882a593Smuzhiyun - Description 143*4882a593Smuzhiyun * - 1 144*4882a593Smuzhiyun - Descriptor. This block precedes a series of data blocks that were 145*4882a593Smuzhiyun written through the journal during a transaction. 146*4882a593Smuzhiyun * - 2 147*4882a593Smuzhiyun - Block commit record. This block signifies the completion of a 148*4882a593Smuzhiyun transaction. 149*4882a593Smuzhiyun * - 3 150*4882a593Smuzhiyun - Journal superblock, v1. 151*4882a593Smuzhiyun * - 4 152*4882a593Smuzhiyun - Journal superblock, v2. 153*4882a593Smuzhiyun * - 5 154*4882a593Smuzhiyun - Block revocation records. This speeds up recovery by enabling the 155*4882a593Smuzhiyun journal to skip writing blocks that were subsequently rewritten. 156*4882a593Smuzhiyun 157*4882a593SmuzhiyunSuper Block 158*4882a593Smuzhiyun~~~~~~~~~~~ 159*4882a593Smuzhiyun 160*4882a593SmuzhiyunThe super block for the journal is much simpler as compared to ext4's. 161*4882a593SmuzhiyunThe key data kept within are size of the journal, and where to find the 162*4882a593Smuzhiyunstart of the log of transactions. 163*4882a593Smuzhiyun 164*4882a593SmuzhiyunThe journal superblock is recorded as ``struct journal_superblock_s``, 165*4882a593Smuzhiyunwhich is 1024 bytes long: 166*4882a593Smuzhiyun 167*4882a593Smuzhiyun.. list-table:: 168*4882a593Smuzhiyun :widths: 8 8 24 40 169*4882a593Smuzhiyun :header-rows: 1 170*4882a593Smuzhiyun 171*4882a593Smuzhiyun * - Offset 172*4882a593Smuzhiyun - Type 173*4882a593Smuzhiyun - Name 174*4882a593Smuzhiyun - Description 175*4882a593Smuzhiyun * - 176*4882a593Smuzhiyun - 177*4882a593Smuzhiyun - 178*4882a593Smuzhiyun - Static information describing the journal. 179*4882a593Smuzhiyun * - 0x0 180*4882a593Smuzhiyun - journal\_header\_t (12 bytes) 181*4882a593Smuzhiyun - s\_header 182*4882a593Smuzhiyun - Common header identifying this as a superblock. 183*4882a593Smuzhiyun * - 0xC 184*4882a593Smuzhiyun - \_\_be32 185*4882a593Smuzhiyun - s\_blocksize 186*4882a593Smuzhiyun - Journal device block size. 187*4882a593Smuzhiyun * - 0x10 188*4882a593Smuzhiyun - \_\_be32 189*4882a593Smuzhiyun - s\_maxlen 190*4882a593Smuzhiyun - Total number of blocks in this journal. 191*4882a593Smuzhiyun * - 0x14 192*4882a593Smuzhiyun - \_\_be32 193*4882a593Smuzhiyun - s\_first 194*4882a593Smuzhiyun - First block of log information. 195*4882a593Smuzhiyun * - 196*4882a593Smuzhiyun - 197*4882a593Smuzhiyun - 198*4882a593Smuzhiyun - Dynamic information describing the current state of the log. 199*4882a593Smuzhiyun * - 0x18 200*4882a593Smuzhiyun - \_\_be32 201*4882a593Smuzhiyun - s\_sequence 202*4882a593Smuzhiyun - First commit ID expected in log. 203*4882a593Smuzhiyun * - 0x1C 204*4882a593Smuzhiyun - \_\_be32 205*4882a593Smuzhiyun - s\_start 206*4882a593Smuzhiyun - Block number of the start of log. Contrary to the comments, this field 207*4882a593Smuzhiyun being zero does not imply that the journal is clean! 208*4882a593Smuzhiyun * - 0x20 209*4882a593Smuzhiyun - \_\_be32 210*4882a593Smuzhiyun - s\_errno 211*4882a593Smuzhiyun - Error value, as set by jbd2\_journal\_abort(). 212*4882a593Smuzhiyun * - 213*4882a593Smuzhiyun - 214*4882a593Smuzhiyun - 215*4882a593Smuzhiyun - The remaining fields are only valid in a v2 superblock. 216*4882a593Smuzhiyun * - 0x24 217*4882a593Smuzhiyun - \_\_be32 218*4882a593Smuzhiyun - s\_feature\_compat; 219*4882a593Smuzhiyun - Compatible feature set. See the table jbd2_compat_ below. 220*4882a593Smuzhiyun * - 0x28 221*4882a593Smuzhiyun - \_\_be32 222*4882a593Smuzhiyun - s\_feature\_incompat 223*4882a593Smuzhiyun - Incompatible feature set. See the table jbd2_incompat_ below. 224*4882a593Smuzhiyun * - 0x2C 225*4882a593Smuzhiyun - \_\_be32 226*4882a593Smuzhiyun - s\_feature\_ro\_compat 227*4882a593Smuzhiyun - Read-only compatible feature set. There aren't any of these currently. 228*4882a593Smuzhiyun * - 0x30 229*4882a593Smuzhiyun - \_\_u8 230*4882a593Smuzhiyun - s\_uuid[16] 231*4882a593Smuzhiyun - 128-bit uuid for journal. This is compared against the copy in the ext4 232*4882a593Smuzhiyun super block at mount time. 233*4882a593Smuzhiyun * - 0x40 234*4882a593Smuzhiyun - \_\_be32 235*4882a593Smuzhiyun - s\_nr\_users 236*4882a593Smuzhiyun - Number of file systems sharing this journal. 237*4882a593Smuzhiyun * - 0x44 238*4882a593Smuzhiyun - \_\_be32 239*4882a593Smuzhiyun - s\_dynsuper 240*4882a593Smuzhiyun - Location of dynamic super block copy. (Not used?) 241*4882a593Smuzhiyun * - 0x48 242*4882a593Smuzhiyun - \_\_be32 243*4882a593Smuzhiyun - s\_max\_transaction 244*4882a593Smuzhiyun - Limit of journal blocks per transaction. (Not used?) 245*4882a593Smuzhiyun * - 0x4C 246*4882a593Smuzhiyun - \_\_be32 247*4882a593Smuzhiyun - s\_max\_trans\_data 248*4882a593Smuzhiyun - Limit of data blocks per transaction. (Not used?) 249*4882a593Smuzhiyun * - 0x50 250*4882a593Smuzhiyun - \_\_u8 251*4882a593Smuzhiyun - s\_checksum\_type 252*4882a593Smuzhiyun - Checksum algorithm used for the journal. See jbd2_checksum_type_ for 253*4882a593Smuzhiyun more info. 254*4882a593Smuzhiyun * - 0x51 255*4882a593Smuzhiyun - \_\_u8[3] 256*4882a593Smuzhiyun - s\_padding2 257*4882a593Smuzhiyun - 258*4882a593Smuzhiyun * - 0x54 259*4882a593Smuzhiyun - \_\_be32 260*4882a593Smuzhiyun - s\_num\_fc\_blocks 261*4882a593Smuzhiyun - Number of fast commit blocks in the journal. 262*4882a593Smuzhiyun * - 0x58 263*4882a593Smuzhiyun - \_\_u32 264*4882a593Smuzhiyun - s\_padding[42] 265*4882a593Smuzhiyun - 266*4882a593Smuzhiyun * - 0xFC 267*4882a593Smuzhiyun - \_\_be32 268*4882a593Smuzhiyun - s\_checksum 269*4882a593Smuzhiyun - Checksum of the entire superblock, with this field set to zero. 270*4882a593Smuzhiyun * - 0x100 271*4882a593Smuzhiyun - \_\_u8 272*4882a593Smuzhiyun - s\_users[16\*48] 273*4882a593Smuzhiyun - ids of all file systems sharing the log. e2fsprogs/Linux don't allow 274*4882a593Smuzhiyun shared external journals, but I imagine Lustre (or ocfs2?), which use 275*4882a593Smuzhiyun the jbd2 code, might. 276*4882a593Smuzhiyun 277*4882a593Smuzhiyun.. _jbd2_compat: 278*4882a593Smuzhiyun 279*4882a593SmuzhiyunThe journal compat features are any combination of the following: 280*4882a593Smuzhiyun 281*4882a593Smuzhiyun.. list-table:: 282*4882a593Smuzhiyun :widths: 16 64 283*4882a593Smuzhiyun :header-rows: 1 284*4882a593Smuzhiyun 285*4882a593Smuzhiyun * - Value 286*4882a593Smuzhiyun - Description 287*4882a593Smuzhiyun * - 0x1 288*4882a593Smuzhiyun - Journal maintains checksums on the data blocks. 289*4882a593Smuzhiyun (JBD2\_FEATURE\_COMPAT\_CHECKSUM) 290*4882a593Smuzhiyun 291*4882a593Smuzhiyun.. _jbd2_incompat: 292*4882a593Smuzhiyun 293*4882a593SmuzhiyunThe journal incompat features are any combination of the following: 294*4882a593Smuzhiyun 295*4882a593Smuzhiyun.. list-table:: 296*4882a593Smuzhiyun :widths: 16 64 297*4882a593Smuzhiyun :header-rows: 1 298*4882a593Smuzhiyun 299*4882a593Smuzhiyun * - Value 300*4882a593Smuzhiyun - Description 301*4882a593Smuzhiyun * - 0x1 302*4882a593Smuzhiyun - Journal has block revocation records. (JBD2\_FEATURE\_INCOMPAT\_REVOKE) 303*4882a593Smuzhiyun * - 0x2 304*4882a593Smuzhiyun - Journal can deal with 64-bit block numbers. 305*4882a593Smuzhiyun (JBD2\_FEATURE\_INCOMPAT\_64BIT) 306*4882a593Smuzhiyun * - 0x4 307*4882a593Smuzhiyun - Journal commits asynchronously. (JBD2\_FEATURE\_INCOMPAT\_ASYNC\_COMMIT) 308*4882a593Smuzhiyun * - 0x8 309*4882a593Smuzhiyun - This journal uses v2 of the checksum on-disk format. Each journal 310*4882a593Smuzhiyun metadata block gets its own checksum, and the block tags in the 311*4882a593Smuzhiyun descriptor table contain checksums for each of the data blocks in the 312*4882a593Smuzhiyun journal. (JBD2\_FEATURE\_INCOMPAT\_CSUM\_V2) 313*4882a593Smuzhiyun * - 0x10 314*4882a593Smuzhiyun - This journal uses v3 of the checksum on-disk format. This is the same as 315*4882a593Smuzhiyun v2, but the journal block tag size is fixed regardless of the size of 316*4882a593Smuzhiyun block numbers. (JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3) 317*4882a593Smuzhiyun * - 0x20 318*4882a593Smuzhiyun - Journal has fast commit blocks. (JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT) 319*4882a593Smuzhiyun 320*4882a593Smuzhiyun.. _jbd2_checksum_type: 321*4882a593Smuzhiyun 322*4882a593SmuzhiyunJournal checksum type codes are one of the following. crc32 or crc32c are the 323*4882a593Smuzhiyunmost likely choices. 324*4882a593Smuzhiyun 325*4882a593Smuzhiyun.. list-table:: 326*4882a593Smuzhiyun :widths: 16 64 327*4882a593Smuzhiyun :header-rows: 1 328*4882a593Smuzhiyun 329*4882a593Smuzhiyun * - Value 330*4882a593Smuzhiyun - Description 331*4882a593Smuzhiyun * - 1 332*4882a593Smuzhiyun - CRC32 333*4882a593Smuzhiyun * - 2 334*4882a593Smuzhiyun - MD5 335*4882a593Smuzhiyun * - 3 336*4882a593Smuzhiyun - SHA1 337*4882a593Smuzhiyun * - 4 338*4882a593Smuzhiyun - CRC32C 339*4882a593Smuzhiyun 340*4882a593SmuzhiyunDescriptor Block 341*4882a593Smuzhiyun~~~~~~~~~~~~~~~~ 342*4882a593Smuzhiyun 343*4882a593SmuzhiyunThe descriptor block contains an array of journal block tags that 344*4882a593Smuzhiyundescribe the final locations of the data blocks that follow in the 345*4882a593Smuzhiyunjournal. Descriptor blocks are open-coded instead of being completely 346*4882a593Smuzhiyundescribed by a data structure, but here is the block structure anyway. 347*4882a593SmuzhiyunDescriptor blocks consume at least 36 bytes, but use a full block: 348*4882a593Smuzhiyun 349*4882a593Smuzhiyun.. list-table:: 350*4882a593Smuzhiyun :widths: 8 8 24 40 351*4882a593Smuzhiyun :header-rows: 1 352*4882a593Smuzhiyun 353*4882a593Smuzhiyun * - Offset 354*4882a593Smuzhiyun - Type 355*4882a593Smuzhiyun - Name 356*4882a593Smuzhiyun - Descriptor 357*4882a593Smuzhiyun * - 0x0 358*4882a593Smuzhiyun - journal\_header\_t 359*4882a593Smuzhiyun - (open coded) 360*4882a593Smuzhiyun - Common block header. 361*4882a593Smuzhiyun * - 0xC 362*4882a593Smuzhiyun - struct journal\_block\_tag\_s 363*4882a593Smuzhiyun - open coded array[] 364*4882a593Smuzhiyun - Enough tags either to fill up the block or to describe all the data 365*4882a593Smuzhiyun blocks that follow this descriptor block. 366*4882a593Smuzhiyun 367*4882a593SmuzhiyunJournal block tags have any of the following formats, depending on which 368*4882a593Smuzhiyunjournal feature and block tag flags are set. 369*4882a593Smuzhiyun 370*4882a593SmuzhiyunIf JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 is set, the journal block tag is 371*4882a593Smuzhiyundefined as ``struct journal_block_tag3_s``, which looks like the 372*4882a593Smuzhiyunfollowing. The size is 16 or 32 bytes. 373*4882a593Smuzhiyun 374*4882a593Smuzhiyun.. list-table:: 375*4882a593Smuzhiyun :widths: 8 8 24 40 376*4882a593Smuzhiyun :header-rows: 1 377*4882a593Smuzhiyun 378*4882a593Smuzhiyun * - Offset 379*4882a593Smuzhiyun - Type 380*4882a593Smuzhiyun - Name 381*4882a593Smuzhiyun - Descriptor 382*4882a593Smuzhiyun * - 0x0 383*4882a593Smuzhiyun - \_\_be32 384*4882a593Smuzhiyun - t\_blocknr 385*4882a593Smuzhiyun - Lower 32-bits of the location of where the corresponding data block 386*4882a593Smuzhiyun should end up on disk. 387*4882a593Smuzhiyun * - 0x4 388*4882a593Smuzhiyun - \_\_be32 389*4882a593Smuzhiyun - t\_flags 390*4882a593Smuzhiyun - Flags that go with the descriptor. See the table jbd2_tag_flags_ for 391*4882a593Smuzhiyun more info. 392*4882a593Smuzhiyun * - 0x8 393*4882a593Smuzhiyun - \_\_be32 394*4882a593Smuzhiyun - t\_blocknr\_high 395*4882a593Smuzhiyun - Upper 32-bits of the location of where the corresponding data block 396*4882a593Smuzhiyun should end up on disk. This is zero if JBD2\_FEATURE\_INCOMPAT\_64BIT is 397*4882a593Smuzhiyun not enabled. 398*4882a593Smuzhiyun * - 0xC 399*4882a593Smuzhiyun - \_\_be32 400*4882a593Smuzhiyun - t\_checksum 401*4882a593Smuzhiyun - Checksum of the journal UUID, the sequence number, and the data block. 402*4882a593Smuzhiyun * - 403*4882a593Smuzhiyun - 404*4882a593Smuzhiyun - 405*4882a593Smuzhiyun - This field appears to be open coded. It always comes at the end of the 406*4882a593Smuzhiyun tag, after t_checksum. This field is not present if the "same UUID" flag 407*4882a593Smuzhiyun is set. 408*4882a593Smuzhiyun * - 0x8 or 0xC 409*4882a593Smuzhiyun - char 410*4882a593Smuzhiyun - uuid[16] 411*4882a593Smuzhiyun - A UUID to go with this tag. This field appears to be copied from the 412*4882a593Smuzhiyun ``j_uuid`` field in ``struct journal_s``, but only tune2fs touches that 413*4882a593Smuzhiyun field. 414*4882a593Smuzhiyun 415*4882a593Smuzhiyun.. _jbd2_tag_flags: 416*4882a593Smuzhiyun 417*4882a593SmuzhiyunThe journal tag flags are any combination of the following: 418*4882a593Smuzhiyun 419*4882a593Smuzhiyun.. list-table:: 420*4882a593Smuzhiyun :widths: 16 64 421*4882a593Smuzhiyun :header-rows: 1 422*4882a593Smuzhiyun 423*4882a593Smuzhiyun * - Value 424*4882a593Smuzhiyun - Description 425*4882a593Smuzhiyun * - 0x1 426*4882a593Smuzhiyun - On-disk block is escaped. The first four bytes of the data block just 427*4882a593Smuzhiyun happened to match the jbd2 magic number. 428*4882a593Smuzhiyun * - 0x2 429*4882a593Smuzhiyun - This block has the same UUID as previous, therefore the UUID field is 430*4882a593Smuzhiyun omitted. 431*4882a593Smuzhiyun * - 0x4 432*4882a593Smuzhiyun - The data block was deleted by the transaction. (Not used?) 433*4882a593Smuzhiyun * - 0x8 434*4882a593Smuzhiyun - This is the last tag in this descriptor block. 435*4882a593Smuzhiyun 436*4882a593SmuzhiyunIf JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 is NOT set, the journal block tag 437*4882a593Smuzhiyunis defined as ``struct journal_block_tag_s``, which looks like the 438*4882a593Smuzhiyunfollowing. The size is 8, 12, 24, or 28 bytes: 439*4882a593Smuzhiyun 440*4882a593Smuzhiyun.. list-table:: 441*4882a593Smuzhiyun :widths: 8 8 24 40 442*4882a593Smuzhiyun :header-rows: 1 443*4882a593Smuzhiyun 444*4882a593Smuzhiyun * - Offset 445*4882a593Smuzhiyun - Type 446*4882a593Smuzhiyun - Name 447*4882a593Smuzhiyun - Descriptor 448*4882a593Smuzhiyun * - 0x0 449*4882a593Smuzhiyun - \_\_be32 450*4882a593Smuzhiyun - t\_blocknr 451*4882a593Smuzhiyun - Lower 32-bits of the location of where the corresponding data block 452*4882a593Smuzhiyun should end up on disk. 453*4882a593Smuzhiyun * - 0x4 454*4882a593Smuzhiyun - \_\_be16 455*4882a593Smuzhiyun - t\_checksum 456*4882a593Smuzhiyun - Checksum of the journal UUID, the sequence number, and the data block. 457*4882a593Smuzhiyun Note that only the lower 16 bits are stored. 458*4882a593Smuzhiyun * - 0x6 459*4882a593Smuzhiyun - \_\_be16 460*4882a593Smuzhiyun - t\_flags 461*4882a593Smuzhiyun - Flags that go with the descriptor. See the table jbd2_tag_flags_ for 462*4882a593Smuzhiyun more info. 463*4882a593Smuzhiyun * - 464*4882a593Smuzhiyun - 465*4882a593Smuzhiyun - 466*4882a593Smuzhiyun - This next field is only present if the super block indicates support for 467*4882a593Smuzhiyun 64-bit block numbers. 468*4882a593Smuzhiyun * - 0x8 469*4882a593Smuzhiyun - \_\_be32 470*4882a593Smuzhiyun - t\_blocknr\_high 471*4882a593Smuzhiyun - Upper 32-bits of the location of where the corresponding data block 472*4882a593Smuzhiyun should end up on disk. 473*4882a593Smuzhiyun * - 474*4882a593Smuzhiyun - 475*4882a593Smuzhiyun - 476*4882a593Smuzhiyun - This field appears to be open coded. It always comes at the end of the 477*4882a593Smuzhiyun tag, after t_flags or t_blocknr_high. This field is not present if the 478*4882a593Smuzhiyun "same UUID" flag is set. 479*4882a593Smuzhiyun * - 0x8 or 0xC 480*4882a593Smuzhiyun - char 481*4882a593Smuzhiyun - uuid[16] 482*4882a593Smuzhiyun - A UUID to go with this tag. This field appears to be copied from the 483*4882a593Smuzhiyun ``j_uuid`` field in ``struct journal_s``, but only tune2fs touches that 484*4882a593Smuzhiyun field. 485*4882a593Smuzhiyun 486*4882a593SmuzhiyunIf JBD2\_FEATURE\_INCOMPAT\_CSUM\_V2 or 487*4882a593SmuzhiyunJBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 are set, the end of the block is a 488*4882a593Smuzhiyun``struct jbd2_journal_block_tail``, which looks like this: 489*4882a593Smuzhiyun 490*4882a593Smuzhiyun.. list-table:: 491*4882a593Smuzhiyun :widths: 8 8 24 40 492*4882a593Smuzhiyun :header-rows: 1 493*4882a593Smuzhiyun 494*4882a593Smuzhiyun * - Offset 495*4882a593Smuzhiyun - Type 496*4882a593Smuzhiyun - Name 497*4882a593Smuzhiyun - Descriptor 498*4882a593Smuzhiyun * - 0x0 499*4882a593Smuzhiyun - \_\_be32 500*4882a593Smuzhiyun - t\_checksum 501*4882a593Smuzhiyun - Checksum of the journal UUID + the descriptor block, with this field set 502*4882a593Smuzhiyun to zero. 503*4882a593Smuzhiyun 504*4882a593SmuzhiyunData Block 505*4882a593Smuzhiyun~~~~~~~~~~ 506*4882a593Smuzhiyun 507*4882a593SmuzhiyunIn general, the data blocks being written to disk through the journal 508*4882a593Smuzhiyunare written verbatim into the journal file after the descriptor block. 509*4882a593SmuzhiyunHowever, if the first four bytes of the block match the jbd2 magic 510*4882a593Smuzhiyunnumber then those four bytes are replaced with zeroes and the “escaped” 511*4882a593Smuzhiyunflag is set in the descriptor block tag. 512*4882a593Smuzhiyun 513*4882a593SmuzhiyunRevocation Block 514*4882a593Smuzhiyun~~~~~~~~~~~~~~~~ 515*4882a593Smuzhiyun 516*4882a593SmuzhiyunA revocation block is used to prevent replay of a block in an earlier 517*4882a593Smuzhiyuntransaction. This is used to mark blocks that were journalled at one 518*4882a593Smuzhiyuntime but are no longer journalled. Typically this happens if a metadata 519*4882a593Smuzhiyunblock is freed and re-allocated as a file data block; in this case, a 520*4882a593Smuzhiyunjournal replay after the file block was written to disk will cause 521*4882a593Smuzhiyuncorruption. 522*4882a593Smuzhiyun 523*4882a593Smuzhiyun**NOTE**: This mechanism is NOT used to express “this journal block is 524*4882a593Smuzhiyunsuperseded by this other journal block”, as the author (djwong) 525*4882a593Smuzhiyunmistakenly thought. Any block being added to a transaction will cause 526*4882a593Smuzhiyunthe removal of all existing revocation records for that block. 527*4882a593Smuzhiyun 528*4882a593SmuzhiyunRevocation blocks are described in 529*4882a593Smuzhiyun``struct jbd2_journal_revoke_header_s``, are at least 16 bytes in 530*4882a593Smuzhiyunlength, but use a full block: 531*4882a593Smuzhiyun 532*4882a593Smuzhiyun.. list-table:: 533*4882a593Smuzhiyun :widths: 8 8 24 40 534*4882a593Smuzhiyun :header-rows: 1 535*4882a593Smuzhiyun 536*4882a593Smuzhiyun * - Offset 537*4882a593Smuzhiyun - Type 538*4882a593Smuzhiyun - Name 539*4882a593Smuzhiyun - Description 540*4882a593Smuzhiyun * - 0x0 541*4882a593Smuzhiyun - journal\_header\_t 542*4882a593Smuzhiyun - r\_header 543*4882a593Smuzhiyun - Common block header. 544*4882a593Smuzhiyun * - 0xC 545*4882a593Smuzhiyun - \_\_be32 546*4882a593Smuzhiyun - r\_count 547*4882a593Smuzhiyun - Number of bytes used in this block. 548*4882a593Smuzhiyun * - 0x10 549*4882a593Smuzhiyun - \_\_be32 or \_\_be64 550*4882a593Smuzhiyun - blocks[0] 551*4882a593Smuzhiyun - Blocks to revoke. 552*4882a593Smuzhiyun 553*4882a593SmuzhiyunAfter r\_count is a linear array of block numbers that are effectively 554*4882a593Smuzhiyunrevoked by this transaction. The size of each block number is 8 bytes if 555*4882a593Smuzhiyunthe superblock advertises 64-bit block number support, or 4 bytes 556*4882a593Smuzhiyunotherwise. 557*4882a593Smuzhiyun 558*4882a593SmuzhiyunIf JBD2\_FEATURE\_INCOMPAT\_CSUM\_V2 or 559*4882a593SmuzhiyunJBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 are set, the end of the revocation 560*4882a593Smuzhiyunblock is a ``struct jbd2_journal_revoke_tail``, which has this format: 561*4882a593Smuzhiyun 562*4882a593Smuzhiyun.. list-table:: 563*4882a593Smuzhiyun :widths: 8 8 24 40 564*4882a593Smuzhiyun :header-rows: 1 565*4882a593Smuzhiyun 566*4882a593Smuzhiyun * - Offset 567*4882a593Smuzhiyun - Type 568*4882a593Smuzhiyun - Name 569*4882a593Smuzhiyun - Description 570*4882a593Smuzhiyun * - 0x0 571*4882a593Smuzhiyun - \_\_be32 572*4882a593Smuzhiyun - r\_checksum 573*4882a593Smuzhiyun - Checksum of the journal UUID + revocation block 574*4882a593Smuzhiyun 575*4882a593SmuzhiyunCommit Block 576*4882a593Smuzhiyun~~~~~~~~~~~~ 577*4882a593Smuzhiyun 578*4882a593SmuzhiyunThe commit block is a sentry that indicates that a transaction has been 579*4882a593Smuzhiyuncompletely written to the journal. Once this commit block reaches the 580*4882a593Smuzhiyunjournal, the data stored with this transaction can be written to their 581*4882a593Smuzhiyunfinal locations on disk. 582*4882a593Smuzhiyun 583*4882a593SmuzhiyunThe commit block is described by ``struct commit_header``, which is 32 584*4882a593Smuzhiyunbytes long (but uses a full block): 585*4882a593Smuzhiyun 586*4882a593Smuzhiyun.. list-table:: 587*4882a593Smuzhiyun :widths: 8 8 24 40 588*4882a593Smuzhiyun :header-rows: 1 589*4882a593Smuzhiyun 590*4882a593Smuzhiyun * - Offset 591*4882a593Smuzhiyun - Type 592*4882a593Smuzhiyun - Name 593*4882a593Smuzhiyun - Descriptor 594*4882a593Smuzhiyun * - 0x0 595*4882a593Smuzhiyun - journal\_header\_s 596*4882a593Smuzhiyun - (open coded) 597*4882a593Smuzhiyun - Common block header. 598*4882a593Smuzhiyun * - 0xC 599*4882a593Smuzhiyun - unsigned char 600*4882a593Smuzhiyun - h\_chksum\_type 601*4882a593Smuzhiyun - The type of checksum to use to verify the integrity of the data blocks 602*4882a593Smuzhiyun in the transaction. See jbd2_checksum_type_ for more info. 603*4882a593Smuzhiyun * - 0xD 604*4882a593Smuzhiyun - unsigned char 605*4882a593Smuzhiyun - h\_chksum\_size 606*4882a593Smuzhiyun - The number of bytes used by the checksum. Most likely 4. 607*4882a593Smuzhiyun * - 0xE 608*4882a593Smuzhiyun - unsigned char 609*4882a593Smuzhiyun - h\_padding[2] 610*4882a593Smuzhiyun - 611*4882a593Smuzhiyun * - 0x10 612*4882a593Smuzhiyun - \_\_be32 613*4882a593Smuzhiyun - h\_chksum[JBD2\_CHECKSUM\_BYTES] 614*4882a593Smuzhiyun - 32 bytes of space to store checksums. If 615*4882a593Smuzhiyun JBD2\_FEATURE\_INCOMPAT\_CSUM\_V2 or JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 616*4882a593Smuzhiyun are set, the first ``__be32`` is the checksum of the journal UUID and 617*4882a593Smuzhiyun the entire commit block, with this field zeroed. If 618*4882a593Smuzhiyun JBD2\_FEATURE\_COMPAT\_CHECKSUM is set, the first ``__be32`` is the 619*4882a593Smuzhiyun crc32 of all the blocks already written to the transaction. 620*4882a593Smuzhiyun * - 0x30 621*4882a593Smuzhiyun - \_\_be64 622*4882a593Smuzhiyun - h\_commit\_sec 623*4882a593Smuzhiyun - The time that the transaction was committed, in seconds since the epoch. 624*4882a593Smuzhiyun * - 0x38 625*4882a593Smuzhiyun - \_\_be32 626*4882a593Smuzhiyun - h\_commit\_nsec 627*4882a593Smuzhiyun - Nanoseconds component of the above timestamp. 628*4882a593Smuzhiyun 629*4882a593SmuzhiyunFast commits 630*4882a593Smuzhiyun~~~~~~~~~~~~ 631*4882a593Smuzhiyun 632*4882a593SmuzhiyunFast commit area is organized as a log of tag length values. Each TLV has 633*4882a593Smuzhiyuna ``struct ext4_fc_tl`` in the beginning which stores the tag and the length 634*4882a593Smuzhiyunof the entire field. It is followed by variable length tag specific value. 635*4882a593SmuzhiyunHere is the list of supported tags and their meanings: 636*4882a593Smuzhiyun 637*4882a593Smuzhiyun.. list-table:: 638*4882a593Smuzhiyun :widths: 8 20 20 32 639*4882a593Smuzhiyun :header-rows: 1 640*4882a593Smuzhiyun 641*4882a593Smuzhiyun * - Tag 642*4882a593Smuzhiyun - Meaning 643*4882a593Smuzhiyun - Value struct 644*4882a593Smuzhiyun - Description 645*4882a593Smuzhiyun * - EXT4_FC_TAG_HEAD 646*4882a593Smuzhiyun - Fast commit area header 647*4882a593Smuzhiyun - ``struct ext4_fc_head`` 648*4882a593Smuzhiyun - Stores the TID of the transaction after which these fast commits should 649*4882a593Smuzhiyun be applied. 650*4882a593Smuzhiyun * - EXT4_FC_TAG_ADD_RANGE 651*4882a593Smuzhiyun - Add extent to inode 652*4882a593Smuzhiyun - ``struct ext4_fc_add_range`` 653*4882a593Smuzhiyun - Stores the inode number and extent to be added in this inode 654*4882a593Smuzhiyun * - EXT4_FC_TAG_DEL_RANGE 655*4882a593Smuzhiyun - Remove logical offsets to inode 656*4882a593Smuzhiyun - ``struct ext4_fc_del_range`` 657*4882a593Smuzhiyun - Stores the inode number and the logical offset range that needs to be 658*4882a593Smuzhiyun removed 659*4882a593Smuzhiyun * - EXT4_FC_TAG_CREAT 660*4882a593Smuzhiyun - Create directory entry for a newly created file 661*4882a593Smuzhiyun - ``struct ext4_fc_dentry_info`` 662*4882a593Smuzhiyun - Stores the parent inode number, inode number and directory entry of the 663*4882a593Smuzhiyun newly created file 664*4882a593Smuzhiyun * - EXT4_FC_TAG_LINK 665*4882a593Smuzhiyun - Link a directory entry to an inode 666*4882a593Smuzhiyun - ``struct ext4_fc_dentry_info`` 667*4882a593Smuzhiyun - Stores the parent inode number, inode number and directory entry 668*4882a593Smuzhiyun * - EXT4_FC_TAG_UNLINK 669*4882a593Smuzhiyun - Unlink a directory entry of an inode 670*4882a593Smuzhiyun - ``struct ext4_fc_dentry_info`` 671*4882a593Smuzhiyun - Stores the parent inode number, inode number and directory entry 672*4882a593Smuzhiyun 673*4882a593Smuzhiyun * - EXT4_FC_TAG_PAD 674*4882a593Smuzhiyun - Padding (unused area) 675*4882a593Smuzhiyun - None 676*4882a593Smuzhiyun - Unused bytes in the fast commit area. 677*4882a593Smuzhiyun 678*4882a593Smuzhiyun * - EXT4_FC_TAG_TAIL 679*4882a593Smuzhiyun - Mark the end of a fast commit 680*4882a593Smuzhiyun - ``struct ext4_fc_tail`` 681*4882a593Smuzhiyun - Stores the TID of the commit, CRC of the fast commit of which this tag 682*4882a593Smuzhiyun represents the end of 683*4882a593Smuzhiyun 684