xref: /OK3568_Linux_fs/kernel/Documentation/filesystems/ext4/journal.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593SmuzhiyunJournal (jbd2)
4*4882a593Smuzhiyun--------------
5*4882a593Smuzhiyun
6*4882a593SmuzhiyunIntroduced in ext3, the ext4 filesystem employs a journal to protect the
7*4882a593Smuzhiyunfilesystem against corruption in the case of a system crash. A small
8*4882a593Smuzhiyuncontinuous region of disk (default 128MiB) is reserved inside the
9*4882a593Smuzhiyunfilesystem as a place to land “important” data writes on-disk as quickly
10*4882a593Smuzhiyunas possible. Once the important data transaction is fully written to the
11*4882a593Smuzhiyundisk and flushed from the disk write cache, a record of the data being
12*4882a593Smuzhiyuncommitted is also written to the journal. At some later point in time,
13*4882a593Smuzhiyunthe journal code writes the transactions to their final locations on
14*4882a593Smuzhiyundisk (this could involve a lot of seeking or a lot of small
15*4882a593Smuzhiyunread-write-erases) before erasing the commit record. Should the system
16*4882a593Smuzhiyuncrash during the second slow write, the journal can be replayed all the
17*4882a593Smuzhiyunway to the latest commit record, guaranteeing the atomicity of whatever
18*4882a593Smuzhiyungets written through the journal to the disk. The effect of this is to
19*4882a593Smuzhiyunguarantee that the filesystem does not become stuck midway through a
20*4882a593Smuzhiyunmetadata update.
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunFor performance reasons, ext4 by default only writes filesystem metadata
23*4882a593Smuzhiyunthrough the journal. This means that file data blocks are /not/
24*4882a593Smuzhiyunguaranteed to be in any consistent state after a crash. If this default
25*4882a593Smuzhiyunguarantee level (``data=ordered``) is not satisfactory, there is a mount
26*4882a593Smuzhiyunoption to control journal behavior. If ``data=journal``, all data and
27*4882a593Smuzhiyunmetadata are written to disk through the journal. This is slower but
28*4882a593Smuzhiyunsafest. If ``data=writeback``, dirty data blocks are not flushed to the
29*4882a593Smuzhiyundisk before the metadata are written to disk through the journal.
30*4882a593Smuzhiyun
31*4882a593SmuzhiyunIn case of ``data=ordered`` mode, Ext4 also supports fast commits which
32*4882a593Smuzhiyunhelp reduce commit latency significantly. The default ``data=ordered``
33*4882a593Smuzhiyunmode works by logging metadata blocks to the journal. In fast commit
34*4882a593Smuzhiyunmode, Ext4 only stores the minimal delta needed to recreate the
35*4882a593Smuzhiyunaffected metadata in fast commit space that is shared with JBD2.
36*4882a593SmuzhiyunOnce the fast commit area fills in or if fast commit is not possible
37*4882a593Smuzhiyunor if JBD2 commit timer goes off, Ext4 performs a traditional full commit.
38*4882a593SmuzhiyunA full commit invalidates all the fast commits that happened before
39*4882a593Smuzhiyunit and thus it makes the fast commit area empty for further fast
40*4882a593Smuzhiyuncommits. This feature needs to be enabled at mkfs time.
41*4882a593Smuzhiyun
42*4882a593SmuzhiyunThe journal inode is typically inode 8. The first 68 bytes of the
43*4882a593Smuzhiyunjournal inode are replicated in the ext4 superblock. The journal itself
44*4882a593Smuzhiyunis normal (but hidden) file within the filesystem. The file usually
45*4882a593Smuzhiyunconsumes an entire block group, though mke2fs tries to put it in the
46*4882a593Smuzhiyunmiddle of the disk.
47*4882a593Smuzhiyun
48*4882a593SmuzhiyunAll fields in jbd2 are written to disk in big-endian order. This is the
49*4882a593Smuzhiyunopposite of ext4.
50*4882a593Smuzhiyun
51*4882a593SmuzhiyunNOTE: Both ext4 and ocfs2 use jbd2.
52*4882a593Smuzhiyun
53*4882a593SmuzhiyunThe maximum size of a journal embedded in an ext4 filesystem is 2^32
54*4882a593Smuzhiyunblocks. jbd2 itself does not seem to care.
55*4882a593Smuzhiyun
56*4882a593SmuzhiyunLayout
57*4882a593Smuzhiyun~~~~~~
58*4882a593Smuzhiyun
59*4882a593SmuzhiyunGenerally speaking, the journal has this format:
60*4882a593Smuzhiyun
61*4882a593Smuzhiyun.. list-table::
62*4882a593Smuzhiyun   :widths: 16 48 16
63*4882a593Smuzhiyun   :header-rows: 1
64*4882a593Smuzhiyun
65*4882a593Smuzhiyun   * - Superblock
66*4882a593Smuzhiyun     - descriptor\_block (data\_blocks or revocation\_block) [more data or
67*4882a593Smuzhiyun       revocations] commmit\_block
68*4882a593Smuzhiyun     - [more transactions...]
69*4882a593Smuzhiyun   * -
70*4882a593Smuzhiyun     - One transaction
71*4882a593Smuzhiyun     -
72*4882a593Smuzhiyun
73*4882a593SmuzhiyunNotice that a transaction begins with either a descriptor and some data,
74*4882a593Smuzhiyunor a block revocation list. A finished transaction always ends with a
75*4882a593Smuzhiyuncommit. If there is no commit record (or the checksums don't match), the
76*4882a593Smuzhiyuntransaction will be discarded during replay.
77*4882a593Smuzhiyun
78*4882a593SmuzhiyunExternal Journal
79*4882a593Smuzhiyun~~~~~~~~~~~~~~~~
80*4882a593Smuzhiyun
81*4882a593SmuzhiyunOptionally, an ext4 filesystem can be created with an external journal
82*4882a593Smuzhiyundevice (as opposed to an internal journal, which uses a reserved inode).
83*4882a593SmuzhiyunIn this case, on the filesystem device, ``s_journal_inum`` should be
84*4882a593Smuzhiyunzero and ``s_journal_uuid`` should be set. On the journal device there
85*4882a593Smuzhiyunwill be an ext4 super block in the usual place, with a matching UUID.
86*4882a593SmuzhiyunThe journal superblock will be in the next full block after the
87*4882a593Smuzhiyunsuperblock.
88*4882a593Smuzhiyun
89*4882a593Smuzhiyun.. list-table::
90*4882a593Smuzhiyun   :widths: 12 12 12 32 12
91*4882a593Smuzhiyun   :header-rows: 1
92*4882a593Smuzhiyun
93*4882a593Smuzhiyun   * - 1024 bytes of padding
94*4882a593Smuzhiyun     - ext4 Superblock
95*4882a593Smuzhiyun     - Journal Superblock
96*4882a593Smuzhiyun     - descriptor\_block (data\_blocks or revocation\_block) [more data or
97*4882a593Smuzhiyun       revocations] commmit\_block
98*4882a593Smuzhiyun     - [more transactions...]
99*4882a593Smuzhiyun   * -
100*4882a593Smuzhiyun     -
101*4882a593Smuzhiyun     -
102*4882a593Smuzhiyun     - One transaction
103*4882a593Smuzhiyun     -
104*4882a593Smuzhiyun
105*4882a593SmuzhiyunBlock Header
106*4882a593Smuzhiyun~~~~~~~~~~~~
107*4882a593Smuzhiyun
108*4882a593SmuzhiyunEvery block in the journal starts with a common 12-byte header
109*4882a593Smuzhiyun``struct journal_header_s``:
110*4882a593Smuzhiyun
111*4882a593Smuzhiyun.. list-table::
112*4882a593Smuzhiyun   :widths: 8 8 24 40
113*4882a593Smuzhiyun   :header-rows: 1
114*4882a593Smuzhiyun
115*4882a593Smuzhiyun   * - Offset
116*4882a593Smuzhiyun     - Type
117*4882a593Smuzhiyun     - Name
118*4882a593Smuzhiyun     - Description
119*4882a593Smuzhiyun   * - 0x0
120*4882a593Smuzhiyun     - \_\_be32
121*4882a593Smuzhiyun     - h\_magic
122*4882a593Smuzhiyun     - jbd2 magic number, 0xC03B3998.
123*4882a593Smuzhiyun   * - 0x4
124*4882a593Smuzhiyun     - \_\_be32
125*4882a593Smuzhiyun     - h\_blocktype
126*4882a593Smuzhiyun     - Description of what this block contains. See the jbd2_blocktype_ table
127*4882a593Smuzhiyun       below.
128*4882a593Smuzhiyun   * - 0x8
129*4882a593Smuzhiyun     - \_\_be32
130*4882a593Smuzhiyun     - h\_sequence
131*4882a593Smuzhiyun     - The transaction ID that goes with this block.
132*4882a593Smuzhiyun
133*4882a593Smuzhiyun.. _jbd2_blocktype:
134*4882a593Smuzhiyun
135*4882a593SmuzhiyunThe journal block type can be any one of:
136*4882a593Smuzhiyun
137*4882a593Smuzhiyun.. list-table::
138*4882a593Smuzhiyun   :widths: 16 64
139*4882a593Smuzhiyun   :header-rows: 1
140*4882a593Smuzhiyun
141*4882a593Smuzhiyun   * - Value
142*4882a593Smuzhiyun     - Description
143*4882a593Smuzhiyun   * - 1
144*4882a593Smuzhiyun     - Descriptor. This block precedes a series of data blocks that were
145*4882a593Smuzhiyun       written through the journal during a transaction.
146*4882a593Smuzhiyun   * - 2
147*4882a593Smuzhiyun     - Block commit record. This block signifies the completion of a
148*4882a593Smuzhiyun       transaction.
149*4882a593Smuzhiyun   * - 3
150*4882a593Smuzhiyun     - Journal superblock, v1.
151*4882a593Smuzhiyun   * - 4
152*4882a593Smuzhiyun     - Journal superblock, v2.
153*4882a593Smuzhiyun   * - 5
154*4882a593Smuzhiyun     - Block revocation records. This speeds up recovery by enabling the
155*4882a593Smuzhiyun       journal to skip writing blocks that were subsequently rewritten.
156*4882a593Smuzhiyun
157*4882a593SmuzhiyunSuper Block
158*4882a593Smuzhiyun~~~~~~~~~~~
159*4882a593Smuzhiyun
160*4882a593SmuzhiyunThe super block for the journal is much simpler as compared to ext4's.
161*4882a593SmuzhiyunThe key data kept within are size of the journal, and where to find the
162*4882a593Smuzhiyunstart of the log of transactions.
163*4882a593Smuzhiyun
164*4882a593SmuzhiyunThe journal superblock is recorded as ``struct journal_superblock_s``,
165*4882a593Smuzhiyunwhich is 1024 bytes long:
166*4882a593Smuzhiyun
167*4882a593Smuzhiyun.. list-table::
168*4882a593Smuzhiyun   :widths: 8 8 24 40
169*4882a593Smuzhiyun   :header-rows: 1
170*4882a593Smuzhiyun
171*4882a593Smuzhiyun   * - Offset
172*4882a593Smuzhiyun     - Type
173*4882a593Smuzhiyun     - Name
174*4882a593Smuzhiyun     - Description
175*4882a593Smuzhiyun   * -
176*4882a593Smuzhiyun     -
177*4882a593Smuzhiyun     -
178*4882a593Smuzhiyun     - Static information describing the journal.
179*4882a593Smuzhiyun   * - 0x0
180*4882a593Smuzhiyun     - journal\_header\_t (12 bytes)
181*4882a593Smuzhiyun     - s\_header
182*4882a593Smuzhiyun     - Common header identifying this as a superblock.
183*4882a593Smuzhiyun   * - 0xC
184*4882a593Smuzhiyun     - \_\_be32
185*4882a593Smuzhiyun     - s\_blocksize
186*4882a593Smuzhiyun     - Journal device block size.
187*4882a593Smuzhiyun   * - 0x10
188*4882a593Smuzhiyun     - \_\_be32
189*4882a593Smuzhiyun     - s\_maxlen
190*4882a593Smuzhiyun     - Total number of blocks in this journal.
191*4882a593Smuzhiyun   * - 0x14
192*4882a593Smuzhiyun     - \_\_be32
193*4882a593Smuzhiyun     - s\_first
194*4882a593Smuzhiyun     - First block of log information.
195*4882a593Smuzhiyun   * -
196*4882a593Smuzhiyun     -
197*4882a593Smuzhiyun     -
198*4882a593Smuzhiyun     - Dynamic information describing the current state of the log.
199*4882a593Smuzhiyun   * - 0x18
200*4882a593Smuzhiyun     - \_\_be32
201*4882a593Smuzhiyun     - s\_sequence
202*4882a593Smuzhiyun     - First commit ID expected in log.
203*4882a593Smuzhiyun   * - 0x1C
204*4882a593Smuzhiyun     - \_\_be32
205*4882a593Smuzhiyun     - s\_start
206*4882a593Smuzhiyun     - Block number of the start of log. Contrary to the comments, this field
207*4882a593Smuzhiyun       being zero does not imply that the journal is clean!
208*4882a593Smuzhiyun   * - 0x20
209*4882a593Smuzhiyun     - \_\_be32
210*4882a593Smuzhiyun     - s\_errno
211*4882a593Smuzhiyun     - Error value, as set by jbd2\_journal\_abort().
212*4882a593Smuzhiyun   * -
213*4882a593Smuzhiyun     -
214*4882a593Smuzhiyun     -
215*4882a593Smuzhiyun     - The remaining fields are only valid in a v2 superblock.
216*4882a593Smuzhiyun   * - 0x24
217*4882a593Smuzhiyun     - \_\_be32
218*4882a593Smuzhiyun     - s\_feature\_compat;
219*4882a593Smuzhiyun     - Compatible feature set. See the table jbd2_compat_ below.
220*4882a593Smuzhiyun   * - 0x28
221*4882a593Smuzhiyun     - \_\_be32
222*4882a593Smuzhiyun     - s\_feature\_incompat
223*4882a593Smuzhiyun     - Incompatible feature set. See the table jbd2_incompat_ below.
224*4882a593Smuzhiyun   * - 0x2C
225*4882a593Smuzhiyun     - \_\_be32
226*4882a593Smuzhiyun     - s\_feature\_ro\_compat
227*4882a593Smuzhiyun     - Read-only compatible feature set. There aren't any of these currently.
228*4882a593Smuzhiyun   * - 0x30
229*4882a593Smuzhiyun     - \_\_u8
230*4882a593Smuzhiyun     - s\_uuid[16]
231*4882a593Smuzhiyun     - 128-bit uuid for journal. This is compared against the copy in the ext4
232*4882a593Smuzhiyun       super block at mount time.
233*4882a593Smuzhiyun   * - 0x40
234*4882a593Smuzhiyun     - \_\_be32
235*4882a593Smuzhiyun     - s\_nr\_users
236*4882a593Smuzhiyun     - Number of file systems sharing this journal.
237*4882a593Smuzhiyun   * - 0x44
238*4882a593Smuzhiyun     - \_\_be32
239*4882a593Smuzhiyun     - s\_dynsuper
240*4882a593Smuzhiyun     - Location of dynamic super block copy. (Not used?)
241*4882a593Smuzhiyun   * - 0x48
242*4882a593Smuzhiyun     - \_\_be32
243*4882a593Smuzhiyun     - s\_max\_transaction
244*4882a593Smuzhiyun     - Limit of journal blocks per transaction. (Not used?)
245*4882a593Smuzhiyun   * - 0x4C
246*4882a593Smuzhiyun     - \_\_be32
247*4882a593Smuzhiyun     - s\_max\_trans\_data
248*4882a593Smuzhiyun     - Limit of data blocks per transaction. (Not used?)
249*4882a593Smuzhiyun   * - 0x50
250*4882a593Smuzhiyun     - \_\_u8
251*4882a593Smuzhiyun     - s\_checksum\_type
252*4882a593Smuzhiyun     - Checksum algorithm used for the journal.  See jbd2_checksum_type_ for
253*4882a593Smuzhiyun       more info.
254*4882a593Smuzhiyun   * - 0x51
255*4882a593Smuzhiyun     - \_\_u8[3]
256*4882a593Smuzhiyun     - s\_padding2
257*4882a593Smuzhiyun     -
258*4882a593Smuzhiyun   * - 0x54
259*4882a593Smuzhiyun     - \_\_be32
260*4882a593Smuzhiyun     - s\_num\_fc\_blocks
261*4882a593Smuzhiyun     - Number of fast commit blocks in the journal.
262*4882a593Smuzhiyun   * - 0x58
263*4882a593Smuzhiyun     - \_\_u32
264*4882a593Smuzhiyun     - s\_padding[42]
265*4882a593Smuzhiyun     -
266*4882a593Smuzhiyun   * - 0xFC
267*4882a593Smuzhiyun     - \_\_be32
268*4882a593Smuzhiyun     - s\_checksum
269*4882a593Smuzhiyun     - Checksum of the entire superblock, with this field set to zero.
270*4882a593Smuzhiyun   * - 0x100
271*4882a593Smuzhiyun     - \_\_u8
272*4882a593Smuzhiyun     - s\_users[16\*48]
273*4882a593Smuzhiyun     - ids of all file systems sharing the log. e2fsprogs/Linux don't allow
274*4882a593Smuzhiyun       shared external journals, but I imagine Lustre (or ocfs2?), which use
275*4882a593Smuzhiyun       the jbd2 code, might.
276*4882a593Smuzhiyun
277*4882a593Smuzhiyun.. _jbd2_compat:
278*4882a593Smuzhiyun
279*4882a593SmuzhiyunThe journal compat features are any combination of the following:
280*4882a593Smuzhiyun
281*4882a593Smuzhiyun.. list-table::
282*4882a593Smuzhiyun   :widths: 16 64
283*4882a593Smuzhiyun   :header-rows: 1
284*4882a593Smuzhiyun
285*4882a593Smuzhiyun   * - Value
286*4882a593Smuzhiyun     - Description
287*4882a593Smuzhiyun   * - 0x1
288*4882a593Smuzhiyun     - Journal maintains checksums on the data blocks.
289*4882a593Smuzhiyun       (JBD2\_FEATURE\_COMPAT\_CHECKSUM)
290*4882a593Smuzhiyun
291*4882a593Smuzhiyun.. _jbd2_incompat:
292*4882a593Smuzhiyun
293*4882a593SmuzhiyunThe journal incompat features are any combination of the following:
294*4882a593Smuzhiyun
295*4882a593Smuzhiyun.. list-table::
296*4882a593Smuzhiyun   :widths: 16 64
297*4882a593Smuzhiyun   :header-rows: 1
298*4882a593Smuzhiyun
299*4882a593Smuzhiyun   * - Value
300*4882a593Smuzhiyun     - Description
301*4882a593Smuzhiyun   * - 0x1
302*4882a593Smuzhiyun     - Journal has block revocation records. (JBD2\_FEATURE\_INCOMPAT\_REVOKE)
303*4882a593Smuzhiyun   * - 0x2
304*4882a593Smuzhiyun     - Journal can deal with 64-bit block numbers.
305*4882a593Smuzhiyun       (JBD2\_FEATURE\_INCOMPAT\_64BIT)
306*4882a593Smuzhiyun   * - 0x4
307*4882a593Smuzhiyun     - Journal commits asynchronously. (JBD2\_FEATURE\_INCOMPAT\_ASYNC\_COMMIT)
308*4882a593Smuzhiyun   * - 0x8
309*4882a593Smuzhiyun     - This journal uses v2 of the checksum on-disk format. Each journal
310*4882a593Smuzhiyun       metadata block gets its own checksum, and the block tags in the
311*4882a593Smuzhiyun       descriptor table contain checksums for each of the data blocks in the
312*4882a593Smuzhiyun       journal. (JBD2\_FEATURE\_INCOMPAT\_CSUM\_V2)
313*4882a593Smuzhiyun   * - 0x10
314*4882a593Smuzhiyun     - This journal uses v3 of the checksum on-disk format. This is the same as
315*4882a593Smuzhiyun       v2, but the journal block tag size is fixed regardless of the size of
316*4882a593Smuzhiyun       block numbers. (JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3)
317*4882a593Smuzhiyun   * - 0x20
318*4882a593Smuzhiyun     - Journal has fast commit blocks. (JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT)
319*4882a593Smuzhiyun
320*4882a593Smuzhiyun.. _jbd2_checksum_type:
321*4882a593Smuzhiyun
322*4882a593SmuzhiyunJournal checksum type codes are one of the following.  crc32 or crc32c are the
323*4882a593Smuzhiyunmost likely choices.
324*4882a593Smuzhiyun
325*4882a593Smuzhiyun.. list-table::
326*4882a593Smuzhiyun   :widths: 16 64
327*4882a593Smuzhiyun   :header-rows: 1
328*4882a593Smuzhiyun
329*4882a593Smuzhiyun   * - Value
330*4882a593Smuzhiyun     - Description
331*4882a593Smuzhiyun   * - 1
332*4882a593Smuzhiyun     - CRC32
333*4882a593Smuzhiyun   * - 2
334*4882a593Smuzhiyun     - MD5
335*4882a593Smuzhiyun   * - 3
336*4882a593Smuzhiyun     - SHA1
337*4882a593Smuzhiyun   * - 4
338*4882a593Smuzhiyun     - CRC32C
339*4882a593Smuzhiyun
340*4882a593SmuzhiyunDescriptor Block
341*4882a593Smuzhiyun~~~~~~~~~~~~~~~~
342*4882a593Smuzhiyun
343*4882a593SmuzhiyunThe descriptor block contains an array of journal block tags that
344*4882a593Smuzhiyundescribe the final locations of the data blocks that follow in the
345*4882a593Smuzhiyunjournal. Descriptor blocks are open-coded instead of being completely
346*4882a593Smuzhiyundescribed by a data structure, but here is the block structure anyway.
347*4882a593SmuzhiyunDescriptor blocks consume at least 36 bytes, but use a full block:
348*4882a593Smuzhiyun
349*4882a593Smuzhiyun.. list-table::
350*4882a593Smuzhiyun   :widths: 8 8 24 40
351*4882a593Smuzhiyun   :header-rows: 1
352*4882a593Smuzhiyun
353*4882a593Smuzhiyun   * - Offset
354*4882a593Smuzhiyun     - Type
355*4882a593Smuzhiyun     - Name
356*4882a593Smuzhiyun     - Descriptor
357*4882a593Smuzhiyun   * - 0x0
358*4882a593Smuzhiyun     - journal\_header\_t
359*4882a593Smuzhiyun     - (open coded)
360*4882a593Smuzhiyun     - Common block header.
361*4882a593Smuzhiyun   * - 0xC
362*4882a593Smuzhiyun     - struct journal\_block\_tag\_s
363*4882a593Smuzhiyun     - open coded array[]
364*4882a593Smuzhiyun     - Enough tags either to fill up the block or to describe all the data
365*4882a593Smuzhiyun       blocks that follow this descriptor block.
366*4882a593Smuzhiyun
367*4882a593SmuzhiyunJournal block tags have any of the following formats, depending on which
368*4882a593Smuzhiyunjournal feature and block tag flags are set.
369*4882a593Smuzhiyun
370*4882a593SmuzhiyunIf JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 is set, the journal block tag is
371*4882a593Smuzhiyundefined as ``struct journal_block_tag3_s``, which looks like the
372*4882a593Smuzhiyunfollowing. The size is 16 or 32 bytes.
373*4882a593Smuzhiyun
374*4882a593Smuzhiyun.. list-table::
375*4882a593Smuzhiyun   :widths: 8 8 24 40
376*4882a593Smuzhiyun   :header-rows: 1
377*4882a593Smuzhiyun
378*4882a593Smuzhiyun   * - Offset
379*4882a593Smuzhiyun     - Type
380*4882a593Smuzhiyun     - Name
381*4882a593Smuzhiyun     - Descriptor
382*4882a593Smuzhiyun   * - 0x0
383*4882a593Smuzhiyun     - \_\_be32
384*4882a593Smuzhiyun     - t\_blocknr
385*4882a593Smuzhiyun     - Lower 32-bits of the location of where the corresponding data block
386*4882a593Smuzhiyun       should end up on disk.
387*4882a593Smuzhiyun   * - 0x4
388*4882a593Smuzhiyun     - \_\_be32
389*4882a593Smuzhiyun     - t\_flags
390*4882a593Smuzhiyun     - Flags that go with the descriptor. See the table jbd2_tag_flags_ for
391*4882a593Smuzhiyun       more info.
392*4882a593Smuzhiyun   * - 0x8
393*4882a593Smuzhiyun     - \_\_be32
394*4882a593Smuzhiyun     - t\_blocknr\_high
395*4882a593Smuzhiyun     - Upper 32-bits of the location of where the corresponding data block
396*4882a593Smuzhiyun       should end up on disk. This is zero if JBD2\_FEATURE\_INCOMPAT\_64BIT is
397*4882a593Smuzhiyun       not enabled.
398*4882a593Smuzhiyun   * - 0xC
399*4882a593Smuzhiyun     - \_\_be32
400*4882a593Smuzhiyun     - t\_checksum
401*4882a593Smuzhiyun     - Checksum of the journal UUID, the sequence number, and the data block.
402*4882a593Smuzhiyun   * -
403*4882a593Smuzhiyun     -
404*4882a593Smuzhiyun     -
405*4882a593Smuzhiyun     - This field appears to be open coded. It always comes at the end of the
406*4882a593Smuzhiyun       tag, after t_checksum. This field is not present if the "same UUID" flag
407*4882a593Smuzhiyun       is set.
408*4882a593Smuzhiyun   * - 0x8 or 0xC
409*4882a593Smuzhiyun     - char
410*4882a593Smuzhiyun     - uuid[16]
411*4882a593Smuzhiyun     - A UUID to go with this tag. This field appears to be copied from the
412*4882a593Smuzhiyun       ``j_uuid`` field in ``struct journal_s``, but only tune2fs touches that
413*4882a593Smuzhiyun       field.
414*4882a593Smuzhiyun
415*4882a593Smuzhiyun.. _jbd2_tag_flags:
416*4882a593Smuzhiyun
417*4882a593SmuzhiyunThe journal tag flags are any combination of the following:
418*4882a593Smuzhiyun
419*4882a593Smuzhiyun.. list-table::
420*4882a593Smuzhiyun   :widths: 16 64
421*4882a593Smuzhiyun   :header-rows: 1
422*4882a593Smuzhiyun
423*4882a593Smuzhiyun   * - Value
424*4882a593Smuzhiyun     - Description
425*4882a593Smuzhiyun   * - 0x1
426*4882a593Smuzhiyun     - On-disk block is escaped. The first four bytes of the data block just
427*4882a593Smuzhiyun       happened to match the jbd2 magic number.
428*4882a593Smuzhiyun   * - 0x2
429*4882a593Smuzhiyun     - This block has the same UUID as previous, therefore the UUID field is
430*4882a593Smuzhiyun       omitted.
431*4882a593Smuzhiyun   * - 0x4
432*4882a593Smuzhiyun     - The data block was deleted by the transaction. (Not used?)
433*4882a593Smuzhiyun   * - 0x8
434*4882a593Smuzhiyun     - This is the last tag in this descriptor block.
435*4882a593Smuzhiyun
436*4882a593SmuzhiyunIf JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 is NOT set, the journal block tag
437*4882a593Smuzhiyunis defined as ``struct journal_block_tag_s``, which looks like the
438*4882a593Smuzhiyunfollowing. The size is 8, 12, 24, or 28 bytes:
439*4882a593Smuzhiyun
440*4882a593Smuzhiyun.. list-table::
441*4882a593Smuzhiyun   :widths: 8 8 24 40
442*4882a593Smuzhiyun   :header-rows: 1
443*4882a593Smuzhiyun
444*4882a593Smuzhiyun   * - Offset
445*4882a593Smuzhiyun     - Type
446*4882a593Smuzhiyun     - Name
447*4882a593Smuzhiyun     - Descriptor
448*4882a593Smuzhiyun   * - 0x0
449*4882a593Smuzhiyun     - \_\_be32
450*4882a593Smuzhiyun     - t\_blocknr
451*4882a593Smuzhiyun     - Lower 32-bits of the location of where the corresponding data block
452*4882a593Smuzhiyun       should end up on disk.
453*4882a593Smuzhiyun   * - 0x4
454*4882a593Smuzhiyun     - \_\_be16
455*4882a593Smuzhiyun     - t\_checksum
456*4882a593Smuzhiyun     - Checksum of the journal UUID, the sequence number, and the data block.
457*4882a593Smuzhiyun       Note that only the lower 16 bits are stored.
458*4882a593Smuzhiyun   * - 0x6
459*4882a593Smuzhiyun     - \_\_be16
460*4882a593Smuzhiyun     - t\_flags
461*4882a593Smuzhiyun     - Flags that go with the descriptor. See the table jbd2_tag_flags_ for
462*4882a593Smuzhiyun       more info.
463*4882a593Smuzhiyun   * -
464*4882a593Smuzhiyun     -
465*4882a593Smuzhiyun     -
466*4882a593Smuzhiyun     - This next field is only present if the super block indicates support for
467*4882a593Smuzhiyun       64-bit block numbers.
468*4882a593Smuzhiyun   * - 0x8
469*4882a593Smuzhiyun     - \_\_be32
470*4882a593Smuzhiyun     - t\_blocknr\_high
471*4882a593Smuzhiyun     - Upper 32-bits of the location of where the corresponding data block
472*4882a593Smuzhiyun       should end up on disk.
473*4882a593Smuzhiyun   * -
474*4882a593Smuzhiyun     -
475*4882a593Smuzhiyun     -
476*4882a593Smuzhiyun     - This field appears to be open coded. It always comes at the end of the
477*4882a593Smuzhiyun       tag, after t_flags or t_blocknr_high. This field is not present if the
478*4882a593Smuzhiyun       "same UUID" flag is set.
479*4882a593Smuzhiyun   * - 0x8 or 0xC
480*4882a593Smuzhiyun     - char
481*4882a593Smuzhiyun     - uuid[16]
482*4882a593Smuzhiyun     - A UUID to go with this tag. This field appears to be copied from the
483*4882a593Smuzhiyun       ``j_uuid`` field in ``struct journal_s``, but only tune2fs touches that
484*4882a593Smuzhiyun       field.
485*4882a593Smuzhiyun
486*4882a593SmuzhiyunIf JBD2\_FEATURE\_INCOMPAT\_CSUM\_V2 or
487*4882a593SmuzhiyunJBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 are set, the end of the block is a
488*4882a593Smuzhiyun``struct jbd2_journal_block_tail``, which looks like this:
489*4882a593Smuzhiyun
490*4882a593Smuzhiyun.. list-table::
491*4882a593Smuzhiyun   :widths: 8 8 24 40
492*4882a593Smuzhiyun   :header-rows: 1
493*4882a593Smuzhiyun
494*4882a593Smuzhiyun   * - Offset
495*4882a593Smuzhiyun     - Type
496*4882a593Smuzhiyun     - Name
497*4882a593Smuzhiyun     - Descriptor
498*4882a593Smuzhiyun   * - 0x0
499*4882a593Smuzhiyun     - \_\_be32
500*4882a593Smuzhiyun     - t\_checksum
501*4882a593Smuzhiyun     - Checksum of the journal UUID + the descriptor block, with this field set
502*4882a593Smuzhiyun       to zero.
503*4882a593Smuzhiyun
504*4882a593SmuzhiyunData Block
505*4882a593Smuzhiyun~~~~~~~~~~
506*4882a593Smuzhiyun
507*4882a593SmuzhiyunIn general, the data blocks being written to disk through the journal
508*4882a593Smuzhiyunare written verbatim into the journal file after the descriptor block.
509*4882a593SmuzhiyunHowever, if the first four bytes of the block match the jbd2 magic
510*4882a593Smuzhiyunnumber then those four bytes are replaced with zeroes and the “escaped”
511*4882a593Smuzhiyunflag is set in the descriptor block tag.
512*4882a593Smuzhiyun
513*4882a593SmuzhiyunRevocation Block
514*4882a593Smuzhiyun~~~~~~~~~~~~~~~~
515*4882a593Smuzhiyun
516*4882a593SmuzhiyunA revocation block is used to prevent replay of a block in an earlier
517*4882a593Smuzhiyuntransaction. This is used to mark blocks that were journalled at one
518*4882a593Smuzhiyuntime but are no longer journalled. Typically this happens if a metadata
519*4882a593Smuzhiyunblock is freed and re-allocated as a file data block; in this case, a
520*4882a593Smuzhiyunjournal replay after the file block was written to disk will cause
521*4882a593Smuzhiyuncorruption.
522*4882a593Smuzhiyun
523*4882a593Smuzhiyun**NOTE**: This mechanism is NOT used to express “this journal block is
524*4882a593Smuzhiyunsuperseded by this other journal block”, as the author (djwong)
525*4882a593Smuzhiyunmistakenly thought. Any block being added to a transaction will cause
526*4882a593Smuzhiyunthe removal of all existing revocation records for that block.
527*4882a593Smuzhiyun
528*4882a593SmuzhiyunRevocation blocks are described in
529*4882a593Smuzhiyun``struct jbd2_journal_revoke_header_s``, are at least 16 bytes in
530*4882a593Smuzhiyunlength, but use a full block:
531*4882a593Smuzhiyun
532*4882a593Smuzhiyun.. list-table::
533*4882a593Smuzhiyun   :widths: 8 8 24 40
534*4882a593Smuzhiyun   :header-rows: 1
535*4882a593Smuzhiyun
536*4882a593Smuzhiyun   * - Offset
537*4882a593Smuzhiyun     - Type
538*4882a593Smuzhiyun     - Name
539*4882a593Smuzhiyun     - Description
540*4882a593Smuzhiyun   * - 0x0
541*4882a593Smuzhiyun     - journal\_header\_t
542*4882a593Smuzhiyun     - r\_header
543*4882a593Smuzhiyun     - Common block header.
544*4882a593Smuzhiyun   * - 0xC
545*4882a593Smuzhiyun     - \_\_be32
546*4882a593Smuzhiyun     - r\_count
547*4882a593Smuzhiyun     - Number of bytes used in this block.
548*4882a593Smuzhiyun   * - 0x10
549*4882a593Smuzhiyun     - \_\_be32 or \_\_be64
550*4882a593Smuzhiyun     - blocks[0]
551*4882a593Smuzhiyun     - Blocks to revoke.
552*4882a593Smuzhiyun
553*4882a593SmuzhiyunAfter r\_count is a linear array of block numbers that are effectively
554*4882a593Smuzhiyunrevoked by this transaction. The size of each block number is 8 bytes if
555*4882a593Smuzhiyunthe superblock advertises 64-bit block number support, or 4 bytes
556*4882a593Smuzhiyunotherwise.
557*4882a593Smuzhiyun
558*4882a593SmuzhiyunIf JBD2\_FEATURE\_INCOMPAT\_CSUM\_V2 or
559*4882a593SmuzhiyunJBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 are set, the end of the revocation
560*4882a593Smuzhiyunblock is a ``struct jbd2_journal_revoke_tail``, which has this format:
561*4882a593Smuzhiyun
562*4882a593Smuzhiyun.. list-table::
563*4882a593Smuzhiyun   :widths: 8 8 24 40
564*4882a593Smuzhiyun   :header-rows: 1
565*4882a593Smuzhiyun
566*4882a593Smuzhiyun   * - Offset
567*4882a593Smuzhiyun     - Type
568*4882a593Smuzhiyun     - Name
569*4882a593Smuzhiyun     - Description
570*4882a593Smuzhiyun   * - 0x0
571*4882a593Smuzhiyun     - \_\_be32
572*4882a593Smuzhiyun     - r\_checksum
573*4882a593Smuzhiyun     - Checksum of the journal UUID + revocation block
574*4882a593Smuzhiyun
575*4882a593SmuzhiyunCommit Block
576*4882a593Smuzhiyun~~~~~~~~~~~~
577*4882a593Smuzhiyun
578*4882a593SmuzhiyunThe commit block is a sentry that indicates that a transaction has been
579*4882a593Smuzhiyuncompletely written to the journal. Once this commit block reaches the
580*4882a593Smuzhiyunjournal, the data stored with this transaction can be written to their
581*4882a593Smuzhiyunfinal locations on disk.
582*4882a593Smuzhiyun
583*4882a593SmuzhiyunThe commit block is described by ``struct commit_header``, which is 32
584*4882a593Smuzhiyunbytes long (but uses a full block):
585*4882a593Smuzhiyun
586*4882a593Smuzhiyun.. list-table::
587*4882a593Smuzhiyun   :widths: 8 8 24 40
588*4882a593Smuzhiyun   :header-rows: 1
589*4882a593Smuzhiyun
590*4882a593Smuzhiyun   * - Offset
591*4882a593Smuzhiyun     - Type
592*4882a593Smuzhiyun     - Name
593*4882a593Smuzhiyun     - Descriptor
594*4882a593Smuzhiyun   * - 0x0
595*4882a593Smuzhiyun     - journal\_header\_s
596*4882a593Smuzhiyun     - (open coded)
597*4882a593Smuzhiyun     - Common block header.
598*4882a593Smuzhiyun   * - 0xC
599*4882a593Smuzhiyun     - unsigned char
600*4882a593Smuzhiyun     - h\_chksum\_type
601*4882a593Smuzhiyun     - The type of checksum to use to verify the integrity of the data blocks
602*4882a593Smuzhiyun       in the transaction. See jbd2_checksum_type_ for more info.
603*4882a593Smuzhiyun   * - 0xD
604*4882a593Smuzhiyun     - unsigned char
605*4882a593Smuzhiyun     - h\_chksum\_size
606*4882a593Smuzhiyun     - The number of bytes used by the checksum. Most likely 4.
607*4882a593Smuzhiyun   * - 0xE
608*4882a593Smuzhiyun     - unsigned char
609*4882a593Smuzhiyun     - h\_padding[2]
610*4882a593Smuzhiyun     -
611*4882a593Smuzhiyun   * - 0x10
612*4882a593Smuzhiyun     - \_\_be32
613*4882a593Smuzhiyun     - h\_chksum[JBD2\_CHECKSUM\_BYTES]
614*4882a593Smuzhiyun     - 32 bytes of space to store checksums. If
615*4882a593Smuzhiyun       JBD2\_FEATURE\_INCOMPAT\_CSUM\_V2 or JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3
616*4882a593Smuzhiyun       are set, the first ``__be32`` is the checksum of the journal UUID and
617*4882a593Smuzhiyun       the entire commit block, with this field zeroed. If
618*4882a593Smuzhiyun       JBD2\_FEATURE\_COMPAT\_CHECKSUM is set, the first ``__be32`` is the
619*4882a593Smuzhiyun       crc32 of all the blocks already written to the transaction.
620*4882a593Smuzhiyun   * - 0x30
621*4882a593Smuzhiyun     - \_\_be64
622*4882a593Smuzhiyun     - h\_commit\_sec
623*4882a593Smuzhiyun     - The time that the transaction was committed, in seconds since the epoch.
624*4882a593Smuzhiyun   * - 0x38
625*4882a593Smuzhiyun     - \_\_be32
626*4882a593Smuzhiyun     - h\_commit\_nsec
627*4882a593Smuzhiyun     - Nanoseconds component of the above timestamp.
628*4882a593Smuzhiyun
629*4882a593SmuzhiyunFast commits
630*4882a593Smuzhiyun~~~~~~~~~~~~
631*4882a593Smuzhiyun
632*4882a593SmuzhiyunFast commit area is organized as a log of tag length values. Each TLV has
633*4882a593Smuzhiyuna ``struct ext4_fc_tl`` in the beginning which stores the tag and the length
634*4882a593Smuzhiyunof the entire field. It is followed by variable length tag specific value.
635*4882a593SmuzhiyunHere is the list of supported tags and their meanings:
636*4882a593Smuzhiyun
637*4882a593Smuzhiyun.. list-table::
638*4882a593Smuzhiyun   :widths: 8 20 20 32
639*4882a593Smuzhiyun   :header-rows: 1
640*4882a593Smuzhiyun
641*4882a593Smuzhiyun   * - Tag
642*4882a593Smuzhiyun     - Meaning
643*4882a593Smuzhiyun     - Value struct
644*4882a593Smuzhiyun     - Description
645*4882a593Smuzhiyun   * - EXT4_FC_TAG_HEAD
646*4882a593Smuzhiyun     - Fast commit area header
647*4882a593Smuzhiyun     - ``struct ext4_fc_head``
648*4882a593Smuzhiyun     - Stores the TID of the transaction after which these fast commits should
649*4882a593Smuzhiyun       be applied.
650*4882a593Smuzhiyun   * - EXT4_FC_TAG_ADD_RANGE
651*4882a593Smuzhiyun     - Add extent to inode
652*4882a593Smuzhiyun     - ``struct ext4_fc_add_range``
653*4882a593Smuzhiyun     - Stores the inode number and extent to be added in this inode
654*4882a593Smuzhiyun   * - EXT4_FC_TAG_DEL_RANGE
655*4882a593Smuzhiyun     - Remove logical offsets to inode
656*4882a593Smuzhiyun     - ``struct ext4_fc_del_range``
657*4882a593Smuzhiyun     - Stores the inode number and the logical offset range that needs to be
658*4882a593Smuzhiyun       removed
659*4882a593Smuzhiyun   * - EXT4_FC_TAG_CREAT
660*4882a593Smuzhiyun     - Create directory entry for a newly created file
661*4882a593Smuzhiyun     - ``struct ext4_fc_dentry_info``
662*4882a593Smuzhiyun     - Stores the parent inode number, inode number and directory entry of the
663*4882a593Smuzhiyun       newly created file
664*4882a593Smuzhiyun   * - EXT4_FC_TAG_LINK
665*4882a593Smuzhiyun     - Link a directory entry to an inode
666*4882a593Smuzhiyun     - ``struct ext4_fc_dentry_info``
667*4882a593Smuzhiyun     - Stores the parent inode number, inode number and directory entry
668*4882a593Smuzhiyun   * - EXT4_FC_TAG_UNLINK
669*4882a593Smuzhiyun     - Unlink a directory entry of an inode
670*4882a593Smuzhiyun     - ``struct ext4_fc_dentry_info``
671*4882a593Smuzhiyun     - Stores the parent inode number, inode number and directory entry
672*4882a593Smuzhiyun
673*4882a593Smuzhiyun   * - EXT4_FC_TAG_PAD
674*4882a593Smuzhiyun     - Padding (unused area)
675*4882a593Smuzhiyun     - None
676*4882a593Smuzhiyun     - Unused bytes in the fast commit area.
677*4882a593Smuzhiyun
678*4882a593Smuzhiyun   * - EXT4_FC_TAG_TAIL
679*4882a593Smuzhiyun     - Mark the end of a fast commit
680*4882a593Smuzhiyun     - ``struct ext4_fc_tail``
681*4882a593Smuzhiyun     - Stores the TID of the commit, CRC of the fast commit of which this tag
682*4882a593Smuzhiyun       represents the end of
683*4882a593Smuzhiyun
684