1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun============================ 4*4882a593SmuzhiyunXFS Self Describing Metadata 5*4882a593Smuzhiyun============================ 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunIntroduction 8*4882a593Smuzhiyun============ 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunThe largest scalability problem facing XFS is not one of algorithmic 11*4882a593Smuzhiyunscalability, but of verification of the filesystem structure. Scalabilty of the 12*4882a593Smuzhiyunstructures and indexes on disk and the algorithms for iterating them are 13*4882a593Smuzhiyunadequate for supporting PB scale filesystems with billions of inodes, however it 14*4882a593Smuzhiyunis this very scalability that causes the verification problem. 15*4882a593Smuzhiyun 16*4882a593SmuzhiyunAlmost all metadata on XFS is dynamically allocated. The only fixed location 17*4882a593Smuzhiyunmetadata is the allocation group headers (SB, AGF, AGFL and AGI), while all 18*4882a593Smuzhiyunother metadata structures need to be discovered by walking the filesystem 19*4882a593Smuzhiyunstructure in different ways. While this is already done by userspace tools for 20*4882a593Smuzhiyunvalidating and repairing the structure, there are limits to what they can 21*4882a593Smuzhiyunverify, and this in turn limits the supportable size of an XFS filesystem. 22*4882a593Smuzhiyun 23*4882a593SmuzhiyunFor example, it is entirely possible to manually use xfs_db and a bit of 24*4882a593Smuzhiyunscripting to analyse the structure of a 100TB filesystem when trying to 25*4882a593Smuzhiyundetermine the root cause of a corruption problem, but it is still mainly a 26*4882a593Smuzhiyunmanual task of verifying that things like single bit errors or misplaced writes 27*4882a593Smuzhiyunweren't the ultimate cause of a corruption event. It may take a few hours to a 28*4882a593Smuzhiyunfew days to perform such forensic analysis, so for at this scale root cause 29*4882a593Smuzhiyunanalysis is entirely possible. 30*4882a593Smuzhiyun 31*4882a593SmuzhiyunHowever, if we scale the filesystem up to 1PB, we now have 10x as much metadata 32*4882a593Smuzhiyunto analyse and so that analysis blows out towards weeks/months of forensic work. 33*4882a593SmuzhiyunMost of the analysis work is slow and tedious, so as the amount of analysis goes 34*4882a593Smuzhiyunup, the more likely that the cause will be lost in the noise. Hence the primary 35*4882a593Smuzhiyunconcern for supporting PB scale filesystems is minimising the time and effort 36*4882a593Smuzhiyunrequired for basic forensic analysis of the filesystem structure. 37*4882a593Smuzhiyun 38*4882a593Smuzhiyun 39*4882a593SmuzhiyunSelf Describing Metadata 40*4882a593Smuzhiyun======================== 41*4882a593Smuzhiyun 42*4882a593SmuzhiyunOne of the problems with the current metadata format is that apart from the 43*4882a593Smuzhiyunmagic number in the metadata block, we have no other way of identifying what it 44*4882a593Smuzhiyunis supposed to be. We can't even identify if it is the right place. Put simply, 45*4882a593Smuzhiyunyou can't look at a single metadata block in isolation and say "yes, it is 46*4882a593Smuzhiyunsupposed to be there and the contents are valid". 47*4882a593Smuzhiyun 48*4882a593SmuzhiyunHence most of the time spent on forensic analysis is spent doing basic 49*4882a593Smuzhiyunverification of metadata values, looking for values that are in range (and hence 50*4882a593Smuzhiyunnot detected by automated verification checks) but are not correct. Finding and 51*4882a593Smuzhiyununderstanding how things like cross linked block lists (e.g. sibling 52*4882a593Smuzhiyunpointers in a btree end up with loops in them) are the key to understanding what 53*4882a593Smuzhiyunwent wrong, but it is impossible to tell what order the blocks were linked into 54*4882a593Smuzhiyuneach other or written to disk after the fact. 55*4882a593Smuzhiyun 56*4882a593SmuzhiyunHence we need to record more information into the metadata to allow us to 57*4882a593Smuzhiyunquickly determine if the metadata is intact and can be ignored for the purpose 58*4882a593Smuzhiyunof analysis. We can't protect against every possible type of error, but we can 59*4882a593Smuzhiyunensure that common types of errors are easily detectable. Hence the concept of 60*4882a593Smuzhiyunself describing metadata. 61*4882a593Smuzhiyun 62*4882a593SmuzhiyunThe first, fundamental requirement of self describing metadata is that the 63*4882a593Smuzhiyunmetadata object contains some form of unique identifier in a well known 64*4882a593Smuzhiyunlocation. This allows us to identify the expected contents of the block and 65*4882a593Smuzhiyunhence parse and verify the metadata object. IF we can't independently identify 66*4882a593Smuzhiyunthe type of metadata in the object, then the metadata doesn't describe itself 67*4882a593Smuzhiyunvery well at all! 68*4882a593Smuzhiyun 69*4882a593SmuzhiyunLuckily, almost all XFS metadata has magic numbers embedded already - only the 70*4882a593SmuzhiyunAGFL, remote symlinks and remote attribute blocks do not contain identifying 71*4882a593Smuzhiyunmagic numbers. Hence we can change the on-disk format of all these objects to 72*4882a593Smuzhiyunadd more identifying information and detect this simply by changing the magic 73*4882a593Smuzhiyunnumbers in the metadata objects. That is, if it has the current magic number, 74*4882a593Smuzhiyunthe metadata isn't self identifying. If it contains a new magic number, it is 75*4882a593Smuzhiyunself identifying and we can do much more expansive automated verification of the 76*4882a593Smuzhiyunmetadata object at runtime, during forensic analysis or repair. 77*4882a593Smuzhiyun 78*4882a593SmuzhiyunAs a primary concern, self describing metadata needs some form of overall 79*4882a593Smuzhiyunintegrity checking. We cannot trust the metadata if we cannot verify that it has 80*4882a593Smuzhiyunnot been changed as a result of external influences. Hence we need some form of 81*4882a593Smuzhiyunintegrity check, and this is done by adding CRC32c validation to the metadata 82*4882a593Smuzhiyunblock. If we can verify the block contains the metadata it was intended to 83*4882a593Smuzhiyuncontain, a large amount of the manual verification work can be skipped. 84*4882a593Smuzhiyun 85*4882a593SmuzhiyunCRC32c was selected as metadata cannot be more than 64k in length in XFS and 86*4882a593Smuzhiyunhence a 32 bit CRC is more than sufficient to detect multi-bit errors in 87*4882a593Smuzhiyunmetadata blocks. CRC32c is also now hardware accelerated on common CPUs so it is 88*4882a593Smuzhiyunfast. So while CRC32c is not the strongest of possible integrity checks that 89*4882a593Smuzhiyuncould be used, it is more than sufficient for our needs and has relatively 90*4882a593Smuzhiyunlittle overhead. Adding support for larger integrity fields and/or algorithms 91*4882a593Smuzhiyundoes really provide any extra value over CRC32c, but it does add a lot of 92*4882a593Smuzhiyuncomplexity and so there is no provision for changing the integrity checking 93*4882a593Smuzhiyunmechanism. 94*4882a593Smuzhiyun 95*4882a593SmuzhiyunSelf describing metadata needs to contain enough information so that the 96*4882a593Smuzhiyunmetadata block can be verified as being in the correct place without needing to 97*4882a593Smuzhiyunlook at any other metadata. This means it needs to contain location information. 98*4882a593SmuzhiyunJust adding a block number to the metadata is not sufficient to protect against 99*4882a593Smuzhiyunmis-directed writes - a write might be misdirected to the wrong LUN and so be 100*4882a593Smuzhiyunwritten to the "correct block" of the wrong filesystem. Hence location 101*4882a593Smuzhiyuninformation must contain a filesystem identifier as well as a block number. 102*4882a593Smuzhiyun 103*4882a593SmuzhiyunAnother key information point in forensic analysis is knowing who the metadata 104*4882a593Smuzhiyunblock belongs to. We already know the type, the location, that it is valid 105*4882a593Smuzhiyunand/or corrupted, and how long ago that it was last modified. Knowing the owner 106*4882a593Smuzhiyunof the block is important as it allows us to find other related metadata to 107*4882a593Smuzhiyundetermine the scope of the corruption. For example, if we have a extent btree 108*4882a593Smuzhiyunobject, we don't know what inode it belongs to and hence have to walk the entire 109*4882a593Smuzhiyunfilesystem to find the owner of the block. Worse, the corruption could mean that 110*4882a593Smuzhiyunno owner can be found (i.e. it's an orphan block), and so without an owner field 111*4882a593Smuzhiyunin the metadata we have no idea of the scope of the corruption. If we have an 112*4882a593Smuzhiyunowner field in the metadata object, we can immediately do top down validation to 113*4882a593Smuzhiyundetermine the scope of the problem. 114*4882a593Smuzhiyun 115*4882a593SmuzhiyunDifferent types of metadata have different owner identifiers. For example, 116*4882a593Smuzhiyundirectory, attribute and extent tree blocks are all owned by an inode, while 117*4882a593Smuzhiyunfreespace btree blocks are owned by an allocation group. Hence the size and 118*4882a593Smuzhiyuncontents of the owner field are determined by the type of metadata object we are 119*4882a593Smuzhiyunlooking at. The owner information can also identify misplaced writes (e.g. 120*4882a593Smuzhiyunfreespace btree block written to the wrong AG). 121*4882a593Smuzhiyun 122*4882a593SmuzhiyunSelf describing metadata also needs to contain some indication of when it was 123*4882a593Smuzhiyunwritten to the filesystem. One of the key information points when doing forensic 124*4882a593Smuzhiyunanalysis is how recently the block was modified. Correlation of set of corrupted 125*4882a593Smuzhiyunmetadata blocks based on modification times is important as it can indicate 126*4882a593Smuzhiyunwhether the corruptions are related, whether there's been multiple corruption 127*4882a593Smuzhiyunevents that lead to the eventual failure, and even whether there are corruptions 128*4882a593Smuzhiyunpresent that the run-time verification is not detecting. 129*4882a593Smuzhiyun 130*4882a593SmuzhiyunFor example, we can determine whether a metadata object is supposed to be free 131*4882a593Smuzhiyunspace or still allocated if it is still referenced by its owner by looking at 132*4882a593Smuzhiyunwhen the free space btree block that contains the block was last written 133*4882a593Smuzhiyuncompared to when the metadata object itself was last written. If the free space 134*4882a593Smuzhiyunblock is more recent than the object and the object's owner, then there is a 135*4882a593Smuzhiyunvery good chance that the block should have been removed from the owner. 136*4882a593Smuzhiyun 137*4882a593SmuzhiyunTo provide this "written timestamp", each metadata block gets the Log Sequence 138*4882a593SmuzhiyunNumber (LSN) of the most recent transaction it was modified on written into it. 139*4882a593SmuzhiyunThis number will always increase over the life of the filesystem, and the only 140*4882a593Smuzhiyunthing that resets it is running xfs_repair on the filesystem. Further, by use of 141*4882a593Smuzhiyunthe LSN we can tell if the corrupted metadata all belonged to the same log 142*4882a593Smuzhiyuncheckpoint and hence have some idea of how much modification occurred between 143*4882a593Smuzhiyunthe first and last instance of corrupt metadata on disk and, further, how much 144*4882a593Smuzhiyunmodification occurred between the corruption being written and when it was 145*4882a593Smuzhiyundetected. 146*4882a593Smuzhiyun 147*4882a593SmuzhiyunRuntime Validation 148*4882a593Smuzhiyun================== 149*4882a593Smuzhiyun 150*4882a593SmuzhiyunValidation of self-describing metadata takes place at runtime in two places: 151*4882a593Smuzhiyun 152*4882a593Smuzhiyun - immediately after a successful read from disk 153*4882a593Smuzhiyun - immediately prior to write IO submission 154*4882a593Smuzhiyun 155*4882a593SmuzhiyunThe verification is completely stateless - it is done independently of the 156*4882a593Smuzhiyunmodification process, and seeks only to check that the metadata is what it says 157*4882a593Smuzhiyunit is and that the metadata fields are within bounds and internally consistent. 158*4882a593SmuzhiyunAs such, we cannot catch all types of corruption that can occur within a block 159*4882a593Smuzhiyunas there may be certain limitations that operational state enforces of the 160*4882a593Smuzhiyunmetadata, or there may be corruption of interblock relationships (e.g. corrupted 161*4882a593Smuzhiyunsibling pointer lists). Hence we still need stateful checking in the main code 162*4882a593Smuzhiyunbody, but in general most of the per-field validation is handled by the 163*4882a593Smuzhiyunverifiers. 164*4882a593Smuzhiyun 165*4882a593SmuzhiyunFor read verification, the caller needs to specify the expected type of metadata 166*4882a593Smuzhiyunthat it should see, and the IO completion process verifies that the metadata 167*4882a593Smuzhiyunobject matches what was expected. If the verification process fails, then it 168*4882a593Smuzhiyunmarks the object being read as EFSCORRUPTED. The caller needs to catch this 169*4882a593Smuzhiyunerror (same as for IO errors), and if it needs to take special action due to a 170*4882a593Smuzhiyunverification error it can do so by catching the EFSCORRUPTED error value. If we 171*4882a593Smuzhiyunneed more discrimination of error type at higher levels, we can define new 172*4882a593Smuzhiyunerror numbers for different errors as necessary. 173*4882a593Smuzhiyun 174*4882a593SmuzhiyunThe first step in read verification is checking the magic number and determining 175*4882a593Smuzhiyunwhether CRC validating is necessary. If it is, the CRC32c is calculated and 176*4882a593Smuzhiyuncompared against the value stored in the object itself. Once this is validated, 177*4882a593Smuzhiyunfurther checks are made against the location information, followed by extensive 178*4882a593Smuzhiyunobject specific metadata validation. If any of these checks fail, then the 179*4882a593Smuzhiyunbuffer is considered corrupt and the EFSCORRUPTED error is set appropriately. 180*4882a593Smuzhiyun 181*4882a593SmuzhiyunWrite verification is the opposite of the read verification - first the object 182*4882a593Smuzhiyunis extensively verified and if it is OK we then update the LSN from the last 183*4882a593Smuzhiyunmodification made to the object, After this, we calculate the CRC and insert it 184*4882a593Smuzhiyuninto the object. Once this is done the write IO is allowed to continue. If any 185*4882a593Smuzhiyunerror occurs during this process, the buffer is again marked with a EFSCORRUPTED 186*4882a593Smuzhiyunerror for the higher layers to catch. 187*4882a593Smuzhiyun 188*4882a593SmuzhiyunStructures 189*4882a593Smuzhiyun========== 190*4882a593Smuzhiyun 191*4882a593SmuzhiyunA typical on-disk structure needs to contain the following information:: 192*4882a593Smuzhiyun 193*4882a593Smuzhiyun struct xfs_ondisk_hdr { 194*4882a593Smuzhiyun __be32 magic; /* magic number */ 195*4882a593Smuzhiyun __be32 crc; /* CRC, not logged */ 196*4882a593Smuzhiyun uuid_t uuid; /* filesystem identifier */ 197*4882a593Smuzhiyun __be64 owner; /* parent object */ 198*4882a593Smuzhiyun __be64 blkno; /* location on disk */ 199*4882a593Smuzhiyun __be64 lsn; /* last modification in log, not logged */ 200*4882a593Smuzhiyun }; 201*4882a593Smuzhiyun 202*4882a593SmuzhiyunDepending on the metadata, this information may be part of a header structure 203*4882a593Smuzhiyunseparate to the metadata contents, or may be distributed through an existing 204*4882a593Smuzhiyunstructure. The latter occurs with metadata that already contains some of this 205*4882a593Smuzhiyuninformation, such as the superblock and AG headers. 206*4882a593Smuzhiyun 207*4882a593SmuzhiyunOther metadata may have different formats for the information, but the same 208*4882a593Smuzhiyunlevel of information is generally provided. For example: 209*4882a593Smuzhiyun 210*4882a593Smuzhiyun - short btree blocks have a 32 bit owner (ag number) and a 32 bit block 211*4882a593Smuzhiyun number for location. The two of these combined provide the same 212*4882a593Smuzhiyun information as @owner and @blkno in eh above structure, but using 8 213*4882a593Smuzhiyun bytes less space on disk. 214*4882a593Smuzhiyun 215*4882a593Smuzhiyun - directory/attribute node blocks have a 16 bit magic number, and the 216*4882a593Smuzhiyun header that contains the magic number has other information in it as 217*4882a593Smuzhiyun well. hence the additional metadata headers change the overall format 218*4882a593Smuzhiyun of the metadata. 219*4882a593Smuzhiyun 220*4882a593SmuzhiyunA typical buffer read verifier is structured as follows:: 221*4882a593Smuzhiyun 222*4882a593Smuzhiyun #define XFS_FOO_CRC_OFF offsetof(struct xfs_ondisk_hdr, crc) 223*4882a593Smuzhiyun 224*4882a593Smuzhiyun static void 225*4882a593Smuzhiyun xfs_foo_read_verify( 226*4882a593Smuzhiyun struct xfs_buf *bp) 227*4882a593Smuzhiyun { 228*4882a593Smuzhiyun struct xfs_mount *mp = bp->b_mount; 229*4882a593Smuzhiyun 230*4882a593Smuzhiyun if ((xfs_sb_version_hascrc(&mp->m_sb) && 231*4882a593Smuzhiyun !xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length), 232*4882a593Smuzhiyun XFS_FOO_CRC_OFF)) || 233*4882a593Smuzhiyun !xfs_foo_verify(bp)) { 234*4882a593Smuzhiyun XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr); 235*4882a593Smuzhiyun xfs_buf_ioerror(bp, EFSCORRUPTED); 236*4882a593Smuzhiyun } 237*4882a593Smuzhiyun } 238*4882a593Smuzhiyun 239*4882a593SmuzhiyunThe code ensures that the CRC is only checked if the filesystem has CRCs enabled 240*4882a593Smuzhiyunby checking the superblock of the feature bit, and then if the CRC verifies OK 241*4882a593Smuzhiyun(or is not needed) it verifies the actual contents of the block. 242*4882a593Smuzhiyun 243*4882a593SmuzhiyunThe verifier function will take a couple of different forms, depending on 244*4882a593Smuzhiyunwhether the magic number can be used to determine the format of the block. In 245*4882a593Smuzhiyunthe case it can't, the code is structured as follows:: 246*4882a593Smuzhiyun 247*4882a593Smuzhiyun static bool 248*4882a593Smuzhiyun xfs_foo_verify( 249*4882a593Smuzhiyun struct xfs_buf *bp) 250*4882a593Smuzhiyun { 251*4882a593Smuzhiyun struct xfs_mount *mp = bp->b_mount; 252*4882a593Smuzhiyun struct xfs_ondisk_hdr *hdr = bp->b_addr; 253*4882a593Smuzhiyun 254*4882a593Smuzhiyun if (hdr->magic != cpu_to_be32(XFS_FOO_MAGIC)) 255*4882a593Smuzhiyun return false; 256*4882a593Smuzhiyun 257*4882a593Smuzhiyun if (!xfs_sb_version_hascrc(&mp->m_sb)) { 258*4882a593Smuzhiyun if (!uuid_equal(&hdr->uuid, &mp->m_sb.sb_uuid)) 259*4882a593Smuzhiyun return false; 260*4882a593Smuzhiyun if (bp->b_bn != be64_to_cpu(hdr->blkno)) 261*4882a593Smuzhiyun return false; 262*4882a593Smuzhiyun if (hdr->owner == 0) 263*4882a593Smuzhiyun return false; 264*4882a593Smuzhiyun } 265*4882a593Smuzhiyun 266*4882a593Smuzhiyun /* object specific verification checks here */ 267*4882a593Smuzhiyun 268*4882a593Smuzhiyun return true; 269*4882a593Smuzhiyun } 270*4882a593Smuzhiyun 271*4882a593SmuzhiyunIf there are different magic numbers for the different formats, the verifier 272*4882a593Smuzhiyunwill look like:: 273*4882a593Smuzhiyun 274*4882a593Smuzhiyun static bool 275*4882a593Smuzhiyun xfs_foo_verify( 276*4882a593Smuzhiyun struct xfs_buf *bp) 277*4882a593Smuzhiyun { 278*4882a593Smuzhiyun struct xfs_mount *mp = bp->b_mount; 279*4882a593Smuzhiyun struct xfs_ondisk_hdr *hdr = bp->b_addr; 280*4882a593Smuzhiyun 281*4882a593Smuzhiyun if (hdr->magic == cpu_to_be32(XFS_FOO_CRC_MAGIC)) { 282*4882a593Smuzhiyun if (!uuid_equal(&hdr->uuid, &mp->m_sb.sb_uuid)) 283*4882a593Smuzhiyun return false; 284*4882a593Smuzhiyun if (bp->b_bn != be64_to_cpu(hdr->blkno)) 285*4882a593Smuzhiyun return false; 286*4882a593Smuzhiyun if (hdr->owner == 0) 287*4882a593Smuzhiyun return false; 288*4882a593Smuzhiyun } else if (hdr->magic != cpu_to_be32(XFS_FOO_MAGIC)) 289*4882a593Smuzhiyun return false; 290*4882a593Smuzhiyun 291*4882a593Smuzhiyun /* object specific verification checks here */ 292*4882a593Smuzhiyun 293*4882a593Smuzhiyun return true; 294*4882a593Smuzhiyun } 295*4882a593Smuzhiyun 296*4882a593SmuzhiyunWrite verifiers are very similar to the read verifiers, they just do things in 297*4882a593Smuzhiyunthe opposite order to the read verifiers. A typical write verifier:: 298*4882a593Smuzhiyun 299*4882a593Smuzhiyun static void 300*4882a593Smuzhiyun xfs_foo_write_verify( 301*4882a593Smuzhiyun struct xfs_buf *bp) 302*4882a593Smuzhiyun { 303*4882a593Smuzhiyun struct xfs_mount *mp = bp->b_mount; 304*4882a593Smuzhiyun struct xfs_buf_log_item *bip = bp->b_fspriv; 305*4882a593Smuzhiyun 306*4882a593Smuzhiyun if (!xfs_foo_verify(bp)) { 307*4882a593Smuzhiyun XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr); 308*4882a593Smuzhiyun xfs_buf_ioerror(bp, EFSCORRUPTED); 309*4882a593Smuzhiyun return; 310*4882a593Smuzhiyun } 311*4882a593Smuzhiyun 312*4882a593Smuzhiyun if (!xfs_sb_version_hascrc(&mp->m_sb)) 313*4882a593Smuzhiyun return; 314*4882a593Smuzhiyun 315*4882a593Smuzhiyun 316*4882a593Smuzhiyun if (bip) { 317*4882a593Smuzhiyun struct xfs_ondisk_hdr *hdr = bp->b_addr; 318*4882a593Smuzhiyun hdr->lsn = cpu_to_be64(bip->bli_item.li_lsn); 319*4882a593Smuzhiyun } 320*4882a593Smuzhiyun xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length), XFS_FOO_CRC_OFF); 321*4882a593Smuzhiyun } 322*4882a593Smuzhiyun 323*4882a593SmuzhiyunThis will verify the internal structure of the metadata before we go any 324*4882a593Smuzhiyunfurther, detecting corruptions that have occurred as the metadata has been 325*4882a593Smuzhiyunmodified in memory. If the metadata verifies OK, and CRCs are enabled, we then 326*4882a593Smuzhiyunupdate the LSN field (when it was last modified) and calculate the CRC on the 327*4882a593Smuzhiyunmetadata. Once this is done, we can issue the IO. 328*4882a593Smuzhiyun 329*4882a593SmuzhiyunInodes and Dquots 330*4882a593Smuzhiyun================= 331*4882a593Smuzhiyun 332*4882a593SmuzhiyunInodes and dquots are special snowflakes. They have per-object CRC and 333*4882a593Smuzhiyunself-identifiers, but they are packed so that there are multiple objects per 334*4882a593Smuzhiyunbuffer. Hence we do not use per-buffer verifiers to do the work of per-object 335*4882a593Smuzhiyunverification and CRC calculations. The per-buffer verifiers simply perform basic 336*4882a593Smuzhiyunidentification of the buffer - that they contain inodes or dquots, and that 337*4882a593Smuzhiyunthere are magic numbers in all the expected spots. All further CRC and 338*4882a593Smuzhiyunverification checks are done when each inode is read from or written back to the 339*4882a593Smuzhiyunbuffer. 340*4882a593Smuzhiyun 341*4882a593SmuzhiyunThe structure of the verifiers and the identifiers checks is very similar to the 342*4882a593Smuzhiyunbuffer code described above. The only difference is where they are called. For 343*4882a593Smuzhiyunexample, inode read verification is done in xfs_inode_from_disk() when the inode 344*4882a593Smuzhiyunis first read out of the buffer and the struct xfs_inode is instantiated. The 345*4882a593Smuzhiyuninode is already extensively verified during writeback in xfs_iflush_int, so the 346*4882a593Smuzhiyunonly addition here is to add the LSN and CRC to the inode as it is copied back 347*4882a593Smuzhiyuninto the buffer. 348*4882a593Smuzhiyun 349*4882a593SmuzhiyunXXX: inode unlinked list modification doesn't recalculate the inode CRC! None of 350*4882a593Smuzhiyunthe unlinked list modifications check or update CRCs, neither during unlink nor 351*4882a593Smuzhiyunlog recovery. So, it's gone unnoticed until now. This won't matter immediately - 352*4882a593Smuzhiyunrepair will probably complain about it - but it needs to be fixed. 353