1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun.. _fsverity: 4*4882a593Smuzhiyun 5*4882a593Smuzhiyun======================================================= 6*4882a593Smuzhiyunfs-verity: read-only file-based authenticity protection 7*4882a593Smuzhiyun======================================================= 8*4882a593Smuzhiyun 9*4882a593SmuzhiyunIntroduction 10*4882a593Smuzhiyun============ 11*4882a593Smuzhiyun 12*4882a593Smuzhiyunfs-verity (``fs/verity/``) is a support layer that filesystems can 13*4882a593Smuzhiyunhook into to support transparent integrity and authenticity protection 14*4882a593Smuzhiyunof read-only files. Currently, it is supported by the ext4 and f2fs 15*4882a593Smuzhiyunfilesystems. Like fscrypt, not too much filesystem-specific code is 16*4882a593Smuzhiyunneeded to support fs-verity. 17*4882a593Smuzhiyun 18*4882a593Smuzhiyunfs-verity is similar to `dm-verity 19*4882a593Smuzhiyun<https://www.kernel.org/doc/Documentation/device-mapper/verity.txt>`_ 20*4882a593Smuzhiyunbut works on files rather than block devices. On regular files on 21*4882a593Smuzhiyunfilesystems supporting fs-verity, userspace can execute an ioctl that 22*4882a593Smuzhiyuncauses the filesystem to build a Merkle tree for the file and persist 23*4882a593Smuzhiyunit to a filesystem-specific location associated with the file. 24*4882a593Smuzhiyun 25*4882a593SmuzhiyunAfter this, the file is made readonly, and all reads from the file are 26*4882a593Smuzhiyunautomatically verified against the file's Merkle tree. Reads of any 27*4882a593Smuzhiyuncorrupted data, including mmap reads, will fail. 28*4882a593Smuzhiyun 29*4882a593SmuzhiyunUserspace can use another ioctl to retrieve the root hash (actually 30*4882a593Smuzhiyunthe "fs-verity file digest", which is a hash that includes the Merkle 31*4882a593Smuzhiyuntree root hash) that fs-verity is enforcing for the file. This ioctl 32*4882a593Smuzhiyunexecutes in constant time, regardless of the file size. 33*4882a593Smuzhiyun 34*4882a593Smuzhiyunfs-verity is essentially a way to hash a file in constant time, 35*4882a593Smuzhiyunsubject to the caveat that reads which would violate the hash will 36*4882a593Smuzhiyunfail at runtime. 37*4882a593Smuzhiyun 38*4882a593SmuzhiyunUse cases 39*4882a593Smuzhiyun========= 40*4882a593Smuzhiyun 41*4882a593SmuzhiyunBy itself, the base fs-verity feature only provides integrity 42*4882a593Smuzhiyunprotection, i.e. detection of accidental (non-malicious) corruption. 43*4882a593Smuzhiyun 44*4882a593SmuzhiyunHowever, because fs-verity makes retrieving the file hash extremely 45*4882a593Smuzhiyunefficient, it's primarily meant to be used as a tool to support 46*4882a593Smuzhiyunauthentication (detection of malicious modifications) or auditing 47*4882a593Smuzhiyun(logging file hashes before use). 48*4882a593Smuzhiyun 49*4882a593SmuzhiyunTrusted userspace code (e.g. operating system code running on a 50*4882a593Smuzhiyunread-only partition that is itself authenticated by dm-verity) can 51*4882a593Smuzhiyunauthenticate the contents of an fs-verity file by using the 52*4882a593Smuzhiyun`FS_IOC_MEASURE_VERITY`_ ioctl to retrieve its hash, then verifying a 53*4882a593Smuzhiyundigital signature of it. 54*4882a593Smuzhiyun 55*4882a593SmuzhiyunA standard file hash could be used instead of fs-verity. However, 56*4882a593Smuzhiyunthis is inefficient if the file is large and only a small portion may 57*4882a593Smuzhiyunbe accessed. This is often the case for Android application package 58*4882a593Smuzhiyun(APK) files, for example. These typically contain many translations, 59*4882a593Smuzhiyunclasses, and other resources that are infrequently or even never 60*4882a593Smuzhiyunaccessed on a particular device. It would be slow and wasteful to 61*4882a593Smuzhiyunread and hash the entire file before starting the application. 62*4882a593Smuzhiyun 63*4882a593SmuzhiyunUnlike an ahead-of-time hash, fs-verity also re-verifies data each 64*4882a593Smuzhiyuntime it's paged in. This ensures that malicious disk firmware can't 65*4882a593Smuzhiyunundetectably change the contents of the file at runtime. 66*4882a593Smuzhiyun 67*4882a593Smuzhiyunfs-verity does not replace or obsolete dm-verity. dm-verity should 68*4882a593Smuzhiyunstill be used on read-only filesystems. fs-verity is for files that 69*4882a593Smuzhiyunmust live on a read-write filesystem because they are independently 70*4882a593Smuzhiyunupdated and potentially user-installed, so dm-verity cannot be used. 71*4882a593Smuzhiyun 72*4882a593SmuzhiyunThe base fs-verity feature is a hashing mechanism only; actually 73*4882a593Smuzhiyunauthenticating the files is up to userspace. However, to meet some 74*4882a593Smuzhiyunusers' needs, fs-verity optionally supports a simple signature 75*4882a593Smuzhiyunverification mechanism where users can configure the kernel to require 76*4882a593Smuzhiyunthat all fs-verity files be signed by a key loaded into a keyring; see 77*4882a593Smuzhiyun`Built-in signature verification`_. Support for fs-verity file hashes 78*4882a593Smuzhiyunin IMA (Integrity Measurement Architecture) policies is also planned. 79*4882a593Smuzhiyun 80*4882a593SmuzhiyunUser API 81*4882a593Smuzhiyun======== 82*4882a593Smuzhiyun 83*4882a593SmuzhiyunFS_IOC_ENABLE_VERITY 84*4882a593Smuzhiyun-------------------- 85*4882a593Smuzhiyun 86*4882a593SmuzhiyunThe FS_IOC_ENABLE_VERITY ioctl enables fs-verity on a file. It takes 87*4882a593Smuzhiyunin a pointer to a struct fsverity_enable_arg, defined as 88*4882a593Smuzhiyunfollows:: 89*4882a593Smuzhiyun 90*4882a593Smuzhiyun struct fsverity_enable_arg { 91*4882a593Smuzhiyun __u32 version; 92*4882a593Smuzhiyun __u32 hash_algorithm; 93*4882a593Smuzhiyun __u32 block_size; 94*4882a593Smuzhiyun __u32 salt_size; 95*4882a593Smuzhiyun __u64 salt_ptr; 96*4882a593Smuzhiyun __u32 sig_size; 97*4882a593Smuzhiyun __u32 __reserved1; 98*4882a593Smuzhiyun __u64 sig_ptr; 99*4882a593Smuzhiyun __u64 __reserved2[11]; 100*4882a593Smuzhiyun }; 101*4882a593Smuzhiyun 102*4882a593SmuzhiyunThis structure contains the parameters of the Merkle tree to build for 103*4882a593Smuzhiyunthe file, and optionally contains a signature. It must be initialized 104*4882a593Smuzhiyunas follows: 105*4882a593Smuzhiyun 106*4882a593Smuzhiyun- ``version`` must be 1. 107*4882a593Smuzhiyun- ``hash_algorithm`` must be the identifier for the hash algorithm to 108*4882a593Smuzhiyun use for the Merkle tree, such as FS_VERITY_HASH_ALG_SHA256. See 109*4882a593Smuzhiyun ``include/uapi/linux/fsverity.h`` for the list of possible values. 110*4882a593Smuzhiyun- ``block_size`` must be the Merkle tree block size. Currently, this 111*4882a593Smuzhiyun must be equal to the system page size, which is usually 4096 bytes. 112*4882a593Smuzhiyun Other sizes may be supported in the future. This value is not 113*4882a593Smuzhiyun necessarily the same as the filesystem block size. 114*4882a593Smuzhiyun- ``salt_size`` is the size of the salt in bytes, or 0 if no salt is 115*4882a593Smuzhiyun provided. The salt is a value that is prepended to every hashed 116*4882a593Smuzhiyun block; it can be used to personalize the hashing for a particular 117*4882a593Smuzhiyun file or device. Currently the maximum salt size is 32 bytes. 118*4882a593Smuzhiyun- ``salt_ptr`` is the pointer to the salt, or NULL if no salt is 119*4882a593Smuzhiyun provided. 120*4882a593Smuzhiyun- ``sig_size`` is the size of the signature in bytes, or 0 if no 121*4882a593Smuzhiyun signature is provided. Currently the signature is (somewhat 122*4882a593Smuzhiyun arbitrarily) limited to 16128 bytes. See `Built-in signature 123*4882a593Smuzhiyun verification`_ for more information. 124*4882a593Smuzhiyun- ``sig_ptr`` is the pointer to the signature, or NULL if no 125*4882a593Smuzhiyun signature is provided. 126*4882a593Smuzhiyun- All reserved fields must be zeroed. 127*4882a593Smuzhiyun 128*4882a593SmuzhiyunFS_IOC_ENABLE_VERITY causes the filesystem to build a Merkle tree for 129*4882a593Smuzhiyunthe file and persist it to a filesystem-specific location associated 130*4882a593Smuzhiyunwith the file, then mark the file as a verity file. This ioctl may 131*4882a593Smuzhiyuntake a long time to execute on large files, and it is interruptible by 132*4882a593Smuzhiyunfatal signals. 133*4882a593Smuzhiyun 134*4882a593SmuzhiyunFS_IOC_ENABLE_VERITY checks for write access to the inode. However, 135*4882a593Smuzhiyunit must be executed on an O_RDONLY file descriptor and no processes 136*4882a593Smuzhiyuncan have the file open for writing. Attempts to open the file for 137*4882a593Smuzhiyunwriting while this ioctl is executing will fail with ETXTBSY. (This 138*4882a593Smuzhiyunis necessary to guarantee that no writable file descriptors will exist 139*4882a593Smuzhiyunafter verity is enabled, and to guarantee that the file's contents are 140*4882a593Smuzhiyunstable while the Merkle tree is being built over it.) 141*4882a593Smuzhiyun 142*4882a593SmuzhiyunOn success, FS_IOC_ENABLE_VERITY returns 0, and the file becomes a 143*4882a593Smuzhiyunverity file. On failure (including the case of interruption by a 144*4882a593Smuzhiyunfatal signal), no changes are made to the file. 145*4882a593Smuzhiyun 146*4882a593SmuzhiyunFS_IOC_ENABLE_VERITY can fail with the following errors: 147*4882a593Smuzhiyun 148*4882a593Smuzhiyun- ``EACCES``: the process does not have write access to the file 149*4882a593Smuzhiyun- ``EBADMSG``: the signature is malformed 150*4882a593Smuzhiyun- ``EBUSY``: this ioctl is already running on the file 151*4882a593Smuzhiyun- ``EEXIST``: the file already has verity enabled 152*4882a593Smuzhiyun- ``EFAULT``: the caller provided inaccessible memory 153*4882a593Smuzhiyun- ``EINTR``: the operation was interrupted by a fatal signal 154*4882a593Smuzhiyun- ``EINVAL``: unsupported version, hash algorithm, or block size; or 155*4882a593Smuzhiyun reserved bits are set; or the file descriptor refers to neither a 156*4882a593Smuzhiyun regular file nor a directory. 157*4882a593Smuzhiyun- ``EISDIR``: the file descriptor refers to a directory 158*4882a593Smuzhiyun- ``EKEYREJECTED``: the signature doesn't match the file 159*4882a593Smuzhiyun- ``EMSGSIZE``: the salt or signature is too long 160*4882a593Smuzhiyun- ``ENOKEY``: the fs-verity keyring doesn't contain the certificate 161*4882a593Smuzhiyun needed to verify the signature 162*4882a593Smuzhiyun- ``ENOPKG``: fs-verity recognizes the hash algorithm, but it's not 163*4882a593Smuzhiyun available in the kernel's crypto API as currently configured (e.g. 164*4882a593Smuzhiyun for SHA-512, missing CONFIG_CRYPTO_SHA512). 165*4882a593Smuzhiyun- ``ENOTTY``: this type of filesystem does not implement fs-verity 166*4882a593Smuzhiyun- ``EOPNOTSUPP``: the kernel was not configured with fs-verity 167*4882a593Smuzhiyun support; or the filesystem superblock has not had the 'verity' 168*4882a593Smuzhiyun feature enabled on it; or the filesystem does not support fs-verity 169*4882a593Smuzhiyun on this file. (See `Filesystem support`_.) 170*4882a593Smuzhiyun- ``EPERM``: the file is append-only; or, a signature is required and 171*4882a593Smuzhiyun one was not provided. 172*4882a593Smuzhiyun- ``EROFS``: the filesystem is read-only 173*4882a593Smuzhiyun- ``ETXTBSY``: someone has the file open for writing. This can be the 174*4882a593Smuzhiyun caller's file descriptor, another open file descriptor, or the file 175*4882a593Smuzhiyun reference held by a writable memory map. 176*4882a593Smuzhiyun 177*4882a593SmuzhiyunFS_IOC_MEASURE_VERITY 178*4882a593Smuzhiyun--------------------- 179*4882a593Smuzhiyun 180*4882a593SmuzhiyunThe FS_IOC_MEASURE_VERITY ioctl retrieves the digest of a verity file. 181*4882a593SmuzhiyunThe fs-verity file digest is a cryptographic digest that identifies 182*4882a593Smuzhiyunthe file contents that are being enforced on reads; it is computed via 183*4882a593Smuzhiyuna Merkle tree and is different from a traditional full-file digest. 184*4882a593Smuzhiyun 185*4882a593SmuzhiyunThis ioctl takes in a pointer to a variable-length structure:: 186*4882a593Smuzhiyun 187*4882a593Smuzhiyun struct fsverity_digest { 188*4882a593Smuzhiyun __u16 digest_algorithm; 189*4882a593Smuzhiyun __u16 digest_size; /* input/output */ 190*4882a593Smuzhiyun __u8 digest[]; 191*4882a593Smuzhiyun }; 192*4882a593Smuzhiyun 193*4882a593Smuzhiyun``digest_size`` is an input/output field. On input, it must be 194*4882a593Smuzhiyuninitialized to the number of bytes allocated for the variable-length 195*4882a593Smuzhiyun``digest`` field. 196*4882a593Smuzhiyun 197*4882a593SmuzhiyunOn success, 0 is returned and the kernel fills in the structure as 198*4882a593Smuzhiyunfollows: 199*4882a593Smuzhiyun 200*4882a593Smuzhiyun- ``digest_algorithm`` will be the hash algorithm used for the file 201*4882a593Smuzhiyun digest. It will match ``fsverity_enable_arg::hash_algorithm``. 202*4882a593Smuzhiyun- ``digest_size`` will be the size of the digest in bytes, e.g. 32 203*4882a593Smuzhiyun for SHA-256. (This can be redundant with ``digest_algorithm``.) 204*4882a593Smuzhiyun- ``digest`` will be the actual bytes of the digest. 205*4882a593Smuzhiyun 206*4882a593SmuzhiyunFS_IOC_MEASURE_VERITY is guaranteed to execute in constant time, 207*4882a593Smuzhiyunregardless of the size of the file. 208*4882a593Smuzhiyun 209*4882a593SmuzhiyunFS_IOC_MEASURE_VERITY can fail with the following errors: 210*4882a593Smuzhiyun 211*4882a593Smuzhiyun- ``EFAULT``: the caller provided inaccessible memory 212*4882a593Smuzhiyun- ``ENODATA``: the file is not a verity file 213*4882a593Smuzhiyun- ``ENOTTY``: this type of filesystem does not implement fs-verity 214*4882a593Smuzhiyun- ``EOPNOTSUPP``: the kernel was not configured with fs-verity 215*4882a593Smuzhiyun support, or the filesystem superblock has not had the 'verity' 216*4882a593Smuzhiyun feature enabled on it. (See `Filesystem support`_.) 217*4882a593Smuzhiyun- ``EOVERFLOW``: the digest is longer than the specified 218*4882a593Smuzhiyun ``digest_size`` bytes. Try providing a larger buffer. 219*4882a593Smuzhiyun 220*4882a593SmuzhiyunFS_IOC_READ_VERITY_METADATA 221*4882a593Smuzhiyun--------------------------- 222*4882a593Smuzhiyun 223*4882a593SmuzhiyunThe FS_IOC_READ_VERITY_METADATA ioctl reads verity metadata from a 224*4882a593Smuzhiyunverity file. This ioctl is available since Linux v5.12. 225*4882a593Smuzhiyun 226*4882a593SmuzhiyunThis ioctl allows writing a server program that takes a verity file 227*4882a593Smuzhiyunand serves it to a client program, such that the client can do its own 228*4882a593Smuzhiyunfs-verity compatible verification of the file. This only makes sense 229*4882a593Smuzhiyunif the client doesn't trust the server and if the server needs to 230*4882a593Smuzhiyunprovide the storage for the client. 231*4882a593Smuzhiyun 232*4882a593SmuzhiyunThis is a fairly specialized use case, and most fs-verity users won't 233*4882a593Smuzhiyunneed this ioctl. 234*4882a593Smuzhiyun 235*4882a593SmuzhiyunThis ioctl takes in a pointer to the following structure:: 236*4882a593Smuzhiyun 237*4882a593Smuzhiyun #define FS_VERITY_METADATA_TYPE_MERKLE_TREE 1 238*4882a593Smuzhiyun #define FS_VERITY_METADATA_TYPE_DESCRIPTOR 2 239*4882a593Smuzhiyun #define FS_VERITY_METADATA_TYPE_SIGNATURE 3 240*4882a593Smuzhiyun 241*4882a593Smuzhiyun struct fsverity_read_metadata_arg { 242*4882a593Smuzhiyun __u64 metadata_type; 243*4882a593Smuzhiyun __u64 offset; 244*4882a593Smuzhiyun __u64 length; 245*4882a593Smuzhiyun __u64 buf_ptr; 246*4882a593Smuzhiyun __u64 __reserved; 247*4882a593Smuzhiyun }; 248*4882a593Smuzhiyun 249*4882a593Smuzhiyun``metadata_type`` specifies the type of metadata to read: 250*4882a593Smuzhiyun 251*4882a593Smuzhiyun- ``FS_VERITY_METADATA_TYPE_MERKLE_TREE`` reads the blocks of the 252*4882a593Smuzhiyun Merkle tree. The blocks are returned in order from the root level 253*4882a593Smuzhiyun to the leaf level. Within each level, the blocks are returned in 254*4882a593Smuzhiyun the same order that their hashes are themselves hashed. 255*4882a593Smuzhiyun See `Merkle tree`_ for more information. 256*4882a593Smuzhiyun 257*4882a593Smuzhiyun- ``FS_VERITY_METADATA_TYPE_DESCRIPTOR`` reads the fs-verity 258*4882a593Smuzhiyun descriptor. See `fs-verity descriptor`_. 259*4882a593Smuzhiyun 260*4882a593Smuzhiyun- ``FS_VERITY_METADATA_TYPE_SIGNATURE`` reads the signature which was 261*4882a593Smuzhiyun passed to FS_IOC_ENABLE_VERITY, if any. See `Built-in signature 262*4882a593Smuzhiyun verification`_. 263*4882a593Smuzhiyun 264*4882a593SmuzhiyunThe semantics are similar to those of ``pread()``. ``offset`` 265*4882a593Smuzhiyunspecifies the offset in bytes into the metadata item to read from, and 266*4882a593Smuzhiyun``length`` specifies the maximum number of bytes to read from the 267*4882a593Smuzhiyunmetadata item. ``buf_ptr`` is the pointer to the buffer to read into, 268*4882a593Smuzhiyuncast to a 64-bit integer. ``__reserved`` must be 0. On success, the 269*4882a593Smuzhiyunnumber of bytes read is returned. 0 is returned at the end of the 270*4882a593Smuzhiyunmetadata item. The returned length may be less than ``length``, for 271*4882a593Smuzhiyunexample if the ioctl is interrupted. 272*4882a593Smuzhiyun 273*4882a593SmuzhiyunThe metadata returned by FS_IOC_READ_VERITY_METADATA isn't guaranteed 274*4882a593Smuzhiyunto be authenticated against the file digest that would be returned by 275*4882a593Smuzhiyun`FS_IOC_MEASURE_VERITY`_, as the metadata is expected to be used to 276*4882a593Smuzhiyunimplement fs-verity compatible verification anyway (though absent a 277*4882a593Smuzhiyunmalicious disk, the metadata will indeed match). E.g. to implement 278*4882a593Smuzhiyunthis ioctl, the filesystem is allowed to just read the Merkle tree 279*4882a593Smuzhiyunblocks from disk without actually verifying the path to the root node. 280*4882a593Smuzhiyun 281*4882a593SmuzhiyunFS_IOC_READ_VERITY_METADATA can fail with the following errors: 282*4882a593Smuzhiyun 283*4882a593Smuzhiyun- ``EFAULT``: the caller provided inaccessible memory 284*4882a593Smuzhiyun- ``EINTR``: the ioctl was interrupted before any data was read 285*4882a593Smuzhiyun- ``EINVAL``: reserved fields were set, or ``offset + length`` 286*4882a593Smuzhiyun overflowed 287*4882a593Smuzhiyun- ``ENODATA``: the file is not a verity file, or 288*4882a593Smuzhiyun FS_VERITY_METADATA_TYPE_SIGNATURE was requested but the file doesn't 289*4882a593Smuzhiyun have a built-in signature 290*4882a593Smuzhiyun- ``ENOTTY``: this type of filesystem does not implement fs-verity, or 291*4882a593Smuzhiyun this ioctl is not yet implemented on it 292*4882a593Smuzhiyun- ``EOPNOTSUPP``: the kernel was not configured with fs-verity 293*4882a593Smuzhiyun support, or the filesystem superblock has not had the 'verity' 294*4882a593Smuzhiyun feature enabled on it. (See `Filesystem support`_.) 295*4882a593Smuzhiyun 296*4882a593SmuzhiyunFS_IOC_GETFLAGS 297*4882a593Smuzhiyun--------------- 298*4882a593Smuzhiyun 299*4882a593SmuzhiyunThe existing ioctl FS_IOC_GETFLAGS (which isn't specific to fs-verity) 300*4882a593Smuzhiyuncan also be used to check whether a file has fs-verity enabled or not. 301*4882a593SmuzhiyunTo do so, check for FS_VERITY_FL (0x00100000) in the returned flags. 302*4882a593Smuzhiyun 303*4882a593SmuzhiyunThe verity flag is not settable via FS_IOC_SETFLAGS. You must use 304*4882a593SmuzhiyunFS_IOC_ENABLE_VERITY instead, since parameters must be provided. 305*4882a593Smuzhiyun 306*4882a593Smuzhiyunstatx 307*4882a593Smuzhiyun----- 308*4882a593Smuzhiyun 309*4882a593SmuzhiyunSince Linux v5.5, the statx() system call sets STATX_ATTR_VERITY if 310*4882a593Smuzhiyunthe file has fs-verity enabled. This can perform better than 311*4882a593SmuzhiyunFS_IOC_GETFLAGS and FS_IOC_MEASURE_VERITY because it doesn't require 312*4882a593Smuzhiyunopening the file, and opening verity files can be expensive. 313*4882a593Smuzhiyun 314*4882a593SmuzhiyunAccessing verity files 315*4882a593Smuzhiyun====================== 316*4882a593Smuzhiyun 317*4882a593SmuzhiyunApplications can transparently access a verity file just like a 318*4882a593Smuzhiyunnon-verity one, with the following exceptions: 319*4882a593Smuzhiyun 320*4882a593Smuzhiyun- Verity files are readonly. They cannot be opened for writing or 321*4882a593Smuzhiyun truncate()d, even if the file mode bits allow it. Attempts to do 322*4882a593Smuzhiyun one of these things will fail with EPERM. However, changes to 323*4882a593Smuzhiyun metadata such as owner, mode, timestamps, and xattrs are still 324*4882a593Smuzhiyun allowed, since these are not measured by fs-verity. Verity files 325*4882a593Smuzhiyun can also still be renamed, deleted, and linked to. 326*4882a593Smuzhiyun 327*4882a593Smuzhiyun- Direct I/O is not supported on verity files. Attempts to use direct 328*4882a593Smuzhiyun I/O on such files will fall back to buffered I/O. 329*4882a593Smuzhiyun 330*4882a593Smuzhiyun- DAX (Direct Access) is not supported on verity files, because this 331*4882a593Smuzhiyun would circumvent the data verification. 332*4882a593Smuzhiyun 333*4882a593Smuzhiyun- Reads of data that doesn't match the verity Merkle tree will fail 334*4882a593Smuzhiyun with EIO (for read()) or SIGBUS (for mmap() reads). 335*4882a593Smuzhiyun 336*4882a593Smuzhiyun- If the sysctl "fs.verity.require_signatures" is set to 1 and the 337*4882a593Smuzhiyun file is not signed by a key in the fs-verity keyring, then opening 338*4882a593Smuzhiyun the file will fail. See `Built-in signature verification`_. 339*4882a593Smuzhiyun 340*4882a593SmuzhiyunDirect access to the Merkle tree is not supported. Therefore, if a 341*4882a593Smuzhiyunverity file is copied, or is backed up and restored, then it will lose 342*4882a593Smuzhiyunits "verity"-ness. fs-verity is primarily meant for files like 343*4882a593Smuzhiyunexecutables that are managed by a package manager. 344*4882a593Smuzhiyun 345*4882a593SmuzhiyunFile digest computation 346*4882a593Smuzhiyun======================= 347*4882a593Smuzhiyun 348*4882a593SmuzhiyunThis section describes how fs-verity hashes the file contents using a 349*4882a593SmuzhiyunMerkle tree to produce the digest which cryptographically identifies 350*4882a593Smuzhiyunthe file contents. This algorithm is the same for all filesystems 351*4882a593Smuzhiyunthat support fs-verity. 352*4882a593Smuzhiyun 353*4882a593SmuzhiyunUserspace only needs to be aware of this algorithm if it needs to 354*4882a593Smuzhiyuncompute fs-verity file digests itself, e.g. in order to sign files. 355*4882a593Smuzhiyun 356*4882a593Smuzhiyun.. _fsverity_merkle_tree: 357*4882a593Smuzhiyun 358*4882a593SmuzhiyunMerkle tree 359*4882a593Smuzhiyun----------- 360*4882a593Smuzhiyun 361*4882a593SmuzhiyunThe file contents is divided into blocks, where the block size is 362*4882a593Smuzhiyunconfigurable but is usually 4096 bytes. The end of the last block is 363*4882a593Smuzhiyunzero-padded if needed. Each block is then hashed, producing the first 364*4882a593Smuzhiyunlevel of hashes. Then, the hashes in this first level are grouped 365*4882a593Smuzhiyuninto 'blocksize'-byte blocks (zero-padding the ends as needed) and 366*4882a593Smuzhiyunthese blocks are hashed, producing the second level of hashes. This 367*4882a593Smuzhiyunproceeds up the tree until only a single block remains. The hash of 368*4882a593Smuzhiyunthis block is the "Merkle tree root hash". 369*4882a593Smuzhiyun 370*4882a593SmuzhiyunIf the file fits in one block and is nonempty, then the "Merkle tree 371*4882a593Smuzhiyunroot hash" is simply the hash of the single data block. If the file 372*4882a593Smuzhiyunis empty, then the "Merkle tree root hash" is all zeroes. 373*4882a593Smuzhiyun 374*4882a593SmuzhiyunThe "blocks" here are not necessarily the same as "filesystem blocks". 375*4882a593Smuzhiyun 376*4882a593SmuzhiyunIf a salt was specified, then it's zero-padded to the closest multiple 377*4882a593Smuzhiyunof the input size of the hash algorithm's compression function, e.g. 378*4882a593Smuzhiyun64 bytes for SHA-256 or 128 bytes for SHA-512. The padded salt is 379*4882a593Smuzhiyunprepended to every data or Merkle tree block that is hashed. 380*4882a593Smuzhiyun 381*4882a593SmuzhiyunThe purpose of the block padding is to cause every hash to be taken 382*4882a593Smuzhiyunover the same amount of data, which simplifies the implementation and 383*4882a593Smuzhiyunkeeps open more possibilities for hardware acceleration. The purpose 384*4882a593Smuzhiyunof the salt padding is to make the salting "free" when the salted hash 385*4882a593Smuzhiyunstate is precomputed, then imported for each hash. 386*4882a593Smuzhiyun 387*4882a593SmuzhiyunExample: in the recommended configuration of SHA-256 and 4K blocks, 388*4882a593Smuzhiyun128 hash values fit in each block. Thus, each level of the Merkle 389*4882a593Smuzhiyuntree is approximately 128 times smaller than the previous, and for 390*4882a593Smuzhiyunlarge files the Merkle tree's size converges to approximately 1/127 of 391*4882a593Smuzhiyunthe original file size. However, for small files, the padding is 392*4882a593Smuzhiyunsignificant, making the space overhead proportionally more. 393*4882a593Smuzhiyun 394*4882a593Smuzhiyun.. _fsverity_descriptor: 395*4882a593Smuzhiyun 396*4882a593Smuzhiyunfs-verity descriptor 397*4882a593Smuzhiyun-------------------- 398*4882a593Smuzhiyun 399*4882a593SmuzhiyunBy itself, the Merkle tree root hash is ambiguous. For example, it 400*4882a593Smuzhiyuncan't a distinguish a large file from a small second file whose data 401*4882a593Smuzhiyunis exactly the top-level hash block of the first file. Ambiguities 402*4882a593Smuzhiyunalso arise from the convention of padding to the next block boundary. 403*4882a593Smuzhiyun 404*4882a593SmuzhiyunTo solve this problem, the fs-verity file digest is actually computed 405*4882a593Smuzhiyunas a hash of the following structure, which contains the Merkle tree 406*4882a593Smuzhiyunroot hash as well as other fields such as the file size:: 407*4882a593Smuzhiyun 408*4882a593Smuzhiyun struct fsverity_descriptor { 409*4882a593Smuzhiyun __u8 version; /* must be 1 */ 410*4882a593Smuzhiyun __u8 hash_algorithm; /* Merkle tree hash algorithm */ 411*4882a593Smuzhiyun __u8 log_blocksize; /* log2 of size of data and tree blocks */ 412*4882a593Smuzhiyun __u8 salt_size; /* size of salt in bytes; 0 if none */ 413*4882a593Smuzhiyun __le32 __reserved_0x04; /* must be 0 */ 414*4882a593Smuzhiyun __le64 data_size; /* size of file the Merkle tree is built over */ 415*4882a593Smuzhiyun __u8 root_hash[64]; /* Merkle tree root hash */ 416*4882a593Smuzhiyun __u8 salt[32]; /* salt prepended to each hashed block */ 417*4882a593Smuzhiyun __u8 __reserved[144]; /* must be 0's */ 418*4882a593Smuzhiyun }; 419*4882a593Smuzhiyun 420*4882a593SmuzhiyunBuilt-in signature verification 421*4882a593Smuzhiyun=============================== 422*4882a593Smuzhiyun 423*4882a593SmuzhiyunWith CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y, fs-verity supports putting 424*4882a593Smuzhiyuna portion of an authentication policy (see `Use cases`_) in the 425*4882a593Smuzhiyunkernel. Specifically, it adds support for: 426*4882a593Smuzhiyun 427*4882a593Smuzhiyun1. At fs-verity module initialization time, a keyring ".fs-verity" is 428*4882a593Smuzhiyun created. The root user can add trusted X.509 certificates to this 429*4882a593Smuzhiyun keyring using the add_key() system call, then (when done) 430*4882a593Smuzhiyun optionally use keyctl_restrict_keyring() to prevent additional 431*4882a593Smuzhiyun certificates from being added. 432*4882a593Smuzhiyun 433*4882a593Smuzhiyun2. `FS_IOC_ENABLE_VERITY`_ accepts a pointer to a PKCS#7 formatted 434*4882a593Smuzhiyun detached signature in DER format of the file's fs-verity digest. 435*4882a593Smuzhiyun On success, this signature is persisted alongside the Merkle tree. 436*4882a593Smuzhiyun Then, any time the file is opened, the kernel will verify the 437*4882a593Smuzhiyun file's actual digest against this signature, using the certificates 438*4882a593Smuzhiyun in the ".fs-verity" keyring. 439*4882a593Smuzhiyun 440*4882a593Smuzhiyun3. A new sysctl "fs.verity.require_signatures" is made available. 441*4882a593Smuzhiyun When set to 1, the kernel requires that all verity files have a 442*4882a593Smuzhiyun correctly signed digest as described in (2). 443*4882a593Smuzhiyun 444*4882a593Smuzhiyunfs-verity file digests must be signed in the following format, which 445*4882a593Smuzhiyunis similar to the structure used by `FS_IOC_MEASURE_VERITY`_:: 446*4882a593Smuzhiyun 447*4882a593Smuzhiyun struct fsverity_formatted_digest { 448*4882a593Smuzhiyun char magic[8]; /* must be "FSVerity" */ 449*4882a593Smuzhiyun __le16 digest_algorithm; 450*4882a593Smuzhiyun __le16 digest_size; 451*4882a593Smuzhiyun __u8 digest[]; 452*4882a593Smuzhiyun }; 453*4882a593Smuzhiyun 454*4882a593Smuzhiyunfs-verity's built-in signature verification support is meant as a 455*4882a593Smuzhiyunrelatively simple mechanism that can be used to provide some level of 456*4882a593Smuzhiyunauthenticity protection for verity files, as an alternative to doing 457*4882a593Smuzhiyunthe signature verification in userspace or using IMA-appraisal. 458*4882a593SmuzhiyunHowever, with this mechanism, userspace programs still need to check 459*4882a593Smuzhiyunthat the verity bit is set, and there is no protection against verity 460*4882a593Smuzhiyunfiles being swapped around. 461*4882a593Smuzhiyun 462*4882a593SmuzhiyunFilesystem support 463*4882a593Smuzhiyun================== 464*4882a593Smuzhiyun 465*4882a593Smuzhiyunfs-verity is currently supported by the ext4 and f2fs filesystems. 466*4882a593SmuzhiyunThe CONFIG_FS_VERITY kconfig option must be enabled to use fs-verity 467*4882a593Smuzhiyunon either filesystem. 468*4882a593Smuzhiyun 469*4882a593Smuzhiyun``include/linux/fsverity.h`` declares the interface between the 470*4882a593Smuzhiyun``fs/verity/`` support layer and filesystems. Briefly, filesystems 471*4882a593Smuzhiyunmust provide an ``fsverity_operations`` structure that provides 472*4882a593Smuzhiyunmethods to read and write the verity metadata to a filesystem-specific 473*4882a593Smuzhiyunlocation, including the Merkle tree blocks and 474*4882a593Smuzhiyun``fsverity_descriptor``. Filesystems must also call functions in 475*4882a593Smuzhiyun``fs/verity/`` at certain times, such as when a file is opened or when 476*4882a593Smuzhiyunpages have been read into the pagecache. (See `Verifying data`_.) 477*4882a593Smuzhiyun 478*4882a593Smuzhiyunext4 479*4882a593Smuzhiyun---- 480*4882a593Smuzhiyun 481*4882a593Smuzhiyunext4 supports fs-verity since Linux v5.4 and e2fsprogs v1.45.2. 482*4882a593Smuzhiyun 483*4882a593SmuzhiyunTo create verity files on an ext4 filesystem, the filesystem must have 484*4882a593Smuzhiyunbeen formatted with ``-O verity`` or had ``tune2fs -O verity`` run on 485*4882a593Smuzhiyunit. "verity" is an RO_COMPAT filesystem feature, so once set, old 486*4882a593Smuzhiyunkernels will only be able to mount the filesystem readonly, and old 487*4882a593Smuzhiyunversions of e2fsck will be unable to check the filesystem. Moreover, 488*4882a593Smuzhiyuncurrently ext4 only supports mounting a filesystem with the "verity" 489*4882a593Smuzhiyunfeature when its block size is equal to PAGE_SIZE (often 4096 bytes). 490*4882a593Smuzhiyun 491*4882a593Smuzhiyunext4 sets the EXT4_VERITY_FL on-disk inode flag on verity files. It 492*4882a593Smuzhiyuncan only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be cleared. 493*4882a593Smuzhiyun 494*4882a593Smuzhiyunext4 also supports encryption, which can be used simultaneously with 495*4882a593Smuzhiyunfs-verity. In this case, the plaintext data is verified rather than 496*4882a593Smuzhiyunthe ciphertext. This is necessary in order to make the fs-verity file 497*4882a593Smuzhiyundigest meaningful, since every file is encrypted differently. 498*4882a593Smuzhiyun 499*4882a593Smuzhiyunext4 stores the verity metadata (Merkle tree and fsverity_descriptor) 500*4882a593Smuzhiyunpast the end of the file, starting at the first 64K boundary beyond 501*4882a593Smuzhiyuni_size. This approach works because (a) verity files are readonly, 502*4882a593Smuzhiyunand (b) pages fully beyond i_size aren't visible to userspace but can 503*4882a593Smuzhiyunbe read/written internally by ext4 with only some relatively small 504*4882a593Smuzhiyunchanges to ext4. This approach avoids having to depend on the 505*4882a593SmuzhiyunEA_INODE feature and on rearchitecturing ext4's xattr support to 506*4882a593Smuzhiyunsupport paging multi-gigabyte xattrs into memory, and to support 507*4882a593Smuzhiyunencrypting xattrs. Note that the verity metadata *must* be encrypted 508*4882a593Smuzhiyunwhen the file is, since it contains hashes of the plaintext data. 509*4882a593Smuzhiyun 510*4882a593SmuzhiyunCurrently, ext4 verity only supports the case where the Merkle tree 511*4882a593Smuzhiyunblock size, filesystem block size, and page size are all the same. It 512*4882a593Smuzhiyunalso only supports extent-based files. 513*4882a593Smuzhiyun 514*4882a593Smuzhiyunf2fs 515*4882a593Smuzhiyun---- 516*4882a593Smuzhiyun 517*4882a593Smuzhiyunf2fs supports fs-verity since Linux v5.4 and f2fs-tools v1.11.0. 518*4882a593Smuzhiyun 519*4882a593SmuzhiyunTo create verity files on an f2fs filesystem, the filesystem must have 520*4882a593Smuzhiyunbeen formatted with ``-O verity``. 521*4882a593Smuzhiyun 522*4882a593Smuzhiyunf2fs sets the FADVISE_VERITY_BIT on-disk inode flag on verity files. 523*4882a593SmuzhiyunIt can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be 524*4882a593Smuzhiyuncleared. 525*4882a593Smuzhiyun 526*4882a593SmuzhiyunLike ext4, f2fs stores the verity metadata (Merkle tree and 527*4882a593Smuzhiyunfsverity_descriptor) past the end of the file, starting at the first 528*4882a593Smuzhiyun64K boundary beyond i_size. See explanation for ext4 above. 529*4882a593SmuzhiyunMoreover, f2fs supports at most 4096 bytes of xattr entries per inode 530*4882a593Smuzhiyunwhich wouldn't be enough for even a single Merkle tree block. 531*4882a593Smuzhiyun 532*4882a593SmuzhiyunCurrently, f2fs verity only supports a Merkle tree block size of 4096. 533*4882a593SmuzhiyunAlso, f2fs doesn't support enabling verity on files that currently 534*4882a593Smuzhiyunhave atomic or volatile writes pending. 535*4882a593Smuzhiyun 536*4882a593SmuzhiyunImplementation details 537*4882a593Smuzhiyun====================== 538*4882a593Smuzhiyun 539*4882a593SmuzhiyunVerifying data 540*4882a593Smuzhiyun-------------- 541*4882a593Smuzhiyun 542*4882a593Smuzhiyunfs-verity ensures that all reads of a verity file's data are verified, 543*4882a593Smuzhiyunregardless of which syscall is used to do the read (e.g. mmap(), 544*4882a593Smuzhiyunread(), pread()) and regardless of whether it's the first read or a 545*4882a593Smuzhiyunlater read (unless the later read can return cached data that was 546*4882a593Smuzhiyunalready verified). Below, we describe how filesystems implement this. 547*4882a593Smuzhiyun 548*4882a593SmuzhiyunPagecache 549*4882a593Smuzhiyun~~~~~~~~~ 550*4882a593Smuzhiyun 551*4882a593SmuzhiyunFor filesystems using Linux's pagecache, the ``->readpage()`` and 552*4882a593Smuzhiyun``->readpages()`` methods must be modified to verify pages before they 553*4882a593Smuzhiyunare marked Uptodate. Merely hooking ``->read_iter()`` would be 554*4882a593Smuzhiyuninsufficient, since ``->read_iter()`` is not used for memory maps. 555*4882a593Smuzhiyun 556*4882a593SmuzhiyunTherefore, fs/verity/ provides a function fsverity_verify_page() which 557*4882a593Smuzhiyunverifies a page that has been read into the pagecache of a verity 558*4882a593Smuzhiyuninode, but is still locked and not Uptodate, so it's not yet readable 559*4882a593Smuzhiyunby userspace. As needed to do the verification, 560*4882a593Smuzhiyunfsverity_verify_page() will call back into the filesystem to read 561*4882a593SmuzhiyunMerkle tree pages via fsverity_operations::read_merkle_tree_page(). 562*4882a593Smuzhiyun 563*4882a593Smuzhiyunfsverity_verify_page() returns false if verification failed; in this 564*4882a593Smuzhiyuncase, the filesystem must not set the page Uptodate. Following this, 565*4882a593Smuzhiyunas per the usual Linux pagecache behavior, attempts by userspace to 566*4882a593Smuzhiyunread() from the part of the file containing the page will fail with 567*4882a593SmuzhiyunEIO, and accesses to the page within a memory map will raise SIGBUS. 568*4882a593Smuzhiyun 569*4882a593Smuzhiyunfsverity_verify_page() currently only supports the case where the 570*4882a593SmuzhiyunMerkle tree block size is equal to PAGE_SIZE (often 4096 bytes). 571*4882a593Smuzhiyun 572*4882a593SmuzhiyunIn principle, fsverity_verify_page() verifies the entire path in the 573*4882a593SmuzhiyunMerkle tree from the data page to the root hash. However, for 574*4882a593Smuzhiyunefficiency the filesystem may cache the hash pages. Therefore, 575*4882a593Smuzhiyunfsverity_verify_page() only ascends the tree reading hash pages until 576*4882a593Smuzhiyunan already-verified hash page is seen, as indicated by the PageChecked 577*4882a593Smuzhiyunbit being set. It then verifies the path to that page. 578*4882a593Smuzhiyun 579*4882a593SmuzhiyunThis optimization, which is also used by dm-verity, results in 580*4882a593Smuzhiyunexcellent sequential read performance. This is because usually (e.g. 581*4882a593Smuzhiyun127 in 128 times for 4K blocks and SHA-256) the hash page from the 582*4882a593Smuzhiyunbottom level of the tree will already be cached and checked from 583*4882a593Smuzhiyunreading a previous data page. However, random reads perform worse. 584*4882a593Smuzhiyun 585*4882a593SmuzhiyunBlock device based filesystems 586*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 587*4882a593Smuzhiyun 588*4882a593SmuzhiyunBlock device based filesystems (e.g. ext4 and f2fs) in Linux also use 589*4882a593Smuzhiyunthe pagecache, so the above subsection applies too. However, they 590*4882a593Smuzhiyunalso usually read many pages from a file at once, grouped into a 591*4882a593Smuzhiyunstructure called a "bio". To make it easier for these types of 592*4882a593Smuzhiyunfilesystems to support fs-verity, fs/verity/ also provides a function 593*4882a593Smuzhiyunfsverity_verify_bio() which verifies all pages in a bio. 594*4882a593Smuzhiyun 595*4882a593Smuzhiyunext4 and f2fs also support encryption. If a verity file is also 596*4882a593Smuzhiyunencrypted, the pages must be decrypted before being verified. To 597*4882a593Smuzhiyunsupport this, these filesystems allocate a "post-read context" for 598*4882a593Smuzhiyuneach bio and store it in ``->bi_private``:: 599*4882a593Smuzhiyun 600*4882a593Smuzhiyun struct bio_post_read_ctx { 601*4882a593Smuzhiyun struct bio *bio; 602*4882a593Smuzhiyun struct work_struct work; 603*4882a593Smuzhiyun unsigned int cur_step; 604*4882a593Smuzhiyun unsigned int enabled_steps; 605*4882a593Smuzhiyun }; 606*4882a593Smuzhiyun 607*4882a593Smuzhiyun``enabled_steps`` is a bitmask that specifies whether decryption, 608*4882a593Smuzhiyunverity, or both is enabled. After the bio completes, for each needed 609*4882a593Smuzhiyunpostprocessing step the filesystem enqueues the bio_post_read_ctx on a 610*4882a593Smuzhiyunworkqueue, and then the workqueue work does the decryption or 611*4882a593Smuzhiyunverification. Finally, pages where no decryption or verity error 612*4882a593Smuzhiyunoccurred are marked Uptodate, and the pages are unlocked. 613*4882a593Smuzhiyun 614*4882a593SmuzhiyunFiles on ext4 and f2fs may contain holes. Normally, ``->readpages()`` 615*4882a593Smuzhiyunsimply zeroes holes and sets the corresponding pages Uptodate; no bios 616*4882a593Smuzhiyunare issued. To prevent this case from bypassing fs-verity, these 617*4882a593Smuzhiyunfilesystems use fsverity_verify_page() to verify hole pages. 618*4882a593Smuzhiyun 619*4882a593Smuzhiyunext4 and f2fs disable direct I/O on verity files, since otherwise 620*4882a593Smuzhiyundirect I/O would bypass fs-verity. (They also do the same for 621*4882a593Smuzhiyunencrypted files.) 622*4882a593Smuzhiyun 623*4882a593SmuzhiyunUserspace utility 624*4882a593Smuzhiyun================= 625*4882a593Smuzhiyun 626*4882a593SmuzhiyunThis document focuses on the kernel, but a userspace utility for 627*4882a593Smuzhiyunfs-verity can be found at: 628*4882a593Smuzhiyun 629*4882a593Smuzhiyun https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git 630*4882a593Smuzhiyun 631*4882a593SmuzhiyunSee the README.md file in the fsverity-utils source tree for details, 632*4882a593Smuzhiyunincluding examples of setting up fs-verity protected files. 633*4882a593Smuzhiyun 634*4882a593SmuzhiyunTests 635*4882a593Smuzhiyun===== 636*4882a593Smuzhiyun 637*4882a593SmuzhiyunTo test fs-verity, use xfstests. For example, using `kvm-xfstests 638*4882a593Smuzhiyun<https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md>`_:: 639*4882a593Smuzhiyun 640*4882a593Smuzhiyun kvm-xfstests -c ext4,f2fs -g verity 641*4882a593Smuzhiyun 642*4882a593SmuzhiyunFAQ 643*4882a593Smuzhiyun=== 644*4882a593Smuzhiyun 645*4882a593SmuzhiyunThis section answers frequently asked questions about fs-verity that 646*4882a593Smuzhiyunweren't already directly answered in other parts of this document. 647*4882a593Smuzhiyun 648*4882a593Smuzhiyun:Q: Why isn't fs-verity part of IMA? 649*4882a593Smuzhiyun:A: fs-verity and IMA (Integrity Measurement Architecture) have 650*4882a593Smuzhiyun different focuses. fs-verity is a filesystem-level mechanism for 651*4882a593Smuzhiyun hashing individual files using a Merkle tree. In contrast, IMA 652*4882a593Smuzhiyun specifies a system-wide policy that specifies which files are 653*4882a593Smuzhiyun hashed and what to do with those hashes, such as log them, 654*4882a593Smuzhiyun authenticate them, or add them to a measurement list. 655*4882a593Smuzhiyun 656*4882a593Smuzhiyun IMA is planned to support the fs-verity hashing mechanism as an 657*4882a593Smuzhiyun alternative to doing full file hashes, for people who want the 658*4882a593Smuzhiyun performance and security benefits of the Merkle tree based hash. 659*4882a593Smuzhiyun But it doesn't make sense to force all uses of fs-verity to be 660*4882a593Smuzhiyun through IMA. As a standalone filesystem feature, fs-verity 661*4882a593Smuzhiyun already meets many users' needs, and it's testable like other 662*4882a593Smuzhiyun filesystem features e.g. with xfstests. 663*4882a593Smuzhiyun 664*4882a593Smuzhiyun:Q: Isn't fs-verity useless because the attacker can just modify the 665*4882a593Smuzhiyun hashes in the Merkle tree, which is stored on-disk? 666*4882a593Smuzhiyun:A: To verify the authenticity of an fs-verity file you must verify 667*4882a593Smuzhiyun the authenticity of the "fs-verity file digest", which 668*4882a593Smuzhiyun incorporates the root hash of the Merkle tree. See `Use cases`_. 669*4882a593Smuzhiyun 670*4882a593Smuzhiyun:Q: Isn't fs-verity useless because the attacker can just replace a 671*4882a593Smuzhiyun verity file with a non-verity one? 672*4882a593Smuzhiyun:A: See `Use cases`_. In the initial use case, it's really trusted 673*4882a593Smuzhiyun userspace code that authenticates the files; fs-verity is just a 674*4882a593Smuzhiyun tool to do this job efficiently and securely. The trusted 675*4882a593Smuzhiyun userspace code will consider non-verity files to be inauthentic. 676*4882a593Smuzhiyun 677*4882a593Smuzhiyun:Q: Why does the Merkle tree need to be stored on-disk? Couldn't you 678*4882a593Smuzhiyun store just the root hash? 679*4882a593Smuzhiyun:A: If the Merkle tree wasn't stored on-disk, then you'd have to 680*4882a593Smuzhiyun compute the entire tree when the file is first accessed, even if 681*4882a593Smuzhiyun just one byte is being read. This is a fundamental consequence of 682*4882a593Smuzhiyun how Merkle tree hashing works. To verify a leaf node, you need to 683*4882a593Smuzhiyun verify the whole path to the root hash, including the root node 684*4882a593Smuzhiyun (the thing which the root hash is a hash of). But if the root 685*4882a593Smuzhiyun node isn't stored on-disk, you have to compute it by hashing its 686*4882a593Smuzhiyun children, and so on until you've actually hashed the entire file. 687*4882a593Smuzhiyun 688*4882a593Smuzhiyun That defeats most of the point of doing a Merkle tree-based hash, 689*4882a593Smuzhiyun since if you have to hash the whole file ahead of time anyway, 690*4882a593Smuzhiyun then you could simply do sha256(file) instead. That would be much 691*4882a593Smuzhiyun simpler, and a bit faster too. 692*4882a593Smuzhiyun 693*4882a593Smuzhiyun It's true that an in-memory Merkle tree could still provide the 694*4882a593Smuzhiyun advantage of verification on every read rather than just on the 695*4882a593Smuzhiyun first read. However, it would be inefficient because every time a 696*4882a593Smuzhiyun hash page gets evicted (you can't pin the entire Merkle tree into 697*4882a593Smuzhiyun memory, since it may be very large), in order to restore it you 698*4882a593Smuzhiyun again need to hash everything below it in the tree. This again 699*4882a593Smuzhiyun defeats most of the point of doing a Merkle tree-based hash, since 700*4882a593Smuzhiyun a single block read could trigger re-hashing gigabytes of data. 701*4882a593Smuzhiyun 702*4882a593Smuzhiyun:Q: But couldn't you store just the leaf nodes and compute the rest? 703*4882a593Smuzhiyun:A: See previous answer; this really just moves up one level, since 704*4882a593Smuzhiyun one could alternatively interpret the data blocks as being the 705*4882a593Smuzhiyun leaf nodes of the Merkle tree. It's true that the tree can be 706*4882a593Smuzhiyun computed much faster if the leaf level is stored rather than just 707*4882a593Smuzhiyun the data, but that's only because each level is less than 1% the 708*4882a593Smuzhiyun size of the level below (assuming the recommended settings of 709*4882a593Smuzhiyun SHA-256 and 4K blocks). For the exact same reason, by storing 710*4882a593Smuzhiyun "just the leaf nodes" you'd already be storing over 99% of the 711*4882a593Smuzhiyun tree, so you might as well simply store the whole tree. 712*4882a593Smuzhiyun 713*4882a593Smuzhiyun:Q: Can the Merkle tree be built ahead of time, e.g. distributed as 714*4882a593Smuzhiyun part of a package that is installed to many computers? 715*4882a593Smuzhiyun:A: This isn't currently supported. It was part of the original 716*4882a593Smuzhiyun design, but was removed to simplify the kernel UAPI and because it 717*4882a593Smuzhiyun wasn't a critical use case. Files are usually installed once and 718*4882a593Smuzhiyun used many times, and cryptographic hashing is somewhat fast on 719*4882a593Smuzhiyun most modern processors. 720*4882a593Smuzhiyun 721*4882a593Smuzhiyun:Q: Why doesn't fs-verity support writes? 722*4882a593Smuzhiyun:A: Write support would be very difficult and would require a 723*4882a593Smuzhiyun completely different design, so it's well outside the scope of 724*4882a593Smuzhiyun fs-verity. Write support would require: 725*4882a593Smuzhiyun 726*4882a593Smuzhiyun - A way to maintain consistency between the data and hashes, 727*4882a593Smuzhiyun including all levels of hashes, since corruption after a crash 728*4882a593Smuzhiyun (especially of potentially the entire file!) is unacceptable. 729*4882a593Smuzhiyun The main options for solving this are data journalling, 730*4882a593Smuzhiyun copy-on-write, and log-structured volume. But it's very hard to 731*4882a593Smuzhiyun retrofit existing filesystems with new consistency mechanisms. 732*4882a593Smuzhiyun Data journalling is available on ext4, but is very slow. 733*4882a593Smuzhiyun 734*4882a593Smuzhiyun - Rebuilding the Merkle tree after every write, which would be 735*4882a593Smuzhiyun extremely inefficient. Alternatively, a different authenticated 736*4882a593Smuzhiyun dictionary structure such as an "authenticated skiplist" could 737*4882a593Smuzhiyun be used. However, this would be far more complex. 738*4882a593Smuzhiyun 739*4882a593Smuzhiyun Compare it to dm-verity vs. dm-integrity. dm-verity is very 740*4882a593Smuzhiyun simple: the kernel just verifies read-only data against a 741*4882a593Smuzhiyun read-only Merkle tree. In contrast, dm-integrity supports writes 742*4882a593Smuzhiyun but is slow, is much more complex, and doesn't actually support 743*4882a593Smuzhiyun full-device authentication since it authenticates each sector 744*4882a593Smuzhiyun independently, i.e. there is no "root hash". It doesn't really 745*4882a593Smuzhiyun make sense for the same device-mapper target to support these two 746*4882a593Smuzhiyun very different cases; the same applies to fs-verity. 747*4882a593Smuzhiyun 748*4882a593Smuzhiyun:Q: Since verity files are immutable, why isn't the immutable bit set? 749*4882a593Smuzhiyun:A: The existing "immutable" bit (FS_IMMUTABLE_FL) already has a 750*4882a593Smuzhiyun specific set of semantics which not only make the file contents 751*4882a593Smuzhiyun read-only, but also prevent the file from being deleted, renamed, 752*4882a593Smuzhiyun linked to, or having its owner or mode changed. These extra 753*4882a593Smuzhiyun properties are unwanted for fs-verity, so reusing the immutable 754*4882a593Smuzhiyun bit isn't appropriate. 755*4882a593Smuzhiyun 756*4882a593Smuzhiyun:Q: Why does the API use ioctls instead of setxattr() and getxattr()? 757*4882a593Smuzhiyun:A: Abusing the xattr interface for basically arbitrary syscalls is 758*4882a593Smuzhiyun heavily frowned upon by most of the Linux filesystem developers. 759*4882a593Smuzhiyun An xattr should really just be an xattr on-disk, not an API to 760*4882a593Smuzhiyun e.g. magically trigger construction of a Merkle tree. 761*4882a593Smuzhiyun 762*4882a593Smuzhiyun:Q: Does fs-verity support remote filesystems? 763*4882a593Smuzhiyun:A: Only ext4 and f2fs support is implemented currently, but in 764*4882a593Smuzhiyun principle any filesystem that can store per-file verity metadata 765*4882a593Smuzhiyun can support fs-verity, regardless of whether it's local or remote. 766*4882a593Smuzhiyun Some filesystems may have fewer options of where to store the 767*4882a593Smuzhiyun verity metadata; one possibility is to store it past the end of 768*4882a593Smuzhiyun the file and "hide" it from userspace by manipulating i_size. The 769*4882a593Smuzhiyun data verification functions provided by ``fs/verity/`` also assume 770*4882a593Smuzhiyun that the filesystem uses the Linux pagecache, but both local and 771*4882a593Smuzhiyun remote filesystems normally do so. 772*4882a593Smuzhiyun 773*4882a593Smuzhiyun:Q: Why is anything filesystem-specific at all? Shouldn't fs-verity 774*4882a593Smuzhiyun be implemented entirely at the VFS level? 775*4882a593Smuzhiyun:A: There are many reasons why this is not possible or would be very 776*4882a593Smuzhiyun difficult, including the following: 777*4882a593Smuzhiyun 778*4882a593Smuzhiyun - To prevent bypassing verification, pages must not be marked 779*4882a593Smuzhiyun Uptodate until they've been verified. Currently, each 780*4882a593Smuzhiyun filesystem is responsible for marking pages Uptodate via 781*4882a593Smuzhiyun ``->readpages()``. Therefore, currently it's not possible for 782*4882a593Smuzhiyun the VFS to do the verification on its own. Changing this would 783*4882a593Smuzhiyun require significant changes to the VFS and all filesystems. 784*4882a593Smuzhiyun 785*4882a593Smuzhiyun - It would require defining a filesystem-independent way to store 786*4882a593Smuzhiyun the verity metadata. Extended attributes don't work for this 787*4882a593Smuzhiyun because (a) the Merkle tree may be gigabytes, but many 788*4882a593Smuzhiyun filesystems assume that all xattrs fit into a single 4K 789*4882a593Smuzhiyun filesystem block, and (b) ext4 and f2fs encryption doesn't 790*4882a593Smuzhiyun encrypt xattrs, yet the Merkle tree *must* be encrypted when the 791*4882a593Smuzhiyun file contents are, because it stores hashes of the plaintext 792*4882a593Smuzhiyun file contents. 793*4882a593Smuzhiyun 794*4882a593Smuzhiyun So the verity metadata would have to be stored in an actual 795*4882a593Smuzhiyun file. Using a separate file would be very ugly, since the 796*4882a593Smuzhiyun metadata is fundamentally part of the file to be protected, and 797*4882a593Smuzhiyun it could cause problems where users could delete the real file 798*4882a593Smuzhiyun but not the metadata file or vice versa. On the other hand, 799*4882a593Smuzhiyun having it be in the same file would break applications unless 800*4882a593Smuzhiyun filesystems' notion of i_size were divorced from the VFS's, 801*4882a593Smuzhiyun which would be complex and require changes to all filesystems. 802*4882a593Smuzhiyun 803*4882a593Smuzhiyun - It's desirable that FS_IOC_ENABLE_VERITY uses the filesystem's 804*4882a593Smuzhiyun transaction mechanism so that either the file ends up with 805*4882a593Smuzhiyun verity enabled, or no changes were made. Allowing intermediate 806*4882a593Smuzhiyun states to occur after a crash may cause problems. 807