1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun====================================== 4*4882a593SmuzhiyunEnhanced Read-Only File System - EROFS 5*4882a593Smuzhiyun====================================== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunOverview 8*4882a593Smuzhiyun======== 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunEROFS file-system stands for Enhanced Read-Only File System. Different 11*4882a593Smuzhiyunfrom other read-only file systems, it aims to be designed for flexibility, 12*4882a593Smuzhiyunscalability, but be kept simple and high performance. 13*4882a593Smuzhiyun 14*4882a593SmuzhiyunIt is designed as a better filesystem solution for the following scenarios: 15*4882a593Smuzhiyun 16*4882a593Smuzhiyun - read-only storage media or 17*4882a593Smuzhiyun 18*4882a593Smuzhiyun - part of a fully trusted read-only solution, which means it needs to be 19*4882a593Smuzhiyun immutable and bit-for-bit identical to the official golden image for 20*4882a593Smuzhiyun their releases due to security and other considerations and 21*4882a593Smuzhiyun 22*4882a593Smuzhiyun - hope to save some extra storage space with guaranteed end-to-end performance 23*4882a593Smuzhiyun by using reduced metadata and transparent file compression, especially 24*4882a593Smuzhiyun for those embedded devices with limited memory (ex, smartphone); 25*4882a593Smuzhiyun 26*4882a593SmuzhiyunHere is the main features of EROFS: 27*4882a593Smuzhiyun 28*4882a593Smuzhiyun - Little endian on-disk design; 29*4882a593Smuzhiyun 30*4882a593Smuzhiyun - Currently 4KB block size (nobh) and therefore maximum 16TB address space; 31*4882a593Smuzhiyun 32*4882a593Smuzhiyun - Metadata & data could be mixed by design; 33*4882a593Smuzhiyun 34*4882a593Smuzhiyun - 2 inode versions for different requirements: 35*4882a593Smuzhiyun 36*4882a593Smuzhiyun ===================== ============ ===================================== 37*4882a593Smuzhiyun compact (v1) extended (v2) 38*4882a593Smuzhiyun ===================== ============ ===================================== 39*4882a593Smuzhiyun Inode metadata size 32 bytes 64 bytes 40*4882a593Smuzhiyun Max file size 4 GB 16 EB (also limited by max. vol size) 41*4882a593Smuzhiyun Max uids/gids 65536 4294967296 42*4882a593Smuzhiyun File change time no yes (64 + 32-bit timestamp) 43*4882a593Smuzhiyun Max hardlinks 65536 4294967296 44*4882a593Smuzhiyun Metadata reserved 4 bytes 14 bytes 45*4882a593Smuzhiyun ===================== ============ ===================================== 46*4882a593Smuzhiyun 47*4882a593Smuzhiyun - Support extended attributes (xattrs) as an option; 48*4882a593Smuzhiyun 49*4882a593Smuzhiyun - Support xattr inline and tail-end data inline for all files; 50*4882a593Smuzhiyun 51*4882a593Smuzhiyun - Support POSIX.1e ACLs by using xattrs; 52*4882a593Smuzhiyun 53*4882a593Smuzhiyun - Support transparent file compression as an option: 54*4882a593Smuzhiyun LZ4 algorithm with 4 KB fixed-sized output compression for high performance. 55*4882a593Smuzhiyun 56*4882a593SmuzhiyunThe following git tree provides the file system user-space tools under 57*4882a593Smuzhiyundevelopment (ex, formatting tool mkfs.erofs): 58*4882a593Smuzhiyun 59*4882a593Smuzhiyun- git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git 60*4882a593Smuzhiyun 61*4882a593SmuzhiyunBugs and patches are welcome, please kindly help us and send to the following 62*4882a593Smuzhiyunlinux-erofs mailing list: 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun- linux-erofs mailing list <linux-erofs@lists.ozlabs.org> 65*4882a593Smuzhiyun 66*4882a593SmuzhiyunMount options 67*4882a593Smuzhiyun============= 68*4882a593Smuzhiyun 69*4882a593Smuzhiyun=================== ========================================================= 70*4882a593Smuzhiyun(no)user_xattr Setup Extended User Attributes. Note: xattr is enabled 71*4882a593Smuzhiyun by default if CONFIG_EROFS_FS_XATTR is selected. 72*4882a593Smuzhiyun(no)acl Setup POSIX Access Control List. Note: acl is enabled 73*4882a593Smuzhiyun by default if CONFIG_EROFS_FS_POSIX_ACL is selected. 74*4882a593Smuzhiyuncache_strategy=%s Select a strategy for cached decompression from now on: 75*4882a593Smuzhiyun 76*4882a593Smuzhiyun ========== ============================================= 77*4882a593Smuzhiyun disabled In-place I/O decompression only; 78*4882a593Smuzhiyun readahead Cache the last incomplete compressed physical 79*4882a593Smuzhiyun cluster for further reading. It still does 80*4882a593Smuzhiyun in-place I/O decompression for the rest 81*4882a593Smuzhiyun compressed physical clusters; 82*4882a593Smuzhiyun readaround Cache the both ends of incomplete compressed 83*4882a593Smuzhiyun physical clusters for further reading. 84*4882a593Smuzhiyun It still does in-place I/O decompression 85*4882a593Smuzhiyun for the rest compressed physical clusters. 86*4882a593Smuzhiyun ========== ============================================= 87*4882a593Smuzhiyundax={always,never} Use direct access (no page cache). See 88*4882a593Smuzhiyun Documentation/filesystems/dax.rst. 89*4882a593Smuzhiyundax A legacy option which is an alias for ``dax=always``. 90*4882a593Smuzhiyun=================== ========================================================= 91*4882a593Smuzhiyun 92*4882a593SmuzhiyunOn-disk details 93*4882a593Smuzhiyun=============== 94*4882a593Smuzhiyun 95*4882a593SmuzhiyunSummary 96*4882a593Smuzhiyun------- 97*4882a593SmuzhiyunDifferent from other read-only file systems, an EROFS volume is designed 98*4882a593Smuzhiyunto be as simple as possible:: 99*4882a593Smuzhiyun 100*4882a593Smuzhiyun |-> aligned with the block size 101*4882a593Smuzhiyun ____________________________________________________________ 102*4882a593Smuzhiyun | |SB| | ... | Metadata | ... | Data | Metadata | ... | Data | 103*4882a593Smuzhiyun |_|__|_|_____|__________|_____|______|__________|_____|______| 104*4882a593Smuzhiyun 0 +1K 105*4882a593Smuzhiyun 106*4882a593SmuzhiyunAll data areas should be aligned with the block size, but metadata areas 107*4882a593Smuzhiyunmay not. All metadatas can be now observed in two different spaces (views): 108*4882a593Smuzhiyun 109*4882a593Smuzhiyun 1. Inode metadata space 110*4882a593Smuzhiyun 111*4882a593Smuzhiyun Each valid inode should be aligned with an inode slot, which is a fixed 112*4882a593Smuzhiyun value (32 bytes) and designed to be kept in line with compact inode size. 113*4882a593Smuzhiyun 114*4882a593Smuzhiyun Each inode can be directly found with the following formula: 115*4882a593Smuzhiyun inode offset = meta_blkaddr * block_size + 32 * nid 116*4882a593Smuzhiyun 117*4882a593Smuzhiyun :: 118*4882a593Smuzhiyun 119*4882a593Smuzhiyun |-> aligned with 8B 120*4882a593Smuzhiyun |-> followed closely 121*4882a593Smuzhiyun + meta_blkaddr blocks |-> another slot 122*4882a593Smuzhiyun _____________________________________________________________________ 123*4882a593Smuzhiyun | ... | inode | xattrs | extents | data inline | ... | inode ... 124*4882a593Smuzhiyun |________|_______|(optional)|(optional)|__(optional)_|_____|__________ 125*4882a593Smuzhiyun |-> aligned with the inode slot size 126*4882a593Smuzhiyun . . 127*4882a593Smuzhiyun . . 128*4882a593Smuzhiyun . . 129*4882a593Smuzhiyun . . 130*4882a593Smuzhiyun . . 131*4882a593Smuzhiyun . . 132*4882a593Smuzhiyun .____________________________________________________|-> aligned with 4B 133*4882a593Smuzhiyun | xattr_ibody_header | shared xattrs | inline xattrs | 134*4882a593Smuzhiyun |____________________|_______________|_______________| 135*4882a593Smuzhiyun |-> 12 bytes <-|->x * 4 bytes<-| . 136*4882a593Smuzhiyun . . . 137*4882a593Smuzhiyun . . . 138*4882a593Smuzhiyun . . . 139*4882a593Smuzhiyun ._______________________________.______________________. 140*4882a593Smuzhiyun | id | id | id | id | ... | id | ent | ... | ent| ... | 141*4882a593Smuzhiyun |____|____|____|____|______|____|_____|_____|____|_____| 142*4882a593Smuzhiyun |-> aligned with 4B 143*4882a593Smuzhiyun |-> aligned with 4B 144*4882a593Smuzhiyun 145*4882a593Smuzhiyun Inode could be 32 or 64 bytes, which can be distinguished from a common 146*4882a593Smuzhiyun field which all inode versions have -- i_format:: 147*4882a593Smuzhiyun 148*4882a593Smuzhiyun __________________ __________________ 149*4882a593Smuzhiyun | i_format | | i_format | 150*4882a593Smuzhiyun |__________________| |__________________| 151*4882a593Smuzhiyun | ... | | ... | 152*4882a593Smuzhiyun | | | | 153*4882a593Smuzhiyun |__________________| 32 bytes | | 154*4882a593Smuzhiyun | | 155*4882a593Smuzhiyun |__________________| 64 bytes 156*4882a593Smuzhiyun 157*4882a593Smuzhiyun Xattrs, extents, data inline are followed by the corresponding inode with 158*4882a593Smuzhiyun proper alignment, and they could be optional for different data mappings. 159*4882a593Smuzhiyun _currently_ total 4 valid data mappings are supported: 160*4882a593Smuzhiyun 161*4882a593Smuzhiyun == ==================================================================== 162*4882a593Smuzhiyun 0 flat file data without data inline (no extent); 163*4882a593Smuzhiyun 1 fixed-sized output data compression (with non-compacted indexes); 164*4882a593Smuzhiyun 2 flat file data with tail packing data inline (no extent); 165*4882a593Smuzhiyun 3 fixed-sized output data compression (with compacted indexes, v5.3+). 166*4882a593Smuzhiyun == ==================================================================== 167*4882a593Smuzhiyun 168*4882a593Smuzhiyun The size of the optional xattrs is indicated by i_xattr_count in inode 169*4882a593Smuzhiyun header. Large xattrs or xattrs shared by many different files can be 170*4882a593Smuzhiyun stored in shared xattrs metadata rather than inlined right after inode. 171*4882a593Smuzhiyun 172*4882a593Smuzhiyun 2. Shared xattrs metadata space 173*4882a593Smuzhiyun 174*4882a593Smuzhiyun Shared xattrs space is similar to the above inode space, started with 175*4882a593Smuzhiyun a specific block indicated by xattr_blkaddr, organized one by one with 176*4882a593Smuzhiyun proper align. 177*4882a593Smuzhiyun 178*4882a593Smuzhiyun Each share xattr can also be directly found by the following formula: 179*4882a593Smuzhiyun xattr offset = xattr_blkaddr * block_size + 4 * xattr_id 180*4882a593Smuzhiyun 181*4882a593Smuzhiyun :: 182*4882a593Smuzhiyun 183*4882a593Smuzhiyun |-> aligned by 4 bytes 184*4882a593Smuzhiyun + xattr_blkaddr blocks |-> aligned with 4 bytes 185*4882a593Smuzhiyun _________________________________________________________________________ 186*4882a593Smuzhiyun | ... | xattr_entry | xattr data | ... | xattr_entry | xattr data ... 187*4882a593Smuzhiyun |________|_____________|_____________|_____|______________|_______________ 188*4882a593Smuzhiyun 189*4882a593SmuzhiyunDirectories 190*4882a593Smuzhiyun----------- 191*4882a593SmuzhiyunAll directories are now organized in a compact on-disk format. Note that 192*4882a593Smuzhiyuneach directory block is divided into index and name areas in order to support 193*4882a593Smuzhiyunrandom file lookup, and all directory entries are _strictly_ recorded in 194*4882a593Smuzhiyunalphabetical order in order to support improved prefix binary search 195*4882a593Smuzhiyunalgorithm (could refer to the related source code). 196*4882a593Smuzhiyun 197*4882a593Smuzhiyun:: 198*4882a593Smuzhiyun 199*4882a593Smuzhiyun ___________________________ 200*4882a593Smuzhiyun / | 201*4882a593Smuzhiyun / ______________|________________ 202*4882a593Smuzhiyun / / | nameoff1 | nameoffN-1 203*4882a593Smuzhiyun ____________.______________._______________v________________v__________ 204*4882a593Smuzhiyun | dirent | dirent | ... | dirent | filename | filename | ... | filename | 205*4882a593Smuzhiyun |___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____| 206*4882a593Smuzhiyun \ ^ 207*4882a593Smuzhiyun \ | * could have 208*4882a593Smuzhiyun \ | trailing '\0' 209*4882a593Smuzhiyun \________________________| nameoff0 210*4882a593Smuzhiyun 211*4882a593Smuzhiyun Directory block 212*4882a593Smuzhiyun 213*4882a593SmuzhiyunNote that apart from the offset of the first filename, nameoff0 also indicates 214*4882a593Smuzhiyunthe total number of directory entries in this block since it is no need to 215*4882a593Smuzhiyunintroduce another on-disk field at all. 216*4882a593Smuzhiyun 217*4882a593SmuzhiyunCompression 218*4882a593Smuzhiyun----------- 219*4882a593SmuzhiyunCurrently, EROFS supports 4KB fixed-sized output transparent file compression, 220*4882a593Smuzhiyunas illustrated below:: 221*4882a593Smuzhiyun 222*4882a593Smuzhiyun |---- Variant-Length Extent ----|-------- VLE --------|----- VLE ----- 223*4882a593Smuzhiyun clusterofs clusterofs clusterofs 224*4882a593Smuzhiyun | | | logical data 225*4882a593Smuzhiyun _________v_______________________________v_____________________v_______________ 226*4882a593Smuzhiyun ... | . | | . | | . | ... 227*4882a593Smuzhiyun ____|____.________|_____________|________.____|_____________|__.__________|____ 228*4882a593Smuzhiyun |-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-| 229*4882a593Smuzhiyun size size size size size 230*4882a593Smuzhiyun . . . . 231*4882a593Smuzhiyun . . . . 232*4882a593Smuzhiyun . . . . 233*4882a593Smuzhiyun _______._____________._____________._____________._____________________ 234*4882a593Smuzhiyun ... | | | | ... physical data 235*4882a593Smuzhiyun _______|_____________|_____________|_____________|_____________________ 236*4882a593Smuzhiyun |-> cluster <-|-> cluster <-|-> cluster <-| 237*4882a593Smuzhiyun size size size 238*4882a593Smuzhiyun 239*4882a593SmuzhiyunCurrently each on-disk physical cluster can contain 4KB (un)compressed data 240*4882a593Smuzhiyunat most. For each logical cluster, there is a corresponding on-disk index to 241*4882a593Smuzhiyundescribe its cluster type, physical cluster address, etc. 242*4882a593Smuzhiyun 243*4882a593SmuzhiyunSee "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details. 244