xref: /OK3568_Linux_fs/kernel/Documentation/filesystems/erofs.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1.. SPDX-License-Identifier: GPL-2.0
2
3======================================
4Enhanced Read-Only File System - EROFS
5======================================
6
7Overview
8========
9
10EROFS file-system stands for Enhanced Read-Only File System. Different
11from other read-only file systems, it aims to be designed for flexibility,
12scalability, but be kept simple and high performance.
13
14It is designed as a better filesystem solution for the following scenarios:
15
16 - read-only storage media or
17
18 - part of a fully trusted read-only solution, which means it needs to be
19   immutable and bit-for-bit identical to the official golden image for
20   their releases due to security and other considerations and
21
22 - hope to save some extra storage space with guaranteed end-to-end performance
23   by using reduced metadata and transparent file compression, especially
24   for those embedded devices with limited memory (ex, smartphone);
25
26Here is the main features of EROFS:
27
28 - Little endian on-disk design;
29
30 - Currently 4KB block size (nobh) and therefore maximum 16TB address space;
31
32 - Metadata & data could be mixed by design;
33
34 - 2 inode versions for different requirements:
35
36   =====================  ============  =====================================
37                          compact (v1)  extended (v2)
38   =====================  ============  =====================================
39   Inode metadata size    32 bytes      64 bytes
40   Max file size          4 GB          16 EB (also limited by max. vol size)
41   Max uids/gids          65536         4294967296
42   File change time       no            yes (64 + 32-bit timestamp)
43   Max hardlinks          65536         4294967296
44   Metadata reserved      4 bytes       14 bytes
45   =====================  ============  =====================================
46
47 - Support extended attributes (xattrs) as an option;
48
49 - Support xattr inline and tail-end data inline for all files;
50
51 - Support POSIX.1e ACLs by using xattrs;
52
53 - Support transparent file compression as an option:
54   LZ4 algorithm with 4 KB fixed-sized output compression for high performance.
55
56The following git tree provides the file system user-space tools under
57development (ex, formatting tool mkfs.erofs):
58
59- git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git
60
61Bugs and patches are welcome, please kindly help us and send to the following
62linux-erofs mailing list:
63
64- linux-erofs mailing list   <linux-erofs@lists.ozlabs.org>
65
66Mount options
67=============
68
69===================    =========================================================
70(no)user_xattr         Setup Extended User Attributes. Note: xattr is enabled
71                       by default if CONFIG_EROFS_FS_XATTR is selected.
72(no)acl                Setup POSIX Access Control List. Note: acl is enabled
73                       by default if CONFIG_EROFS_FS_POSIX_ACL is selected.
74cache_strategy=%s      Select a strategy for cached decompression from now on:
75
76		       ==========  =============================================
77                         disabled  In-place I/O decompression only;
78                        readahead  Cache the last incomplete compressed physical
79                                   cluster for further reading. It still does
80                                   in-place I/O decompression for the rest
81                                   compressed physical clusters;
82                       readaround  Cache the both ends of incomplete compressed
83                                   physical clusters for further reading.
84                                   It still does in-place I/O decompression
85                                   for the rest compressed physical clusters.
86		       ==========  =============================================
87dax={always,never}     Use direct access (no page cache).  See
88                       Documentation/filesystems/dax.rst.
89dax                    A legacy option which is an alias for ``dax=always``.
90===================    =========================================================
91
92On-disk details
93===============
94
95Summary
96-------
97Different from other read-only file systems, an EROFS volume is designed
98to be as simple as possible::
99
100                                |-> aligned with the block size
101   ____________________________________________________________
102  | |SB| | ... | Metadata | ... | Data | Metadata | ... | Data |
103  |_|__|_|_____|__________|_____|______|__________|_____|______|
104  0 +1K
105
106All data areas should be aligned with the block size, but metadata areas
107may not. All metadatas can be now observed in two different spaces (views):
108
109 1. Inode metadata space
110
111    Each valid inode should be aligned with an inode slot, which is a fixed
112    value (32 bytes) and designed to be kept in line with compact inode size.
113
114    Each inode can be directly found with the following formula:
115         inode offset = meta_blkaddr * block_size + 32 * nid
116
117    ::
118
119				    |-> aligned with 8B
120					    |-> followed closely
121	+ meta_blkaddr blocks                                      |-> another slot
122	_____________________________________________________________________
123	|  ...   | inode |  xattrs  | extents  | data inline | ... | inode ...
124	|________|_______|(optional)|(optional)|__(optional)_|_____|__________
125		|-> aligned with the inode slot size
126		    .                   .
127		    .                         .
128		.                              .
129		.                                    .
130	    .                                         .
131	    .                                              .
132	.____________________________________________________|-> aligned with 4B
133	| xattr_ibody_header | shared xattrs | inline xattrs |
134	|____________________|_______________|_______________|
135	|->    12 bytes    <-|->x * 4 bytes<-|               .
136			    .                .                 .
137			.                      .                   .
138		.                           .                     .
139	    ._______________________________.______________________.
140	    | id | id | id | id |  ... | id | ent | ... | ent| ... |
141	    |____|____|____|____|______|____|_____|_____|____|_____|
142					    |-> aligned with 4B
143							|-> aligned with 4B
144
145    Inode could be 32 or 64 bytes, which can be distinguished from a common
146    field which all inode versions have -- i_format::
147
148        __________________               __________________
149       |     i_format     |             |     i_format     |
150       |__________________|             |__________________|
151       |        ...       |             |        ...       |
152       |                  |             |                  |
153       |__________________| 32 bytes    |                  |
154                                        |                  |
155                                        |__________________| 64 bytes
156
157    Xattrs, extents, data inline are followed by the corresponding inode with
158    proper alignment, and they could be optional for different data mappings.
159    _currently_ total 4 valid data mappings are supported:
160
161    ==  ====================================================================
162     0  flat file data without data inline (no extent);
163     1  fixed-sized output data compression (with non-compacted indexes);
164     2  flat file data with tail packing data inline (no extent);
165     3  fixed-sized output data compression (with compacted indexes, v5.3+).
166    ==  ====================================================================
167
168    The size of the optional xattrs is indicated by i_xattr_count in inode
169    header. Large xattrs or xattrs shared by many different files can be
170    stored in shared xattrs metadata rather than inlined right after inode.
171
172 2. Shared xattrs metadata space
173
174    Shared xattrs space is similar to the above inode space, started with
175    a specific block indicated by xattr_blkaddr, organized one by one with
176    proper align.
177
178    Each share xattr can also be directly found by the following formula:
179         xattr offset = xattr_blkaddr * block_size + 4 * xattr_id
180
181    ::
182
183			    |-> aligned by  4 bytes
184	+ xattr_blkaddr blocks                     |-> aligned with 4 bytes
185	_________________________________________________________________________
186	|  ...   | xattr_entry |  xattr data | ... |  xattr_entry | xattr data  ...
187	|________|_____________|_____________|_____|______________|_______________
188
189Directories
190-----------
191All directories are now organized in a compact on-disk format. Note that
192each directory block is divided into index and name areas in order to support
193random file lookup, and all directory entries are _strictly_ recorded in
194alphabetical order in order to support improved prefix binary search
195algorithm (could refer to the related source code).
196
197::
198
199		    ___________________________
200		    /                           |
201		/              ______________|________________
202		/              /              | nameoff1       | nameoffN-1
203    ____________.______________._______________v________________v__________
204    | dirent | dirent | ... | dirent | filename | filename | ... | filename |
205    |___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____|
206	\                           ^
207	\                          |                           * could have
208	\                         |                             trailing '\0'
209	    \________________________| nameoff0
210
211				Directory block
212
213Note that apart from the offset of the first filename, nameoff0 also indicates
214the total number of directory entries in this block since it is no need to
215introduce another on-disk field at all.
216
217Compression
218-----------
219Currently, EROFS supports 4KB fixed-sized output transparent file compression,
220as illustrated below::
221
222	    |---- Variant-Length Extent ----|-------- VLE --------|----- VLE -----
223	    clusterofs                      clusterofs            clusterofs
224	    |                               |                     |   logical data
225    _________v_______________________________v_____________________v_______________
226    ... |    .        |             |        .    |             |  .          | ...
227    ____|____.________|_____________|________.____|_____________|__.__________|____
228	|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|
229	    size          size          size          size          size
230	    .                             .                .                   .
231	    .                       .               .                  .
232		.                  .              .                .
233	_______._____________._____________._____________._____________________
234	    ... |             |             |             | ... physical data
235	_______|_____________|_____________|_____________|_____________________
236		|-> cluster <-|-> cluster <-|-> cluster <-|
237		    size          size          size
238
239Currently each on-disk physical cluster can contain 4KB (un)compressed data
240at most. For each logical cluster, there is a corresponding on-disk index to
241describe its cluster type, physical cluster address, etc.
242
243See "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details.
244