xref: /OK3568_Linux_fs/kernel/Documentation/filesystems/ext2.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun
4*4882a593SmuzhiyunThe Second Extended Filesystem
5*4882a593Smuzhiyun==============================
6*4882a593Smuzhiyun
7*4882a593Smuzhiyunext2 was originally released in January 1993.  Written by R\'emy Card,
8*4882a593SmuzhiyunTheodore Ts'o and Stephen Tweedie, it was a major rewrite of the
9*4882a593SmuzhiyunExtended Filesystem.  It is currently still (April 2001) the predominant
10*4882a593Smuzhiyunfilesystem in use by Linux.  There are also implementations available
11*4882a593Smuzhiyunfor NetBSD, FreeBSD, the GNU HURD, Windows 95/98/NT, OS/2 and RISC OS.
12*4882a593Smuzhiyun
13*4882a593SmuzhiyunOptions
14*4882a593Smuzhiyun=======
15*4882a593Smuzhiyun
16*4882a593SmuzhiyunMost defaults are determined by the filesystem superblock, and can be
17*4882a593Smuzhiyunset using tune2fs(8). Kernel-determined defaults are indicated by (*).
18*4882a593Smuzhiyun
19*4882a593Smuzhiyun====================    ===     ================================================
20*4882a593Smuzhiyunbsddf			(*)	Makes ``df`` act like BSD.
21*4882a593Smuzhiyunminixdf				Makes ``df`` act like Minix.
22*4882a593Smuzhiyun
23*4882a593Smuzhiyuncheck=none, nocheck	(*)	Don't do extra checking of bitmaps on mount
24*4882a593Smuzhiyun				(check=normal and check=strict options removed)
25*4882a593Smuzhiyun
26*4882a593Smuzhiyundax				Use direct access (no page cache).  See
27*4882a593Smuzhiyun				Documentation/filesystems/dax.txt.
28*4882a593Smuzhiyun
29*4882a593Smuzhiyundebug				Extra debugging information is sent to the
30*4882a593Smuzhiyun				kernel syslog.  Useful for developers.
31*4882a593Smuzhiyun
32*4882a593Smuzhiyunerrors=continue			Keep going on a filesystem error.
33*4882a593Smuzhiyunerrors=remount-ro		Remount the filesystem read-only on an error.
34*4882a593Smuzhiyunerrors=panic			Panic and halt the machine if an error occurs.
35*4882a593Smuzhiyun
36*4882a593Smuzhiyungrpid, bsdgroups		Give objects the same group ID as their parent.
37*4882a593Smuzhiyunnogrpid, sysvgroups		New objects have the group ID of their creator.
38*4882a593Smuzhiyun
39*4882a593Smuzhiyunnouid32				Use 16-bit UIDs and GIDs.
40*4882a593Smuzhiyun
41*4882a593Smuzhiyunoldalloc			Enable the old block allocator. Orlov should
42*4882a593Smuzhiyun				have better performance, we'd like to get some
43*4882a593Smuzhiyun				feedback if it's the contrary for you.
44*4882a593Smuzhiyunorlov			(*)	Use the Orlov block allocator.
45*4882a593Smuzhiyun				(See http://lwn.net/Articles/14633/ and
46*4882a593Smuzhiyun				http://lwn.net/Articles/14446/.)
47*4882a593Smuzhiyun
48*4882a593Smuzhiyunresuid=n			The user ID which may use the reserved blocks.
49*4882a593Smuzhiyunresgid=n			The group ID which may use the reserved blocks.
50*4882a593Smuzhiyun
51*4882a593Smuzhiyunsb=n				Use alternate superblock at this location.
52*4882a593Smuzhiyun
53*4882a593Smuzhiyunuser_xattr			Enable "user." POSIX Extended Attributes
54*4882a593Smuzhiyun				(requires CONFIG_EXT2_FS_XATTR).
55*4882a593Smuzhiyunnouser_xattr			Don't support "user." extended attributes.
56*4882a593Smuzhiyun
57*4882a593Smuzhiyunacl				Enable POSIX Access Control Lists support
58*4882a593Smuzhiyun				(requires CONFIG_EXT2_FS_POSIX_ACL).
59*4882a593Smuzhiyunnoacl				Don't support POSIX ACLs.
60*4882a593Smuzhiyun
61*4882a593Smuzhiyunnobh				Do not attach buffer_heads to file pagecache.
62*4882a593Smuzhiyun
63*4882a593Smuzhiyunquota, usrquota			Enable user disk quota support
64*4882a593Smuzhiyun				(requires CONFIG_QUOTA).
65*4882a593Smuzhiyun
66*4882a593Smuzhiyungrpquota			Enable group disk quota support
67*4882a593Smuzhiyun				(requires CONFIG_QUOTA).
68*4882a593Smuzhiyun====================    ===     ================================================
69*4882a593Smuzhiyun
70*4882a593Smuzhiyunnoquota option ls silently ignored by ext2.
71*4882a593Smuzhiyun
72*4882a593Smuzhiyun
73*4882a593SmuzhiyunSpecification
74*4882a593Smuzhiyun=============
75*4882a593Smuzhiyun
76*4882a593Smuzhiyunext2 shares many properties with traditional Unix filesystems.  It has
77*4882a593Smuzhiyunthe concepts of blocks, inodes and directories.  It has space in the
78*4882a593Smuzhiyunspecification for Access Control Lists (ACLs), fragments, undeletion and
79*4882a593Smuzhiyuncompression though these are not yet implemented (some are available as
80*4882a593Smuzhiyunseparate patches).  There is also a versioning mechanism to allow new
81*4882a593Smuzhiyunfeatures (such as journalling) to be added in a maximally compatible
82*4882a593Smuzhiyunmanner.
83*4882a593Smuzhiyun
84*4882a593SmuzhiyunBlocks
85*4882a593Smuzhiyun------
86*4882a593Smuzhiyun
87*4882a593SmuzhiyunThe space in the device or file is split up into blocks.  These are
88*4882a593Smuzhiyuna fixed size, of 1024, 2048 or 4096 bytes (8192 bytes on Alpha systems),
89*4882a593Smuzhiyunwhich is decided when the filesystem is created.  Smaller blocks mean
90*4882a593Smuzhiyunless wasted space per file, but require slightly more accounting overhead,
91*4882a593Smuzhiyunand also impose other limits on the size of files and the filesystem.
92*4882a593Smuzhiyun
93*4882a593SmuzhiyunBlock Groups
94*4882a593Smuzhiyun------------
95*4882a593Smuzhiyun
96*4882a593SmuzhiyunBlocks are clustered into block groups in order to reduce fragmentation
97*4882a593Smuzhiyunand minimise the amount of head seeking when reading a large amount
98*4882a593Smuzhiyunof consecutive data.  Information about each block group is kept in a
99*4882a593Smuzhiyundescriptor table stored in the block(s) immediately after the superblock.
100*4882a593SmuzhiyunTwo blocks near the start of each group are reserved for the block usage
101*4882a593Smuzhiyunbitmap and the inode usage bitmap which show which blocks and inodes
102*4882a593Smuzhiyunare in use.  Since each bitmap is limited to a single block, this means
103*4882a593Smuzhiyunthat the maximum size of a block group is 8 times the size of a block.
104*4882a593Smuzhiyun
105*4882a593SmuzhiyunThe block(s) following the bitmaps in each block group are designated
106*4882a593Smuzhiyunas the inode table for that block group and the remainder are the data
107*4882a593Smuzhiyunblocks.  The block allocation algorithm attempts to allocate data blocks
108*4882a593Smuzhiyunin the same block group as the inode which contains them.
109*4882a593Smuzhiyun
110*4882a593SmuzhiyunThe Superblock
111*4882a593Smuzhiyun--------------
112*4882a593Smuzhiyun
113*4882a593SmuzhiyunThe superblock contains all the information about the configuration of
114*4882a593Smuzhiyunthe filing system.  The primary copy of the superblock is stored at an
115*4882a593Smuzhiyunoffset of 1024 bytes from the start of the device, and it is essential
116*4882a593Smuzhiyunto mounting the filesystem.  Since it is so important, backup copies of
117*4882a593Smuzhiyunthe superblock are stored in block groups throughout the filesystem.
118*4882a593SmuzhiyunThe first version of ext2 (revision 0) stores a copy at the start of
119*4882a593Smuzhiyunevery block group, along with backups of the group descriptor block(s).
120*4882a593SmuzhiyunBecause this can consume a considerable amount of space for large
121*4882a593Smuzhiyunfilesystems, later revisions can optionally reduce the number of backup
122*4882a593Smuzhiyuncopies by only putting backups in specific groups (this is the sparse
123*4882a593Smuzhiyunsuperblock feature).  The groups chosen are 0, 1 and powers of 3, 5 and 7.
124*4882a593Smuzhiyun
125*4882a593SmuzhiyunThe information in the superblock contains fields such as the total
126*4882a593Smuzhiyunnumber of inodes and blocks in the filesystem and how many are free,
127*4882a593Smuzhiyunhow many inodes and blocks are in each block group, when the filesystem
128*4882a593Smuzhiyunwas mounted (and if it was cleanly unmounted), when it was modified,
129*4882a593Smuzhiyunwhat version of the filesystem it is (see the Revisions section below)
130*4882a593Smuzhiyunand which OS created it.
131*4882a593Smuzhiyun
132*4882a593SmuzhiyunIf the filesystem is revision 1 or higher, then there are extra fields,
133*4882a593Smuzhiyunsuch as a volume name, a unique identification number, the inode size,
134*4882a593Smuzhiyunand space for optional filesystem features to store configuration info.
135*4882a593Smuzhiyun
136*4882a593SmuzhiyunAll fields in the superblock (as in all other ext2 structures) are stored
137*4882a593Smuzhiyunon the disc in little endian format, so a filesystem is portable between
138*4882a593Smuzhiyunmachines without having to know what machine it was created on.
139*4882a593Smuzhiyun
140*4882a593SmuzhiyunInodes
141*4882a593Smuzhiyun------
142*4882a593Smuzhiyun
143*4882a593SmuzhiyunThe inode (index node) is a fundamental concept in the ext2 filesystem.
144*4882a593SmuzhiyunEach object in the filesystem is represented by an inode.  The inode
145*4882a593Smuzhiyunstructure contains pointers to the filesystem blocks which contain the
146*4882a593Smuzhiyundata held in the object and all of the metadata about an object except
147*4882a593Smuzhiyunits name.  The metadata about an object includes the permissions, owner,
148*4882a593Smuzhiyungroup, flags, size, number of blocks used, access time, change time,
149*4882a593Smuzhiyunmodification time, deletion time, number of links, fragments, version
150*4882a593Smuzhiyun(for NFS) and extended attributes (EAs) and/or Access Control Lists (ACLs).
151*4882a593Smuzhiyun
152*4882a593SmuzhiyunThere are some reserved fields which are currently unused in the inode
153*4882a593Smuzhiyunstructure and several which are overloaded.  One field is reserved for the
154*4882a593Smuzhiyundirectory ACL if the inode is a directory and alternately for the top 32
155*4882a593Smuzhiyunbits of the file size if the inode is a regular file (allowing file sizes
156*4882a593Smuzhiyunlarger than 2GB).  The translator field is unused under Linux, but is used
157*4882a593Smuzhiyunby the HURD to reference the inode of a program which will be used to
158*4882a593Smuzhiyuninterpret this object.  Most of the remaining reserved fields have been
159*4882a593Smuzhiyunused up for both Linux and the HURD for larger owner and group fields,
160*4882a593SmuzhiyunThe HURD also has a larger mode field so it uses another of the remaining
161*4882a593Smuzhiyunfields to store the extra more bits.
162*4882a593Smuzhiyun
163*4882a593SmuzhiyunThere are pointers to the first 12 blocks which contain the file's data
164*4882a593Smuzhiyunin the inode.  There is a pointer to an indirect block (which contains
165*4882a593Smuzhiyunpointers to the next set of blocks), a pointer to a doubly-indirect
166*4882a593Smuzhiyunblock (which contains pointers to indirect blocks) and a pointer to a
167*4882a593Smuzhiyuntrebly-indirect block (which contains pointers to doubly-indirect blocks).
168*4882a593Smuzhiyun
169*4882a593SmuzhiyunThe flags field contains some ext2-specific flags which aren't catered
170*4882a593Smuzhiyunfor by the standard chmod flags.  These flags can be listed with lsattr
171*4882a593Smuzhiyunand changed with the chattr command, and allow specific filesystem
172*4882a593Smuzhiyunbehaviour on a per-file basis.  There are flags for secure deletion,
173*4882a593Smuzhiyunundeletable, compression, synchronous updates, immutability, append-only,
174*4882a593Smuzhiyundumpable, no-atime, indexed directories, and data-journaling.  Not all
175*4882a593Smuzhiyunof these are supported yet.
176*4882a593Smuzhiyun
177*4882a593SmuzhiyunDirectories
178*4882a593Smuzhiyun-----------
179*4882a593Smuzhiyun
180*4882a593SmuzhiyunA directory is a filesystem object and has an inode just like a file.
181*4882a593SmuzhiyunIt is a specially formatted file containing records which associate
182*4882a593Smuzhiyuneach name with an inode number.  Later revisions of the filesystem also
183*4882a593Smuzhiyunencode the type of the object (file, directory, symlink, device, fifo,
184*4882a593Smuzhiyunsocket) to avoid the need to check the inode itself for this information
185*4882a593Smuzhiyun(support for taking advantage of this feature does not yet exist in
186*4882a593SmuzhiyunGlibc 2.2).
187*4882a593Smuzhiyun
188*4882a593SmuzhiyunThe inode allocation code tries to assign inodes which are in the same
189*4882a593Smuzhiyunblock group as the directory in which they are first created.
190*4882a593Smuzhiyun
191*4882a593SmuzhiyunThe current implementation of ext2 uses a singly-linked list to store
192*4882a593Smuzhiyunthe filenames in the directory; a pending enhancement uses hashing of the
193*4882a593Smuzhiyunfilenames to allow lookup without the need to scan the entire directory.
194*4882a593Smuzhiyun
195*4882a593SmuzhiyunThe current implementation never removes empty directory blocks once they
196*4882a593Smuzhiyunhave been allocated to hold more files.
197*4882a593Smuzhiyun
198*4882a593SmuzhiyunSpecial files
199*4882a593Smuzhiyun-------------
200*4882a593Smuzhiyun
201*4882a593SmuzhiyunSymbolic links are also filesystem objects with inodes.  They deserve
202*4882a593Smuzhiyunspecial mention because the data for them is stored within the inode
203*4882a593Smuzhiyunitself if the symlink is less than 60 bytes long.  It uses the fields
204*4882a593Smuzhiyunwhich would normally be used to store the pointers to data blocks.
205*4882a593SmuzhiyunThis is a worthwhile optimisation as it we avoid allocating a full
206*4882a593Smuzhiyunblock for the symlink, and most symlinks are less than 60 characters long.
207*4882a593Smuzhiyun
208*4882a593SmuzhiyunCharacter and block special devices never have data blocks assigned to
209*4882a593Smuzhiyunthem.  Instead, their device number is stored in the inode, again reusing
210*4882a593Smuzhiyunthe fields which would be used to point to the data blocks.
211*4882a593Smuzhiyun
212*4882a593SmuzhiyunReserved Space
213*4882a593Smuzhiyun--------------
214*4882a593Smuzhiyun
215*4882a593SmuzhiyunIn ext2, there is a mechanism for reserving a certain number of blocks
216*4882a593Smuzhiyunfor a particular user (normally the super-user).  This is intended to
217*4882a593Smuzhiyunallow for the system to continue functioning even if non-privileged users
218*4882a593Smuzhiyunfill up all the space available to them (this is independent of filesystem
219*4882a593Smuzhiyunquotas).  It also keeps the filesystem from filling up entirely which
220*4882a593Smuzhiyunhelps combat fragmentation.
221*4882a593Smuzhiyun
222*4882a593SmuzhiyunFilesystem check
223*4882a593Smuzhiyun----------------
224*4882a593Smuzhiyun
225*4882a593SmuzhiyunAt boot time, most systems run a consistency check (e2fsck) on their
226*4882a593Smuzhiyunfilesystems.  The superblock of the ext2 filesystem contains several
227*4882a593Smuzhiyunfields which indicate whether fsck should actually run (since checking
228*4882a593Smuzhiyunthe filesystem at boot can take a long time if it is large).  fsck will
229*4882a593Smuzhiyunrun if the filesystem was not cleanly unmounted, if the maximum mount
230*4882a593Smuzhiyuncount has been exceeded or if the maximum time between checks has been
231*4882a593Smuzhiyunexceeded.
232*4882a593Smuzhiyun
233*4882a593SmuzhiyunFeature Compatibility
234*4882a593Smuzhiyun---------------------
235*4882a593Smuzhiyun
236*4882a593SmuzhiyunThe compatibility feature mechanism used in ext2 is sophisticated.
237*4882a593SmuzhiyunIt safely allows features to be added to the filesystem, without
238*4882a593Smuzhiyununnecessarily sacrificing compatibility with older versions of the
239*4882a593Smuzhiyunfilesystem code.  The feature compatibility mechanism is not supported by
240*4882a593Smuzhiyunthe original revision 0 (EXT2_GOOD_OLD_REV) of ext2, but was introduced in
241*4882a593Smuzhiyunrevision 1.  There are three 32-bit fields, one for compatible features
242*4882a593Smuzhiyun(COMPAT), one for read-only compatible (RO_COMPAT) features and one for
243*4882a593Smuzhiyunincompatible (INCOMPAT) features.
244*4882a593Smuzhiyun
245*4882a593SmuzhiyunThese feature flags have specific meanings for the kernel as follows:
246*4882a593Smuzhiyun
247*4882a593SmuzhiyunA COMPAT flag indicates that a feature is present in the filesystem,
248*4882a593Smuzhiyunbut the on-disk format is 100% compatible with older on-disk formats, so
249*4882a593Smuzhiyuna kernel which didn't know anything about this feature could read/write
250*4882a593Smuzhiyunthe filesystem without any chance of corrupting the filesystem (or even
251*4882a593Smuzhiyunmaking it inconsistent).  This is essentially just a flag which says
252*4882a593Smuzhiyun"this filesystem has a (hidden) feature" that the kernel or e2fsck may
253*4882a593Smuzhiyunwant to be aware of (more on e2fsck and feature flags later).  The ext3
254*4882a593SmuzhiyunHAS_JOURNAL feature is a COMPAT flag because the ext3 journal is simply
255*4882a593Smuzhiyuna regular file with data blocks in it so the kernel does not need to
256*4882a593Smuzhiyuntake any special notice of it if it doesn't understand ext3 journaling.
257*4882a593Smuzhiyun
258*4882a593SmuzhiyunAn RO_COMPAT flag indicates that the on-disk format is 100% compatible
259*4882a593Smuzhiyunwith older on-disk formats for reading (i.e. the feature does not change
260*4882a593Smuzhiyunthe visible on-disk format).  However, an old kernel writing to such a
261*4882a593Smuzhiyunfilesystem would/could corrupt the filesystem, so this is prevented. The
262*4882a593Smuzhiyunmost common such feature, SPARSE_SUPER, is an RO_COMPAT feature because
263*4882a593Smuzhiyunsparse groups allow file data blocks where superblock/group descriptor
264*4882a593Smuzhiyunbackups used to live, and ext2_free_blocks() refuses to free these blocks,
265*4882a593Smuzhiyunwhich would leading to inconsistent bitmaps.  An old kernel would also
266*4882a593Smuzhiyunget an error if it tried to free a series of blocks which crossed a group
267*4882a593Smuzhiyunboundary, but this is a legitimate layout in a SPARSE_SUPER filesystem.
268*4882a593Smuzhiyun
269*4882a593SmuzhiyunAn INCOMPAT flag indicates the on-disk format has changed in some
270*4882a593Smuzhiyunway that makes it unreadable by older kernels, or would otherwise
271*4882a593Smuzhiyuncause a problem if an old kernel tried to mount it.  FILETYPE is an
272*4882a593SmuzhiyunINCOMPAT flag because older kernels would think a filename was longer
273*4882a593Smuzhiyunthan 256 characters, which would lead to corrupt directory listings.
274*4882a593SmuzhiyunThe COMPRESSION flag is an obvious INCOMPAT flag - if the kernel
275*4882a593Smuzhiyundoesn't understand compression, you would just get garbage back from
276*4882a593Smuzhiyunread() instead of it automatically decompressing your data.  The ext3
277*4882a593SmuzhiyunRECOVER flag is needed to prevent a kernel which does not understand the
278*4882a593Smuzhiyunext3 journal from mounting the filesystem without replaying the journal.
279*4882a593Smuzhiyun
280*4882a593SmuzhiyunFor e2fsck, it needs to be more strict with the handling of these
281*4882a593Smuzhiyunflags than the kernel.  If it doesn't understand ANY of the COMPAT,
282*4882a593SmuzhiyunRO_COMPAT, or INCOMPAT flags it will refuse to check the filesystem,
283*4882a593Smuzhiyunbecause it has no way of verifying whether a given feature is valid
284*4882a593Smuzhiyunor not.  Allowing e2fsck to succeed on a filesystem with an unknown
285*4882a593Smuzhiyunfeature is a false sense of security for the user.  Refusing to check
286*4882a593Smuzhiyuna filesystem with unknown features is a good incentive for the user to
287*4882a593Smuzhiyunupdate to the latest e2fsck.  This also means that anyone adding feature
288*4882a593Smuzhiyunflags to ext2 also needs to update e2fsck to verify these features.
289*4882a593Smuzhiyun
290*4882a593SmuzhiyunMetadata
291*4882a593Smuzhiyun--------
292*4882a593Smuzhiyun
293*4882a593SmuzhiyunIt is frequently claimed that the ext2 implementation of writing
294*4882a593Smuzhiyunasynchronous metadata is faster than the ffs synchronous metadata
295*4882a593Smuzhiyunscheme but less reliable.  Both methods are equally resolvable by their
296*4882a593Smuzhiyunrespective fsck programs.
297*4882a593Smuzhiyun
298*4882a593SmuzhiyunIf you're exceptionally paranoid, there are 3 ways of making metadata
299*4882a593Smuzhiyunwrites synchronous on ext2:
300*4882a593Smuzhiyun
301*4882a593Smuzhiyun- per-file if you have the program source: use the O_SYNC flag to open()
302*4882a593Smuzhiyun- per-file if you don't have the source: use "chattr +S" on the file
303*4882a593Smuzhiyun- per-filesystem: add the "sync" option to mount (or in /etc/fstab)
304*4882a593Smuzhiyun
305*4882a593Smuzhiyunthe first and last are not ext2 specific but do force the metadata to
306*4882a593Smuzhiyunbe written synchronously.  See also Journaling below.
307*4882a593Smuzhiyun
308*4882a593SmuzhiyunLimitations
309*4882a593Smuzhiyun-----------
310*4882a593Smuzhiyun
311*4882a593SmuzhiyunThere are various limits imposed by the on-disk layout of ext2.  Other
312*4882a593Smuzhiyunlimits are imposed by the current implementation of the kernel code.
313*4882a593SmuzhiyunMany of the limits are determined at the time the filesystem is first
314*4882a593Smuzhiyuncreated, and depend upon the block size chosen.  The ratio of inodes to
315*4882a593Smuzhiyundata blocks is fixed at filesystem creation time, so the only way to
316*4882a593Smuzhiyunincrease the number of inodes is to increase the size of the filesystem.
317*4882a593SmuzhiyunNo tools currently exist which can change the ratio of inodes to blocks.
318*4882a593Smuzhiyun
319*4882a593SmuzhiyunMost of these limits could be overcome with slight changes in the on-disk
320*4882a593Smuzhiyunformat and using a compatibility flag to signal the format change (at
321*4882a593Smuzhiyunthe expense of some compatibility).
322*4882a593Smuzhiyun
323*4882a593Smuzhiyun=====================  =======    =======    =======   ========
324*4882a593SmuzhiyunFilesystem block size      1kB        2kB        4kB        8kB
325*4882a593Smuzhiyun=====================  =======    =======    =======   ========
326*4882a593SmuzhiyunFile size limit           16GB      256GB     2048GB     2048GB
327*4882a593SmuzhiyunFilesystem size limit   2047GB     8192GB    16384GB    32768GB
328*4882a593Smuzhiyun=====================  =======    =======    =======   ========
329*4882a593Smuzhiyun
330*4882a593SmuzhiyunThere is a 2.4 kernel limit of 2048GB for a single block device, so no
331*4882a593Smuzhiyunfilesystem larger than that can be created at this time.  There is also
332*4882a593Smuzhiyunan upper limit on the block size imposed by the page size of the kernel,
333*4882a593Smuzhiyunso 8kB blocks are only allowed on Alpha systems (and other architectures
334*4882a593Smuzhiyunwhich support larger pages).
335*4882a593Smuzhiyun
336*4882a593SmuzhiyunThere is an upper limit of 32000 subdirectories in a single directory.
337*4882a593Smuzhiyun
338*4882a593SmuzhiyunThere is a "soft" upper limit of about 10-15k files in a single directory
339*4882a593Smuzhiyunwith the current linear linked-list directory implementation.  This limit
340*4882a593Smuzhiyunstems from performance problems when creating and deleting (and also
341*4882a593Smuzhiyunfinding) files in such large directories.  Using a hashed directory index
342*4882a593Smuzhiyun(under development) allows 100k-1M+ files in a single directory without
343*4882a593Smuzhiyunperformance problems (although RAM size becomes an issue at this point).
344*4882a593Smuzhiyun
345*4882a593SmuzhiyunThe (meaningless) absolute upper limit of files in a single directory
346*4882a593Smuzhiyun(imposed by the file size, the realistic limit is obviously much less)
347*4882a593Smuzhiyunis over 130 trillion files.  It would be higher except there are not
348*4882a593Smuzhiyunenough 4-character names to make up unique directory entries, so they
349*4882a593Smuzhiyunhave to be 8 character filenames, even then we are fairly close to
350*4882a593Smuzhiyunrunning out of unique filenames.
351*4882a593Smuzhiyun
352*4882a593SmuzhiyunJournaling
353*4882a593Smuzhiyun----------
354*4882a593Smuzhiyun
355*4882a593SmuzhiyunA journaling extension to the ext2 code has been developed by Stephen
356*4882a593SmuzhiyunTweedie.  It avoids the risks of metadata corruption and the need to
357*4882a593Smuzhiyunwait for e2fsck to complete after a crash, without requiring a change
358*4882a593Smuzhiyunto the on-disk ext2 layout.  In a nutshell, the journal is a regular
359*4882a593Smuzhiyunfile which stores whole metadata (and optionally data) blocks that have
360*4882a593Smuzhiyunbeen modified, prior to writing them into the filesystem.  This means
361*4882a593Smuzhiyunit is possible to add a journal to an existing ext2 filesystem without
362*4882a593Smuzhiyunthe need for data conversion.
363*4882a593Smuzhiyun
364*4882a593SmuzhiyunWhen changes to the filesystem (e.g. a file is renamed) they are stored in
365*4882a593Smuzhiyuna transaction in the journal and can either be complete or incomplete at
366*4882a593Smuzhiyunthe time of a crash.  If a transaction is complete at the time of a crash
367*4882a593Smuzhiyun(or in the normal case where the system does not crash), then any blocks
368*4882a593Smuzhiyunin that transaction are guaranteed to represent a valid filesystem state,
369*4882a593Smuzhiyunand are copied into the filesystem.  If a transaction is incomplete at
370*4882a593Smuzhiyunthe time of the crash, then there is no guarantee of consistency for
371*4882a593Smuzhiyunthe blocks in that transaction so they are discarded (which means any
372*4882a593Smuzhiyunfilesystem changes they represent are also lost).
373*4882a593SmuzhiyunCheck Documentation/filesystems/ext4/ if you want to read more about
374*4882a593Smuzhiyunext4 and journaling.
375*4882a593Smuzhiyun
376*4882a593SmuzhiyunReferences
377*4882a593Smuzhiyun==========
378*4882a593Smuzhiyun
379*4882a593Smuzhiyun=======================	===============================================
380*4882a593SmuzhiyunThe kernel source	file:/usr/src/linux/fs/ext2/
381*4882a593Smuzhiyune2fsprogs (e2fsck)	http://e2fsprogs.sourceforge.net/
382*4882a593SmuzhiyunDesign & Implementation	http://e2fsprogs.sourceforge.net/ext2intro.html
383*4882a593SmuzhiyunJournaling (ext3)	ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/
384*4882a593SmuzhiyunFilesystem Resizing	http://ext2resize.sourceforge.net/
385*4882a593SmuzhiyunCompression [1]_	http://e2compr.sourceforge.net/
386*4882a593Smuzhiyun=======================	===============================================
387*4882a593Smuzhiyun
388*4882a593SmuzhiyunImplementations for:
389*4882a593Smuzhiyun
390*4882a593Smuzhiyun=======================	===========================================================
391*4882a593SmuzhiyunWindows 95/98/NT/2000	http://www.chrysocome.net/explore2fs
392*4882a593SmuzhiyunWindows 95 [1]_		http://www.yipton.net/content.html#FSDEXT2
393*4882a593SmuzhiyunDOS client [1]_		ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
394*4882a593SmuzhiyunOS/2 [2]_		ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
395*4882a593SmuzhiyunRISC OS client		http://www.esw-heim.tu-clausthal.de/~marco/smorbrod/IscaFS/
396*4882a593Smuzhiyun=======================	===========================================================
397*4882a593Smuzhiyun
398*4882a593Smuzhiyun.. [1] no longer actively developed/supported (as of Apr 2001)
399*4882a593Smuzhiyun.. [2] no longer actively developed/supported (as of Mar 2009)
400