xref: /OK3568_Linux_fs/kernel/Documentation/filesystems/ext4/bigalloc.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593SmuzhiyunBigalloc
4*4882a593Smuzhiyun--------
5*4882a593Smuzhiyun
6*4882a593SmuzhiyunAt the moment, the default size of a block is 4KiB, which is a commonly
7*4882a593Smuzhiyunsupported page size on most MMU-capable hardware. This is fortunate, as
8*4882a593Smuzhiyunext4 code is not prepared to handle the case where the block size
9*4882a593Smuzhiyunexceeds the page size. However, for a filesystem of mostly huge files,
10*4882a593Smuzhiyunit is desirable to be able to allocate disk blocks in units of multiple
11*4882a593Smuzhiyunblocks to reduce both fragmentation and metadata overhead. The
12*4882a593Smuzhiyunbigalloc feature provides exactly this ability.
13*4882a593Smuzhiyun
14*4882a593SmuzhiyunThe bigalloc feature (EXT4_FEATURE_RO_COMPAT_BIGALLOC) changes ext4 to
15*4882a593Smuzhiyunuse clustered allocation, so that each bit in the ext4 block allocation
16*4882a593Smuzhiyunbitmap addresses a power of two number of blocks. For example, if the
17*4882a593Smuzhiyunfile system is mainly going to be storing large files in the 4-32
18*4882a593Smuzhiyunmegabyte range, it might make sense to set a cluster size of 1 megabyte.
19*4882a593SmuzhiyunThis means that each bit in the block allocation bitmap now addresses
20*4882a593Smuzhiyun256 4k blocks. This shrinks the total size of the block allocation
21*4882a593Smuzhiyunbitmaps for a 2T file system from 64 megabytes to 256 kilobytes. It also
22*4882a593Smuzhiyunmeans that a block group addresses 32 gigabytes instead of 128 megabytes,
23*4882a593Smuzhiyunalso shrinking the amount of file system overhead for metadata.
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunThe administrator can set a block cluster size at mkfs time (which is
26*4882a593Smuzhiyunstored in the s\_log\_cluster\_size field in the superblock); from then
27*4882a593Smuzhiyunon, the block bitmaps track clusters, not individual blocks. This means
28*4882a593Smuzhiyunthat block groups can be several gigabytes in size (instead of just
29*4882a593Smuzhiyun128MiB); however, the minimum allocation unit becomes a cluster, not a
30*4882a593Smuzhiyunblock, even for directories. TaoBao had a patchset to extend the “use
31*4882a593Smuzhiyununits of clusters instead of blocks” to the extent tree, though it is
32*4882a593Smuzhiyunnot clear where those patches went-- they eventually morphed into
33*4882a593Smuzhiyun“extent tree v2” but that code has not landed as of May 2015.
34*4882a593Smuzhiyun
35