1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun============ 4*4882a593SmuzhiyunFiemap Ioctl 5*4882a593Smuzhiyun============ 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunThe fiemap ioctl is an efficient method for userspace to get file 8*4882a593Smuzhiyunextent mappings. Instead of block-by-block mapping (such as bmap), fiemap 9*4882a593Smuzhiyunreturns a list of extents. 10*4882a593Smuzhiyun 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunRequest Basics 13*4882a593Smuzhiyun-------------- 14*4882a593Smuzhiyun 15*4882a593SmuzhiyunA fiemap request is encoded within struct fiemap:: 16*4882a593Smuzhiyun 17*4882a593Smuzhiyun struct fiemap { 18*4882a593Smuzhiyun __u64 fm_start; /* logical offset (inclusive) at 19*4882a593Smuzhiyun * which to start mapping (in) */ 20*4882a593Smuzhiyun __u64 fm_length; /* logical length of mapping which 21*4882a593Smuzhiyun * userspace cares about (in) */ 22*4882a593Smuzhiyun __u32 fm_flags; /* FIEMAP_FLAG_* flags for request (in/out) */ 23*4882a593Smuzhiyun __u32 fm_mapped_extents; /* number of extents that were 24*4882a593Smuzhiyun * mapped (out) */ 25*4882a593Smuzhiyun __u32 fm_extent_count; /* size of fm_extents array (in) */ 26*4882a593Smuzhiyun __u32 fm_reserved; 27*4882a593Smuzhiyun struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */ 28*4882a593Smuzhiyun }; 29*4882a593Smuzhiyun 30*4882a593Smuzhiyun 31*4882a593Smuzhiyunfm_start, and fm_length specify the logical range within the file 32*4882a593Smuzhiyunwhich the process would like mappings for. Extents returned mirror 33*4882a593Smuzhiyunthose on disk - that is, the logical offset of the 1st returned extent 34*4882a593Smuzhiyunmay start before fm_start, and the range covered by the last returned 35*4882a593Smuzhiyunextent may end after fm_length. All offsets and lengths are in bytes. 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunCertain flags to modify the way in which mappings are looked up can be 38*4882a593Smuzhiyunset in fm_flags. If the kernel doesn't understand some particular 39*4882a593Smuzhiyunflags, it will return EBADR and the contents of fm_flags will contain 40*4882a593Smuzhiyunthe set of flags which caused the error. If the kernel is compatible 41*4882a593Smuzhiyunwith all flags passed, the contents of fm_flags will be unmodified. 42*4882a593SmuzhiyunIt is up to userspace to determine whether rejection of a particular 43*4882a593Smuzhiyunflag is fatal to its operation. This scheme is intended to allow the 44*4882a593Smuzhiyunfiemap interface to grow in the future but without losing 45*4882a593Smuzhiyuncompatibility with old software. 46*4882a593Smuzhiyun 47*4882a593Smuzhiyunfm_extent_count specifies the number of elements in the fm_extents[] array 48*4882a593Smuzhiyunthat can be used to return extents. If fm_extent_count is zero, then the 49*4882a593Smuzhiyunfm_extents[] array is ignored (no extents will be returned), and the 50*4882a593Smuzhiyunfm_mapped_extents count will hold the number of extents needed in 51*4882a593Smuzhiyunfm_extents[] to hold the file's current mapping. Note that there is 52*4882a593Smuzhiyunnothing to prevent the file from changing between calls to FIEMAP. 53*4882a593Smuzhiyun 54*4882a593SmuzhiyunThe following flags can be set in fm_flags: 55*4882a593Smuzhiyun 56*4882a593SmuzhiyunFIEMAP_FLAG_SYNC 57*4882a593Smuzhiyun If this flag is set, the kernel will sync the file before mapping extents. 58*4882a593Smuzhiyun 59*4882a593SmuzhiyunFIEMAP_FLAG_XATTR 60*4882a593Smuzhiyun If this flag is set, the extents returned will describe the inodes 61*4882a593Smuzhiyun extended attribute lookup tree, instead of its data tree. 62*4882a593Smuzhiyun 63*4882a593Smuzhiyun 64*4882a593SmuzhiyunExtent Mapping 65*4882a593Smuzhiyun-------------- 66*4882a593Smuzhiyun 67*4882a593SmuzhiyunExtent information is returned within the embedded fm_extents array 68*4882a593Smuzhiyunwhich userspace must allocate along with the fiemap structure. The 69*4882a593Smuzhiyunnumber of elements in the fiemap_extents[] array should be passed via 70*4882a593Smuzhiyunfm_extent_count. The number of extents mapped by kernel will be 71*4882a593Smuzhiyunreturned via fm_mapped_extents. If the number of fiemap_extents 72*4882a593Smuzhiyunallocated is less than would be required to map the requested range, 73*4882a593Smuzhiyunthe maximum number of extents that can be mapped in the fm_extent[] 74*4882a593Smuzhiyunarray will be returned and fm_mapped_extents will be equal to 75*4882a593Smuzhiyunfm_extent_count. In that case, the last extent in the array will not 76*4882a593Smuzhiyuncomplete the requested range and will not have the FIEMAP_EXTENT_LAST 77*4882a593Smuzhiyunflag set (see the next section on extent flags). 78*4882a593Smuzhiyun 79*4882a593SmuzhiyunEach extent is described by a single fiemap_extent structure as 80*4882a593Smuzhiyunreturned in fm_extents:: 81*4882a593Smuzhiyun 82*4882a593Smuzhiyun struct fiemap_extent { 83*4882a593Smuzhiyun __u64 fe_logical; /* logical offset in bytes for the start of 84*4882a593Smuzhiyun * the extent */ 85*4882a593Smuzhiyun __u64 fe_physical; /* physical offset in bytes for the start 86*4882a593Smuzhiyun * of the extent */ 87*4882a593Smuzhiyun __u64 fe_length; /* length in bytes for the extent */ 88*4882a593Smuzhiyun __u64 fe_reserved64[2]; 89*4882a593Smuzhiyun __u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */ 90*4882a593Smuzhiyun __u32 fe_reserved[3]; 91*4882a593Smuzhiyun }; 92*4882a593Smuzhiyun 93*4882a593SmuzhiyunAll offsets and lengths are in bytes and mirror those on disk. It is valid 94*4882a593Smuzhiyunfor an extents logical offset to start before the request or its logical 95*4882a593Smuzhiyunlength to extend past the request. Unless FIEMAP_EXTENT_NOT_ALIGNED is 96*4882a593Smuzhiyunreturned, fe_logical, fe_physical, and fe_length will be aligned to the 97*4882a593Smuzhiyunblock size of the file system. With the exception of extents flagged as 98*4882a593SmuzhiyunFIEMAP_EXTENT_MERGED, adjacent extents will not be merged. 99*4882a593Smuzhiyun 100*4882a593SmuzhiyunThe fe_flags field contains flags which describe the extent returned. 101*4882a593SmuzhiyunA special flag, FIEMAP_EXTENT_LAST is always set on the last extent in 102*4882a593Smuzhiyunthe file so that the process making fiemap calls can determine when no 103*4882a593Smuzhiyunmore extents are available, without having to call the ioctl again. 104*4882a593Smuzhiyun 105*4882a593SmuzhiyunSome flags are intentionally vague and will always be set in the 106*4882a593Smuzhiyunpresence of other more specific flags. This way a program looking for 107*4882a593Smuzhiyuna general property does not have to know all existing and future flags 108*4882a593Smuzhiyunwhich imply that property. 109*4882a593Smuzhiyun 110*4882a593SmuzhiyunFor example, if FIEMAP_EXTENT_DATA_INLINE or FIEMAP_EXTENT_DATA_TAIL 111*4882a593Smuzhiyunare set, FIEMAP_EXTENT_NOT_ALIGNED will also be set. A program looking 112*4882a593Smuzhiyunfor inline or tail-packed data can key on the specific flag. Software 113*4882a593Smuzhiyunwhich simply cares not to try operating on non-aligned extents 114*4882a593Smuzhiyunhowever, can just key on FIEMAP_EXTENT_NOT_ALIGNED, and not have to 115*4882a593Smuzhiyunworry about all present and future flags which might imply unaligned 116*4882a593Smuzhiyundata. Note that the opposite is not true - it would be valid for 117*4882a593SmuzhiyunFIEMAP_EXTENT_NOT_ALIGNED to appear alone. 118*4882a593Smuzhiyun 119*4882a593SmuzhiyunFIEMAP_EXTENT_LAST 120*4882a593Smuzhiyun This is generally the last extent in the file. A mapping attempt past 121*4882a593Smuzhiyun this extent may return nothing. Some implementations set this flag to 122*4882a593Smuzhiyun indicate this extent is the last one in the range queried by the user 123*4882a593Smuzhiyun (via fiemap->fm_length). 124*4882a593Smuzhiyun 125*4882a593SmuzhiyunFIEMAP_EXTENT_UNKNOWN 126*4882a593Smuzhiyun The location of this extent is currently unknown. This may indicate 127*4882a593Smuzhiyun the data is stored on an inaccessible volume or that no storage has 128*4882a593Smuzhiyun been allocated for the file yet. 129*4882a593Smuzhiyun 130*4882a593SmuzhiyunFIEMAP_EXTENT_DELALLOC 131*4882a593Smuzhiyun This will also set FIEMAP_EXTENT_UNKNOWN. 132*4882a593Smuzhiyun 133*4882a593Smuzhiyun Delayed allocation - while there is data for this extent, its 134*4882a593Smuzhiyun physical location has not been allocated yet. 135*4882a593Smuzhiyun 136*4882a593SmuzhiyunFIEMAP_EXTENT_ENCODED 137*4882a593Smuzhiyun This extent does not consist of plain filesystem blocks but is 138*4882a593Smuzhiyun encoded (e.g. encrypted or compressed). Reading the data in this 139*4882a593Smuzhiyun extent via I/O to the block device will have undefined results. 140*4882a593Smuzhiyun 141*4882a593SmuzhiyunNote that it is *always* undefined to try to update the data 142*4882a593Smuzhiyunin-place by writing to the indicated location without the 143*4882a593Smuzhiyunassistance of the filesystem, or to access the data using the 144*4882a593Smuzhiyuninformation returned by the FIEMAP interface while the filesystem 145*4882a593Smuzhiyunis mounted. In other words, user applications may only read the 146*4882a593Smuzhiyunextent data via I/O to the block device while the filesystem is 147*4882a593Smuzhiyununmounted, and then only if the FIEMAP_EXTENT_ENCODED flag is 148*4882a593Smuzhiyunclear; user applications must not try reading or writing to the 149*4882a593Smuzhiyunfilesystem via the block device under any other circumstances. 150*4882a593Smuzhiyun 151*4882a593SmuzhiyunFIEMAP_EXTENT_DATA_ENCRYPTED 152*4882a593Smuzhiyun This will also set FIEMAP_EXTENT_ENCODED 153*4882a593Smuzhiyun The data in this extent has been encrypted by the file system. 154*4882a593Smuzhiyun 155*4882a593SmuzhiyunFIEMAP_EXTENT_NOT_ALIGNED 156*4882a593Smuzhiyun Extent offsets and length are not guaranteed to be block aligned. 157*4882a593Smuzhiyun 158*4882a593SmuzhiyunFIEMAP_EXTENT_DATA_INLINE 159*4882a593Smuzhiyun This will also set FIEMAP_EXTENT_NOT_ALIGNED 160*4882a593Smuzhiyun Data is located within a meta data block. 161*4882a593Smuzhiyun 162*4882a593SmuzhiyunFIEMAP_EXTENT_DATA_TAIL 163*4882a593Smuzhiyun This will also set FIEMAP_EXTENT_NOT_ALIGNED 164*4882a593Smuzhiyun Data is packed into a block with data from other files. 165*4882a593Smuzhiyun 166*4882a593SmuzhiyunFIEMAP_EXTENT_UNWRITTEN 167*4882a593Smuzhiyun Unwritten extent - the extent is allocated but its data has not been 168*4882a593Smuzhiyun initialized. This indicates the extent's data will be all zero if read 169*4882a593Smuzhiyun through the filesystem but the contents are undefined if read directly from 170*4882a593Smuzhiyun the device. 171*4882a593Smuzhiyun 172*4882a593SmuzhiyunFIEMAP_EXTENT_MERGED 173*4882a593Smuzhiyun This will be set when a file does not support extents, i.e., it uses a block 174*4882a593Smuzhiyun based addressing scheme. Since returning an extent for each block back to 175*4882a593Smuzhiyun userspace would be highly inefficient, the kernel will try to merge most 176*4882a593Smuzhiyun adjacent blocks into 'extents'. 177*4882a593Smuzhiyun 178*4882a593Smuzhiyun 179*4882a593SmuzhiyunVFS -> File System Implementation 180*4882a593Smuzhiyun--------------------------------- 181*4882a593Smuzhiyun 182*4882a593SmuzhiyunFile systems wishing to support fiemap must implement a ->fiemap callback on 183*4882a593Smuzhiyuntheir inode_operations structure. The fs ->fiemap call is responsible for 184*4882a593Smuzhiyundefining its set of supported fiemap flags, and calling a helper function on 185*4882a593Smuzhiyuneach discovered extent:: 186*4882a593Smuzhiyun 187*4882a593Smuzhiyun struct inode_operations { 188*4882a593Smuzhiyun ... 189*4882a593Smuzhiyun 190*4882a593Smuzhiyun int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, 191*4882a593Smuzhiyun u64 len); 192*4882a593Smuzhiyun 193*4882a593Smuzhiyun->fiemap is passed struct fiemap_extent_info which describes the 194*4882a593Smuzhiyunfiemap request:: 195*4882a593Smuzhiyun 196*4882a593Smuzhiyun struct fiemap_extent_info { 197*4882a593Smuzhiyun unsigned int fi_flags; /* Flags as passed from user */ 198*4882a593Smuzhiyun unsigned int fi_extents_mapped; /* Number of mapped extents */ 199*4882a593Smuzhiyun unsigned int fi_extents_max; /* Size of fiemap_extent array */ 200*4882a593Smuzhiyun struct fiemap_extent *fi_extents_start; /* Start of fiemap_extent array */ 201*4882a593Smuzhiyun }; 202*4882a593Smuzhiyun 203*4882a593SmuzhiyunIt is intended that the file system should not need to access any of this 204*4882a593Smuzhiyunstructure directly. Filesystem handlers should be tolerant to signals and return 205*4882a593SmuzhiyunEINTR once fatal signal received. 206*4882a593Smuzhiyun 207*4882a593Smuzhiyun 208*4882a593SmuzhiyunFlag checking should be done at the beginning of the ->fiemap callback via the 209*4882a593Smuzhiyunfiemap_prep() helper:: 210*4882a593Smuzhiyun 211*4882a593Smuzhiyun int fiemap_prep(struct inode *inode, struct fiemap_extent_info *fieinfo, 212*4882a593Smuzhiyun u64 start, u64 *len, u32 supported_flags); 213*4882a593Smuzhiyun 214*4882a593SmuzhiyunThe struct fieinfo should be passed in as received from ioctl_fiemap(). The 215*4882a593Smuzhiyunset of fiemap flags which the fs understands should be passed via fs_flags. If 216*4882a593Smuzhiyunfiemap_prep finds invalid user flags, it will place the bad values in 217*4882a593Smuzhiyunfieinfo->fi_flags and return -EBADR. If the file system gets -EBADR, from 218*4882a593Smuzhiyunfiemap_prep(), it should immediately exit, returning that error back to 219*4882a593Smuzhiyunioctl_fiemap(). Additionally the range is validate against the supported 220*4882a593Smuzhiyunmaximum file size. 221*4882a593Smuzhiyun 222*4882a593Smuzhiyun 223*4882a593SmuzhiyunFor each extent in the request range, the file system should call 224*4882a593Smuzhiyunthe helper function, fiemap_fill_next_extent():: 225*4882a593Smuzhiyun 226*4882a593Smuzhiyun int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical, 227*4882a593Smuzhiyun u64 phys, u64 len, u32 flags, u32 dev); 228*4882a593Smuzhiyun 229*4882a593Smuzhiyunfiemap_fill_next_extent() will use the passed values to populate the 230*4882a593Smuzhiyunnext free extent in the fm_extents array. 'General' extent flags will 231*4882a593Smuzhiyunautomatically be set from specific flags on behalf of the calling file 232*4882a593Smuzhiyunsystem so that the userspace API is not broken. 233*4882a593Smuzhiyun 234*4882a593Smuzhiyunfiemap_fill_next_extent() returns 0 on success, and 1 when the 235*4882a593Smuzhiyunuser-supplied fm_extents array is full. If an error is encountered 236*4882a593Smuzhiyunwhile copying the extent to user memory, -EFAULT will be returned. 237