xref: /OK3568_Linux_fs/kernel/Documentation/filesystems/fiemap.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun============
4*4882a593SmuzhiyunFiemap Ioctl
5*4882a593Smuzhiyun============
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunThe fiemap ioctl is an efficient method for userspace to get file
8*4882a593Smuzhiyunextent mappings. Instead of block-by-block mapping (such as bmap), fiemap
9*4882a593Smuzhiyunreturns a list of extents.
10*4882a593Smuzhiyun
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunRequest Basics
13*4882a593Smuzhiyun--------------
14*4882a593Smuzhiyun
15*4882a593SmuzhiyunA fiemap request is encoded within struct fiemap::
16*4882a593Smuzhiyun
17*4882a593Smuzhiyun  struct fiemap {
18*4882a593Smuzhiyun	__u64	fm_start;	 /* logical offset (inclusive) at
19*4882a593Smuzhiyun				  * which to start mapping (in) */
20*4882a593Smuzhiyun	__u64	fm_length;	 /* logical length of mapping which
21*4882a593Smuzhiyun				  * userspace cares about (in) */
22*4882a593Smuzhiyun	__u32	fm_flags;	 /* FIEMAP_FLAG_* flags for request (in/out) */
23*4882a593Smuzhiyun	__u32	fm_mapped_extents; /* number of extents that were
24*4882a593Smuzhiyun				    * mapped (out) */
25*4882a593Smuzhiyun	__u32	fm_extent_count; /* size of fm_extents array (in) */
26*4882a593Smuzhiyun	__u32	fm_reserved;
27*4882a593Smuzhiyun	struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */
28*4882a593Smuzhiyun  };
29*4882a593Smuzhiyun
30*4882a593Smuzhiyun
31*4882a593Smuzhiyunfm_start, and fm_length specify the logical range within the file
32*4882a593Smuzhiyunwhich the process would like mappings for. Extents returned mirror
33*4882a593Smuzhiyunthose on disk - that is, the logical offset of the 1st returned extent
34*4882a593Smuzhiyunmay start before fm_start, and the range covered by the last returned
35*4882a593Smuzhiyunextent may end after fm_length. All offsets and lengths are in bytes.
36*4882a593Smuzhiyun
37*4882a593SmuzhiyunCertain flags to modify the way in which mappings are looked up can be
38*4882a593Smuzhiyunset in fm_flags. If the kernel doesn't understand some particular
39*4882a593Smuzhiyunflags, it will return EBADR and the contents of fm_flags will contain
40*4882a593Smuzhiyunthe set of flags which caused the error. If the kernel is compatible
41*4882a593Smuzhiyunwith all flags passed, the contents of fm_flags will be unmodified.
42*4882a593SmuzhiyunIt is up to userspace to determine whether rejection of a particular
43*4882a593Smuzhiyunflag is fatal to its operation. This scheme is intended to allow the
44*4882a593Smuzhiyunfiemap interface to grow in the future but without losing
45*4882a593Smuzhiyuncompatibility with old software.
46*4882a593Smuzhiyun
47*4882a593Smuzhiyunfm_extent_count specifies the number of elements in the fm_extents[] array
48*4882a593Smuzhiyunthat can be used to return extents.  If fm_extent_count is zero, then the
49*4882a593Smuzhiyunfm_extents[] array is ignored (no extents will be returned), and the
50*4882a593Smuzhiyunfm_mapped_extents count will hold the number of extents needed in
51*4882a593Smuzhiyunfm_extents[] to hold the file's current mapping.  Note that there is
52*4882a593Smuzhiyunnothing to prevent the file from changing between calls to FIEMAP.
53*4882a593Smuzhiyun
54*4882a593SmuzhiyunThe following flags can be set in fm_flags:
55*4882a593Smuzhiyun
56*4882a593SmuzhiyunFIEMAP_FLAG_SYNC
57*4882a593Smuzhiyun  If this flag is set, the kernel will sync the file before mapping extents.
58*4882a593Smuzhiyun
59*4882a593SmuzhiyunFIEMAP_FLAG_XATTR
60*4882a593Smuzhiyun  If this flag is set, the extents returned will describe the inodes
61*4882a593Smuzhiyun  extended attribute lookup tree, instead of its data tree.
62*4882a593Smuzhiyun
63*4882a593Smuzhiyun
64*4882a593SmuzhiyunExtent Mapping
65*4882a593Smuzhiyun--------------
66*4882a593Smuzhiyun
67*4882a593SmuzhiyunExtent information is returned within the embedded fm_extents array
68*4882a593Smuzhiyunwhich userspace must allocate along with the fiemap structure. The
69*4882a593Smuzhiyunnumber of elements in the fiemap_extents[] array should be passed via
70*4882a593Smuzhiyunfm_extent_count. The number of extents mapped by kernel will be
71*4882a593Smuzhiyunreturned via fm_mapped_extents. If the number of fiemap_extents
72*4882a593Smuzhiyunallocated is less than would be required to map the requested range,
73*4882a593Smuzhiyunthe maximum number of extents that can be mapped in the fm_extent[]
74*4882a593Smuzhiyunarray will be returned and fm_mapped_extents will be equal to
75*4882a593Smuzhiyunfm_extent_count. In that case, the last extent in the array will not
76*4882a593Smuzhiyuncomplete the requested range and will not have the FIEMAP_EXTENT_LAST
77*4882a593Smuzhiyunflag set (see the next section on extent flags).
78*4882a593Smuzhiyun
79*4882a593SmuzhiyunEach extent is described by a single fiemap_extent structure as
80*4882a593Smuzhiyunreturned in fm_extents::
81*4882a593Smuzhiyun
82*4882a593Smuzhiyun    struct fiemap_extent {
83*4882a593Smuzhiyun	    __u64	fe_logical;  /* logical offset in bytes for the start of
84*4882a593Smuzhiyun				* the extent */
85*4882a593Smuzhiyun	    __u64	fe_physical; /* physical offset in bytes for the start
86*4882a593Smuzhiyun				* of the extent */
87*4882a593Smuzhiyun	    __u64	fe_length;   /* length in bytes for the extent */
88*4882a593Smuzhiyun	    __u64	fe_reserved64[2];
89*4882a593Smuzhiyun	    __u32	fe_flags;    /* FIEMAP_EXTENT_* flags for this extent */
90*4882a593Smuzhiyun	    __u32	fe_reserved[3];
91*4882a593Smuzhiyun    };
92*4882a593Smuzhiyun
93*4882a593SmuzhiyunAll offsets and lengths are in bytes and mirror those on disk.  It is valid
94*4882a593Smuzhiyunfor an extents logical offset to start before the request or its logical
95*4882a593Smuzhiyunlength to extend past the request.  Unless FIEMAP_EXTENT_NOT_ALIGNED is
96*4882a593Smuzhiyunreturned, fe_logical, fe_physical, and fe_length will be aligned to the
97*4882a593Smuzhiyunblock size of the file system.  With the exception of extents flagged as
98*4882a593SmuzhiyunFIEMAP_EXTENT_MERGED, adjacent extents will not be merged.
99*4882a593Smuzhiyun
100*4882a593SmuzhiyunThe fe_flags field contains flags which describe the extent returned.
101*4882a593SmuzhiyunA special flag, FIEMAP_EXTENT_LAST is always set on the last extent in
102*4882a593Smuzhiyunthe file so that the process making fiemap calls can determine when no
103*4882a593Smuzhiyunmore extents are available, without having to call the ioctl again.
104*4882a593Smuzhiyun
105*4882a593SmuzhiyunSome flags are intentionally vague and will always be set in the
106*4882a593Smuzhiyunpresence of other more specific flags. This way a program looking for
107*4882a593Smuzhiyuna general property does not have to know all existing and future flags
108*4882a593Smuzhiyunwhich imply that property.
109*4882a593Smuzhiyun
110*4882a593SmuzhiyunFor example, if FIEMAP_EXTENT_DATA_INLINE or FIEMAP_EXTENT_DATA_TAIL
111*4882a593Smuzhiyunare set, FIEMAP_EXTENT_NOT_ALIGNED will also be set. A program looking
112*4882a593Smuzhiyunfor inline or tail-packed data can key on the specific flag. Software
113*4882a593Smuzhiyunwhich simply cares not to try operating on non-aligned extents
114*4882a593Smuzhiyunhowever, can just key on FIEMAP_EXTENT_NOT_ALIGNED, and not have to
115*4882a593Smuzhiyunworry about all present and future flags which might imply unaligned
116*4882a593Smuzhiyundata. Note that the opposite is not true - it would be valid for
117*4882a593SmuzhiyunFIEMAP_EXTENT_NOT_ALIGNED to appear alone.
118*4882a593Smuzhiyun
119*4882a593SmuzhiyunFIEMAP_EXTENT_LAST
120*4882a593Smuzhiyun  This is generally the last extent in the file. A mapping attempt past
121*4882a593Smuzhiyun  this extent may return nothing. Some implementations set this flag to
122*4882a593Smuzhiyun  indicate this extent is the last one in the range queried by the user
123*4882a593Smuzhiyun  (via fiemap->fm_length).
124*4882a593Smuzhiyun
125*4882a593SmuzhiyunFIEMAP_EXTENT_UNKNOWN
126*4882a593Smuzhiyun  The location of this extent is currently unknown. This may indicate
127*4882a593Smuzhiyun  the data is stored on an inaccessible volume or that no storage has
128*4882a593Smuzhiyun  been allocated for the file yet.
129*4882a593Smuzhiyun
130*4882a593SmuzhiyunFIEMAP_EXTENT_DELALLOC
131*4882a593Smuzhiyun  This will also set FIEMAP_EXTENT_UNKNOWN.
132*4882a593Smuzhiyun
133*4882a593Smuzhiyun  Delayed allocation - while there is data for this extent, its
134*4882a593Smuzhiyun  physical location has not been allocated yet.
135*4882a593Smuzhiyun
136*4882a593SmuzhiyunFIEMAP_EXTENT_ENCODED
137*4882a593Smuzhiyun  This extent does not consist of plain filesystem blocks but is
138*4882a593Smuzhiyun  encoded (e.g. encrypted or compressed).  Reading the data in this
139*4882a593Smuzhiyun  extent via I/O to the block device will have undefined results.
140*4882a593Smuzhiyun
141*4882a593SmuzhiyunNote that it is *always* undefined to try to update the data
142*4882a593Smuzhiyunin-place by writing to the indicated location without the
143*4882a593Smuzhiyunassistance of the filesystem, or to access the data using the
144*4882a593Smuzhiyuninformation returned by the FIEMAP interface while the filesystem
145*4882a593Smuzhiyunis mounted.  In other words, user applications may only read the
146*4882a593Smuzhiyunextent data via I/O to the block device while the filesystem is
147*4882a593Smuzhiyununmounted, and then only if the FIEMAP_EXTENT_ENCODED flag is
148*4882a593Smuzhiyunclear; user applications must not try reading or writing to the
149*4882a593Smuzhiyunfilesystem via the block device under any other circumstances.
150*4882a593Smuzhiyun
151*4882a593SmuzhiyunFIEMAP_EXTENT_DATA_ENCRYPTED
152*4882a593Smuzhiyun  This will also set FIEMAP_EXTENT_ENCODED
153*4882a593Smuzhiyun  The data in this extent has been encrypted by the file system.
154*4882a593Smuzhiyun
155*4882a593SmuzhiyunFIEMAP_EXTENT_NOT_ALIGNED
156*4882a593Smuzhiyun  Extent offsets and length are not guaranteed to be block aligned.
157*4882a593Smuzhiyun
158*4882a593SmuzhiyunFIEMAP_EXTENT_DATA_INLINE
159*4882a593Smuzhiyun  This will also set FIEMAP_EXTENT_NOT_ALIGNED
160*4882a593Smuzhiyun  Data is located within a meta data block.
161*4882a593Smuzhiyun
162*4882a593SmuzhiyunFIEMAP_EXTENT_DATA_TAIL
163*4882a593Smuzhiyun  This will also set FIEMAP_EXTENT_NOT_ALIGNED
164*4882a593Smuzhiyun  Data is packed into a block with data from other files.
165*4882a593Smuzhiyun
166*4882a593SmuzhiyunFIEMAP_EXTENT_UNWRITTEN
167*4882a593Smuzhiyun  Unwritten extent - the extent is allocated but its data has not been
168*4882a593Smuzhiyun  initialized.  This indicates the extent's data will be all zero if read
169*4882a593Smuzhiyun  through the filesystem but the contents are undefined if read directly from
170*4882a593Smuzhiyun  the device.
171*4882a593Smuzhiyun
172*4882a593SmuzhiyunFIEMAP_EXTENT_MERGED
173*4882a593Smuzhiyun  This will be set when a file does not support extents, i.e., it uses a block
174*4882a593Smuzhiyun  based addressing scheme.  Since returning an extent for each block back to
175*4882a593Smuzhiyun  userspace would be highly inefficient, the kernel will try to merge most
176*4882a593Smuzhiyun  adjacent blocks into 'extents'.
177*4882a593Smuzhiyun
178*4882a593Smuzhiyun
179*4882a593SmuzhiyunVFS -> File System Implementation
180*4882a593Smuzhiyun---------------------------------
181*4882a593Smuzhiyun
182*4882a593SmuzhiyunFile systems wishing to support fiemap must implement a ->fiemap callback on
183*4882a593Smuzhiyuntheir inode_operations structure. The fs ->fiemap call is responsible for
184*4882a593Smuzhiyundefining its set of supported fiemap flags, and calling a helper function on
185*4882a593Smuzhiyuneach discovered extent::
186*4882a593Smuzhiyun
187*4882a593Smuzhiyun  struct inode_operations {
188*4882a593Smuzhiyun       ...
189*4882a593Smuzhiyun
190*4882a593Smuzhiyun       int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
191*4882a593Smuzhiyun                     u64 len);
192*4882a593Smuzhiyun
193*4882a593Smuzhiyun->fiemap is passed struct fiemap_extent_info which describes the
194*4882a593Smuzhiyunfiemap request::
195*4882a593Smuzhiyun
196*4882a593Smuzhiyun  struct fiemap_extent_info {
197*4882a593Smuzhiyun	unsigned int fi_flags;		/* Flags as passed from user */
198*4882a593Smuzhiyun	unsigned int fi_extents_mapped;	/* Number of mapped extents */
199*4882a593Smuzhiyun	unsigned int fi_extents_max;	/* Size of fiemap_extent array */
200*4882a593Smuzhiyun	struct fiemap_extent *fi_extents_start;	/* Start of fiemap_extent array */
201*4882a593Smuzhiyun  };
202*4882a593Smuzhiyun
203*4882a593SmuzhiyunIt is intended that the file system should not need to access any of this
204*4882a593Smuzhiyunstructure directly. Filesystem handlers should be tolerant to signals and return
205*4882a593SmuzhiyunEINTR once fatal signal received.
206*4882a593Smuzhiyun
207*4882a593Smuzhiyun
208*4882a593SmuzhiyunFlag checking should be done at the beginning of the ->fiemap callback via the
209*4882a593Smuzhiyunfiemap_prep() helper::
210*4882a593Smuzhiyun
211*4882a593Smuzhiyun  int fiemap_prep(struct inode *inode, struct fiemap_extent_info *fieinfo,
212*4882a593Smuzhiyun		  u64 start, u64 *len, u32 supported_flags);
213*4882a593Smuzhiyun
214*4882a593SmuzhiyunThe struct fieinfo should be passed in as received from ioctl_fiemap(). The
215*4882a593Smuzhiyunset of fiemap flags which the fs understands should be passed via fs_flags. If
216*4882a593Smuzhiyunfiemap_prep finds invalid user flags, it will place the bad values in
217*4882a593Smuzhiyunfieinfo->fi_flags and return -EBADR. If the file system gets -EBADR, from
218*4882a593Smuzhiyunfiemap_prep(), it should immediately exit, returning that error back to
219*4882a593Smuzhiyunioctl_fiemap().  Additionally the range is validate against the supported
220*4882a593Smuzhiyunmaximum file size.
221*4882a593Smuzhiyun
222*4882a593Smuzhiyun
223*4882a593SmuzhiyunFor each extent in the request range, the file system should call
224*4882a593Smuzhiyunthe helper function, fiemap_fill_next_extent()::
225*4882a593Smuzhiyun
226*4882a593Smuzhiyun  int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
227*4882a593Smuzhiyun			      u64 phys, u64 len, u32 flags, u32 dev);
228*4882a593Smuzhiyun
229*4882a593Smuzhiyunfiemap_fill_next_extent() will use the passed values to populate the
230*4882a593Smuzhiyunnext free extent in the fm_extents array. 'General' extent flags will
231*4882a593Smuzhiyunautomatically be set from specific flags on behalf of the calling file
232*4882a593Smuzhiyunsystem so that the userspace API is not broken.
233*4882a593Smuzhiyun
234*4882a593Smuzhiyunfiemap_fill_next_extent() returns 0 on success, and 1 when the
235*4882a593Smuzhiyunuser-supplied fm_extents array is full. If an error is encountered
236*4882a593Smuzhiyunwhile copying the extent to user memory, -EFAULT will be returned.
237