xref: /OK3568_Linux_fs/kernel/Documentation/admin-guide/xfs.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun======================
4*4882a593SmuzhiyunThe SGI XFS Filesystem
5*4882a593Smuzhiyun======================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunXFS is a high performance journaling filesystem which originated
8*4882a593Smuzhiyunon the SGI IRIX platform.  It is completely multi-threaded, can
9*4882a593Smuzhiyunsupport large files and large filesystems, extended attributes,
10*4882a593Smuzhiyunvariable block sizes, is extent based, and makes extensive use of
11*4882a593SmuzhiyunBtrees (directories, extents, free space) to aid both performance
12*4882a593Smuzhiyunand scalability.
13*4882a593Smuzhiyun
14*4882a593SmuzhiyunRefer to the documentation at https://xfs.wiki.kernel.org/
15*4882a593Smuzhiyunfor further details.  This implementation is on-disk compatible
16*4882a593Smuzhiyunwith the IRIX version of XFS.
17*4882a593Smuzhiyun
18*4882a593Smuzhiyun
19*4882a593SmuzhiyunMount Options
20*4882a593Smuzhiyun=============
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunWhen mounting an XFS filesystem, the following options are accepted.
23*4882a593Smuzhiyun
24*4882a593Smuzhiyun  allocsize=size
25*4882a593Smuzhiyun	Sets the buffered I/O end-of-file preallocation size when
26*4882a593Smuzhiyun	doing delayed allocation writeout (default size is 64KiB).
27*4882a593Smuzhiyun	Valid values for this option are page size (typically 4KiB)
28*4882a593Smuzhiyun	through to 1GiB, inclusive, in power-of-2 increments.
29*4882a593Smuzhiyun
30*4882a593Smuzhiyun	The default behaviour is for dynamic end-of-file
31*4882a593Smuzhiyun	preallocation size, which uses a set of heuristics to
32*4882a593Smuzhiyun	optimise the preallocation size based on the current
33*4882a593Smuzhiyun	allocation patterns within the file and the access patterns
34*4882a593Smuzhiyun	to the file. Specifying a fixed ``allocsize`` value turns off
35*4882a593Smuzhiyun	the dynamic behaviour.
36*4882a593Smuzhiyun
37*4882a593Smuzhiyun  attr2 or noattr2
38*4882a593Smuzhiyun	The options enable/disable an "opportunistic" improvement to
39*4882a593Smuzhiyun	be made in the way inline extended attributes are stored
40*4882a593Smuzhiyun	on-disk.  When the new form is used for the first time when
41*4882a593Smuzhiyun	``attr2`` is selected (either when setting or removing extended
42*4882a593Smuzhiyun	attributes) the on-disk superblock feature bit field will be
43*4882a593Smuzhiyun	updated to reflect this format being in use.
44*4882a593Smuzhiyun
45*4882a593Smuzhiyun	The default behaviour is determined by the on-disk feature
46*4882a593Smuzhiyun	bit indicating that ``attr2`` behaviour is active. If either
47*4882a593Smuzhiyun	mount option is set, then that becomes the new default used
48*4882a593Smuzhiyun	by the filesystem.
49*4882a593Smuzhiyun
50*4882a593Smuzhiyun	CRC enabled filesystems always use the ``attr2`` format, and so
51*4882a593Smuzhiyun	will reject the ``noattr2`` mount option if it is set.
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun  discard or nodiscard (default)
54*4882a593Smuzhiyun	Enable/disable the issuing of commands to let the block
55*4882a593Smuzhiyun	device reclaim space freed by the filesystem.  This is
56*4882a593Smuzhiyun	useful for SSD devices, thinly provisioned LUNs and virtual
57*4882a593Smuzhiyun	machine images, but may have a performance impact.
58*4882a593Smuzhiyun
59*4882a593Smuzhiyun	Note: It is currently recommended that you use the ``fstrim``
60*4882a593Smuzhiyun	application to ``discard`` unused blocks rather than the ``discard``
61*4882a593Smuzhiyun	mount option because the performance impact of this option
62*4882a593Smuzhiyun	is quite severe.
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun  grpid/bsdgroups or nogrpid/sysvgroups (default)
65*4882a593Smuzhiyun	These options define what group ID a newly created file
66*4882a593Smuzhiyun	gets.  When ``grpid`` is set, it takes the group ID of the
67*4882a593Smuzhiyun	directory in which it is created; otherwise it takes the
68*4882a593Smuzhiyun	``fsgid`` of the current process, unless the directory has the
69*4882a593Smuzhiyun	``setgid`` bit set, in which case it takes the ``gid`` from the
70*4882a593Smuzhiyun	parent directory, and also gets the ``setgid`` bit set if it is
71*4882a593Smuzhiyun	a directory itself.
72*4882a593Smuzhiyun
73*4882a593Smuzhiyun  filestreams
74*4882a593Smuzhiyun	Make the data allocator use the filestreams allocation mode
75*4882a593Smuzhiyun	across the entire filesystem rather than just on directories
76*4882a593Smuzhiyun	configured to use it.
77*4882a593Smuzhiyun
78*4882a593Smuzhiyun  ikeep or noikeep (default)
79*4882a593Smuzhiyun	When ``ikeep`` is specified, XFS does not delete empty inode
80*4882a593Smuzhiyun	clusters and keeps them around on disk.  When ``noikeep`` is
81*4882a593Smuzhiyun	specified, empty inode clusters are returned to the free
82*4882a593Smuzhiyun	space pool.
83*4882a593Smuzhiyun
84*4882a593Smuzhiyun  inode32 or inode64 (default)
85*4882a593Smuzhiyun	When ``inode32`` is specified, it indicates that XFS limits
86*4882a593Smuzhiyun	inode creation to locations which will not result in inode
87*4882a593Smuzhiyun	numbers with more than 32 bits of significance.
88*4882a593Smuzhiyun
89*4882a593Smuzhiyun	When ``inode64`` is specified, it indicates that XFS is allowed
90*4882a593Smuzhiyun	to create inodes at any location in the filesystem,
91*4882a593Smuzhiyun	including those which will result in inode numbers occupying
92*4882a593Smuzhiyun	more than 32 bits of significance.
93*4882a593Smuzhiyun
94*4882a593Smuzhiyun	``inode32`` is provided for backwards compatibility with older
95*4882a593Smuzhiyun	systems and applications, since 64 bits inode numbers might
96*4882a593Smuzhiyun	cause problems for some applications that cannot handle
97*4882a593Smuzhiyun	large inode numbers.  If applications are in use which do
98*4882a593Smuzhiyun	not handle inode numbers bigger than 32 bits, the ``inode32``
99*4882a593Smuzhiyun	option should be specified.
100*4882a593Smuzhiyun
101*4882a593Smuzhiyun  largeio or nolargeio (default)
102*4882a593Smuzhiyun	If ``nolargeio`` is specified, the optimal I/O reported in
103*4882a593Smuzhiyun	``st_blksize`` by **stat(2)** will be as small as possible to allow
104*4882a593Smuzhiyun	user applications to avoid inefficient read/modify/write
105*4882a593Smuzhiyun	I/O.  This is typically the page size of the machine, as
106*4882a593Smuzhiyun	this is the granularity of the page cache.
107*4882a593Smuzhiyun
108*4882a593Smuzhiyun	If ``largeio`` is specified, a filesystem that was created with a
109*4882a593Smuzhiyun	``swidth`` specified will return the ``swidth`` value (in bytes)
110*4882a593Smuzhiyun	in ``st_blksize``. If the filesystem does not have a ``swidth``
111*4882a593Smuzhiyun	specified but does specify an ``allocsize`` then ``allocsize``
112*4882a593Smuzhiyun	(in bytes) will be returned instead. Otherwise the behaviour
113*4882a593Smuzhiyun	is the same as if ``nolargeio`` was specified.
114*4882a593Smuzhiyun
115*4882a593Smuzhiyun  logbufs=value
116*4882a593Smuzhiyun	Set the number of in-memory log buffers.  Valid numbers
117*4882a593Smuzhiyun	range from 2-8 inclusive.
118*4882a593Smuzhiyun
119*4882a593Smuzhiyun	The default value is 8 buffers.
120*4882a593Smuzhiyun
121*4882a593Smuzhiyun	If the memory cost of 8 log buffers is too high on small
122*4882a593Smuzhiyun	systems, then it may be reduced at some cost to performance
123*4882a593Smuzhiyun	on metadata intensive workloads. The ``logbsize`` option below
124*4882a593Smuzhiyun	controls the size of each buffer and so is also relevant to
125*4882a593Smuzhiyun	this case.
126*4882a593Smuzhiyun
127*4882a593Smuzhiyun  logbsize=value
128*4882a593Smuzhiyun	Set the size of each in-memory log buffer.  The size may be
129*4882a593Smuzhiyun	specified in bytes, or in kilobytes with a "k" suffix.
130*4882a593Smuzhiyun	Valid sizes for version 1 and version 2 logs are 16384 (16k)
131*4882a593Smuzhiyun	and 32768 (32k).  Valid sizes for version 2 logs also
132*4882a593Smuzhiyun	include 65536 (64k), 131072 (128k) and 262144 (256k). The
133*4882a593Smuzhiyun	logbsize must be an integer multiple of the log
134*4882a593Smuzhiyun	stripe unit configured at **mkfs(8)** time.
135*4882a593Smuzhiyun
136*4882a593Smuzhiyun	The default value for version 1 logs is 32768, while the
137*4882a593Smuzhiyun	default value for version 2 logs is MAX(32768, log_sunit).
138*4882a593Smuzhiyun
139*4882a593Smuzhiyun  logdev=device and rtdev=device
140*4882a593Smuzhiyun	Use an external log (metadata journal) and/or real-time device.
141*4882a593Smuzhiyun	An XFS filesystem has up to three parts: a data section, a log
142*4882a593Smuzhiyun	section, and a real-time section.  The real-time section is
143*4882a593Smuzhiyun	optional, and the log section can be separate from the data
144*4882a593Smuzhiyun	section or contained within it.
145*4882a593Smuzhiyun
146*4882a593Smuzhiyun  noalign
147*4882a593Smuzhiyun	Data allocations will not be aligned at stripe unit
148*4882a593Smuzhiyun	boundaries. This is only relevant to filesystems created
149*4882a593Smuzhiyun	with non-zero data alignment parameters (``sunit``, ``swidth``) by
150*4882a593Smuzhiyun	**mkfs(8)**.
151*4882a593Smuzhiyun
152*4882a593Smuzhiyun  norecovery
153*4882a593Smuzhiyun	The filesystem will be mounted without running log recovery.
154*4882a593Smuzhiyun	If the filesystem was not cleanly unmounted, it is likely to
155*4882a593Smuzhiyun	be inconsistent when mounted in ``norecovery`` mode.
156*4882a593Smuzhiyun	Some files or directories may not be accessible because of this.
157*4882a593Smuzhiyun	Filesystems mounted ``norecovery`` must be mounted read-only or
158*4882a593Smuzhiyun	the mount will fail.
159*4882a593Smuzhiyun
160*4882a593Smuzhiyun  nouuid
161*4882a593Smuzhiyun	Don't check for double mounted file systems using the file
162*4882a593Smuzhiyun	system ``uuid``.  This is useful to mount LVM snapshot volumes,
163*4882a593Smuzhiyun	and often used in combination with ``norecovery`` for mounting
164*4882a593Smuzhiyun	read-only snapshots.
165*4882a593Smuzhiyun
166*4882a593Smuzhiyun  noquota
167*4882a593Smuzhiyun	Forcibly turns off all quota accounting and enforcement
168*4882a593Smuzhiyun	within the filesystem.
169*4882a593Smuzhiyun
170*4882a593Smuzhiyun  uquota/usrquota/uqnoenforce/quota
171*4882a593Smuzhiyun	User disk quota accounting enabled, and limits (optionally)
172*4882a593Smuzhiyun	enforced.  Refer to **xfs_quota(8)** for further details.
173*4882a593Smuzhiyun
174*4882a593Smuzhiyun  gquota/grpquota/gqnoenforce
175*4882a593Smuzhiyun	Group disk quota accounting enabled and limits (optionally)
176*4882a593Smuzhiyun	enforced.  Refer to **xfs_quota(8)** for further details.
177*4882a593Smuzhiyun
178*4882a593Smuzhiyun  pquota/prjquota/pqnoenforce
179*4882a593Smuzhiyun	Project disk quota accounting enabled and limits (optionally)
180*4882a593Smuzhiyun	enforced.  Refer to **xfs_quota(8)** for further details.
181*4882a593Smuzhiyun
182*4882a593Smuzhiyun  sunit=value and swidth=value
183*4882a593Smuzhiyun	Used to specify the stripe unit and width for a RAID device
184*4882a593Smuzhiyun	or a stripe volume.  "value" must be specified in 512-byte
185*4882a593Smuzhiyun	block units. These options are only relevant to filesystems
186*4882a593Smuzhiyun	that were created with non-zero data alignment parameters.
187*4882a593Smuzhiyun
188*4882a593Smuzhiyun	The ``sunit`` and ``swidth`` parameters specified must be compatible
189*4882a593Smuzhiyun	with the existing filesystem alignment characteristics.  In
190*4882a593Smuzhiyun	general, that means the only valid changes to ``sunit`` are
191*4882a593Smuzhiyun	increasing it by a power-of-2 multiple. Valid ``swidth`` values
192*4882a593Smuzhiyun	are any integer multiple of a valid ``sunit`` value.
193*4882a593Smuzhiyun
194*4882a593Smuzhiyun	Typically the only time these mount options are necessary if
195*4882a593Smuzhiyun	after an underlying RAID device has had it's geometry
196*4882a593Smuzhiyun	modified, such as adding a new disk to a RAID5 lun and
197*4882a593Smuzhiyun	reshaping it.
198*4882a593Smuzhiyun
199*4882a593Smuzhiyun  swalloc
200*4882a593Smuzhiyun	Data allocations will be rounded up to stripe width boundaries
201*4882a593Smuzhiyun	when the current end of file is being extended and the file
202*4882a593Smuzhiyun	size is larger than the stripe width size.
203*4882a593Smuzhiyun
204*4882a593Smuzhiyun  wsync
205*4882a593Smuzhiyun	When specified, all filesystem namespace operations are
206*4882a593Smuzhiyun	executed synchronously. This ensures that when the namespace
207*4882a593Smuzhiyun	operation (create, unlink, etc) completes, the change to the
208*4882a593Smuzhiyun	namespace is on stable storage. This is useful in HA setups
209*4882a593Smuzhiyun	where failover must not result in clients seeing
210*4882a593Smuzhiyun	inconsistent namespace presentation during or after a
211*4882a593Smuzhiyun	failover event.
212*4882a593Smuzhiyun
213*4882a593SmuzhiyunDeprecation of V4 Format
214*4882a593Smuzhiyun========================
215*4882a593Smuzhiyun
216*4882a593SmuzhiyunThe V4 filesystem format lacks certain features that are supported by
217*4882a593Smuzhiyunthe V5 format, such as metadata checksumming, strengthened metadata
218*4882a593Smuzhiyunverification, and the ability to store timestamps past the year 2038.
219*4882a593SmuzhiyunBecause of this, the V4 format is deprecated.  All users should upgrade
220*4882a593Smuzhiyunby backing up their files, reformatting, and restoring from the backup.
221*4882a593Smuzhiyun
222*4882a593SmuzhiyunAdministrators and users can detect a V4 filesystem by running xfs_info
223*4882a593Smuzhiyunagainst a filesystem mountpoint and checking for a string containing
224*4882a593Smuzhiyun"crc=".  If no such string is found, please upgrade xfsprogs to the
225*4882a593Smuzhiyunlatest version and try again.
226*4882a593Smuzhiyun
227*4882a593SmuzhiyunThe deprecation will take place in two parts.  Support for mounting V4
228*4882a593Smuzhiyunfilesystems can now be disabled at kernel build time via Kconfig option.
229*4882a593SmuzhiyunThe option will default to yes until September 2025, at which time it
230*4882a593Smuzhiyunwill be changed to default to no.  In September 2030, support will be
231*4882a593Smuzhiyunremoved from the codebase entirely.
232*4882a593Smuzhiyun
233*4882a593SmuzhiyunNote: Distributors may choose to withdraw V4 format support earlier than
234*4882a593Smuzhiyunthe dates listed above.
235*4882a593Smuzhiyun
236*4882a593SmuzhiyunDeprecated Mount Options
237*4882a593Smuzhiyun========================
238*4882a593Smuzhiyun
239*4882a593Smuzhiyun===========================     ================
240*4882a593Smuzhiyun  Name				Removal Schedule
241*4882a593Smuzhiyun===========================     ================
242*4882a593SmuzhiyunMounting with V4 filesystem     September 2030
243*4882a593Smuzhiyunikeep/noikeep			September 2025
244*4882a593Smuzhiyunattr2/noattr2			September 2025
245*4882a593Smuzhiyun===========================     ================
246*4882a593Smuzhiyun
247*4882a593Smuzhiyun
248*4882a593SmuzhiyunRemoved Mount Options
249*4882a593Smuzhiyun=====================
250*4882a593Smuzhiyun
251*4882a593Smuzhiyun===========================     =======
252*4882a593Smuzhiyun  Name				Removed
253*4882a593Smuzhiyun===========================	=======
254*4882a593Smuzhiyun  delaylog/nodelaylog		v4.0
255*4882a593Smuzhiyun  ihashsize			v4.0
256*4882a593Smuzhiyun  irixsgid			v4.0
257*4882a593Smuzhiyun  osyncisdsync/osyncisosync	v4.0
258*4882a593Smuzhiyun  barrier			v4.19
259*4882a593Smuzhiyun  nobarrier			v4.19
260*4882a593Smuzhiyun===========================     =======
261*4882a593Smuzhiyun
262*4882a593Smuzhiyunsysctls
263*4882a593Smuzhiyun=======
264*4882a593Smuzhiyun
265*4882a593SmuzhiyunThe following sysctls are available for the XFS filesystem:
266*4882a593Smuzhiyun
267*4882a593Smuzhiyun  fs.xfs.stats_clear		(Min: 0  Default: 0  Max: 1)
268*4882a593Smuzhiyun	Setting this to "1" clears accumulated XFS statistics
269*4882a593Smuzhiyun	in /proc/fs/xfs/stat.  It then immediately resets to "0".
270*4882a593Smuzhiyun
271*4882a593Smuzhiyun  fs.xfs.xfssyncd_centisecs	(Min: 100  Default: 3000  Max: 720000)
272*4882a593Smuzhiyun	The interval at which the filesystem flushes metadata
273*4882a593Smuzhiyun	out to disk and runs internal cache cleanup routines.
274*4882a593Smuzhiyun
275*4882a593Smuzhiyun  fs.xfs.filestream_centisecs	(Min: 1  Default: 3000  Max: 360000)
276*4882a593Smuzhiyun	The interval at which the filesystem ages filestreams cache
277*4882a593Smuzhiyun	references and returns timed-out AGs back to the free stream
278*4882a593Smuzhiyun	pool.
279*4882a593Smuzhiyun
280*4882a593Smuzhiyun  fs.xfs.speculative_prealloc_lifetime
281*4882a593Smuzhiyun	(Units: seconds   Min: 1  Default: 300  Max: 86400)
282*4882a593Smuzhiyun	The interval at which the background scanning for inodes
283*4882a593Smuzhiyun	with unused speculative preallocation runs. The scan
284*4882a593Smuzhiyun	removes unused preallocation from clean inodes and releases
285*4882a593Smuzhiyun	the unused space back to the free pool.
286*4882a593Smuzhiyun
287*4882a593Smuzhiyun  fs.xfs.error_level		(Min: 0  Default: 3  Max: 11)
288*4882a593Smuzhiyun	A volume knob for error reporting when internal errors occur.
289*4882a593Smuzhiyun	This will generate detailed messages & backtraces for filesystem
290*4882a593Smuzhiyun	shutdowns, for example.  Current threshold values are:
291*4882a593Smuzhiyun
292*4882a593Smuzhiyun		XFS_ERRLEVEL_OFF:       0
293*4882a593Smuzhiyun		XFS_ERRLEVEL_LOW:       1
294*4882a593Smuzhiyun		XFS_ERRLEVEL_HIGH:      5
295*4882a593Smuzhiyun
296*4882a593Smuzhiyun  fs.xfs.panic_mask		(Min: 0  Default: 0  Max: 256)
297*4882a593Smuzhiyun	Causes certain error conditions to call BUG(). Value is a bitmask;
298*4882a593Smuzhiyun	OR together the tags which represent errors which should cause panics:
299*4882a593Smuzhiyun
300*4882a593Smuzhiyun		XFS_NO_PTAG                     0
301*4882a593Smuzhiyun		XFS_PTAG_IFLUSH                 0x00000001
302*4882a593Smuzhiyun		XFS_PTAG_LOGRES                 0x00000002
303*4882a593Smuzhiyun		XFS_PTAG_AILDELETE              0x00000004
304*4882a593Smuzhiyun		XFS_PTAG_ERROR_REPORT           0x00000008
305*4882a593Smuzhiyun		XFS_PTAG_SHUTDOWN_CORRUPT       0x00000010
306*4882a593Smuzhiyun		XFS_PTAG_SHUTDOWN_IOERROR       0x00000020
307*4882a593Smuzhiyun		XFS_PTAG_SHUTDOWN_LOGERROR      0x00000040
308*4882a593Smuzhiyun		XFS_PTAG_FSBLOCK_ZERO           0x00000080
309*4882a593Smuzhiyun		XFS_PTAG_VERIFIER_ERROR         0x00000100
310*4882a593Smuzhiyun
311*4882a593Smuzhiyun	This option is intended for debugging only.
312*4882a593Smuzhiyun
313*4882a593Smuzhiyun  fs.xfs.irix_symlink_mode	(Min: 0  Default: 0  Max: 1)
314*4882a593Smuzhiyun	Controls whether symlinks are created with mode 0777 (default)
315*4882a593Smuzhiyun	or whether their mode is affected by the umask (irix mode).
316*4882a593Smuzhiyun
317*4882a593Smuzhiyun  fs.xfs.irix_sgid_inherit	(Min: 0  Default: 0  Max: 1)
318*4882a593Smuzhiyun	Controls files created in SGID directories.
319*4882a593Smuzhiyun	If the group ID of the new file does not match the effective group
320*4882a593Smuzhiyun	ID or one of the supplementary group IDs of the parent dir, the
321*4882a593Smuzhiyun	ISGID bit is cleared if the irix_sgid_inherit compatibility sysctl
322*4882a593Smuzhiyun	is set.
323*4882a593Smuzhiyun
324*4882a593Smuzhiyun  fs.xfs.inherit_sync		(Min: 0  Default: 1  Max: 1)
325*4882a593Smuzhiyun	Setting this to "1" will cause the "sync" flag set
326*4882a593Smuzhiyun	by the **xfs_io(8)** chattr command on a directory to be
327*4882a593Smuzhiyun	inherited by files in that directory.
328*4882a593Smuzhiyun
329*4882a593Smuzhiyun  fs.xfs.inherit_nodump		(Min: 0  Default: 1  Max: 1)
330*4882a593Smuzhiyun	Setting this to "1" will cause the "nodump" flag set
331*4882a593Smuzhiyun	by the **xfs_io(8)** chattr command on a directory to be
332*4882a593Smuzhiyun	inherited by files in that directory.
333*4882a593Smuzhiyun
334*4882a593Smuzhiyun  fs.xfs.inherit_noatime	(Min: 0  Default: 1  Max: 1)
335*4882a593Smuzhiyun	Setting this to "1" will cause the "noatime" flag set
336*4882a593Smuzhiyun	by the **xfs_io(8)** chattr command on a directory to be
337*4882a593Smuzhiyun	inherited by files in that directory.
338*4882a593Smuzhiyun
339*4882a593Smuzhiyun  fs.xfs.inherit_nosymlinks	(Min: 0  Default: 1  Max: 1)
340*4882a593Smuzhiyun	Setting this to "1" will cause the "nosymlinks" flag set
341*4882a593Smuzhiyun	by the **xfs_io(8)** chattr command on a directory to be
342*4882a593Smuzhiyun	inherited by files in that directory.
343*4882a593Smuzhiyun
344*4882a593Smuzhiyun  fs.xfs.inherit_nodefrag	(Min: 0  Default: 1  Max: 1)
345*4882a593Smuzhiyun	Setting this to "1" will cause the "nodefrag" flag set
346*4882a593Smuzhiyun	by the **xfs_io(8)** chattr command on a directory to be
347*4882a593Smuzhiyun	inherited by files in that directory.
348*4882a593Smuzhiyun
349*4882a593Smuzhiyun  fs.xfs.rotorstep		(Min: 1  Default: 1  Max: 256)
350*4882a593Smuzhiyun	In "inode32" allocation mode, this option determines how many
351*4882a593Smuzhiyun	files the allocator attempts to allocate in the same allocation
352*4882a593Smuzhiyun	group before moving to the next allocation group.  The intent
353*4882a593Smuzhiyun	is to control the rate at which the allocator moves between
354*4882a593Smuzhiyun	allocation groups when allocating extents for new files.
355*4882a593Smuzhiyun
356*4882a593SmuzhiyunDeprecated Sysctls
357*4882a593Smuzhiyun==================
358*4882a593Smuzhiyun
359*4882a593Smuzhiyun===========================     ================
360*4882a593Smuzhiyun  Name				Removal Schedule
361*4882a593Smuzhiyun===========================     ================
362*4882a593Smuzhiyunfs.xfs.irix_sgid_inherit        September 2025
363*4882a593Smuzhiyunfs.xfs.irix_symlink_mode        September 2025
364*4882a593Smuzhiyun===========================     ================
365*4882a593Smuzhiyun
366*4882a593Smuzhiyun
367*4882a593SmuzhiyunRemoved Sysctls
368*4882a593Smuzhiyun===============
369*4882a593Smuzhiyun
370*4882a593Smuzhiyun=============================	=======
371*4882a593Smuzhiyun  Name				Removed
372*4882a593Smuzhiyun=============================	=======
373*4882a593Smuzhiyun  fs.xfs.xfsbufd_centisec	v4.0
374*4882a593Smuzhiyun  fs.xfs.age_buffer_centisecs	v4.0
375*4882a593Smuzhiyun=============================	=======
376*4882a593Smuzhiyun
377*4882a593SmuzhiyunError handling
378*4882a593Smuzhiyun==============
379*4882a593Smuzhiyun
380*4882a593SmuzhiyunXFS can act differently according to the type of error found during its
381*4882a593Smuzhiyunoperation. The implementation introduces the following concepts to the error
382*4882a593Smuzhiyunhandler:
383*4882a593Smuzhiyun
384*4882a593Smuzhiyun -failure speed:
385*4882a593Smuzhiyun	Defines how fast XFS should propagate an error upwards when a specific
386*4882a593Smuzhiyun	error is found during the filesystem operation. It can propagate
387*4882a593Smuzhiyun	immediately, after a defined number of retries, after a set time period,
388*4882a593Smuzhiyun	or simply retry forever.
389*4882a593Smuzhiyun
390*4882a593Smuzhiyun -error classes:
391*4882a593Smuzhiyun	Specifies the subsystem the error configuration will apply to, such as
392*4882a593Smuzhiyun	metadata IO or memory allocation. Different subsystems will have
393*4882a593Smuzhiyun	different error handlers for which behaviour can be configured.
394*4882a593Smuzhiyun
395*4882a593Smuzhiyun -error handlers:
396*4882a593Smuzhiyun	Defines the behavior for a specific error.
397*4882a593Smuzhiyun
398*4882a593SmuzhiyunThe filesystem behavior during an error can be set via ``sysfs`` files. Each
399*4882a593Smuzhiyunerror handler works independently - the first condition met by an error handler
400*4882a593Smuzhiyunfor a specific class will cause the error to be propagated rather than reset and
401*4882a593Smuzhiyunretried.
402*4882a593Smuzhiyun
403*4882a593SmuzhiyunThe action taken by the filesystem when the error is propagated is context
404*4882a593Smuzhiyundependent - it may cause a shut down in the case of an unrecoverable error,
405*4882a593Smuzhiyunit may be reported back to userspace, or it may even be ignored because
406*4882a593Smuzhiyunthere's nothing useful we can with the error or anyone we can report it to (e.g.
407*4882a593Smuzhiyunduring unmount).
408*4882a593Smuzhiyun
409*4882a593SmuzhiyunThe configuration files are organized into the following hierarchy for each
410*4882a593Smuzhiyunmounted filesystem:
411*4882a593Smuzhiyun
412*4882a593Smuzhiyun  /sys/fs/xfs/<dev>/error/<class>/<error>/
413*4882a593Smuzhiyun
414*4882a593SmuzhiyunWhere:
415*4882a593Smuzhiyun  <dev>
416*4882a593Smuzhiyun	The short device name of the mounted filesystem. This is the same device
417*4882a593Smuzhiyun	name that shows up in XFS kernel error messages as "XFS(<dev>): ..."
418*4882a593Smuzhiyun
419*4882a593Smuzhiyun  <class>
420*4882a593Smuzhiyun	The subsystem the error configuration belongs to. As of 4.9, the defined
421*4882a593Smuzhiyun	classes are:
422*4882a593Smuzhiyun
423*4882a593Smuzhiyun		- "metadata": applies metadata buffer write IO
424*4882a593Smuzhiyun
425*4882a593Smuzhiyun  <error>
426*4882a593Smuzhiyun	The individual error handler configurations.
427*4882a593Smuzhiyun
428*4882a593Smuzhiyun
429*4882a593SmuzhiyunEach filesystem has "global" error configuration options defined in their top
430*4882a593Smuzhiyunlevel directory:
431*4882a593Smuzhiyun
432*4882a593Smuzhiyun  /sys/fs/xfs/<dev>/error/
433*4882a593Smuzhiyun
434*4882a593Smuzhiyun  fail_at_unmount		(Min:  0  Default:  1  Max: 1)
435*4882a593Smuzhiyun	Defines the filesystem error behavior at unmount time.
436*4882a593Smuzhiyun
437*4882a593Smuzhiyun	If set to a value of 1, XFS will override all other error configurations
438*4882a593Smuzhiyun	during unmount and replace them with "immediate fail" characteristics.
439*4882a593Smuzhiyun	i.e. no retries, no retry timeout. This will always allow unmount to
440*4882a593Smuzhiyun	succeed when there are persistent errors present.
441*4882a593Smuzhiyun
442*4882a593Smuzhiyun	If set to 0, the configured retry behaviour will continue until all
443*4882a593Smuzhiyun	retries and/or timeouts have been exhausted. This will delay unmount
444*4882a593Smuzhiyun	completion when there are persistent errors, and it may prevent the
445*4882a593Smuzhiyun	filesystem from ever unmounting fully in the case of "retry forever"
446*4882a593Smuzhiyun	handler configurations.
447*4882a593Smuzhiyun
448*4882a593Smuzhiyun	Note: there is no guarantee that fail_at_unmount can be set while an
449*4882a593Smuzhiyun	unmount is in progress. It is possible that the ``sysfs`` entries are
450*4882a593Smuzhiyun	removed by the unmounting filesystem before a "retry forever" error
451*4882a593Smuzhiyun	handler configuration causes unmount to hang, and hence the filesystem
452*4882a593Smuzhiyun	must be configured appropriately before unmount begins to prevent
453*4882a593Smuzhiyun	unmount hangs.
454*4882a593Smuzhiyun
455*4882a593SmuzhiyunEach filesystem has specific error class handlers that define the error
456*4882a593Smuzhiyunpropagation behaviour for specific errors. There is also a "default" error
457*4882a593Smuzhiyunhandler defined, which defines the behaviour for all errors that don't have
458*4882a593Smuzhiyunspecific handlers defined. Where multiple retry constraints are configured for
459*4882a593Smuzhiyuna single error, the first retry configuration that expires will cause the error
460*4882a593Smuzhiyunto be propagated. The handler configurations are found in the directory:
461*4882a593Smuzhiyun
462*4882a593Smuzhiyun  /sys/fs/xfs/<dev>/error/<class>/<error>/
463*4882a593Smuzhiyun
464*4882a593Smuzhiyun  max_retries			(Min: -1  Default: Varies  Max: INTMAX)
465*4882a593Smuzhiyun	Defines the allowed number of retries of a specific error before
466*4882a593Smuzhiyun	the filesystem will propagate the error. The retry count for a given
467*4882a593Smuzhiyun	error context (e.g. a specific metadata buffer) is reset every time
468*4882a593Smuzhiyun	there is a successful completion of the operation.
469*4882a593Smuzhiyun
470*4882a593Smuzhiyun	Setting the value to "-1" will cause XFS to retry forever for this
471*4882a593Smuzhiyun	specific error.
472*4882a593Smuzhiyun
473*4882a593Smuzhiyun	Setting the value to "0" will cause XFS to fail immediately when the
474*4882a593Smuzhiyun	specific error is reported.
475*4882a593Smuzhiyun
476*4882a593Smuzhiyun	Setting the value to "N" (where 0 < N < Max) will make XFS retry the
477*4882a593Smuzhiyun	operation "N" times before propagating the error.
478*4882a593Smuzhiyun
479*4882a593Smuzhiyun  retry_timeout_seconds		(Min:  -1  Default:  Varies  Max: 1 day)
480*4882a593Smuzhiyun	Define the amount of time (in seconds) that the filesystem is
481*4882a593Smuzhiyun	allowed to retry its operations when the specific error is
482*4882a593Smuzhiyun	found.
483*4882a593Smuzhiyun
484*4882a593Smuzhiyun	Setting the value to "-1" will allow XFS to retry forever for this
485*4882a593Smuzhiyun	specific error.
486*4882a593Smuzhiyun
487*4882a593Smuzhiyun	Setting the value to "0" will cause XFS to fail immediately when the
488*4882a593Smuzhiyun	specific error is reported.
489*4882a593Smuzhiyun
490*4882a593Smuzhiyun	Setting the value to "N" (where 0 < N < Max) will allow XFS to retry the
491*4882a593Smuzhiyun	operation for up to "N" seconds before propagating the error.
492*4882a593Smuzhiyun
493*4882a593Smuzhiyun**Note:** The default behaviour for a specific error handler is dependent on both
494*4882a593Smuzhiyunthe class and error context. For example, the default values for
495*4882a593Smuzhiyun"metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults
496*4882a593Smuzhiyunto "fail immediately" behaviour. This is done because ENODEV is a fatal,
497*4882a593Smuzhiyununrecoverable error no matter how many times the metadata IO is retried.
498