xref: /OK3568_Linux_fs/kernel/Documentation/block/queue-sysfs.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun=================
2*4882a593SmuzhiyunQueue sysfs files
3*4882a593Smuzhiyun=================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunThis text file will detail the queue files that are located in the sysfs tree
6*4882a593Smuzhiyunfor each block device. Note that stacked devices typically do not export
7*4882a593Smuzhiyunany settings, since their queue merely functions are a remapping target.
8*4882a593SmuzhiyunThese files are the ones found in the /sys/block/xxx/queue/ directory.
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunFiles denoted with a RO postfix are readonly and the RW postfix means
11*4882a593Smuzhiyunread-write.
12*4882a593Smuzhiyun
13*4882a593Smuzhiyunadd_random (RW)
14*4882a593Smuzhiyun---------------
15*4882a593SmuzhiyunThis file allows to turn off the disk entropy contribution. Default
16*4882a593Smuzhiyunvalue of this file is '1'(on).
17*4882a593Smuzhiyun
18*4882a593Smuzhiyunchunk_sectors (RO)
19*4882a593Smuzhiyun------------------
20*4882a593SmuzhiyunThis has different meaning depending on the type of the block device.
21*4882a593SmuzhiyunFor a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
22*4882a593Smuzhiyunof the RAID volume stripe segment. For a zoned block device, either host-aware
23*4882a593Smuzhiyunor host-managed, chunk_sectors indicates the size in 512B sectors of the zones
24*4882a593Smuzhiyunof the device, with the eventual exception of the last zone of the device which
25*4882a593Smuzhiyunmay be smaller.
26*4882a593Smuzhiyun
27*4882a593Smuzhiyundax (RO)
28*4882a593Smuzhiyun--------
29*4882a593SmuzhiyunThis file indicates whether the device supports Direct Access (DAX),
30*4882a593Smuzhiyunused by CPU-addressable storage to bypass the pagecache.  It shows '1'
31*4882a593Smuzhiyunif true, '0' if not.
32*4882a593Smuzhiyun
33*4882a593Smuzhiyundiscard_granularity (RO)
34*4882a593Smuzhiyun------------------------
35*4882a593SmuzhiyunThis shows the size of internal allocation of the device in bytes, if
36*4882a593Smuzhiyunreported by the device. A value of '0' means device does not support
37*4882a593Smuzhiyunthe discard functionality.
38*4882a593Smuzhiyun
39*4882a593Smuzhiyundiscard_max_hw_bytes (RO)
40*4882a593Smuzhiyun-------------------------
41*4882a593SmuzhiyunDevices that support discard functionality may have internal limits on
42*4882a593Smuzhiyunthe number of bytes that can be trimmed or unmapped in a single operation.
43*4882a593SmuzhiyunThe discard_max_bytes parameter is set by the device driver to the maximum
44*4882a593Smuzhiyunnumber of bytes that can be discarded in a single operation. Discard
45*4882a593Smuzhiyunrequests issued to the device must not exceed this limit. A discard_max_bytes
46*4882a593Smuzhiyunvalue of 0 means that the device does not support discard functionality.
47*4882a593Smuzhiyun
48*4882a593Smuzhiyundiscard_max_bytes (RW)
49*4882a593Smuzhiyun----------------------
50*4882a593SmuzhiyunWhile discard_max_hw_bytes is the hardware limit for the device, this
51*4882a593Smuzhiyunsetting is the software limit. Some devices exhibit large latencies when
52*4882a593Smuzhiyunlarge discards are issued, setting this value lower will make Linux issue
53*4882a593Smuzhiyunsmaller discards and potentially help reduce latencies induced by large
54*4882a593Smuzhiyundiscard operations.
55*4882a593Smuzhiyun
56*4882a593Smuzhiyundiscard_zeroes_data (RO)
57*4882a593Smuzhiyun------------------------
58*4882a593SmuzhiyunObsolete. Always zero.
59*4882a593Smuzhiyun
60*4882a593Smuzhiyunfua (RO)
61*4882a593Smuzhiyun--------
62*4882a593SmuzhiyunWhether or not the block driver supports the FUA flag for write requests.
63*4882a593SmuzhiyunFUA stands for Force Unit Access. If the FUA flag is set that means that
64*4882a593Smuzhiyunwrite requests must bypass the volatile cache of the storage device.
65*4882a593Smuzhiyun
66*4882a593Smuzhiyunhw_sector_size (RO)
67*4882a593Smuzhiyun-------------------
68*4882a593SmuzhiyunThis is the hardware sector size of the device, in bytes.
69*4882a593Smuzhiyun
70*4882a593Smuzhiyunio_poll (RW)
71*4882a593Smuzhiyun------------
72*4882a593SmuzhiyunWhen read, this file shows whether polling is enabled (1) or disabled
73*4882a593Smuzhiyun(0).  Writing '0' to this file will disable polling for this device.
74*4882a593SmuzhiyunWriting any non-zero value will enable this feature.
75*4882a593Smuzhiyun
76*4882a593Smuzhiyunio_poll_delay (RW)
77*4882a593Smuzhiyun------------------
78*4882a593SmuzhiyunIf polling is enabled, this controls what kind of polling will be
79*4882a593Smuzhiyunperformed. It defaults to -1, which is classic polling. In this mode,
80*4882a593Smuzhiyunthe CPU will repeatedly ask for completions without giving up any time.
81*4882a593SmuzhiyunIf set to 0, a hybrid polling mode is used, where the kernel will attempt
82*4882a593Smuzhiyunto make an educated guess at when the IO will complete. Based on this
83*4882a593Smuzhiyunguess, the kernel will put the process issuing IO to sleep for an amount
84*4882a593Smuzhiyunof time, before entering a classic poll loop. This mode might be a
85*4882a593Smuzhiyunlittle slower than pure classic polling, but it will be more efficient.
86*4882a593SmuzhiyunIf set to a value larger than 0, the kernel will put the process issuing
87*4882a593SmuzhiyunIO to sleep for this amount of microseconds before entering classic
88*4882a593Smuzhiyunpolling.
89*4882a593Smuzhiyun
90*4882a593Smuzhiyunio_timeout (RW)
91*4882a593Smuzhiyun---------------
92*4882a593Smuzhiyunio_timeout is the request timeout in milliseconds. If a request does not
93*4882a593Smuzhiyuncomplete in this time then the block driver timeout handler is invoked.
94*4882a593SmuzhiyunThat timeout handler can decide to retry the request, to fail it or to start
95*4882a593Smuzhiyuna device recovery strategy.
96*4882a593Smuzhiyun
97*4882a593Smuzhiyuniostats (RW)
98*4882a593Smuzhiyun-------------
99*4882a593SmuzhiyunThis file is used to control (on/off) the iostats accounting of the
100*4882a593Smuzhiyundisk.
101*4882a593Smuzhiyun
102*4882a593Smuzhiyunlogical_block_size (RO)
103*4882a593Smuzhiyun-----------------------
104*4882a593SmuzhiyunThis is the logical block size of the device, in bytes.
105*4882a593Smuzhiyun
106*4882a593Smuzhiyunmax_discard_segments (RO)
107*4882a593Smuzhiyun-------------------------
108*4882a593SmuzhiyunThe maximum number of DMA scatter/gather entries in a discard request.
109*4882a593Smuzhiyun
110*4882a593Smuzhiyunmax_hw_sectors_kb (RO)
111*4882a593Smuzhiyun----------------------
112*4882a593SmuzhiyunThis is the maximum number of kilobytes supported in a single data transfer.
113*4882a593Smuzhiyun
114*4882a593Smuzhiyunmax_integrity_segments (RO)
115*4882a593Smuzhiyun---------------------------
116*4882a593SmuzhiyunMaximum number of elements in a DMA scatter/gather list with integrity
117*4882a593Smuzhiyundata that will be submitted by the block layer core to the associated
118*4882a593Smuzhiyunblock driver.
119*4882a593Smuzhiyun
120*4882a593Smuzhiyunmax_active_zones (RO)
121*4882a593Smuzhiyun---------------------
122*4882a593SmuzhiyunFor zoned block devices (zoned attribute indicating "host-managed" or
123*4882a593Smuzhiyun"host-aware"), the sum of zones belonging to any of the zone states:
124*4882a593SmuzhiyunEXPLICIT OPEN, IMPLICIT OPEN or CLOSED, is limited by this value.
125*4882a593SmuzhiyunIf this value is 0, there is no limit.
126*4882a593Smuzhiyun
127*4882a593SmuzhiyunIf the host attempts to exceed this limit, the driver should report this error
128*4882a593Smuzhiyunwith BLK_STS_ZONE_ACTIVE_RESOURCE, which user space may see as the EOVERFLOW
129*4882a593Smuzhiyunerrno.
130*4882a593Smuzhiyun
131*4882a593Smuzhiyunmax_open_zones (RO)
132*4882a593Smuzhiyun-------------------
133*4882a593SmuzhiyunFor zoned block devices (zoned attribute indicating "host-managed" or
134*4882a593Smuzhiyun"host-aware"), the sum of zones belonging to any of the zone states:
135*4882a593SmuzhiyunEXPLICIT OPEN or IMPLICIT OPEN, is limited by this value.
136*4882a593SmuzhiyunIf this value is 0, there is no limit.
137*4882a593Smuzhiyun
138*4882a593SmuzhiyunIf the host attempts to exceed this limit, the driver should report this error
139*4882a593Smuzhiyunwith BLK_STS_ZONE_OPEN_RESOURCE, which user space may see as the ETOOMANYREFS
140*4882a593Smuzhiyunerrno.
141*4882a593Smuzhiyun
142*4882a593Smuzhiyunmax_sectors_kb (RW)
143*4882a593Smuzhiyun-------------------
144*4882a593SmuzhiyunThis is the maximum number of kilobytes that the block layer will allow
145*4882a593Smuzhiyunfor a filesystem request. Must be smaller than or equal to the maximum
146*4882a593Smuzhiyunsize allowed by the hardware.
147*4882a593Smuzhiyun
148*4882a593Smuzhiyunmax_segments (RO)
149*4882a593Smuzhiyun-----------------
150*4882a593SmuzhiyunMaximum number of elements in a DMA scatter/gather list that is submitted
151*4882a593Smuzhiyunto the associated block driver.
152*4882a593Smuzhiyun
153*4882a593Smuzhiyunmax_segment_size (RO)
154*4882a593Smuzhiyun---------------------
155*4882a593SmuzhiyunMaximum size in bytes of a single element in a DMA scatter/gather list.
156*4882a593Smuzhiyun
157*4882a593Smuzhiyunminimum_io_size (RO)
158*4882a593Smuzhiyun--------------------
159*4882a593SmuzhiyunThis is the smallest preferred IO size reported by the device.
160*4882a593Smuzhiyun
161*4882a593Smuzhiyunnomerges (RW)
162*4882a593Smuzhiyun-------------
163*4882a593SmuzhiyunThis enables the user to disable the lookup logic involved with IO
164*4882a593Smuzhiyunmerging requests in the block layer. By default (0) all merges are
165*4882a593Smuzhiyunenabled. When set to 1 only simple one-hit merges will be tried. When
166*4882a593Smuzhiyunset to 2 no merge algorithms will be tried (including one-hit or more
167*4882a593Smuzhiyuncomplex tree/hash lookups).
168*4882a593Smuzhiyun
169*4882a593Smuzhiyunnr_requests (RW)
170*4882a593Smuzhiyun----------------
171*4882a593SmuzhiyunThis controls how many requests may be allocated in the block layer for
172*4882a593Smuzhiyunread or write requests. Note that the total allocated number may be twice
173*4882a593Smuzhiyunthis amount, since it applies only to reads or writes (not the accumulated
174*4882a593Smuzhiyunsum).
175*4882a593Smuzhiyun
176*4882a593SmuzhiyunTo avoid priority inversion through request starvation, a request
177*4882a593Smuzhiyunqueue maintains a separate request pool per each cgroup when
178*4882a593SmuzhiyunCONFIG_BLK_CGROUP is enabled, and this parameter applies to each such
179*4882a593Smuzhiyunper-block-cgroup request pool.  IOW, if there are N block cgroups,
180*4882a593Smuzhiyuneach request queue may have up to N request pools, each independently
181*4882a593Smuzhiyunregulated by nr_requests.
182*4882a593Smuzhiyun
183*4882a593Smuzhiyunnr_zones (RO)
184*4882a593Smuzhiyun-------------
185*4882a593SmuzhiyunFor zoned block devices (zoned attribute indicating "host-managed" or
186*4882a593Smuzhiyun"host-aware"), this indicates the total number of zones of the device.
187*4882a593SmuzhiyunThis is always 0 for regular block devices.
188*4882a593Smuzhiyun
189*4882a593Smuzhiyunoptimal_io_size (RO)
190*4882a593Smuzhiyun--------------------
191*4882a593SmuzhiyunThis is the optimal IO size reported by the device.
192*4882a593Smuzhiyun
193*4882a593Smuzhiyunphysical_block_size (RO)
194*4882a593Smuzhiyun------------------------
195*4882a593SmuzhiyunThis is the physical block size of device, in bytes.
196*4882a593Smuzhiyun
197*4882a593Smuzhiyunread_ahead_kb (RW)
198*4882a593Smuzhiyun------------------
199*4882a593SmuzhiyunMaximum number of kilobytes to read-ahead for filesystems on this block
200*4882a593Smuzhiyundevice.
201*4882a593Smuzhiyun
202*4882a593Smuzhiyunrotational (RW)
203*4882a593Smuzhiyun---------------
204*4882a593SmuzhiyunThis file is used to stat if the device is of rotational type or
205*4882a593Smuzhiyunnon-rotational type.
206*4882a593Smuzhiyun
207*4882a593Smuzhiyunrq_affinity (RW)
208*4882a593Smuzhiyun----------------
209*4882a593SmuzhiyunIf this option is '1', the block layer will migrate request completions to the
210*4882a593Smuzhiyuncpu "group" that originally submitted the request. For some workloads this
211*4882a593Smuzhiyunprovides a significant reduction in CPU cycles due to caching effects.
212*4882a593Smuzhiyun
213*4882a593SmuzhiyunFor storage configurations that need to maximize distribution of completion
214*4882a593Smuzhiyunprocessing setting this option to '2' forces the completion to run on the
215*4882a593Smuzhiyunrequesting cpu (bypassing the "group" aggregation logic).
216*4882a593Smuzhiyun
217*4882a593Smuzhiyunscheduler (RW)
218*4882a593Smuzhiyun--------------
219*4882a593SmuzhiyunWhen read, this file will display the current and available IO schedulers
220*4882a593Smuzhiyunfor this block device. The currently active IO scheduler will be enclosed
221*4882a593Smuzhiyunin [] brackets. Writing an IO scheduler name to this file will switch
222*4882a593Smuzhiyuncontrol of this block device to that new IO scheduler. Note that writing
223*4882a593Smuzhiyunan IO scheduler name to this file will attempt to load that IO scheduler
224*4882a593Smuzhiyunmodule, if it isn't already present in the system.
225*4882a593Smuzhiyun
226*4882a593Smuzhiyunwrite_cache (RW)
227*4882a593Smuzhiyun----------------
228*4882a593SmuzhiyunWhen read, this file will display whether the device has write back
229*4882a593Smuzhiyuncaching enabled or not. It will return "write back" for the former
230*4882a593Smuzhiyuncase, and "write through" for the latter. Writing to this file can
231*4882a593Smuzhiyunchange the kernels view of the device, but it doesn't alter the
232*4882a593Smuzhiyundevice state. This means that it might not be safe to toggle the
233*4882a593Smuzhiyunsetting from "write back" to "write through", since that will also
234*4882a593Smuzhiyuneliminate cache flushes issued by the kernel.
235*4882a593Smuzhiyun
236*4882a593Smuzhiyunwrite_same_max_bytes (RO)
237*4882a593Smuzhiyun-------------------------
238*4882a593SmuzhiyunThis is the number of bytes the device can write in a single write-same
239*4882a593Smuzhiyuncommand.  A value of '0' means write-same is not supported by this
240*4882a593Smuzhiyundevice.
241*4882a593Smuzhiyun
242*4882a593Smuzhiyunwbt_lat_usec (RW)
243*4882a593Smuzhiyun-----------------
244*4882a593SmuzhiyunIf the device is registered for writeback throttling, then this file shows
245*4882a593Smuzhiyunthe target minimum read latency. If this latency is exceeded in a given
246*4882a593Smuzhiyunwindow of time (see wb_window_usec), then the writeback throttling will start
247*4882a593Smuzhiyunscaling back writes. Writing a value of '0' to this file disables the
248*4882a593Smuzhiyunfeature. Writing a value of '-1' to this file resets the value to the
249*4882a593Smuzhiyundefault setting.
250*4882a593Smuzhiyun
251*4882a593Smuzhiyunthrottle_sample_time (RW)
252*4882a593Smuzhiyun-------------------------
253*4882a593SmuzhiyunThis is the time window that blk-throttle samples data, in millisecond.
254*4882a593Smuzhiyunblk-throttle makes decision based on the samplings. Lower time means cgroups
255*4882a593Smuzhiyunhave more smooth throughput, but higher CPU overhead. This exists only when
256*4882a593SmuzhiyunCONFIG_BLK_DEV_THROTTLING_LOW is enabled.
257*4882a593Smuzhiyun
258*4882a593Smuzhiyunwrite_zeroes_max_bytes (RO)
259*4882a593Smuzhiyun---------------------------
260*4882a593SmuzhiyunFor block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of
261*4882a593Smuzhiyunbytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES
262*4882a593Smuzhiyunis not supported.
263*4882a593Smuzhiyun
264*4882a593Smuzhiyunzoned (RO)
265*4882a593Smuzhiyun----------
266*4882a593SmuzhiyunThis indicates if the device is a zoned block device and the zone model of the
267*4882a593Smuzhiyundevice if it is indeed zoned. The possible values indicated by zoned are
268*4882a593Smuzhiyun"none" for regular block devices and "host-aware" or "host-managed" for zoned
269*4882a593Smuzhiyunblock devices. The characteristics of host-aware and host-managed zoned block
270*4882a593Smuzhiyundevices are described in the ZBC (Zoned Block Commands) and ZAC
271*4882a593Smuzhiyun(Zoned Device ATA Command Set) standards. These standards also define the
272*4882a593Smuzhiyun"drive-managed" zone model. However, since drive-managed zoned block devices
273*4882a593Smuzhiyundo not support zone commands, they will be treated as regular block devices
274*4882a593Smuzhiyunand zoned will report "none".
275*4882a593Smuzhiyun
276*4882a593SmuzhiyunJens Axboe <jens.axboe@oracle.com>, February 2009
277