1*4882a593Smuzhiyun================= 2*4882a593SmuzhiyunQueue sysfs files 3*4882a593Smuzhiyun================= 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunThis text file will detail the queue files that are located in the sysfs tree 6*4882a593Smuzhiyunfor each block device. Note that stacked devices typically do not export 7*4882a593Smuzhiyunany settings, since their queue merely functions are a remapping target. 8*4882a593SmuzhiyunThese files are the ones found in the /sys/block/xxx/queue/ directory. 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunFiles denoted with a RO postfix are readonly and the RW postfix means 11*4882a593Smuzhiyunread-write. 12*4882a593Smuzhiyun 13*4882a593Smuzhiyunadd_random (RW) 14*4882a593Smuzhiyun--------------- 15*4882a593SmuzhiyunThis file allows to turn off the disk entropy contribution. Default 16*4882a593Smuzhiyunvalue of this file is '1'(on). 17*4882a593Smuzhiyun 18*4882a593Smuzhiyunchunk_sectors (RO) 19*4882a593Smuzhiyun------------------ 20*4882a593SmuzhiyunThis has different meaning depending on the type of the block device. 21*4882a593SmuzhiyunFor a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors 22*4882a593Smuzhiyunof the RAID volume stripe segment. For a zoned block device, either host-aware 23*4882a593Smuzhiyunor host-managed, chunk_sectors indicates the size in 512B sectors of the zones 24*4882a593Smuzhiyunof the device, with the eventual exception of the last zone of the device which 25*4882a593Smuzhiyunmay be smaller. 26*4882a593Smuzhiyun 27*4882a593Smuzhiyundax (RO) 28*4882a593Smuzhiyun-------- 29*4882a593SmuzhiyunThis file indicates whether the device supports Direct Access (DAX), 30*4882a593Smuzhiyunused by CPU-addressable storage to bypass the pagecache. It shows '1' 31*4882a593Smuzhiyunif true, '0' if not. 32*4882a593Smuzhiyun 33*4882a593Smuzhiyundiscard_granularity (RO) 34*4882a593Smuzhiyun------------------------ 35*4882a593SmuzhiyunThis shows the size of internal allocation of the device in bytes, if 36*4882a593Smuzhiyunreported by the device. A value of '0' means device does not support 37*4882a593Smuzhiyunthe discard functionality. 38*4882a593Smuzhiyun 39*4882a593Smuzhiyundiscard_max_hw_bytes (RO) 40*4882a593Smuzhiyun------------------------- 41*4882a593SmuzhiyunDevices that support discard functionality may have internal limits on 42*4882a593Smuzhiyunthe number of bytes that can be trimmed or unmapped in a single operation. 43*4882a593SmuzhiyunThe discard_max_bytes parameter is set by the device driver to the maximum 44*4882a593Smuzhiyunnumber of bytes that can be discarded in a single operation. Discard 45*4882a593Smuzhiyunrequests issued to the device must not exceed this limit. A discard_max_bytes 46*4882a593Smuzhiyunvalue of 0 means that the device does not support discard functionality. 47*4882a593Smuzhiyun 48*4882a593Smuzhiyundiscard_max_bytes (RW) 49*4882a593Smuzhiyun---------------------- 50*4882a593SmuzhiyunWhile discard_max_hw_bytes is the hardware limit for the device, this 51*4882a593Smuzhiyunsetting is the software limit. Some devices exhibit large latencies when 52*4882a593Smuzhiyunlarge discards are issued, setting this value lower will make Linux issue 53*4882a593Smuzhiyunsmaller discards and potentially help reduce latencies induced by large 54*4882a593Smuzhiyundiscard operations. 55*4882a593Smuzhiyun 56*4882a593Smuzhiyundiscard_zeroes_data (RO) 57*4882a593Smuzhiyun------------------------ 58*4882a593SmuzhiyunObsolete. Always zero. 59*4882a593Smuzhiyun 60*4882a593Smuzhiyunfua (RO) 61*4882a593Smuzhiyun-------- 62*4882a593SmuzhiyunWhether or not the block driver supports the FUA flag for write requests. 63*4882a593SmuzhiyunFUA stands for Force Unit Access. If the FUA flag is set that means that 64*4882a593Smuzhiyunwrite requests must bypass the volatile cache of the storage device. 65*4882a593Smuzhiyun 66*4882a593Smuzhiyunhw_sector_size (RO) 67*4882a593Smuzhiyun------------------- 68*4882a593SmuzhiyunThis is the hardware sector size of the device, in bytes. 69*4882a593Smuzhiyun 70*4882a593Smuzhiyunio_poll (RW) 71*4882a593Smuzhiyun------------ 72*4882a593SmuzhiyunWhen read, this file shows whether polling is enabled (1) or disabled 73*4882a593Smuzhiyun(0). Writing '0' to this file will disable polling for this device. 74*4882a593SmuzhiyunWriting any non-zero value will enable this feature. 75*4882a593Smuzhiyun 76*4882a593Smuzhiyunio_poll_delay (RW) 77*4882a593Smuzhiyun------------------ 78*4882a593SmuzhiyunIf polling is enabled, this controls what kind of polling will be 79*4882a593Smuzhiyunperformed. It defaults to -1, which is classic polling. In this mode, 80*4882a593Smuzhiyunthe CPU will repeatedly ask for completions without giving up any time. 81*4882a593SmuzhiyunIf set to 0, a hybrid polling mode is used, where the kernel will attempt 82*4882a593Smuzhiyunto make an educated guess at when the IO will complete. Based on this 83*4882a593Smuzhiyunguess, the kernel will put the process issuing IO to sleep for an amount 84*4882a593Smuzhiyunof time, before entering a classic poll loop. This mode might be a 85*4882a593Smuzhiyunlittle slower than pure classic polling, but it will be more efficient. 86*4882a593SmuzhiyunIf set to a value larger than 0, the kernel will put the process issuing 87*4882a593SmuzhiyunIO to sleep for this amount of microseconds before entering classic 88*4882a593Smuzhiyunpolling. 89*4882a593Smuzhiyun 90*4882a593Smuzhiyunio_timeout (RW) 91*4882a593Smuzhiyun--------------- 92*4882a593Smuzhiyunio_timeout is the request timeout in milliseconds. If a request does not 93*4882a593Smuzhiyuncomplete in this time then the block driver timeout handler is invoked. 94*4882a593SmuzhiyunThat timeout handler can decide to retry the request, to fail it or to start 95*4882a593Smuzhiyuna device recovery strategy. 96*4882a593Smuzhiyun 97*4882a593Smuzhiyuniostats (RW) 98*4882a593Smuzhiyun------------- 99*4882a593SmuzhiyunThis file is used to control (on/off) the iostats accounting of the 100*4882a593Smuzhiyundisk. 101*4882a593Smuzhiyun 102*4882a593Smuzhiyunlogical_block_size (RO) 103*4882a593Smuzhiyun----------------------- 104*4882a593SmuzhiyunThis is the logical block size of the device, in bytes. 105*4882a593Smuzhiyun 106*4882a593Smuzhiyunmax_discard_segments (RO) 107*4882a593Smuzhiyun------------------------- 108*4882a593SmuzhiyunThe maximum number of DMA scatter/gather entries in a discard request. 109*4882a593Smuzhiyun 110*4882a593Smuzhiyunmax_hw_sectors_kb (RO) 111*4882a593Smuzhiyun---------------------- 112*4882a593SmuzhiyunThis is the maximum number of kilobytes supported in a single data transfer. 113*4882a593Smuzhiyun 114*4882a593Smuzhiyunmax_integrity_segments (RO) 115*4882a593Smuzhiyun--------------------------- 116*4882a593SmuzhiyunMaximum number of elements in a DMA scatter/gather list with integrity 117*4882a593Smuzhiyundata that will be submitted by the block layer core to the associated 118*4882a593Smuzhiyunblock driver. 119*4882a593Smuzhiyun 120*4882a593Smuzhiyunmax_active_zones (RO) 121*4882a593Smuzhiyun--------------------- 122*4882a593SmuzhiyunFor zoned block devices (zoned attribute indicating "host-managed" or 123*4882a593Smuzhiyun"host-aware"), the sum of zones belonging to any of the zone states: 124*4882a593SmuzhiyunEXPLICIT OPEN, IMPLICIT OPEN or CLOSED, is limited by this value. 125*4882a593SmuzhiyunIf this value is 0, there is no limit. 126*4882a593Smuzhiyun 127*4882a593SmuzhiyunIf the host attempts to exceed this limit, the driver should report this error 128*4882a593Smuzhiyunwith BLK_STS_ZONE_ACTIVE_RESOURCE, which user space may see as the EOVERFLOW 129*4882a593Smuzhiyunerrno. 130*4882a593Smuzhiyun 131*4882a593Smuzhiyunmax_open_zones (RO) 132*4882a593Smuzhiyun------------------- 133*4882a593SmuzhiyunFor zoned block devices (zoned attribute indicating "host-managed" or 134*4882a593Smuzhiyun"host-aware"), the sum of zones belonging to any of the zone states: 135*4882a593SmuzhiyunEXPLICIT OPEN or IMPLICIT OPEN, is limited by this value. 136*4882a593SmuzhiyunIf this value is 0, there is no limit. 137*4882a593Smuzhiyun 138*4882a593SmuzhiyunIf the host attempts to exceed this limit, the driver should report this error 139*4882a593Smuzhiyunwith BLK_STS_ZONE_OPEN_RESOURCE, which user space may see as the ETOOMANYREFS 140*4882a593Smuzhiyunerrno. 141*4882a593Smuzhiyun 142*4882a593Smuzhiyunmax_sectors_kb (RW) 143*4882a593Smuzhiyun------------------- 144*4882a593SmuzhiyunThis is the maximum number of kilobytes that the block layer will allow 145*4882a593Smuzhiyunfor a filesystem request. Must be smaller than or equal to the maximum 146*4882a593Smuzhiyunsize allowed by the hardware. 147*4882a593Smuzhiyun 148*4882a593Smuzhiyunmax_segments (RO) 149*4882a593Smuzhiyun----------------- 150*4882a593SmuzhiyunMaximum number of elements in a DMA scatter/gather list that is submitted 151*4882a593Smuzhiyunto the associated block driver. 152*4882a593Smuzhiyun 153*4882a593Smuzhiyunmax_segment_size (RO) 154*4882a593Smuzhiyun--------------------- 155*4882a593SmuzhiyunMaximum size in bytes of a single element in a DMA scatter/gather list. 156*4882a593Smuzhiyun 157*4882a593Smuzhiyunminimum_io_size (RO) 158*4882a593Smuzhiyun-------------------- 159*4882a593SmuzhiyunThis is the smallest preferred IO size reported by the device. 160*4882a593Smuzhiyun 161*4882a593Smuzhiyunnomerges (RW) 162*4882a593Smuzhiyun------------- 163*4882a593SmuzhiyunThis enables the user to disable the lookup logic involved with IO 164*4882a593Smuzhiyunmerging requests in the block layer. By default (0) all merges are 165*4882a593Smuzhiyunenabled. When set to 1 only simple one-hit merges will be tried. When 166*4882a593Smuzhiyunset to 2 no merge algorithms will be tried (including one-hit or more 167*4882a593Smuzhiyuncomplex tree/hash lookups). 168*4882a593Smuzhiyun 169*4882a593Smuzhiyunnr_requests (RW) 170*4882a593Smuzhiyun---------------- 171*4882a593SmuzhiyunThis controls how many requests may be allocated in the block layer for 172*4882a593Smuzhiyunread or write requests. Note that the total allocated number may be twice 173*4882a593Smuzhiyunthis amount, since it applies only to reads or writes (not the accumulated 174*4882a593Smuzhiyunsum). 175*4882a593Smuzhiyun 176*4882a593SmuzhiyunTo avoid priority inversion through request starvation, a request 177*4882a593Smuzhiyunqueue maintains a separate request pool per each cgroup when 178*4882a593SmuzhiyunCONFIG_BLK_CGROUP is enabled, and this parameter applies to each such 179*4882a593Smuzhiyunper-block-cgroup request pool. IOW, if there are N block cgroups, 180*4882a593Smuzhiyuneach request queue may have up to N request pools, each independently 181*4882a593Smuzhiyunregulated by nr_requests. 182*4882a593Smuzhiyun 183*4882a593Smuzhiyunnr_zones (RO) 184*4882a593Smuzhiyun------------- 185*4882a593SmuzhiyunFor zoned block devices (zoned attribute indicating "host-managed" or 186*4882a593Smuzhiyun"host-aware"), this indicates the total number of zones of the device. 187*4882a593SmuzhiyunThis is always 0 for regular block devices. 188*4882a593Smuzhiyun 189*4882a593Smuzhiyunoptimal_io_size (RO) 190*4882a593Smuzhiyun-------------------- 191*4882a593SmuzhiyunThis is the optimal IO size reported by the device. 192*4882a593Smuzhiyun 193*4882a593Smuzhiyunphysical_block_size (RO) 194*4882a593Smuzhiyun------------------------ 195*4882a593SmuzhiyunThis is the physical block size of device, in bytes. 196*4882a593Smuzhiyun 197*4882a593Smuzhiyunread_ahead_kb (RW) 198*4882a593Smuzhiyun------------------ 199*4882a593SmuzhiyunMaximum number of kilobytes to read-ahead for filesystems on this block 200*4882a593Smuzhiyundevice. 201*4882a593Smuzhiyun 202*4882a593Smuzhiyunrotational (RW) 203*4882a593Smuzhiyun--------------- 204*4882a593SmuzhiyunThis file is used to stat if the device is of rotational type or 205*4882a593Smuzhiyunnon-rotational type. 206*4882a593Smuzhiyun 207*4882a593Smuzhiyunrq_affinity (RW) 208*4882a593Smuzhiyun---------------- 209*4882a593SmuzhiyunIf this option is '1', the block layer will migrate request completions to the 210*4882a593Smuzhiyuncpu "group" that originally submitted the request. For some workloads this 211*4882a593Smuzhiyunprovides a significant reduction in CPU cycles due to caching effects. 212*4882a593Smuzhiyun 213*4882a593SmuzhiyunFor storage configurations that need to maximize distribution of completion 214*4882a593Smuzhiyunprocessing setting this option to '2' forces the completion to run on the 215*4882a593Smuzhiyunrequesting cpu (bypassing the "group" aggregation logic). 216*4882a593Smuzhiyun 217*4882a593Smuzhiyunscheduler (RW) 218*4882a593Smuzhiyun-------------- 219*4882a593SmuzhiyunWhen read, this file will display the current and available IO schedulers 220*4882a593Smuzhiyunfor this block device. The currently active IO scheduler will be enclosed 221*4882a593Smuzhiyunin [] brackets. Writing an IO scheduler name to this file will switch 222*4882a593Smuzhiyuncontrol of this block device to that new IO scheduler. Note that writing 223*4882a593Smuzhiyunan IO scheduler name to this file will attempt to load that IO scheduler 224*4882a593Smuzhiyunmodule, if it isn't already present in the system. 225*4882a593Smuzhiyun 226*4882a593Smuzhiyunwrite_cache (RW) 227*4882a593Smuzhiyun---------------- 228*4882a593SmuzhiyunWhen read, this file will display whether the device has write back 229*4882a593Smuzhiyuncaching enabled or not. It will return "write back" for the former 230*4882a593Smuzhiyuncase, and "write through" for the latter. Writing to this file can 231*4882a593Smuzhiyunchange the kernels view of the device, but it doesn't alter the 232*4882a593Smuzhiyundevice state. This means that it might not be safe to toggle the 233*4882a593Smuzhiyunsetting from "write back" to "write through", since that will also 234*4882a593Smuzhiyuneliminate cache flushes issued by the kernel. 235*4882a593Smuzhiyun 236*4882a593Smuzhiyunwrite_same_max_bytes (RO) 237*4882a593Smuzhiyun------------------------- 238*4882a593SmuzhiyunThis is the number of bytes the device can write in a single write-same 239*4882a593Smuzhiyuncommand. A value of '0' means write-same is not supported by this 240*4882a593Smuzhiyundevice. 241*4882a593Smuzhiyun 242*4882a593Smuzhiyunwbt_lat_usec (RW) 243*4882a593Smuzhiyun----------------- 244*4882a593SmuzhiyunIf the device is registered for writeback throttling, then this file shows 245*4882a593Smuzhiyunthe target minimum read latency. If this latency is exceeded in a given 246*4882a593Smuzhiyunwindow of time (see wb_window_usec), then the writeback throttling will start 247*4882a593Smuzhiyunscaling back writes. Writing a value of '0' to this file disables the 248*4882a593Smuzhiyunfeature. Writing a value of '-1' to this file resets the value to the 249*4882a593Smuzhiyundefault setting. 250*4882a593Smuzhiyun 251*4882a593Smuzhiyunthrottle_sample_time (RW) 252*4882a593Smuzhiyun------------------------- 253*4882a593SmuzhiyunThis is the time window that blk-throttle samples data, in millisecond. 254*4882a593Smuzhiyunblk-throttle makes decision based on the samplings. Lower time means cgroups 255*4882a593Smuzhiyunhave more smooth throughput, but higher CPU overhead. This exists only when 256*4882a593SmuzhiyunCONFIG_BLK_DEV_THROTTLING_LOW is enabled. 257*4882a593Smuzhiyun 258*4882a593Smuzhiyunwrite_zeroes_max_bytes (RO) 259*4882a593Smuzhiyun--------------------------- 260*4882a593SmuzhiyunFor block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of 261*4882a593Smuzhiyunbytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES 262*4882a593Smuzhiyunis not supported. 263*4882a593Smuzhiyun 264*4882a593Smuzhiyunzoned (RO) 265*4882a593Smuzhiyun---------- 266*4882a593SmuzhiyunThis indicates if the device is a zoned block device and the zone model of the 267*4882a593Smuzhiyundevice if it is indeed zoned. The possible values indicated by zoned are 268*4882a593Smuzhiyun"none" for regular block devices and "host-aware" or "host-managed" for zoned 269*4882a593Smuzhiyunblock devices. The characteristics of host-aware and host-managed zoned block 270*4882a593Smuzhiyundevices are described in the ZBC (Zoned Block Commands) and ZAC 271*4882a593Smuzhiyun(Zoned Device ATA Command Set) standards. These standards also define the 272*4882a593Smuzhiyun"drive-managed" zone model. However, since drive-managed zoned block devices 273*4882a593Smuzhiyundo not support zone commands, they will be treated as regular block devices 274*4882a593Smuzhiyunand zoned will report "none". 275*4882a593Smuzhiyun 276*4882a593SmuzhiyunJens Axboe <jens.axboe@oracle.com>, February 2009 277