1*4882a593Smuzhiyun============= 2*4882a593SmuzhiyunDM statistics 3*4882a593Smuzhiyun============= 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunDevice Mapper supports the collection of I/O statistics on user-defined 6*4882a593Smuzhiyunregions of a DM device. If no regions are defined no statistics are 7*4882a593Smuzhiyuncollected so there isn't any performance impact. Only bio-based DM 8*4882a593Smuzhiyundevices are currently supported. 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunEach user-defined region specifies a starting sector, length and step. 11*4882a593SmuzhiyunIndividual statistics will be collected for each step-sized area within 12*4882a593Smuzhiyunthe range specified. 13*4882a593Smuzhiyun 14*4882a593SmuzhiyunThe I/O statistics counters for each step-sized area of a region are 15*4882a593Smuzhiyunin the same format as `/sys/block/*/stat` or `/proc/diskstats` (see: 16*4882a593SmuzhiyunDocumentation/admin-guide/iostats.rst). But two extra counters (12 and 13) are 17*4882a593Smuzhiyunprovided: total time spent reading and writing. When the histogram 18*4882a593Smuzhiyunargument is used, the 14th parameter is reported that represents the 19*4882a593Smuzhiyunhistogram of latencies. All these counters may be accessed by sending 20*4882a593Smuzhiyunthe @stats_print message to the appropriate DM device via dmsetup. 21*4882a593Smuzhiyun 22*4882a593SmuzhiyunThe reported times are in milliseconds and the granularity depends on 23*4882a593Smuzhiyunthe kernel ticks. When the option precise_timestamps is used, the 24*4882a593Smuzhiyunreported times are in nanoseconds. 25*4882a593Smuzhiyun 26*4882a593SmuzhiyunEach region has a corresponding unique identifier, which we call a 27*4882a593Smuzhiyunregion_id, that is assigned when the region is created. The region_id 28*4882a593Smuzhiyunmust be supplied when querying statistics about the region, deleting the 29*4882a593Smuzhiyunregion, etc. Unique region_ids enable multiple userspace programs to 30*4882a593Smuzhiyunrequest and process statistics for the same DM device without stepping 31*4882a593Smuzhiyunon each other's data. 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunThe creation of DM statistics will allocate memory via kmalloc or 34*4882a593Smuzhiyunfallback to using vmalloc space. At most, 1/4 of the overall system 35*4882a593Smuzhiyunmemory may be allocated by DM statistics. The admin can see how much 36*4882a593Smuzhiyunmemory is used by reading: 37*4882a593Smuzhiyun 38*4882a593Smuzhiyun /sys/module/dm_mod/parameters/stats_current_allocated_bytes 39*4882a593Smuzhiyun 40*4882a593SmuzhiyunMessages 41*4882a593Smuzhiyun======== 42*4882a593Smuzhiyun 43*4882a593Smuzhiyun @stats_create <range> <step> [<number_of_optional_arguments> <optional_arguments>...] [<program_id> [<aux_data>]] 44*4882a593Smuzhiyun Create a new region and return the region_id. 45*4882a593Smuzhiyun 46*4882a593Smuzhiyun <range> 47*4882a593Smuzhiyun "-" 48*4882a593Smuzhiyun whole device 49*4882a593Smuzhiyun "<start_sector>+<length>" 50*4882a593Smuzhiyun a range of <length> 512-byte sectors 51*4882a593Smuzhiyun starting with <start_sector>. 52*4882a593Smuzhiyun 53*4882a593Smuzhiyun <step> 54*4882a593Smuzhiyun "<area_size>" 55*4882a593Smuzhiyun the range is subdivided into areas each containing 56*4882a593Smuzhiyun <area_size> sectors. 57*4882a593Smuzhiyun "/<number_of_areas>" 58*4882a593Smuzhiyun the range is subdivided into the specified 59*4882a593Smuzhiyun number of areas. 60*4882a593Smuzhiyun 61*4882a593Smuzhiyun <number_of_optional_arguments> 62*4882a593Smuzhiyun The number of optional arguments 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun <optional_arguments> 65*4882a593Smuzhiyun The following optional arguments are supported: 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun precise_timestamps 68*4882a593Smuzhiyun use precise timer with nanosecond resolution 69*4882a593Smuzhiyun instead of the "jiffies" variable. When this argument is 70*4882a593Smuzhiyun used, the resulting times are in nanoseconds instead of 71*4882a593Smuzhiyun milliseconds. Precise timestamps are a little bit slower 72*4882a593Smuzhiyun to obtain than jiffies-based timestamps. 73*4882a593Smuzhiyun histogram:n1,n2,n3,n4,... 74*4882a593Smuzhiyun collect histogram of latencies. The 75*4882a593Smuzhiyun numbers n1, n2, etc are times that represent the boundaries 76*4882a593Smuzhiyun of the histogram. If precise_timestamps is not used, the 77*4882a593Smuzhiyun times are in milliseconds, otherwise they are in 78*4882a593Smuzhiyun nanoseconds. For each range, the kernel will report the 79*4882a593Smuzhiyun number of requests that completed within this range. For 80*4882a593Smuzhiyun example, if we use "histogram:10,20,30", the kernel will 81*4882a593Smuzhiyun report four numbers a:b:c:d. a is the number of requests 82*4882a593Smuzhiyun that took 0-10 ms to complete, b is the number of requests 83*4882a593Smuzhiyun that took 10-20 ms to complete, c is the number of requests 84*4882a593Smuzhiyun that took 20-30 ms to complete and d is the number of 85*4882a593Smuzhiyun requests that took more than 30 ms to complete. 86*4882a593Smuzhiyun 87*4882a593Smuzhiyun <program_id> 88*4882a593Smuzhiyun An optional parameter. A name that uniquely identifies 89*4882a593Smuzhiyun the userspace owner of the range. This groups ranges together 90*4882a593Smuzhiyun so that userspace programs can identify the ranges they 91*4882a593Smuzhiyun created and ignore those created by others. 92*4882a593Smuzhiyun The kernel returns this string back in the output of 93*4882a593Smuzhiyun @stats_list message, but it doesn't use it for anything else. 94*4882a593Smuzhiyun If we omit the number of optional arguments, program id must not 95*4882a593Smuzhiyun be a number, otherwise it would be interpreted as the number of 96*4882a593Smuzhiyun optional arguments. 97*4882a593Smuzhiyun 98*4882a593Smuzhiyun <aux_data> 99*4882a593Smuzhiyun An optional parameter. A word that provides auxiliary data 100*4882a593Smuzhiyun that is useful to the client program that created the range. 101*4882a593Smuzhiyun The kernel returns this string back in the output of 102*4882a593Smuzhiyun @stats_list message, but it doesn't use this value for anything. 103*4882a593Smuzhiyun 104*4882a593Smuzhiyun @stats_delete <region_id> 105*4882a593Smuzhiyun Delete the region with the specified id. 106*4882a593Smuzhiyun 107*4882a593Smuzhiyun <region_id> 108*4882a593Smuzhiyun region_id returned from @stats_create 109*4882a593Smuzhiyun 110*4882a593Smuzhiyun @stats_clear <region_id> 111*4882a593Smuzhiyun Clear all the counters except the in-flight i/o counters. 112*4882a593Smuzhiyun 113*4882a593Smuzhiyun <region_id> 114*4882a593Smuzhiyun region_id returned from @stats_create 115*4882a593Smuzhiyun 116*4882a593Smuzhiyun @stats_list [<program_id>] 117*4882a593Smuzhiyun List all regions registered with @stats_create. 118*4882a593Smuzhiyun 119*4882a593Smuzhiyun <program_id> 120*4882a593Smuzhiyun An optional parameter. 121*4882a593Smuzhiyun If this parameter is specified, only matching regions 122*4882a593Smuzhiyun are returned. 123*4882a593Smuzhiyun If it is not specified, all regions are returned. 124*4882a593Smuzhiyun 125*4882a593Smuzhiyun Output format: 126*4882a593Smuzhiyun <region_id>: <start_sector>+<length> <step> <program_id> <aux_data> 127*4882a593Smuzhiyun precise_timestamps histogram:n1,n2,n3,... 128*4882a593Smuzhiyun 129*4882a593Smuzhiyun The strings "precise_timestamps" and "histogram" are printed only 130*4882a593Smuzhiyun if they were specified when creating the region. 131*4882a593Smuzhiyun 132*4882a593Smuzhiyun @stats_print <region_id> [<starting_line> <number_of_lines>] 133*4882a593Smuzhiyun Print counters for each step-sized area of a region. 134*4882a593Smuzhiyun 135*4882a593Smuzhiyun <region_id> 136*4882a593Smuzhiyun region_id returned from @stats_create 137*4882a593Smuzhiyun 138*4882a593Smuzhiyun <starting_line> 139*4882a593Smuzhiyun The index of the starting line in the output. 140*4882a593Smuzhiyun If omitted, all lines are returned. 141*4882a593Smuzhiyun 142*4882a593Smuzhiyun <number_of_lines> 143*4882a593Smuzhiyun The number of lines to include in the output. 144*4882a593Smuzhiyun If omitted, all lines are returned. 145*4882a593Smuzhiyun 146*4882a593Smuzhiyun Output format for each step-sized area of a region: 147*4882a593Smuzhiyun 148*4882a593Smuzhiyun <start_sector>+<length> 149*4882a593Smuzhiyun counters 150*4882a593Smuzhiyun 151*4882a593Smuzhiyun The first 11 counters have the same meaning as 152*4882a593Smuzhiyun `/sys/block/*/stat or /proc/diskstats`. 153*4882a593Smuzhiyun 154*4882a593Smuzhiyun Please refer to Documentation/admin-guide/iostats.rst for details. 155*4882a593Smuzhiyun 156*4882a593Smuzhiyun 1. the number of reads completed 157*4882a593Smuzhiyun 2. the number of reads merged 158*4882a593Smuzhiyun 3. the number of sectors read 159*4882a593Smuzhiyun 4. the number of milliseconds spent reading 160*4882a593Smuzhiyun 5. the number of writes completed 161*4882a593Smuzhiyun 6. the number of writes merged 162*4882a593Smuzhiyun 7. the number of sectors written 163*4882a593Smuzhiyun 8. the number of milliseconds spent writing 164*4882a593Smuzhiyun 9. the number of I/Os currently in progress 165*4882a593Smuzhiyun 10. the number of milliseconds spent doing I/Os 166*4882a593Smuzhiyun 11. the weighted number of milliseconds spent doing I/Os 167*4882a593Smuzhiyun 168*4882a593Smuzhiyun Additional counters: 169*4882a593Smuzhiyun 170*4882a593Smuzhiyun 12. the total time spent reading in milliseconds 171*4882a593Smuzhiyun 13. the total time spent writing in milliseconds 172*4882a593Smuzhiyun 173*4882a593Smuzhiyun @stats_print_clear <region_id> [<starting_line> <number_of_lines>] 174*4882a593Smuzhiyun Atomically print and then clear all the counters except the 175*4882a593Smuzhiyun in-flight i/o counters. Useful when the client consuming the 176*4882a593Smuzhiyun statistics does not want to lose any statistics (those updated 177*4882a593Smuzhiyun between printing and clearing). 178*4882a593Smuzhiyun 179*4882a593Smuzhiyun <region_id> 180*4882a593Smuzhiyun region_id returned from @stats_create 181*4882a593Smuzhiyun 182*4882a593Smuzhiyun <starting_line> 183*4882a593Smuzhiyun The index of the starting line in the output. 184*4882a593Smuzhiyun If omitted, all lines are printed and then cleared. 185*4882a593Smuzhiyun 186*4882a593Smuzhiyun <number_of_lines> 187*4882a593Smuzhiyun The number of lines to process. 188*4882a593Smuzhiyun If omitted, all lines are printed and then cleared. 189*4882a593Smuzhiyun 190*4882a593Smuzhiyun @stats_set_aux <region_id> <aux_data> 191*4882a593Smuzhiyun Store auxiliary data aux_data for the specified region. 192*4882a593Smuzhiyun 193*4882a593Smuzhiyun <region_id> 194*4882a593Smuzhiyun region_id returned from @stats_create 195*4882a593Smuzhiyun 196*4882a593Smuzhiyun <aux_data> 197*4882a593Smuzhiyun The string that identifies data which is useful to the client 198*4882a593Smuzhiyun program that created the range. The kernel returns this 199*4882a593Smuzhiyun string back in the output of @stats_list message, but it 200*4882a593Smuzhiyun doesn't use this value for anything. 201*4882a593Smuzhiyun 202*4882a593SmuzhiyunExamples 203*4882a593Smuzhiyun======== 204*4882a593Smuzhiyun 205*4882a593SmuzhiyunSubdivide the DM device 'vol' into 100 pieces and start collecting 206*4882a593Smuzhiyunstatistics on them:: 207*4882a593Smuzhiyun 208*4882a593Smuzhiyun dmsetup message vol 0 @stats_create - /100 209*4882a593Smuzhiyun 210*4882a593SmuzhiyunSet the auxiliary data string to "foo bar baz" (the escape for each 211*4882a593Smuzhiyunspace must also be escaped, otherwise the shell will consume them):: 212*4882a593Smuzhiyun 213*4882a593Smuzhiyun dmsetup message vol 0 @stats_set_aux 0 foo\\ bar\\ baz 214*4882a593Smuzhiyun 215*4882a593SmuzhiyunList the statistics:: 216*4882a593Smuzhiyun 217*4882a593Smuzhiyun dmsetup message vol 0 @stats_list 218*4882a593Smuzhiyun 219*4882a593SmuzhiyunPrint the statistics:: 220*4882a593Smuzhiyun 221*4882a593Smuzhiyun dmsetup message vol 0 @stats_print 0 222*4882a593Smuzhiyun 223*4882a593SmuzhiyunDelete the statistics:: 224*4882a593Smuzhiyun 225*4882a593Smuzhiyun dmsetup message vol 0 @stats_delete 0 226