1*4882a593Smuzhiyun================================================= 2*4882a593SmuzhiyunLinux API for read access to z/VM Monitor Records 3*4882a593Smuzhiyun================================================= 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunDate : 2004-Nov-26 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunAuthor: Gerald Schaefer (geraldsc@de.ibm.com) 8*4882a593Smuzhiyun 9*4882a593Smuzhiyun 10*4882a593Smuzhiyun 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunDescription 13*4882a593Smuzhiyun=========== 14*4882a593SmuzhiyunThis item delivers a new Linux API in the form of a misc char device that is 15*4882a593Smuzhiyunusable from user space and allows read access to the z/VM Monitor Records 16*4882a593Smuzhiyuncollected by the `*MONITOR` System Service of z/VM. 17*4882a593Smuzhiyun 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunUser Requirements 20*4882a593Smuzhiyun================= 21*4882a593SmuzhiyunThe z/VM guest on which you want to access this API needs to be configured in 22*4882a593Smuzhiyunorder to allow IUCV connections to the `*MONITOR` service, i.e. it needs the 23*4882a593SmuzhiyunIUCV `*MONITOR` statement in its user entry. If the monitor DCSS to be used is 24*4882a593Smuzhiyunrestricted (likely), you also need the NAMESAVE <DCSS NAME> statement. 25*4882a593SmuzhiyunThis item will use the IUCV device driver to access the z/VM services, so you 26*4882a593Smuzhiyunneed a kernel with IUCV support. You also need z/VM version 4.4 or 5.1. 27*4882a593Smuzhiyun 28*4882a593SmuzhiyunThere are two options for being able to load the monitor DCSS (examples assume 29*4882a593Smuzhiyunthat the monitor DCSS begins at 144 MB and ends at 152 MB). You can query the 30*4882a593Smuzhiyunlocation of the monitor DCSS with the Class E privileged CP command Q NSS MAP 31*4882a593Smuzhiyun(the values BEGPAG and ENDPAG are given in units of 4K pages). 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunSee also "CP Command and Utility Reference" (SC24-6081-00) for more information 34*4882a593Smuzhiyunon the DEF STOR and Q NSS MAP commands, as well as "Saved Segments Planning 35*4882a593Smuzhiyunand Administration" (SC24-6116-00) for more information on DCSSes. 36*4882a593Smuzhiyun 37*4882a593Smuzhiyun1st option: 38*4882a593Smuzhiyun----------- 39*4882a593SmuzhiyunYou can use the CP command DEF STOR CONFIG to define a "memory hole" in your 40*4882a593Smuzhiyunguest virtual storage around the address range of the DCSS. 41*4882a593Smuzhiyun 42*4882a593SmuzhiyunExample: DEF STOR CONFIG 0.140M 200M.200M 43*4882a593Smuzhiyun 44*4882a593SmuzhiyunThis defines two blocks of storage, the first is 140MB in size an begins at 45*4882a593Smuzhiyunaddress 0MB, the second is 200MB in size and begins at address 200MB, 46*4882a593Smuzhiyunresulting in a total storage of 340MB. Note that the first block should 47*4882a593Smuzhiyunalways start at 0 and be at least 64MB in size. 48*4882a593Smuzhiyun 49*4882a593Smuzhiyun2nd option: 50*4882a593Smuzhiyun----------- 51*4882a593SmuzhiyunYour guest virtual storage has to end below the starting address of the DCSS 52*4882a593Smuzhiyunand you have to specify the "mem=" kernel parameter in your parmfile with a 53*4882a593Smuzhiyunvalue greater than the ending address of the DCSS. 54*4882a593Smuzhiyun 55*4882a593SmuzhiyunExample:: 56*4882a593Smuzhiyun 57*4882a593Smuzhiyun DEF STOR 140M 58*4882a593Smuzhiyun 59*4882a593SmuzhiyunThis defines 140MB storage size for your guest, the parameter "mem=160M" is 60*4882a593Smuzhiyunadded to the parmfile. 61*4882a593Smuzhiyun 62*4882a593Smuzhiyun 63*4882a593SmuzhiyunUser Interface 64*4882a593Smuzhiyun============== 65*4882a593SmuzhiyunThe char device is implemented as a kernel module named "monreader", 66*4882a593Smuzhiyunwhich can be loaded via the modprobe command, or it can be compiled into the 67*4882a593Smuzhiyunkernel instead. There is one optional module (or kernel) parameter, "mondcss", 68*4882a593Smuzhiyunto specify the name of the monitor DCSS. If the module is compiled into the 69*4882a593Smuzhiyunkernel, the kernel parameter "monreader.mondcss=<DCSS NAME>" can be specified 70*4882a593Smuzhiyunin the parmfile. 71*4882a593Smuzhiyun 72*4882a593SmuzhiyunThe default name for the DCSS is "MONDCSS" if none is specified. In case that 73*4882a593Smuzhiyunthere are other users already connected to the `*MONITOR` service (e.g. 74*4882a593SmuzhiyunPerformance Toolkit), the monitor DCSS is already defined and you have to use 75*4882a593Smuzhiyunthe same DCSS. The CP command Q MONITOR (Class E privileged) shows the name 76*4882a593Smuzhiyunof the monitor DCSS, if already defined, and the users connected to the 77*4882a593Smuzhiyun`*MONITOR` service. 78*4882a593SmuzhiyunRefer to the "z/VM Performance" book (SC24-6109-00) on how to create a monitor 79*4882a593SmuzhiyunDCSS if your z/VM doesn't have one already, you need Class E privileges to 80*4882a593Smuzhiyundefine and save a DCSS. 81*4882a593Smuzhiyun 82*4882a593SmuzhiyunExample: 83*4882a593Smuzhiyun-------- 84*4882a593Smuzhiyun 85*4882a593Smuzhiyun:: 86*4882a593Smuzhiyun 87*4882a593Smuzhiyun modprobe monreader mondcss=MYDCSS 88*4882a593Smuzhiyun 89*4882a593SmuzhiyunThis loads the module and sets the DCSS name to "MYDCSS". 90*4882a593Smuzhiyun 91*4882a593SmuzhiyunNOTE: 92*4882a593Smuzhiyun----- 93*4882a593SmuzhiyunThis API provides no interface to control the `*MONITOR` service, e.g. specify 94*4882a593Smuzhiyunwhich data should be collected. This can be done by the CP command MONITOR 95*4882a593Smuzhiyun(Class E privileged), see "CP Command and Utility Reference". 96*4882a593Smuzhiyun 97*4882a593SmuzhiyunDevice nodes with udev: 98*4882a593Smuzhiyun----------------------- 99*4882a593SmuzhiyunAfter loading the module, a char device will be created along with the device 100*4882a593Smuzhiyunnode /<udev directory>/monreader. 101*4882a593Smuzhiyun 102*4882a593SmuzhiyunDevice nodes without udev: 103*4882a593Smuzhiyun-------------------------- 104*4882a593SmuzhiyunIf your distribution does not support udev, a device node will not be created 105*4882a593Smuzhiyunautomatically and you have to create it manually after loading the module. 106*4882a593SmuzhiyunTherefore you need to know the major and minor numbers of the device. These 107*4882a593Smuzhiyunnumbers can be found in /sys/class/misc/monreader/dev. 108*4882a593Smuzhiyun 109*4882a593SmuzhiyunTyping cat /sys/class/misc/monreader/dev will give an output of the form 110*4882a593Smuzhiyun<major>:<minor>. The device node can be created via the mknod command, enter 111*4882a593Smuzhiyunmknod <name> c <major> <minor>, where <name> is the name of the device node 112*4882a593Smuzhiyunto be created. 113*4882a593Smuzhiyun 114*4882a593SmuzhiyunExample: 115*4882a593Smuzhiyun-------- 116*4882a593Smuzhiyun 117*4882a593Smuzhiyun:: 118*4882a593Smuzhiyun 119*4882a593Smuzhiyun # modprobe monreader 120*4882a593Smuzhiyun # cat /sys/class/misc/monreader/dev 121*4882a593Smuzhiyun 10:63 122*4882a593Smuzhiyun # mknod /dev/monreader c 10 63 123*4882a593Smuzhiyun 124*4882a593SmuzhiyunThis loads the module with the default monitor DCSS (MONDCSS) and creates a 125*4882a593Smuzhiyundevice node. 126*4882a593Smuzhiyun 127*4882a593SmuzhiyunFile operations: 128*4882a593Smuzhiyun---------------- 129*4882a593SmuzhiyunThe following file operations are supported: open, release, read, poll. 130*4882a593SmuzhiyunThere are two alternative methods for reading: either non-blocking read in 131*4882a593Smuzhiyunconjunction with polling, or blocking read without polling. IOCTLs are not 132*4882a593Smuzhiyunsupported. 133*4882a593Smuzhiyun 134*4882a593SmuzhiyunRead: 135*4882a593Smuzhiyun----- 136*4882a593SmuzhiyunReading from the device provides a 12 Byte monitor control element (MCE), 137*4882a593Smuzhiyunfollowed by a set of one or more contiguous monitor records (similar to the 138*4882a593Smuzhiyunoutput of the CMS utility MONWRITE without the 4K control blocks). The MCE 139*4882a593Smuzhiyuncontains information on the type of the following record set (sample/event 140*4882a593Smuzhiyundata), the monitor domains contained within it and the start and end address 141*4882a593Smuzhiyunof the record set in the monitor DCSS. The start and end address can be used 142*4882a593Smuzhiyunto determine the size of the record set, the end address is the address of the 143*4882a593Smuzhiyunlast byte of data. The start address is needed to handle "end-of-frame" records 144*4882a593Smuzhiyuncorrectly (domain 1, record 13), i.e. it can be used to determine the record 145*4882a593Smuzhiyunstart offset relative to a 4K page (frame) boundary. 146*4882a593Smuzhiyun 147*4882a593SmuzhiyunSee "Appendix A: `*MONITOR`" in the "z/VM Performance" document for a description 148*4882a593Smuzhiyunof the monitor control element layout. The layout of the monitor records can 149*4882a593Smuzhiyunbe found here (z/VM 5.1): https://www.vm.ibm.com/pubs/mon510/index.html 150*4882a593Smuzhiyun 151*4882a593SmuzhiyunThe layout of the data stream provided by the monreader device is as follows:: 152*4882a593Smuzhiyun 153*4882a593Smuzhiyun ... 154*4882a593Smuzhiyun <0 byte read> 155*4882a593Smuzhiyun <first MCE> \ 156*4882a593Smuzhiyun <first set of records> | 157*4882a593Smuzhiyun ... |- data set 158*4882a593Smuzhiyun <last MCE> | 159*4882a593Smuzhiyun <last set of records> / 160*4882a593Smuzhiyun <0 byte read> 161*4882a593Smuzhiyun ... 162*4882a593Smuzhiyun 163*4882a593SmuzhiyunThere may be more than one combination of MCE and corresponding record set 164*4882a593Smuzhiyunwithin one data set and the end of each data set is indicated by a successful 165*4882a593Smuzhiyunread with a return value of 0 (0 byte read). 166*4882a593SmuzhiyunAny received data must be considered invalid until a complete set was 167*4882a593Smuzhiyunread successfully, including the closing 0 byte read. Therefore you should 168*4882a593Smuzhiyunalways read the complete set into a buffer before processing the data. 169*4882a593Smuzhiyun 170*4882a593SmuzhiyunThe maximum size of a data set can be as large as the size of the 171*4882a593Smuzhiyunmonitor DCSS, so design the buffer adequately or use dynamic memory allocation. 172*4882a593SmuzhiyunThe size of the monitor DCSS will be printed into syslog after loading the 173*4882a593Smuzhiyunmodule. You can also use the (Class E privileged) CP command Q NSS MAP to 174*4882a593Smuzhiyunlist all available segments and information about them. 175*4882a593Smuzhiyun 176*4882a593SmuzhiyunAs with most char devices, error conditions are indicated by returning a 177*4882a593Smuzhiyunnegative value for the number of bytes read. In this case, the errno variable 178*4882a593Smuzhiyunindicates the error condition: 179*4882a593Smuzhiyun 180*4882a593SmuzhiyunEIO: 181*4882a593Smuzhiyun reply failed, read data is invalid and the application 182*4882a593Smuzhiyun should discard the data read since the last successful read with 0 size. 183*4882a593SmuzhiyunEFAULT: 184*4882a593Smuzhiyun copy_to_user failed, read data is invalid and the application should 185*4882a593Smuzhiyun discard the data read since the last successful read with 0 size. 186*4882a593SmuzhiyunEAGAIN: 187*4882a593Smuzhiyun occurs on a non-blocking read if there is no data available at the 188*4882a593Smuzhiyun moment. There is no data missing or corrupted, just try again or rather 189*4882a593Smuzhiyun use polling for non-blocking reads. 190*4882a593SmuzhiyunEOVERFLOW: 191*4882a593Smuzhiyun message limit reached, the data read since the last successful 192*4882a593Smuzhiyun read with 0 size is valid but subsequent records may be missing. 193*4882a593Smuzhiyun 194*4882a593SmuzhiyunIn the last case (EOVERFLOW) there may be missing data, in the first two cases 195*4882a593Smuzhiyun(EIO, EFAULT) there will be missing data. It's up to the application if it will 196*4882a593Smuzhiyuncontinue reading subsequent data or rather exit. 197*4882a593Smuzhiyun 198*4882a593SmuzhiyunOpen: 199*4882a593Smuzhiyun----- 200*4882a593SmuzhiyunOnly one user is allowed to open the char device. If it is already in use, the 201*4882a593Smuzhiyunopen function will fail (return a negative value) and set errno to EBUSY. 202*4882a593SmuzhiyunThe open function may also fail if an IUCV connection to the `*MONITOR` service 203*4882a593Smuzhiyuncannot be established. In this case errno will be set to EIO and an error 204*4882a593Smuzhiyunmessage with an IPUSER SEVER code will be printed into syslog. The IPUSER SEVER 205*4882a593Smuzhiyuncodes are described in the "z/VM Performance" book, Appendix A. 206*4882a593Smuzhiyun 207*4882a593SmuzhiyunNOTE: 208*4882a593Smuzhiyun----- 209*4882a593SmuzhiyunAs soon as the device is opened, incoming messages will be accepted and they 210*4882a593Smuzhiyunwill account for the message limit, i.e. opening the device without reading 211*4882a593Smuzhiyunfrom it will provoke the "message limit reached" error (EOVERFLOW error code) 212*4882a593Smuzhiyuneventually. 213