1*4882a593Smuzhiyun======================================= 2*4882a593SmuzhiyunOracle Data Analytics Accelerator (DAX) 3*4882a593Smuzhiyun======================================= 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunDAX is a coprocessor which resides on the SPARC M7 (DAX1) and M8 6*4882a593Smuzhiyun(DAX2) processor chips, and has direct access to the CPU's L3 caches 7*4882a593Smuzhiyunas well as physical memory. It can perform several operations on data 8*4882a593Smuzhiyunstreams with various input and output formats. A driver provides a 9*4882a593Smuzhiyuntransport mechanism and has limited knowledge of the various opcodes 10*4882a593Smuzhiyunand data formats. A user space library provides high level services 11*4882a593Smuzhiyunand translates these into low level commands which are then passed 12*4882a593Smuzhiyuninto the driver and subsequently the Hypervisor and the coprocessor. 13*4882a593SmuzhiyunThe library is the recommended way for applications to use the 14*4882a593Smuzhiyuncoprocessor, and the driver interface is not intended for general use. 15*4882a593SmuzhiyunThis document describes the general flow of the driver, its 16*4882a593Smuzhiyunstructures, and its programmatic interface. It also provides example 17*4882a593Smuzhiyuncode sufficient to write user or kernel applications that use DAX 18*4882a593Smuzhiyunfunctionality. 19*4882a593Smuzhiyun 20*4882a593SmuzhiyunThe user library is open source and available at: 21*4882a593Smuzhiyun 22*4882a593Smuzhiyun https://oss.oracle.com/git/gitweb.cgi?p=libdax.git 23*4882a593Smuzhiyun 24*4882a593SmuzhiyunThe Hypervisor interface to the coprocessor is described in detail in 25*4882a593Smuzhiyunthe accompanying document, dax-hv-api.txt, which is a plain text 26*4882a593Smuzhiyunexcerpt of the (Oracle internal) "UltraSPARC Virtual Machine 27*4882a593SmuzhiyunSpecification" version 3.0.20+15, dated 2017-09-25. 28*4882a593Smuzhiyun 29*4882a593Smuzhiyun 30*4882a593SmuzhiyunHigh Level Overview 31*4882a593Smuzhiyun=================== 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunA coprocessor request is described by a Command Control Block 34*4882a593Smuzhiyun(CCB). The CCB contains an opcode and various parameters. The opcode 35*4882a593Smuzhiyunspecifies what operation is to be done, and the parameters specify 36*4882a593Smuzhiyunoptions, flags, sizes, and addresses. The CCB (or an array of CCBs) 37*4882a593Smuzhiyunis passed to the Hypervisor, which handles queueing and scheduling of 38*4882a593Smuzhiyunrequests to the available coprocessor execution units. A status code 39*4882a593Smuzhiyunreturned indicates if the request was submitted successfully or if 40*4882a593Smuzhiyunthere was an error. One of the addresses given in each CCB is a 41*4882a593Smuzhiyunpointer to a "completion area", which is a 128 byte memory block that 42*4882a593Smuzhiyunis written by the coprocessor to provide execution status. No 43*4882a593Smuzhiyuninterrupt is generated upon completion; the completion area must be 44*4882a593Smuzhiyunpolled by software to find out when a transaction has finished, but 45*4882a593Smuzhiyunthe M7 and later processors provide a mechanism to pause the virtual 46*4882a593Smuzhiyunprocessor until the completion status has been updated by the 47*4882a593Smuzhiyuncoprocessor. This is done using the monitored load and mwait 48*4882a593Smuzhiyuninstructions, which are described in more detail later. The DAX 49*4882a593Smuzhiyuncoprocessor was designed so that after a request is submitted, the 50*4882a593Smuzhiyunkernel is no longer involved in the processing of it. The polling is 51*4882a593Smuzhiyundone at the user level, which results in almost zero latency between 52*4882a593Smuzhiyuncompletion of a request and resumption of execution of the requesting 53*4882a593Smuzhiyunthread. 54*4882a593Smuzhiyun 55*4882a593Smuzhiyun 56*4882a593SmuzhiyunAddressing Memory 57*4882a593Smuzhiyun================= 58*4882a593Smuzhiyun 59*4882a593SmuzhiyunThe kernel does not have access to physical memory in the Sun4v 60*4882a593Smuzhiyunarchitecture, as there is an additional level of memory virtualization 61*4882a593Smuzhiyunpresent. This intermediate level is called "real" memory, and the 62*4882a593Smuzhiyunkernel treats this as if it were physical. The Hypervisor handles the 63*4882a593Smuzhiyuntranslations between real memory and physical so that each logical 64*4882a593Smuzhiyundomain (LDOM) can have a partition of physical memory that is isolated 65*4882a593Smuzhiyunfrom that of other LDOMs. When the kernel sets up a virtual mapping, 66*4882a593Smuzhiyunit specifies a virtual address and the real address to which it should 67*4882a593Smuzhiyunbe mapped. 68*4882a593Smuzhiyun 69*4882a593SmuzhiyunThe DAX coprocessor can only operate on physical memory, so before a 70*4882a593Smuzhiyunrequest can be fed to the coprocessor, all the addresses in a CCB must 71*4882a593Smuzhiyunbe converted into physical addresses. The kernel cannot do this since 72*4882a593Smuzhiyunit has no visibility into physical addresses. So a CCB may contain 73*4882a593Smuzhiyuneither the virtual or real addresses of the buffers or a combination 74*4882a593Smuzhiyunof them. An "address type" field is available for each address that 75*4882a593Smuzhiyunmay be given in the CCB. In all cases, the Hypervisor will translate 76*4882a593Smuzhiyunall the addresses to physical before dispatching to hardware. Address 77*4882a593Smuzhiyuntranslations are performed using the context of the process initiating 78*4882a593Smuzhiyunthe request. 79*4882a593Smuzhiyun 80*4882a593Smuzhiyun 81*4882a593SmuzhiyunThe Driver API 82*4882a593Smuzhiyun============== 83*4882a593Smuzhiyun 84*4882a593SmuzhiyunAn application makes requests to the driver via the write() system 85*4882a593Smuzhiyuncall, and gets results (if any) via read(). The completion areas are 86*4882a593Smuzhiyunmade accessible via mmap(), and are read-only for the application. 87*4882a593Smuzhiyun 88*4882a593SmuzhiyunThe request may either be an immediate command or an array of CCBs to 89*4882a593Smuzhiyunbe submitted to the hardware. 90*4882a593Smuzhiyun 91*4882a593SmuzhiyunEach open instance of the device is exclusive to the thread that 92*4882a593Smuzhiyunopened it, and must be used by that thread for all subsequent 93*4882a593Smuzhiyunoperations. The driver open function creates a new context for the 94*4882a593Smuzhiyunthread and initializes it for use. This context contains pointers and 95*4882a593Smuzhiyunvalues used internally by the driver to keep track of submitted 96*4882a593Smuzhiyunrequests. The completion area buffer is also allocated, and this is 97*4882a593Smuzhiyunlarge enough to contain the completion areas for many concurrent 98*4882a593Smuzhiyunrequests. When the device is closed, any outstanding transactions are 99*4882a593Smuzhiyunflushed and the context is cleaned up. 100*4882a593Smuzhiyun 101*4882a593SmuzhiyunOn a DAX1 system (M7), the device will be called "oradax1", while on a 102*4882a593SmuzhiyunDAX2 system (M8) it will be "oradax2". If an application requires one 103*4882a593Smuzhiyunor the other, it should simply attempt to open the appropriate 104*4882a593Smuzhiyundevice. Only one of the devices will exist on any given system, so the 105*4882a593Smuzhiyunname can be used to determine what the platform supports. 106*4882a593Smuzhiyun 107*4882a593SmuzhiyunThe immediate commands are CCB_DEQUEUE, CCB_KILL, and CCB_INFO. For 108*4882a593Smuzhiyunall of these, success is indicated by a return value from write() 109*4882a593Smuzhiyunequal to the number of bytes given in the call. Otherwise -1 is 110*4882a593Smuzhiyunreturned and errno is set. 111*4882a593Smuzhiyun 112*4882a593SmuzhiyunCCB_DEQUEUE 113*4882a593Smuzhiyun----------- 114*4882a593Smuzhiyun 115*4882a593SmuzhiyunTells the driver to clean up resources associated with past 116*4882a593Smuzhiyunrequests. Since no interrupt is generated upon the completion of a 117*4882a593Smuzhiyunrequest, the driver must be told when it may reclaim resources. No 118*4882a593Smuzhiyunfurther status information is returned, so the user should not 119*4882a593Smuzhiyunsubsequently call read(). 120*4882a593Smuzhiyun 121*4882a593SmuzhiyunCCB_KILL 122*4882a593Smuzhiyun-------- 123*4882a593Smuzhiyun 124*4882a593SmuzhiyunKills a CCB during execution. The CCB is guaranteed to not continue 125*4882a593Smuzhiyunexecuting once this call returns successfully. On success, read() must 126*4882a593Smuzhiyunbe called to retrieve the result of the action. 127*4882a593Smuzhiyun 128*4882a593SmuzhiyunCCB_INFO 129*4882a593Smuzhiyun-------- 130*4882a593Smuzhiyun 131*4882a593SmuzhiyunRetrieves information about a currently executing CCB. Note that some 132*4882a593SmuzhiyunHypervisors might return 'notfound' when the CCB is in 'inprogress' 133*4882a593Smuzhiyunstate. To ensure a CCB in the 'notfound' state will never be executed, 134*4882a593SmuzhiyunCCB_KILL must be invoked on that CCB. Upon success, read() must be 135*4882a593Smuzhiyuncalled to retrieve the details of the action. 136*4882a593Smuzhiyun 137*4882a593SmuzhiyunSubmission of an array of CCBs for execution 138*4882a593Smuzhiyun--------------------------------------------- 139*4882a593Smuzhiyun 140*4882a593SmuzhiyunA write() whose length is a multiple of the CCB size is treated as a 141*4882a593Smuzhiyunsubmit operation. The file offset is treated as the index of the 142*4882a593Smuzhiyuncompletion area to use, and may be set via lseek() or using the 143*4882a593Smuzhiyunpwrite() system call. If -1 is returned then errno is set to indicate 144*4882a593Smuzhiyunthe error. Otherwise, the return value is the length of the array that 145*4882a593Smuzhiyunwas actually accepted by the coprocessor. If the accepted length is 146*4882a593Smuzhiyunequal to the requested length, then the submission was completely 147*4882a593Smuzhiyunsuccessful and there is no further status needed; hence, the user 148*4882a593Smuzhiyunshould not subsequently call read(). Partial acceptance of the CCB 149*4882a593Smuzhiyunarray is indicated by a return value less than the requested length, 150*4882a593Smuzhiyunand read() must be called to retrieve further status information. The 151*4882a593Smuzhiyunstatus will reflect the error caused by the first CCB that was not 152*4882a593Smuzhiyunaccepted, and status_data will provide additional data in some cases. 153*4882a593Smuzhiyun 154*4882a593SmuzhiyunMMAP 155*4882a593Smuzhiyun---- 156*4882a593Smuzhiyun 157*4882a593SmuzhiyunThe mmap() function provides access to the completion area allocated 158*4882a593Smuzhiyunin the driver. Note that the completion area is not writeable by the 159*4882a593Smuzhiyunuser process, and the mmap call must not specify PROT_WRITE. 160*4882a593Smuzhiyun 161*4882a593Smuzhiyun 162*4882a593SmuzhiyunCompletion of a Request 163*4882a593Smuzhiyun======================= 164*4882a593Smuzhiyun 165*4882a593SmuzhiyunThe first byte in each completion area is the command status which is 166*4882a593Smuzhiyunupdated by the coprocessor hardware. Software may take advantage of 167*4882a593Smuzhiyunnew M7/M8 processor capabilities to efficiently poll this status byte. 168*4882a593SmuzhiyunFirst, a "monitored load" is achieved via a Load from Alternate Space 169*4882a593Smuzhiyun(ldxa, lduba, etc.) with ASI 0x84 (ASI_MONITOR_PRIMARY). Second, a 170*4882a593Smuzhiyun"monitored wait" is achieved via the mwait instruction (a write to 171*4882a593Smuzhiyun%asr28). This instruction is like pause in that it suspends execution 172*4882a593Smuzhiyunof the virtual processor for the given number of nanoseconds, but in 173*4882a593Smuzhiyunaddition will terminate early when one of several events occur. If the 174*4882a593Smuzhiyunblock of data containing the monitored location is modified, then the 175*4882a593Smuzhiyunmwait terminates. This causes software to resume execution immediately 176*4882a593Smuzhiyun(without a context switch or kernel to user transition) after a 177*4882a593Smuzhiyuntransaction completes. Thus the latency between transaction completion 178*4882a593Smuzhiyunand resumption of execution may be just a few nanoseconds. 179*4882a593Smuzhiyun 180*4882a593Smuzhiyun 181*4882a593SmuzhiyunApplication Life Cycle of a DAX Submission 182*4882a593Smuzhiyun========================================== 183*4882a593Smuzhiyun 184*4882a593Smuzhiyun - open dax device 185*4882a593Smuzhiyun - call mmap() to get the completion area address 186*4882a593Smuzhiyun - allocate a CCB and fill in the opcode, flags, parameters, addresses, etc. 187*4882a593Smuzhiyun - submit CCB via write() or pwrite() 188*4882a593Smuzhiyun - go into a loop executing monitored load + monitored wait and 189*4882a593Smuzhiyun terminate when the command status indicates the request is complete 190*4882a593Smuzhiyun (CCB_KILL or CCB_INFO may be used any time as necessary) 191*4882a593Smuzhiyun - perform a CCB_DEQUEUE 192*4882a593Smuzhiyun - call munmap() for completion area 193*4882a593Smuzhiyun - close the dax device 194*4882a593Smuzhiyun 195*4882a593Smuzhiyun 196*4882a593SmuzhiyunMemory Constraints 197*4882a593Smuzhiyun================== 198*4882a593Smuzhiyun 199*4882a593SmuzhiyunThe DAX hardware operates only on physical addresses. Therefore, it is 200*4882a593Smuzhiyunnot aware of virtual memory mappings and the discontiguities that may 201*4882a593Smuzhiyunexist in the physical memory that a virtual buffer maps to. There is 202*4882a593Smuzhiyunno I/O TLB or any scatter/gather mechanism. All buffers, whether input 203*4882a593Smuzhiyunor output, must reside in a physically contiguous region of memory. 204*4882a593Smuzhiyun 205*4882a593SmuzhiyunThe Hypervisor translates all addresses within a CCB to physical 206*4882a593Smuzhiyunbefore handing off the CCB to DAX. The Hypervisor determines the 207*4882a593Smuzhiyunvirtual page size for each virtual address given, and uses this to 208*4882a593Smuzhiyunprogram a size limit for each address. This prevents the coprocessor 209*4882a593Smuzhiyunfrom reading or writing beyond the bound of the virtual page, even 210*4882a593Smuzhiyunthough it is accessing physical memory directly. A simpler way of 211*4882a593Smuzhiyunsaying this is that a DAX operation will never "cross" a virtual page 212*4882a593Smuzhiyunboundary. If an 8k virtual page is used, then the data is strictly 213*4882a593Smuzhiyunlimited to 8k. If a user's buffer is larger than 8k, then a larger 214*4882a593Smuzhiyunpage size must be used, or the transaction size will be truncated to 215*4882a593Smuzhiyun8k. 216*4882a593Smuzhiyun 217*4882a593SmuzhiyunHuge pages. A user may allocate huge pages using standard interfaces. 218*4882a593SmuzhiyunMemory buffers residing on huge pages may be used to achieve much 219*4882a593Smuzhiyunlarger DAX transaction sizes, but the rules must still be followed, 220*4882a593Smuzhiyunand no transaction will cross a page boundary, even a huge page. A 221*4882a593Smuzhiyunmajor caveat is that Linux on Sparc presents 8Mb as one of the huge 222*4882a593Smuzhiyunpage sizes. Sparc does not actually provide a 8Mb hardware page size, 223*4882a593Smuzhiyunand this size is synthesized by pasting together two 4Mb pages. The 224*4882a593Smuzhiyunreasons for this are historical, and it creates an issue because only 225*4882a593Smuzhiyunhalf of this 8Mb page can actually be used for any given buffer in a 226*4882a593SmuzhiyunDAX request, and it must be either the first half or the second half; 227*4882a593Smuzhiyunit cannot be a 4Mb chunk in the middle, since that crosses a 228*4882a593Smuzhiyun(hardware) page boundary. Note that this entire issue may be hidden by 229*4882a593Smuzhiyunhigher level libraries. 230*4882a593Smuzhiyun 231*4882a593Smuzhiyun 232*4882a593SmuzhiyunCCB Structure 233*4882a593Smuzhiyun------------- 234*4882a593SmuzhiyunA CCB is an array of 8 64-bit words. Several of these words provide 235*4882a593Smuzhiyuncommand opcodes, parameters, flags, etc., and the rest are addresses 236*4882a593Smuzhiyunfor the completion area, output buffer, and various inputs:: 237*4882a593Smuzhiyun 238*4882a593Smuzhiyun struct ccb { 239*4882a593Smuzhiyun u64 control; 240*4882a593Smuzhiyun u64 completion; 241*4882a593Smuzhiyun u64 input0; 242*4882a593Smuzhiyun u64 access; 243*4882a593Smuzhiyun u64 input1; 244*4882a593Smuzhiyun u64 op_data; 245*4882a593Smuzhiyun u64 output; 246*4882a593Smuzhiyun u64 table; 247*4882a593Smuzhiyun }; 248*4882a593Smuzhiyun 249*4882a593SmuzhiyunSee libdax/common/sys/dax1/dax1_ccb.h for a detailed description of 250*4882a593Smuzhiyuneach of these fields, and see dax-hv-api.txt for a complete description 251*4882a593Smuzhiyunof the Hypervisor API available to the guest OS (ie, Linux kernel). 252*4882a593Smuzhiyun 253*4882a593SmuzhiyunThe first word (control) is examined by the driver for the following: 254*4882a593Smuzhiyun - CCB version, which must be consistent with hardware version 255*4882a593Smuzhiyun - Opcode, which must be one of the documented allowable commands 256*4882a593Smuzhiyun - Address types, which must be set to "virtual" for all the addresses 257*4882a593Smuzhiyun given by the user, thereby ensuring that the application can 258*4882a593Smuzhiyun only access memory that it owns 259*4882a593Smuzhiyun 260*4882a593Smuzhiyun 261*4882a593SmuzhiyunExample Code 262*4882a593Smuzhiyun============ 263*4882a593Smuzhiyun 264*4882a593SmuzhiyunThe DAX is accessible to both user and kernel code. The kernel code 265*4882a593Smuzhiyuncan make hypercalls directly while the user code must use wrappers 266*4882a593Smuzhiyunprovided by the driver. The setup of the CCB is nearly identical for 267*4882a593Smuzhiyunboth; the only difference is in preparation of the completion area. An 268*4882a593Smuzhiyunexample of user code is given now, with kernel code afterwards. 269*4882a593Smuzhiyun 270*4882a593SmuzhiyunIn order to program using the driver API, the file 271*4882a593Smuzhiyunarch/sparc/include/uapi/asm/oradax.h must be included. 272*4882a593Smuzhiyun 273*4882a593SmuzhiyunFirst, the proper device must be opened. For M7 it will be 274*4882a593Smuzhiyun/dev/oradax1 and for M8 it will be /dev/oradax2. The simplest 275*4882a593Smuzhiyunprocedure is to attempt to open both, as only one will succeed:: 276*4882a593Smuzhiyun 277*4882a593Smuzhiyun fd = open("/dev/oradax1", O_RDWR); 278*4882a593Smuzhiyun if (fd < 0) 279*4882a593Smuzhiyun fd = open("/dev/oradax2", O_RDWR); 280*4882a593Smuzhiyun if (fd < 0) 281*4882a593Smuzhiyun /* No DAX found */ 282*4882a593Smuzhiyun 283*4882a593SmuzhiyunNext, the completion area must be mapped:: 284*4882a593Smuzhiyun 285*4882a593Smuzhiyun completion_area = mmap(NULL, DAX_MMAP_LEN, PROT_READ, MAP_SHARED, fd, 0); 286*4882a593Smuzhiyun 287*4882a593SmuzhiyunAll input and output buffers must be fully contained in one hardware 288*4882a593Smuzhiyunpage, since as explained above, the DAX is strictly constrained by 289*4882a593Smuzhiyunvirtual page boundaries. In addition, the output buffer must be 290*4882a593Smuzhiyun64-byte aligned and its size must be a multiple of 64 bytes because 291*4882a593Smuzhiyunthe coprocessor writes in units of cache lines. 292*4882a593Smuzhiyun 293*4882a593SmuzhiyunThis example demonstrates the DAX Scan command, which takes as input a 294*4882a593Smuzhiyunvector and a match value, and produces a bitmap as the output. For 295*4882a593Smuzhiyuneach input element that matches the value, the corresponding bit is 296*4882a593Smuzhiyunset in the output. 297*4882a593Smuzhiyun 298*4882a593SmuzhiyunIn this example, the input vector consists of a series of single bits, 299*4882a593Smuzhiyunand the match value is 0. So each 0 bit in the input will produce a 1 300*4882a593Smuzhiyunin the output, and vice versa, which produces an output bitmap which 301*4882a593Smuzhiyunis the input bitmap inverted. 302*4882a593Smuzhiyun 303*4882a593SmuzhiyunFor details of all the parameters and bits used in this CCB, please 304*4882a593Smuzhiyunrefer to section 36.2.1.3 of the DAX Hypervisor API document, which 305*4882a593Smuzhiyundescribes the Scan command in detail:: 306*4882a593Smuzhiyun 307*4882a593Smuzhiyun ccb->control = /* Table 36.1, CCB Header Format */ 308*4882a593Smuzhiyun (2L << 48) /* command = Scan Value */ 309*4882a593Smuzhiyun | (3L << 40) /* output address type = primary virtual */ 310*4882a593Smuzhiyun | (3L << 34) /* primary input address type = primary virtual */ 311*4882a593Smuzhiyun /* Section 36.2.1, Query CCB Command Formats */ 312*4882a593Smuzhiyun | (1 << 28) /* 36.2.1.1.1 primary input format = fixed width bit packed */ 313*4882a593Smuzhiyun | (0 << 23) /* 36.2.1.1.2 primary input element size = 0 (1 bit) */ 314*4882a593Smuzhiyun | (8 << 10) /* 36.2.1.1.6 output format = bit vector */ 315*4882a593Smuzhiyun | (0 << 5) /* 36.2.1.3 First scan criteria size = 0 (1 byte) */ 316*4882a593Smuzhiyun | (31 << 0); /* 36.2.1.3 Disable second scan criteria */ 317*4882a593Smuzhiyun 318*4882a593Smuzhiyun ccb->completion = 0; /* Completion area address, to be filled in by driver */ 319*4882a593Smuzhiyun 320*4882a593Smuzhiyun ccb->input0 = (unsigned long) input; /* primary input address */ 321*4882a593Smuzhiyun 322*4882a593Smuzhiyun ccb->access = /* Section 36.2.1.2, Data Access Control */ 323*4882a593Smuzhiyun (2 << 24) /* Primary input length format = bits */ 324*4882a593Smuzhiyun | (nbits - 1); /* number of bits in primary input stream, minus 1 */ 325*4882a593Smuzhiyun 326*4882a593Smuzhiyun ccb->input1 = 0; /* secondary input address, unused */ 327*4882a593Smuzhiyun 328*4882a593Smuzhiyun ccb->op_data = 0; /* scan criteria (value to be matched) */ 329*4882a593Smuzhiyun 330*4882a593Smuzhiyun ccb->output = (unsigned long) output; /* output address */ 331*4882a593Smuzhiyun 332*4882a593Smuzhiyun ccb->table = 0; /* table address, unused */ 333*4882a593Smuzhiyun 334*4882a593SmuzhiyunThe CCB submission is a write() or pwrite() system call to the 335*4882a593Smuzhiyundriver. If the call fails, then a read() must be used to retrieve the 336*4882a593Smuzhiyunstatus:: 337*4882a593Smuzhiyun 338*4882a593Smuzhiyun if (pwrite(fd, ccb, 64, 0) != 64) { 339*4882a593Smuzhiyun struct ccb_exec_result status; 340*4882a593Smuzhiyun read(fd, &status, sizeof(status)); 341*4882a593Smuzhiyun /* bail out */ 342*4882a593Smuzhiyun } 343*4882a593Smuzhiyun 344*4882a593SmuzhiyunAfter a successful submission of the CCB, the completion area may be 345*4882a593Smuzhiyunpolled to determine when the DAX is finished. Detailed information on 346*4882a593Smuzhiyunthe contents of the completion area can be found in section 36.2.2 of 347*4882a593Smuzhiyunthe DAX HV API document:: 348*4882a593Smuzhiyun 349*4882a593Smuzhiyun while (1) { 350*4882a593Smuzhiyun /* Monitored Load */ 351*4882a593Smuzhiyun __asm__ __volatile__("lduba [%1] 0x84, %0\n" 352*4882a593Smuzhiyun : "=r" (status) 353*4882a593Smuzhiyun : "r" (completion_area)); 354*4882a593Smuzhiyun 355*4882a593Smuzhiyun if (status) /* 0 indicates command in progress */ 356*4882a593Smuzhiyun break; 357*4882a593Smuzhiyun 358*4882a593Smuzhiyun /* MWAIT */ 359*4882a593Smuzhiyun __asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::); /* 1000 ns */ 360*4882a593Smuzhiyun } 361*4882a593Smuzhiyun 362*4882a593SmuzhiyunA completion area status of 1 indicates successful completion of the 363*4882a593SmuzhiyunCCB and validity of the output bitmap, which may be used immediately. 364*4882a593SmuzhiyunAll other non-zero values indicate error conditions which are 365*4882a593Smuzhiyundescribed in section 36.2.2:: 366*4882a593Smuzhiyun 367*4882a593Smuzhiyun if (completion_area[0] != 1) { /* section 36.2.2, 1 = command ran and succeeded */ 368*4882a593Smuzhiyun /* completion_area[0] contains the completion status */ 369*4882a593Smuzhiyun /* completion_area[1] contains an error code, see 36.2.2 */ 370*4882a593Smuzhiyun } 371*4882a593Smuzhiyun 372*4882a593SmuzhiyunAfter the completion area has been processed, the driver must be 373*4882a593Smuzhiyunnotified that it can release any resources associated with the 374*4882a593Smuzhiyunrequest. This is done via the dequeue operation:: 375*4882a593Smuzhiyun 376*4882a593Smuzhiyun struct dax_command cmd; 377*4882a593Smuzhiyun cmd.command = CCB_DEQUEUE; 378*4882a593Smuzhiyun if (write(fd, &cmd, sizeof(cmd)) != sizeof(cmd)) { 379*4882a593Smuzhiyun /* bail out */ 380*4882a593Smuzhiyun } 381*4882a593Smuzhiyun 382*4882a593SmuzhiyunFinally, normal program cleanup should be done, i.e., unmapping 383*4882a593Smuzhiyuncompletion area, closing the dax device, freeing memory etc. 384*4882a593Smuzhiyun 385*4882a593SmuzhiyunKernel example 386*4882a593Smuzhiyun-------------- 387*4882a593Smuzhiyun 388*4882a593SmuzhiyunThe only difference in using the DAX in kernel code is the treatment 389*4882a593Smuzhiyunof the completion area. Unlike user applications which mmap the 390*4882a593Smuzhiyuncompletion area allocated by the driver, kernel code must allocate its 391*4882a593Smuzhiyunown memory to use for the completion area, and this address and its 392*4882a593Smuzhiyuntype must be given in the CCB:: 393*4882a593Smuzhiyun 394*4882a593Smuzhiyun ccb->control |= /* Table 36.1, CCB Header Format */ 395*4882a593Smuzhiyun (3L << 32); /* completion area address type = primary virtual */ 396*4882a593Smuzhiyun 397*4882a593Smuzhiyun ccb->completion = (unsigned long) completion_area; /* Completion area address */ 398*4882a593Smuzhiyun 399*4882a593SmuzhiyunThe dax submit hypercall is made directly. The flags used in the 400*4882a593Smuzhiyunccb_submit call are documented in the DAX HV API in section 36.3.1/ 401*4882a593Smuzhiyun 402*4882a593Smuzhiyun:: 403*4882a593Smuzhiyun 404*4882a593Smuzhiyun #include <asm/hypervisor.h> 405*4882a593Smuzhiyun 406*4882a593Smuzhiyun hv_rv = sun4v_ccb_submit((unsigned long)ccb, 64, 407*4882a593Smuzhiyun HV_CCB_QUERY_CMD | 408*4882a593Smuzhiyun HV_CCB_ARG0_PRIVILEGED | HV_CCB_ARG0_TYPE_PRIMARY | 409*4882a593Smuzhiyun HV_CCB_VA_PRIVILEGED, 410*4882a593Smuzhiyun 0, &bytes_accepted, &status_data); 411*4882a593Smuzhiyun 412*4882a593Smuzhiyun if (hv_rv != HV_EOK) { 413*4882a593Smuzhiyun /* hv_rv is an error code, status_data contains */ 414*4882a593Smuzhiyun /* potential additional status, see 36.3.1.1 */ 415*4882a593Smuzhiyun } 416*4882a593Smuzhiyun 417*4882a593SmuzhiyunAfter the submission, the completion area polling code is identical to 418*4882a593Smuzhiyunthat in user land:: 419*4882a593Smuzhiyun 420*4882a593Smuzhiyun while (1) { 421*4882a593Smuzhiyun /* Monitored Load */ 422*4882a593Smuzhiyun __asm__ __volatile__("lduba [%1] 0x84, %0\n" 423*4882a593Smuzhiyun : "=r" (status) 424*4882a593Smuzhiyun : "r" (completion_area)); 425*4882a593Smuzhiyun 426*4882a593Smuzhiyun if (status) /* 0 indicates command in progress */ 427*4882a593Smuzhiyun break; 428*4882a593Smuzhiyun 429*4882a593Smuzhiyun /* MWAIT */ 430*4882a593Smuzhiyun __asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::); /* 1000 ns */ 431*4882a593Smuzhiyun } 432*4882a593Smuzhiyun 433*4882a593Smuzhiyun if (completion_area[0] != 1) { /* section 36.2.2, 1 = command ran and succeeded */ 434*4882a593Smuzhiyun /* completion_area[0] contains the completion status */ 435*4882a593Smuzhiyun /* completion_area[1] contains an error code, see 36.2.2 */ 436*4882a593Smuzhiyun } 437*4882a593Smuzhiyun 438*4882a593SmuzhiyunThe output bitmap is ready for consumption immediately after the 439*4882a593Smuzhiyuncompletion status indicates success. 440*4882a593Smuzhiyun 441*4882a593SmuzhiyunExcer[t from UltraSPARC Virtual Machine Specification 442*4882a593Smuzhiyun===================================================== 443*4882a593Smuzhiyun 444*4882a593Smuzhiyun .. include:: dax-hv-api.txt 445*4882a593Smuzhiyun :literal: 446