xref: /OK3568_Linux_fs/kernel/Documentation/sparc/oradax/oracle-dax.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun=======================================
2*4882a593SmuzhiyunOracle Data Analytics Accelerator (DAX)
3*4882a593Smuzhiyun=======================================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunDAX is a coprocessor which resides on the SPARC M7 (DAX1) and M8
6*4882a593Smuzhiyun(DAX2) processor chips, and has direct access to the CPU's L3 caches
7*4882a593Smuzhiyunas well as physical memory. It can perform several operations on data
8*4882a593Smuzhiyunstreams with various input and output formats.  A driver provides a
9*4882a593Smuzhiyuntransport mechanism and has limited knowledge of the various opcodes
10*4882a593Smuzhiyunand data formats. A user space library provides high level services
11*4882a593Smuzhiyunand translates these into low level commands which are then passed
12*4882a593Smuzhiyuninto the driver and subsequently the Hypervisor and the coprocessor.
13*4882a593SmuzhiyunThe library is the recommended way for applications to use the
14*4882a593Smuzhiyuncoprocessor, and the driver interface is not intended for general use.
15*4882a593SmuzhiyunThis document describes the general flow of the driver, its
16*4882a593Smuzhiyunstructures, and its programmatic interface. It also provides example
17*4882a593Smuzhiyuncode sufficient to write user or kernel applications that use DAX
18*4882a593Smuzhiyunfunctionality.
19*4882a593Smuzhiyun
20*4882a593SmuzhiyunThe user library is open source and available at:
21*4882a593Smuzhiyun
22*4882a593Smuzhiyun    https://oss.oracle.com/git/gitweb.cgi?p=libdax.git
23*4882a593Smuzhiyun
24*4882a593SmuzhiyunThe Hypervisor interface to the coprocessor is described in detail in
25*4882a593Smuzhiyunthe accompanying document, dax-hv-api.txt, which is a plain text
26*4882a593Smuzhiyunexcerpt of the (Oracle internal) "UltraSPARC Virtual Machine
27*4882a593SmuzhiyunSpecification" version 3.0.20+15, dated 2017-09-25.
28*4882a593Smuzhiyun
29*4882a593Smuzhiyun
30*4882a593SmuzhiyunHigh Level Overview
31*4882a593Smuzhiyun===================
32*4882a593Smuzhiyun
33*4882a593SmuzhiyunA coprocessor request is described by a Command Control Block
34*4882a593Smuzhiyun(CCB). The CCB contains an opcode and various parameters. The opcode
35*4882a593Smuzhiyunspecifies what operation is to be done, and the parameters specify
36*4882a593Smuzhiyunoptions, flags, sizes, and addresses.  The CCB (or an array of CCBs)
37*4882a593Smuzhiyunis passed to the Hypervisor, which handles queueing and scheduling of
38*4882a593Smuzhiyunrequests to the available coprocessor execution units. A status code
39*4882a593Smuzhiyunreturned indicates if the request was submitted successfully or if
40*4882a593Smuzhiyunthere was an error.  One of the addresses given in each CCB is a
41*4882a593Smuzhiyunpointer to a "completion area", which is a 128 byte memory block that
42*4882a593Smuzhiyunis written by the coprocessor to provide execution status. No
43*4882a593Smuzhiyuninterrupt is generated upon completion; the completion area must be
44*4882a593Smuzhiyunpolled by software to find out when a transaction has finished, but
45*4882a593Smuzhiyunthe M7 and later processors provide a mechanism to pause the virtual
46*4882a593Smuzhiyunprocessor until the completion status has been updated by the
47*4882a593Smuzhiyuncoprocessor. This is done using the monitored load and mwait
48*4882a593Smuzhiyuninstructions, which are described in more detail later.  The DAX
49*4882a593Smuzhiyuncoprocessor was designed so that after a request is submitted, the
50*4882a593Smuzhiyunkernel is no longer involved in the processing of it.  The polling is
51*4882a593Smuzhiyundone at the user level, which results in almost zero latency between
52*4882a593Smuzhiyuncompletion of a request and resumption of execution of the requesting
53*4882a593Smuzhiyunthread.
54*4882a593Smuzhiyun
55*4882a593Smuzhiyun
56*4882a593SmuzhiyunAddressing Memory
57*4882a593Smuzhiyun=================
58*4882a593Smuzhiyun
59*4882a593SmuzhiyunThe kernel does not have access to physical memory in the Sun4v
60*4882a593Smuzhiyunarchitecture, as there is an additional level of memory virtualization
61*4882a593Smuzhiyunpresent. This intermediate level is called "real" memory, and the
62*4882a593Smuzhiyunkernel treats this as if it were physical.  The Hypervisor handles the
63*4882a593Smuzhiyuntranslations between real memory and physical so that each logical
64*4882a593Smuzhiyundomain (LDOM) can have a partition of physical memory that is isolated
65*4882a593Smuzhiyunfrom that of other LDOMs.  When the kernel sets up a virtual mapping,
66*4882a593Smuzhiyunit specifies a virtual address and the real address to which it should
67*4882a593Smuzhiyunbe mapped.
68*4882a593Smuzhiyun
69*4882a593SmuzhiyunThe DAX coprocessor can only operate on physical memory, so before a
70*4882a593Smuzhiyunrequest can be fed to the coprocessor, all the addresses in a CCB must
71*4882a593Smuzhiyunbe converted into physical addresses. The kernel cannot do this since
72*4882a593Smuzhiyunit has no visibility into physical addresses. So a CCB may contain
73*4882a593Smuzhiyuneither the virtual or real addresses of the buffers or a combination
74*4882a593Smuzhiyunof them. An "address type" field is available for each address that
75*4882a593Smuzhiyunmay be given in the CCB. In all cases, the Hypervisor will translate
76*4882a593Smuzhiyunall the addresses to physical before dispatching to hardware. Address
77*4882a593Smuzhiyuntranslations are performed using the context of the process initiating
78*4882a593Smuzhiyunthe request.
79*4882a593Smuzhiyun
80*4882a593Smuzhiyun
81*4882a593SmuzhiyunThe Driver API
82*4882a593Smuzhiyun==============
83*4882a593Smuzhiyun
84*4882a593SmuzhiyunAn application makes requests to the driver via the write() system
85*4882a593Smuzhiyuncall, and gets results (if any) via read(). The completion areas are
86*4882a593Smuzhiyunmade accessible via mmap(), and are read-only for the application.
87*4882a593Smuzhiyun
88*4882a593SmuzhiyunThe request may either be an immediate command or an array of CCBs to
89*4882a593Smuzhiyunbe submitted to the hardware.
90*4882a593Smuzhiyun
91*4882a593SmuzhiyunEach open instance of the device is exclusive to the thread that
92*4882a593Smuzhiyunopened it, and must be used by that thread for all subsequent
93*4882a593Smuzhiyunoperations. The driver open function creates a new context for the
94*4882a593Smuzhiyunthread and initializes it for use.  This context contains pointers and
95*4882a593Smuzhiyunvalues used internally by the driver to keep track of submitted
96*4882a593Smuzhiyunrequests. The completion area buffer is also allocated, and this is
97*4882a593Smuzhiyunlarge enough to contain the completion areas for many concurrent
98*4882a593Smuzhiyunrequests.  When the device is closed, any outstanding transactions are
99*4882a593Smuzhiyunflushed and the context is cleaned up.
100*4882a593Smuzhiyun
101*4882a593SmuzhiyunOn a DAX1 system (M7), the device will be called "oradax1", while on a
102*4882a593SmuzhiyunDAX2 system (M8) it will be "oradax2". If an application requires one
103*4882a593Smuzhiyunor the other, it should simply attempt to open the appropriate
104*4882a593Smuzhiyundevice. Only one of the devices will exist on any given system, so the
105*4882a593Smuzhiyunname can be used to determine what the platform supports.
106*4882a593Smuzhiyun
107*4882a593SmuzhiyunThe immediate commands are CCB_DEQUEUE, CCB_KILL, and CCB_INFO. For
108*4882a593Smuzhiyunall of these, success is indicated by a return value from write()
109*4882a593Smuzhiyunequal to the number of bytes given in the call. Otherwise -1 is
110*4882a593Smuzhiyunreturned and errno is set.
111*4882a593Smuzhiyun
112*4882a593SmuzhiyunCCB_DEQUEUE
113*4882a593Smuzhiyun-----------
114*4882a593Smuzhiyun
115*4882a593SmuzhiyunTells the driver to clean up resources associated with past
116*4882a593Smuzhiyunrequests. Since no interrupt is generated upon the completion of a
117*4882a593Smuzhiyunrequest, the driver must be told when it may reclaim resources.  No
118*4882a593Smuzhiyunfurther status information is returned, so the user should not
119*4882a593Smuzhiyunsubsequently call read().
120*4882a593Smuzhiyun
121*4882a593SmuzhiyunCCB_KILL
122*4882a593Smuzhiyun--------
123*4882a593Smuzhiyun
124*4882a593SmuzhiyunKills a CCB during execution. The CCB is guaranteed to not continue
125*4882a593Smuzhiyunexecuting once this call returns successfully. On success, read() must
126*4882a593Smuzhiyunbe called to retrieve the result of the action.
127*4882a593Smuzhiyun
128*4882a593SmuzhiyunCCB_INFO
129*4882a593Smuzhiyun--------
130*4882a593Smuzhiyun
131*4882a593SmuzhiyunRetrieves information about a currently executing CCB. Note that some
132*4882a593SmuzhiyunHypervisors might return 'notfound' when the CCB is in 'inprogress'
133*4882a593Smuzhiyunstate. To ensure a CCB in the 'notfound' state will never be executed,
134*4882a593SmuzhiyunCCB_KILL must be invoked on that CCB. Upon success, read() must be
135*4882a593Smuzhiyuncalled to retrieve the details of the action.
136*4882a593Smuzhiyun
137*4882a593SmuzhiyunSubmission of an array of CCBs for execution
138*4882a593Smuzhiyun---------------------------------------------
139*4882a593Smuzhiyun
140*4882a593SmuzhiyunA write() whose length is a multiple of the CCB size is treated as a
141*4882a593Smuzhiyunsubmit operation. The file offset is treated as the index of the
142*4882a593Smuzhiyuncompletion area to use, and may be set via lseek() or using the
143*4882a593Smuzhiyunpwrite() system call. If -1 is returned then errno is set to indicate
144*4882a593Smuzhiyunthe error. Otherwise, the return value is the length of the array that
145*4882a593Smuzhiyunwas actually accepted by the coprocessor. If the accepted length is
146*4882a593Smuzhiyunequal to the requested length, then the submission was completely
147*4882a593Smuzhiyunsuccessful and there is no further status needed; hence, the user
148*4882a593Smuzhiyunshould not subsequently call read(). Partial acceptance of the CCB
149*4882a593Smuzhiyunarray is indicated by a return value less than the requested length,
150*4882a593Smuzhiyunand read() must be called to retrieve further status information.  The
151*4882a593Smuzhiyunstatus will reflect the error caused by the first CCB that was not
152*4882a593Smuzhiyunaccepted, and status_data will provide additional data in some cases.
153*4882a593Smuzhiyun
154*4882a593SmuzhiyunMMAP
155*4882a593Smuzhiyun----
156*4882a593Smuzhiyun
157*4882a593SmuzhiyunThe mmap() function provides access to the completion area allocated
158*4882a593Smuzhiyunin the driver.  Note that the completion area is not writeable by the
159*4882a593Smuzhiyunuser process, and the mmap call must not specify PROT_WRITE.
160*4882a593Smuzhiyun
161*4882a593Smuzhiyun
162*4882a593SmuzhiyunCompletion of a Request
163*4882a593Smuzhiyun=======================
164*4882a593Smuzhiyun
165*4882a593SmuzhiyunThe first byte in each completion area is the command status which is
166*4882a593Smuzhiyunupdated by the coprocessor hardware. Software may take advantage of
167*4882a593Smuzhiyunnew M7/M8 processor capabilities to efficiently poll this status byte.
168*4882a593SmuzhiyunFirst, a "monitored load" is achieved via a Load from Alternate Space
169*4882a593Smuzhiyun(ldxa, lduba, etc.) with ASI 0x84 (ASI_MONITOR_PRIMARY).  Second, a
170*4882a593Smuzhiyun"monitored wait" is achieved via the mwait instruction (a write to
171*4882a593Smuzhiyun%asr28). This instruction is like pause in that it suspends execution
172*4882a593Smuzhiyunof the virtual processor for the given number of nanoseconds, but in
173*4882a593Smuzhiyunaddition will terminate early when one of several events occur. If the
174*4882a593Smuzhiyunblock of data containing the monitored location is modified, then the
175*4882a593Smuzhiyunmwait terminates. This causes software to resume execution immediately
176*4882a593Smuzhiyun(without a context switch or kernel to user transition) after a
177*4882a593Smuzhiyuntransaction completes. Thus the latency between transaction completion
178*4882a593Smuzhiyunand resumption of execution may be just a few nanoseconds.
179*4882a593Smuzhiyun
180*4882a593Smuzhiyun
181*4882a593SmuzhiyunApplication Life Cycle of a DAX Submission
182*4882a593Smuzhiyun==========================================
183*4882a593Smuzhiyun
184*4882a593Smuzhiyun - open dax device
185*4882a593Smuzhiyun - call mmap() to get the completion area address
186*4882a593Smuzhiyun - allocate a CCB and fill in the opcode, flags, parameters, addresses, etc.
187*4882a593Smuzhiyun - submit CCB via write() or pwrite()
188*4882a593Smuzhiyun - go into a loop executing monitored load + monitored wait and
189*4882a593Smuzhiyun   terminate when the command status indicates the request is complete
190*4882a593Smuzhiyun   (CCB_KILL or CCB_INFO may be used any time as necessary)
191*4882a593Smuzhiyun - perform a CCB_DEQUEUE
192*4882a593Smuzhiyun - call munmap() for completion area
193*4882a593Smuzhiyun - close the dax device
194*4882a593Smuzhiyun
195*4882a593Smuzhiyun
196*4882a593SmuzhiyunMemory Constraints
197*4882a593Smuzhiyun==================
198*4882a593Smuzhiyun
199*4882a593SmuzhiyunThe DAX hardware operates only on physical addresses. Therefore, it is
200*4882a593Smuzhiyunnot aware of virtual memory mappings and the discontiguities that may
201*4882a593Smuzhiyunexist in the physical memory that a virtual buffer maps to. There is
202*4882a593Smuzhiyunno I/O TLB or any scatter/gather mechanism. All buffers, whether input
203*4882a593Smuzhiyunor output, must reside in a physically contiguous region of memory.
204*4882a593Smuzhiyun
205*4882a593SmuzhiyunThe Hypervisor translates all addresses within a CCB to physical
206*4882a593Smuzhiyunbefore handing off the CCB to DAX. The Hypervisor determines the
207*4882a593Smuzhiyunvirtual page size for each virtual address given, and uses this to
208*4882a593Smuzhiyunprogram a size limit for each address. This prevents the coprocessor
209*4882a593Smuzhiyunfrom reading or writing beyond the bound of the virtual page, even
210*4882a593Smuzhiyunthough it is accessing physical memory directly. A simpler way of
211*4882a593Smuzhiyunsaying this is that a DAX operation will never "cross" a virtual page
212*4882a593Smuzhiyunboundary. If an 8k virtual page is used, then the data is strictly
213*4882a593Smuzhiyunlimited to 8k. If a user's buffer is larger than 8k, then a larger
214*4882a593Smuzhiyunpage size must be used, or the transaction size will be truncated to
215*4882a593Smuzhiyun8k.
216*4882a593Smuzhiyun
217*4882a593SmuzhiyunHuge pages. A user may allocate huge pages using standard interfaces.
218*4882a593SmuzhiyunMemory buffers residing on huge pages may be used to achieve much
219*4882a593Smuzhiyunlarger DAX transaction sizes, but the rules must still be followed,
220*4882a593Smuzhiyunand no transaction will cross a page boundary, even a huge page.  A
221*4882a593Smuzhiyunmajor caveat is that Linux on Sparc presents 8Mb as one of the huge
222*4882a593Smuzhiyunpage sizes. Sparc does not actually provide a 8Mb hardware page size,
223*4882a593Smuzhiyunand this size is synthesized by pasting together two 4Mb pages. The
224*4882a593Smuzhiyunreasons for this are historical, and it creates an issue because only
225*4882a593Smuzhiyunhalf of this 8Mb page can actually be used for any given buffer in a
226*4882a593SmuzhiyunDAX request, and it must be either the first half or the second half;
227*4882a593Smuzhiyunit cannot be a 4Mb chunk in the middle, since that crosses a
228*4882a593Smuzhiyun(hardware) page boundary. Note that this entire issue may be hidden by
229*4882a593Smuzhiyunhigher level libraries.
230*4882a593Smuzhiyun
231*4882a593Smuzhiyun
232*4882a593SmuzhiyunCCB Structure
233*4882a593Smuzhiyun-------------
234*4882a593SmuzhiyunA CCB is an array of 8 64-bit words. Several of these words provide
235*4882a593Smuzhiyuncommand opcodes, parameters, flags, etc., and the rest are addresses
236*4882a593Smuzhiyunfor the completion area, output buffer, and various inputs::
237*4882a593Smuzhiyun
238*4882a593Smuzhiyun   struct ccb {
239*4882a593Smuzhiyun       u64   control;
240*4882a593Smuzhiyun       u64   completion;
241*4882a593Smuzhiyun       u64   input0;
242*4882a593Smuzhiyun       u64   access;
243*4882a593Smuzhiyun       u64   input1;
244*4882a593Smuzhiyun       u64   op_data;
245*4882a593Smuzhiyun       u64   output;
246*4882a593Smuzhiyun       u64   table;
247*4882a593Smuzhiyun   };
248*4882a593Smuzhiyun
249*4882a593SmuzhiyunSee libdax/common/sys/dax1/dax1_ccb.h for a detailed description of
250*4882a593Smuzhiyuneach of these fields, and see dax-hv-api.txt for a complete description
251*4882a593Smuzhiyunof the Hypervisor API available to the guest OS (ie, Linux kernel).
252*4882a593Smuzhiyun
253*4882a593SmuzhiyunThe first word (control) is examined by the driver for the following:
254*4882a593Smuzhiyun - CCB version, which must be consistent with hardware version
255*4882a593Smuzhiyun - Opcode, which must be one of the documented allowable commands
256*4882a593Smuzhiyun - Address types, which must be set to "virtual" for all the addresses
257*4882a593Smuzhiyun   given by the user, thereby ensuring that the application can
258*4882a593Smuzhiyun   only access memory that it owns
259*4882a593Smuzhiyun
260*4882a593Smuzhiyun
261*4882a593SmuzhiyunExample Code
262*4882a593Smuzhiyun============
263*4882a593Smuzhiyun
264*4882a593SmuzhiyunThe DAX is accessible to both user and kernel code.  The kernel code
265*4882a593Smuzhiyuncan make hypercalls directly while the user code must use wrappers
266*4882a593Smuzhiyunprovided by the driver. The setup of the CCB is nearly identical for
267*4882a593Smuzhiyunboth; the only difference is in preparation of the completion area. An
268*4882a593Smuzhiyunexample of user code is given now, with kernel code afterwards.
269*4882a593Smuzhiyun
270*4882a593SmuzhiyunIn order to program using the driver API, the file
271*4882a593Smuzhiyunarch/sparc/include/uapi/asm/oradax.h must be included.
272*4882a593Smuzhiyun
273*4882a593SmuzhiyunFirst, the proper device must be opened. For M7 it will be
274*4882a593Smuzhiyun/dev/oradax1 and for M8 it will be /dev/oradax2. The simplest
275*4882a593Smuzhiyunprocedure is to attempt to open both, as only one will succeed::
276*4882a593Smuzhiyun
277*4882a593Smuzhiyun	fd = open("/dev/oradax1", O_RDWR);
278*4882a593Smuzhiyun	if (fd < 0)
279*4882a593Smuzhiyun		fd = open("/dev/oradax2", O_RDWR);
280*4882a593Smuzhiyun	if (fd < 0)
281*4882a593Smuzhiyun	       /* No DAX found */
282*4882a593Smuzhiyun
283*4882a593SmuzhiyunNext, the completion area must be mapped::
284*4882a593Smuzhiyun
285*4882a593Smuzhiyun      completion_area = mmap(NULL, DAX_MMAP_LEN, PROT_READ, MAP_SHARED, fd, 0);
286*4882a593Smuzhiyun
287*4882a593SmuzhiyunAll input and output buffers must be fully contained in one hardware
288*4882a593Smuzhiyunpage, since as explained above, the DAX is strictly constrained by
289*4882a593Smuzhiyunvirtual page boundaries.  In addition, the output buffer must be
290*4882a593Smuzhiyun64-byte aligned and its size must be a multiple of 64 bytes because
291*4882a593Smuzhiyunthe coprocessor writes in units of cache lines.
292*4882a593Smuzhiyun
293*4882a593SmuzhiyunThis example demonstrates the DAX Scan command, which takes as input a
294*4882a593Smuzhiyunvector and a match value, and produces a bitmap as the output. For
295*4882a593Smuzhiyuneach input element that matches the value, the corresponding bit is
296*4882a593Smuzhiyunset in the output.
297*4882a593Smuzhiyun
298*4882a593SmuzhiyunIn this example, the input vector consists of a series of single bits,
299*4882a593Smuzhiyunand the match value is 0. So each 0 bit in the input will produce a 1
300*4882a593Smuzhiyunin the output, and vice versa, which produces an output bitmap which
301*4882a593Smuzhiyunis the input bitmap inverted.
302*4882a593Smuzhiyun
303*4882a593SmuzhiyunFor details of all the parameters and bits used in this CCB, please
304*4882a593Smuzhiyunrefer to section 36.2.1.3 of the DAX Hypervisor API document, which
305*4882a593Smuzhiyundescribes the Scan command in detail::
306*4882a593Smuzhiyun
307*4882a593Smuzhiyun	ccb->control =       /* Table 36.1, CCB Header Format */
308*4882a593Smuzhiyun		  (2L << 48)     /* command = Scan Value */
309*4882a593Smuzhiyun		| (3L << 40)     /* output address type = primary virtual */
310*4882a593Smuzhiyun		| (3L << 34)     /* primary input address type = primary virtual */
311*4882a593Smuzhiyun		             /* Section 36.2.1, Query CCB Command Formats */
312*4882a593Smuzhiyun		| (1 << 28)     /* 36.2.1.1.1 primary input format = fixed width bit packed */
313*4882a593Smuzhiyun		| (0 << 23)     /* 36.2.1.1.2 primary input element size = 0 (1 bit) */
314*4882a593Smuzhiyun		| (8 << 10)     /* 36.2.1.1.6 output format = bit vector */
315*4882a593Smuzhiyun		| (0 <<  5)	/* 36.2.1.3 First scan criteria size = 0 (1 byte) */
316*4882a593Smuzhiyun		| (31 << 0);	/* 36.2.1.3 Disable second scan criteria */
317*4882a593Smuzhiyun
318*4882a593Smuzhiyun	ccb->completion = 0;    /* Completion area address, to be filled in by driver */
319*4882a593Smuzhiyun
320*4882a593Smuzhiyun	ccb->input0 = (unsigned long) input; /* primary input address */
321*4882a593Smuzhiyun
322*4882a593Smuzhiyun	ccb->access =       /* Section 36.2.1.2, Data Access Control */
323*4882a593Smuzhiyun		  (2 << 24)    /* Primary input length format = bits */
324*4882a593Smuzhiyun		| (nbits - 1); /* number of bits in primary input stream, minus 1 */
325*4882a593Smuzhiyun
326*4882a593Smuzhiyun	ccb->input1 = 0;       /* secondary input address, unused */
327*4882a593Smuzhiyun
328*4882a593Smuzhiyun	ccb->op_data = 0;      /* scan criteria (value to be matched) */
329*4882a593Smuzhiyun
330*4882a593Smuzhiyun	ccb->output = (unsigned long) output;	/* output address */
331*4882a593Smuzhiyun
332*4882a593Smuzhiyun	ccb->table = 0;	       /* table address, unused */
333*4882a593Smuzhiyun
334*4882a593SmuzhiyunThe CCB submission is a write() or pwrite() system call to the
335*4882a593Smuzhiyundriver. If the call fails, then a read() must be used to retrieve the
336*4882a593Smuzhiyunstatus::
337*4882a593Smuzhiyun
338*4882a593Smuzhiyun	if (pwrite(fd, ccb, 64, 0) != 64) {
339*4882a593Smuzhiyun		struct ccb_exec_result status;
340*4882a593Smuzhiyun		read(fd, &status, sizeof(status));
341*4882a593Smuzhiyun		/* bail out */
342*4882a593Smuzhiyun	}
343*4882a593Smuzhiyun
344*4882a593SmuzhiyunAfter a successful submission of the CCB, the completion area may be
345*4882a593Smuzhiyunpolled to determine when the DAX is finished. Detailed information on
346*4882a593Smuzhiyunthe contents of the completion area can be found in section 36.2.2 of
347*4882a593Smuzhiyunthe DAX HV API document::
348*4882a593Smuzhiyun
349*4882a593Smuzhiyun	while (1) {
350*4882a593Smuzhiyun		/* Monitored Load */
351*4882a593Smuzhiyun		__asm__ __volatile__("lduba [%1] 0x84, %0\n"
352*4882a593Smuzhiyun				     : "=r" (status)
353*4882a593Smuzhiyun				     : "r"  (completion_area));
354*4882a593Smuzhiyun
355*4882a593Smuzhiyun		if (status)	     /* 0 indicates command in progress */
356*4882a593Smuzhiyun			break;
357*4882a593Smuzhiyun
358*4882a593Smuzhiyun		/* MWAIT */
359*4882a593Smuzhiyun		__asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::);    /* 1000 ns */
360*4882a593Smuzhiyun	}
361*4882a593Smuzhiyun
362*4882a593SmuzhiyunA completion area status of 1 indicates successful completion of the
363*4882a593SmuzhiyunCCB and validity of the output bitmap, which may be used immediately.
364*4882a593SmuzhiyunAll other non-zero values indicate error conditions which are
365*4882a593Smuzhiyundescribed in section 36.2.2::
366*4882a593Smuzhiyun
367*4882a593Smuzhiyun	if (completion_area[0] != 1) {	/* section 36.2.2, 1 = command ran and succeeded */
368*4882a593Smuzhiyun		/* completion_area[0] contains the completion status */
369*4882a593Smuzhiyun		/* completion_area[1] contains an error code, see 36.2.2 */
370*4882a593Smuzhiyun	}
371*4882a593Smuzhiyun
372*4882a593SmuzhiyunAfter the completion area has been processed, the driver must be
373*4882a593Smuzhiyunnotified that it can release any resources associated with the
374*4882a593Smuzhiyunrequest. This is done via the dequeue operation::
375*4882a593Smuzhiyun
376*4882a593Smuzhiyun	struct dax_command cmd;
377*4882a593Smuzhiyun	cmd.command = CCB_DEQUEUE;
378*4882a593Smuzhiyun	if (write(fd, &cmd, sizeof(cmd)) != sizeof(cmd)) {
379*4882a593Smuzhiyun		/* bail out */
380*4882a593Smuzhiyun	}
381*4882a593Smuzhiyun
382*4882a593SmuzhiyunFinally, normal program cleanup should be done, i.e., unmapping
383*4882a593Smuzhiyuncompletion area, closing the dax device, freeing memory etc.
384*4882a593Smuzhiyun
385*4882a593SmuzhiyunKernel example
386*4882a593Smuzhiyun--------------
387*4882a593Smuzhiyun
388*4882a593SmuzhiyunThe only difference in using the DAX in kernel code is the treatment
389*4882a593Smuzhiyunof the completion area. Unlike user applications which mmap the
390*4882a593Smuzhiyuncompletion area allocated by the driver, kernel code must allocate its
391*4882a593Smuzhiyunown memory to use for the completion area, and this address and its
392*4882a593Smuzhiyuntype must be given in the CCB::
393*4882a593Smuzhiyun
394*4882a593Smuzhiyun	ccb->control |=      /* Table 36.1, CCB Header Format */
395*4882a593Smuzhiyun	        (3L << 32);     /* completion area address type = primary virtual */
396*4882a593Smuzhiyun
397*4882a593Smuzhiyun	ccb->completion = (unsigned long) completion_area;   /* Completion area address */
398*4882a593Smuzhiyun
399*4882a593SmuzhiyunThe dax submit hypercall is made directly. The flags used in the
400*4882a593Smuzhiyunccb_submit call are documented in the DAX HV API in section 36.3.1/
401*4882a593Smuzhiyun
402*4882a593Smuzhiyun::
403*4882a593Smuzhiyun
404*4882a593Smuzhiyun  #include <asm/hypervisor.h>
405*4882a593Smuzhiyun
406*4882a593Smuzhiyun	hv_rv = sun4v_ccb_submit((unsigned long)ccb, 64,
407*4882a593Smuzhiyun				 HV_CCB_QUERY_CMD |
408*4882a593Smuzhiyun				 HV_CCB_ARG0_PRIVILEGED | HV_CCB_ARG0_TYPE_PRIMARY |
409*4882a593Smuzhiyun				 HV_CCB_VA_PRIVILEGED,
410*4882a593Smuzhiyun				 0, &bytes_accepted, &status_data);
411*4882a593Smuzhiyun
412*4882a593Smuzhiyun	if (hv_rv != HV_EOK) {
413*4882a593Smuzhiyun		/* hv_rv is an error code, status_data contains */
414*4882a593Smuzhiyun		/* potential additional status, see 36.3.1.1 */
415*4882a593Smuzhiyun	}
416*4882a593Smuzhiyun
417*4882a593SmuzhiyunAfter the submission, the completion area polling code is identical to
418*4882a593Smuzhiyunthat in user land::
419*4882a593Smuzhiyun
420*4882a593Smuzhiyun	while (1) {
421*4882a593Smuzhiyun		/* Monitored Load */
422*4882a593Smuzhiyun		__asm__ __volatile__("lduba [%1] 0x84, %0\n"
423*4882a593Smuzhiyun				     : "=r" (status)
424*4882a593Smuzhiyun				     : "r"  (completion_area));
425*4882a593Smuzhiyun
426*4882a593Smuzhiyun		if (status)	     /* 0 indicates command in progress */
427*4882a593Smuzhiyun			break;
428*4882a593Smuzhiyun
429*4882a593Smuzhiyun		/* MWAIT */
430*4882a593Smuzhiyun		__asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::);    /* 1000 ns */
431*4882a593Smuzhiyun	}
432*4882a593Smuzhiyun
433*4882a593Smuzhiyun	if (completion_area[0] != 1) {	/* section 36.2.2, 1 = command ran and succeeded */
434*4882a593Smuzhiyun		/* completion_area[0] contains the completion status */
435*4882a593Smuzhiyun		/* completion_area[1] contains an error code, see 36.2.2 */
436*4882a593Smuzhiyun	}
437*4882a593Smuzhiyun
438*4882a593SmuzhiyunThe output bitmap is ready for consumption immediately after the
439*4882a593Smuzhiyuncompletion status indicates success.
440*4882a593Smuzhiyun
441*4882a593SmuzhiyunExcer[t from UltraSPARC Virtual Machine Specification
442*4882a593Smuzhiyun=====================================================
443*4882a593Smuzhiyun
444*4882a593Smuzhiyun .. include:: dax-hv-api.txt
445*4882a593Smuzhiyun    :literal:
446