xref: /OK3568_Linux_fs/kernel/Documentation/crypto/async-tx-api.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun=====================================
4*4882a593SmuzhiyunAsynchronous Transfers/Transforms API
5*4882a593Smuzhiyun=====================================
6*4882a593Smuzhiyun
7*4882a593Smuzhiyun.. Contents
8*4882a593Smuzhiyun
9*4882a593Smuzhiyun  1. INTRODUCTION
10*4882a593Smuzhiyun
11*4882a593Smuzhiyun  2 GENEALOGY
12*4882a593Smuzhiyun
13*4882a593Smuzhiyun  3 USAGE
14*4882a593Smuzhiyun  3.1 General format of the API
15*4882a593Smuzhiyun  3.2 Supported operations
16*4882a593Smuzhiyun  3.3 Descriptor management
17*4882a593Smuzhiyun  3.4 When does the operation execute?
18*4882a593Smuzhiyun  3.5 When does the operation complete?
19*4882a593Smuzhiyun  3.6 Constraints
20*4882a593Smuzhiyun  3.7 Example
21*4882a593Smuzhiyun
22*4882a593Smuzhiyun  4 DMAENGINE DRIVER DEVELOPER NOTES
23*4882a593Smuzhiyun  4.1 Conformance points
24*4882a593Smuzhiyun  4.2 "My application needs exclusive control of hardware channels"
25*4882a593Smuzhiyun
26*4882a593Smuzhiyun  5 SOURCE
27*4882a593Smuzhiyun
28*4882a593Smuzhiyun1. Introduction
29*4882a593Smuzhiyun===============
30*4882a593Smuzhiyun
31*4882a593SmuzhiyunThe async_tx API provides methods for describing a chain of asynchronous
32*4882a593Smuzhiyunbulk memory transfers/transforms with support for inter-transactional
33*4882a593Smuzhiyundependencies.  It is implemented as a dmaengine client that smooths over
34*4882a593Smuzhiyunthe details of different hardware offload engine implementations.  Code
35*4882a593Smuzhiyunthat is written to the API can optimize for asynchronous operation and
36*4882a593Smuzhiyunthe API will fit the chain of operations to the available offload
37*4882a593Smuzhiyunresources.
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun2.Genealogy
40*4882a593Smuzhiyun===========
41*4882a593Smuzhiyun
42*4882a593SmuzhiyunThe API was initially designed to offload the memory copy and
43*4882a593Smuzhiyunxor-parity-calculations of the md-raid5 driver using the offload engines
44*4882a593Smuzhiyunpresent in the Intel(R) Xscale series of I/O processors.  It also built
45*4882a593Smuzhiyunon the 'dmaengine' layer developed for offloading memory copies in the
46*4882a593Smuzhiyunnetwork stack using Intel(R) I/OAT engines.  The following design
47*4882a593Smuzhiyunfeatures surfaced as a result:
48*4882a593Smuzhiyun
49*4882a593Smuzhiyun1. implicit synchronous path: users of the API do not need to know if
50*4882a593Smuzhiyun   the platform they are running on has offload capabilities.  The
51*4882a593Smuzhiyun   operation will be offloaded when an engine is available and carried out
52*4882a593Smuzhiyun   in software otherwise.
53*4882a593Smuzhiyun2. cross channel dependency chains: the API allows a chain of dependent
54*4882a593Smuzhiyun   operations to be submitted, like xor->copy->xor in the raid5 case.  The
55*4882a593Smuzhiyun   API automatically handles cases where the transition from one operation
56*4882a593Smuzhiyun   to another implies a hardware channel switch.
57*4882a593Smuzhiyun3. dmaengine extensions to support multiple clients and operation types
58*4882a593Smuzhiyun   beyond 'memcpy'
59*4882a593Smuzhiyun
60*4882a593Smuzhiyun3. Usage
61*4882a593Smuzhiyun========
62*4882a593Smuzhiyun
63*4882a593Smuzhiyun3.1 General format of the API
64*4882a593Smuzhiyun-----------------------------
65*4882a593Smuzhiyun
66*4882a593Smuzhiyun::
67*4882a593Smuzhiyun
68*4882a593Smuzhiyun  struct dma_async_tx_descriptor *
69*4882a593Smuzhiyun  async_<operation>(<op specific parameters>, struct async_submit ctl *submit)
70*4882a593Smuzhiyun
71*4882a593Smuzhiyun3.2 Supported operations
72*4882a593Smuzhiyun------------------------
73*4882a593Smuzhiyun
74*4882a593Smuzhiyun========  ====================================================================
75*4882a593Smuzhiyunmemcpy    memory copy between a source and a destination buffer
76*4882a593Smuzhiyunmemset    fill a destination buffer with a byte value
77*4882a593Smuzhiyunxor       xor a series of source buffers and write the result to a
78*4882a593Smuzhiyun	  destination buffer
79*4882a593Smuzhiyunxor_val   xor a series of source buffers and set a flag if the
80*4882a593Smuzhiyun	  result is zero.  The implementation attempts to prevent
81*4882a593Smuzhiyun	  writes to memory
82*4882a593Smuzhiyunpq	  generate the p+q (raid6 syndrome) from a series of source buffers
83*4882a593Smuzhiyunpq_val    validate that a p and or q buffer are in sync with a given series of
84*4882a593Smuzhiyun	  sources
85*4882a593Smuzhiyundatap	  (raid6_datap_recov) recover a raid6 data block and the p block
86*4882a593Smuzhiyun	  from the given sources
87*4882a593Smuzhiyun2data	  (raid6_2data_recov) recover 2 raid6 data blocks from the given
88*4882a593Smuzhiyun	  sources
89*4882a593Smuzhiyun========  ====================================================================
90*4882a593Smuzhiyun
91*4882a593Smuzhiyun3.3 Descriptor management
92*4882a593Smuzhiyun-------------------------
93*4882a593Smuzhiyun
94*4882a593SmuzhiyunThe return value is non-NULL and points to a 'descriptor' when the operation
95*4882a593Smuzhiyunhas been queued to execute asynchronously.  Descriptors are recycled
96*4882a593Smuzhiyunresources, under control of the offload engine driver, to be reused as
97*4882a593Smuzhiyunoperations complete.  When an application needs to submit a chain of
98*4882a593Smuzhiyunoperations it must guarantee that the descriptor is not automatically recycled
99*4882a593Smuzhiyunbefore the dependency is submitted.  This requires that all descriptors be
100*4882a593Smuzhiyunacknowledged by the application before the offload engine driver is allowed to
101*4882a593Smuzhiyunrecycle (or free) the descriptor.  A descriptor can be acked by one of the
102*4882a593Smuzhiyunfollowing methods:
103*4882a593Smuzhiyun
104*4882a593Smuzhiyun1. setting the ASYNC_TX_ACK flag if no child operations are to be submitted
105*4882a593Smuzhiyun2. submitting an unacknowledged descriptor as a dependency to another
106*4882a593Smuzhiyun   async_tx call will implicitly set the acknowledged state.
107*4882a593Smuzhiyun3. calling async_tx_ack() on the descriptor.
108*4882a593Smuzhiyun
109*4882a593Smuzhiyun3.4 When does the operation execute?
110*4882a593Smuzhiyun------------------------------------
111*4882a593Smuzhiyun
112*4882a593SmuzhiyunOperations do not immediately issue after return from the
113*4882a593Smuzhiyunasync_<operation> call.  Offload engine drivers batch operations to
114*4882a593Smuzhiyunimprove performance by reducing the number of mmio cycles needed to
115*4882a593Smuzhiyunmanage the channel.  Once a driver-specific threshold is met the driver
116*4882a593Smuzhiyunautomatically issues pending operations.  An application can force this
117*4882a593Smuzhiyunevent by calling async_tx_issue_pending_all().  This operates on all
118*4882a593Smuzhiyunchannels since the application has no knowledge of channel to operation
119*4882a593Smuzhiyunmapping.
120*4882a593Smuzhiyun
121*4882a593Smuzhiyun3.5 When does the operation complete?
122*4882a593Smuzhiyun-------------------------------------
123*4882a593Smuzhiyun
124*4882a593SmuzhiyunThere are two methods for an application to learn about the completion
125*4882a593Smuzhiyunof an operation.
126*4882a593Smuzhiyun
127*4882a593Smuzhiyun1. Call dma_wait_for_async_tx().  This call causes the CPU to spin while
128*4882a593Smuzhiyun   it polls for the completion of the operation.  It handles dependency
129*4882a593Smuzhiyun   chains and issuing pending operations.
130*4882a593Smuzhiyun2. Specify a completion callback.  The callback routine runs in tasklet
131*4882a593Smuzhiyun   context if the offload engine driver supports interrupts, or it is
132*4882a593Smuzhiyun   called in application context if the operation is carried out
133*4882a593Smuzhiyun   synchronously in software.  The callback can be set in the call to
134*4882a593Smuzhiyun   async_<operation>, or when the application needs to submit a chain of
135*4882a593Smuzhiyun   unknown length it can use the async_trigger_callback() routine to set a
136*4882a593Smuzhiyun   completion interrupt/callback at the end of the chain.
137*4882a593Smuzhiyun
138*4882a593Smuzhiyun3.6 Constraints
139*4882a593Smuzhiyun---------------
140*4882a593Smuzhiyun
141*4882a593Smuzhiyun1. Calls to async_<operation> are not permitted in IRQ context.  Other
142*4882a593Smuzhiyun   contexts are permitted provided constraint #2 is not violated.
143*4882a593Smuzhiyun2. Completion callback routines cannot submit new operations.  This
144*4882a593Smuzhiyun   results in recursion in the synchronous case and spin_locks being
145*4882a593Smuzhiyun   acquired twice in the asynchronous case.
146*4882a593Smuzhiyun
147*4882a593Smuzhiyun3.7 Example
148*4882a593Smuzhiyun-----------
149*4882a593Smuzhiyun
150*4882a593SmuzhiyunPerform a xor->copy->xor operation where each operation depends on the
151*4882a593Smuzhiyunresult from the previous operation::
152*4882a593Smuzhiyun
153*4882a593Smuzhiyun    void callback(void *param)
154*4882a593Smuzhiyun    {
155*4882a593Smuzhiyun	    struct completion *cmp = param;
156*4882a593Smuzhiyun
157*4882a593Smuzhiyun	    complete(cmp);
158*4882a593Smuzhiyun    }
159*4882a593Smuzhiyun
160*4882a593Smuzhiyun    void run_xor_copy_xor(struct page **xor_srcs,
161*4882a593Smuzhiyun			int xor_src_cnt,
162*4882a593Smuzhiyun			struct page *xor_dest,
163*4882a593Smuzhiyun			size_t xor_len,
164*4882a593Smuzhiyun			struct page *copy_src,
165*4882a593Smuzhiyun			struct page *copy_dest,
166*4882a593Smuzhiyun			size_t copy_len)
167*4882a593Smuzhiyun    {
168*4882a593Smuzhiyun	    struct dma_async_tx_descriptor *tx;
169*4882a593Smuzhiyun	    addr_conv_t addr_conv[xor_src_cnt];
170*4882a593Smuzhiyun	    struct async_submit_ctl submit;
171*4882a593Smuzhiyun	    addr_conv_t addr_conv[NDISKS];
172*4882a593Smuzhiyun	    struct completion cmp;
173*4882a593Smuzhiyun
174*4882a593Smuzhiyun	    init_async_submit(&submit, ASYNC_TX_XOR_DROP_DST, NULL, NULL, NULL,
175*4882a593Smuzhiyun			    addr_conv);
176*4882a593Smuzhiyun	    tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, &submit)
177*4882a593Smuzhiyun
178*4882a593Smuzhiyun	    submit->depend_tx = tx;
179*4882a593Smuzhiyun	    tx = async_memcpy(copy_dest, copy_src, 0, 0, copy_len, &submit);
180*4882a593Smuzhiyun
181*4882a593Smuzhiyun	    init_completion(&cmp);
182*4882a593Smuzhiyun	    init_async_submit(&submit, ASYNC_TX_XOR_DROP_DST | ASYNC_TX_ACK, tx,
183*4882a593Smuzhiyun			    callback, &cmp, addr_conv);
184*4882a593Smuzhiyun	    tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, &submit);
185*4882a593Smuzhiyun
186*4882a593Smuzhiyun	    async_tx_issue_pending_all();
187*4882a593Smuzhiyun
188*4882a593Smuzhiyun	    wait_for_completion(&cmp);
189*4882a593Smuzhiyun    }
190*4882a593Smuzhiyun
191*4882a593SmuzhiyunSee include/linux/async_tx.h for more information on the flags.  See the
192*4882a593Smuzhiyunops_run_* and ops_complete_* routines in drivers/md/raid5.c for more
193*4882a593Smuzhiyunimplementation examples.
194*4882a593Smuzhiyun
195*4882a593Smuzhiyun4. Driver Development Notes
196*4882a593Smuzhiyun===========================
197*4882a593Smuzhiyun
198*4882a593Smuzhiyun4.1 Conformance points
199*4882a593Smuzhiyun----------------------
200*4882a593Smuzhiyun
201*4882a593SmuzhiyunThere are a few conformance points required in dmaengine drivers to
202*4882a593Smuzhiyunaccommodate assumptions made by applications using the async_tx API:
203*4882a593Smuzhiyun
204*4882a593Smuzhiyun1. Completion callbacks are expected to happen in tasklet context
205*4882a593Smuzhiyun2. dma_async_tx_descriptor fields are never manipulated in IRQ context
206*4882a593Smuzhiyun3. Use async_tx_run_dependencies() in the descriptor clean up path to
207*4882a593Smuzhiyun   handle submission of dependent operations
208*4882a593Smuzhiyun
209*4882a593Smuzhiyun4.2 "My application needs exclusive control of hardware channels"
210*4882a593Smuzhiyun-----------------------------------------------------------------
211*4882a593Smuzhiyun
212*4882a593SmuzhiyunPrimarily this requirement arises from cases where a DMA engine driver
213*4882a593Smuzhiyunis being used to support device-to-memory operations.  A channel that is
214*4882a593Smuzhiyunperforming these operations cannot, for many platform specific reasons,
215*4882a593Smuzhiyunbe shared.  For these cases the dma_request_channel() interface is
216*4882a593Smuzhiyunprovided.
217*4882a593Smuzhiyun
218*4882a593SmuzhiyunThe interface is::
219*4882a593Smuzhiyun
220*4882a593Smuzhiyun  struct dma_chan *dma_request_channel(dma_cap_mask_t mask,
221*4882a593Smuzhiyun				       dma_filter_fn filter_fn,
222*4882a593Smuzhiyun				       void *filter_param);
223*4882a593Smuzhiyun
224*4882a593SmuzhiyunWhere dma_filter_fn is defined as::
225*4882a593Smuzhiyun
226*4882a593Smuzhiyun  typedef bool (*dma_filter_fn)(struct dma_chan *chan, void *filter_param);
227*4882a593Smuzhiyun
228*4882a593SmuzhiyunWhen the optional 'filter_fn' parameter is set to NULL
229*4882a593Smuzhiyundma_request_channel simply returns the first channel that satisfies the
230*4882a593Smuzhiyuncapability mask.  Otherwise, when the mask parameter is insufficient for
231*4882a593Smuzhiyunspecifying the necessary channel, the filter_fn routine can be used to
232*4882a593Smuzhiyundisposition the available channels in the system. The filter_fn routine
233*4882a593Smuzhiyunis called once for each free channel in the system.  Upon seeing a
234*4882a593Smuzhiyunsuitable channel filter_fn returns DMA_ACK which flags that channel to
235*4882a593Smuzhiyunbe the return value from dma_request_channel.  A channel allocated via
236*4882a593Smuzhiyunthis interface is exclusive to the caller, until dma_release_channel()
237*4882a593Smuzhiyunis called.
238*4882a593Smuzhiyun
239*4882a593SmuzhiyunThe DMA_PRIVATE capability flag is used to tag dma devices that should
240*4882a593Smuzhiyunnot be used by the general-purpose allocator.  It can be set at
241*4882a593Smuzhiyuninitialization time if it is known that a channel will always be
242*4882a593Smuzhiyunprivate.  Alternatively, it is set when dma_request_channel() finds an
243*4882a593Smuzhiyununused "public" channel.
244*4882a593Smuzhiyun
245*4882a593SmuzhiyunA couple caveats to note when implementing a driver and consumer:
246*4882a593Smuzhiyun
247*4882a593Smuzhiyun1. Once a channel has been privately allocated it will no longer be
248*4882a593Smuzhiyun   considered by the general-purpose allocator even after a call to
249*4882a593Smuzhiyun   dma_release_channel().
250*4882a593Smuzhiyun2. Since capabilities are specified at the device level a dma_device
251*4882a593Smuzhiyun   with multiple channels will either have all channels public, or all
252*4882a593Smuzhiyun   channels private.
253*4882a593Smuzhiyun
254*4882a593Smuzhiyun5. Source
255*4882a593Smuzhiyun---------
256*4882a593Smuzhiyun
257*4882a593Smuzhiyuninclude/linux/dmaengine.h:
258*4882a593Smuzhiyun    core header file for DMA drivers and api users
259*4882a593Smuzhiyundrivers/dma/dmaengine.c:
260*4882a593Smuzhiyun    offload engine channel management routines
261*4882a593Smuzhiyundrivers/dma/:
262*4882a593Smuzhiyun    location for offload engine drivers
263*4882a593Smuzhiyuninclude/linux/async_tx.h:
264*4882a593Smuzhiyun    core header file for the async_tx api
265*4882a593Smuzhiyuncrypto/async_tx/async_tx.c:
266*4882a593Smuzhiyun    async_tx interface to dmaengine and common code
267*4882a593Smuzhiyuncrypto/async_tx/async_memcpy.c:
268*4882a593Smuzhiyun    copy offload
269*4882a593Smuzhiyuncrypto/async_tx/async_xor.c:
270*4882a593Smuzhiyun    xor and xor zero sum offload
271