Documentation/crypto/async-tx-api.rst

*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
*4882a593Smuzhiyun
*4882a593Smuzhiyun=====================================
*4882a593SmuzhiyunAsynchronous Transfers/Transforms API
*4882a593Smuzhiyun=====================================
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. Contents
*4882a593Smuzhiyun
*4882a593Smuzhiyun  1. INTRODUCTION
*4882a593Smuzhiyun
*4882a593Smuzhiyun  2 GENEALOGY
*4882a593Smuzhiyun
*4882a593Smuzhiyun  3 USAGE
*4882a593Smuzhiyun  3.1 General format of the API
*4882a593Smuzhiyun  3.2 Supported operations
*4882a593Smuzhiyun  3.3 Descriptor management
*4882a593Smuzhiyun  3.4 When does the operation execute?
*4882a593Smuzhiyun  3.5 When does the operation complete?
*4882a593Smuzhiyun  3.6 Constraints
*4882a593Smuzhiyun  3.7 Example
*4882a593Smuzhiyun
*4882a593Smuzhiyun  4 DMAENGINE DRIVER DEVELOPER NOTES
*4882a593Smuzhiyun  4.1 Conformance points
*4882a593Smuzhiyun  4.2 "My application needs exclusive control of hardware channels"
*4882a593Smuzhiyun
*4882a593Smuzhiyun  5 SOURCE
*4882a593Smuzhiyun
*4882a593Smuzhiyun1. Introduction
*4882a593Smuzhiyun===============
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe async_tx API provides methods for describing a chain of asynchronous
*4882a593Smuzhiyunbulk memory transfers/transforms with support for inter-transactional
*4882a593Smuzhiyundependencies.  It is implemented as a dmaengine client that smooths over
*4882a593Smuzhiyunthe details of different hardware offload engine implementations.  Code
*4882a593Smuzhiyunthat is written to the API can optimize for asynchronous operation and
*4882a593Smuzhiyunthe API will fit the chain of operations to the available offload
*4882a593Smuzhiyunresources.
*4882a593Smuzhiyun
*4882a593Smuzhiyun2.Genealogy
*4882a593Smuzhiyun===========
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe API was initially designed to offload the memory copy and
*4882a593Smuzhiyunxor-parity-calculations of the md-raid5 driver using the offload engines
*4882a593Smuzhiyunpresent in the Intel(R) Xscale series of I/O processors.  It also built
*4882a593Smuzhiyunon the 'dmaengine' layer developed for offloading memory copies in the
*4882a593Smuzhiyunnetwork stack using Intel(R) I/OAT engines.  The following design
*4882a593Smuzhiyunfeatures surfaced as a result:
*4882a593Smuzhiyun
*4882a593Smuzhiyun1. implicit synchronous path: users of the API do not need to know if
*4882a593Smuzhiyun   the platform they are running on has offload capabilities.  The
*4882a593Smuzhiyun   operation will be offloaded when an engine is available and carried out
*4882a593Smuzhiyun   in software otherwise.
*4882a593Smuzhiyun2. cross channel dependency chains: the API allows a chain of dependent
*4882a593Smuzhiyun   operations to be submitted, like xor->copy->xor in the raid5 case.  The
*4882a593Smuzhiyun   API automatically handles cases where the transition from one operation
*4882a593Smuzhiyun   to another implies a hardware channel switch.
*4882a593Smuzhiyun3. dmaengine extensions to support multiple clients and operation types
*4882a593Smuzhiyun   beyond 'memcpy'
*4882a593Smuzhiyun
*4882a593Smuzhiyun3. Usage
*4882a593Smuzhiyun========
*4882a593Smuzhiyun
*4882a593Smuzhiyun3.1 General format of the API
*4882a593Smuzhiyun-----------------------------
*4882a593Smuzhiyun
*4882a593Smuzhiyun::
*4882a593Smuzhiyun
*4882a593Smuzhiyun  struct dma_async_tx_descriptor *
*4882a593Smuzhiyun  async_<operation>(<op specific parameters>, struct async_submit ctl *submit)
*4882a593Smuzhiyun
*4882a593Smuzhiyun3.2 Supported operations
*4882a593Smuzhiyun------------------------
*4882a593Smuzhiyun
*4882a593Smuzhiyun========  ====================================================================
*4882a593Smuzhiyunmemcpy    memory copy between a source and a destination buffer
*4882a593Smuzhiyunmemset    fill a destination buffer with a byte value
*4882a593Smuzhiyunxor       xor a series of source buffers and write the result to a
*4882a593Smuzhiyun	  destination buffer
*4882a593Smuzhiyunxor_val   xor a series of source buffers and set a flag if the
*4882a593Smuzhiyun	  result is zero.  The implementation attempts to prevent
*4882a593Smuzhiyun	  writes to memory
*4882a593Smuzhiyunpq	  generate the p+q (raid6 syndrome) from a series of source buffers
*4882a593Smuzhiyunpq_val    validate that a p and or q buffer are in sync with a given series of
*4882a593Smuzhiyun	  sources
*4882a593Smuzhiyundatap	  (raid6_datap_recov) recover a raid6 data block and the p block
*4882a593Smuzhiyun	  from the given sources
*4882a593Smuzhiyun2data	  (raid6_2data_recov) recover 2 raid6 data blocks from the given
*4882a593Smuzhiyun	  sources
*4882a593Smuzhiyun========  ====================================================================
*4882a593Smuzhiyun
*4882a593Smuzhiyun3.3 Descriptor management
*4882a593Smuzhiyun-------------------------
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe return value is non-NULL and points to a 'descriptor' when the operation
*4882a593Smuzhiyunhas been queued to execute asynchronously.  Descriptors are recycled
*4882a593Smuzhiyunresources, under control of the offload engine driver, to be reused as
*4882a593Smuzhiyunoperations complete.  When an application needs to submit a chain of
*4882a593Smuzhiyunoperations it must guarantee that the descriptor is not automatically recycled
*4882a593Smuzhiyunbefore the dependency is submitted.  This requires that all descriptors be
*4882a593Smuzhiyunacknowledged by the application before the offload engine driver is allowed to
*4882a593Smuzhiyunrecycle (or free) the descriptor.  A descriptor can be acked by one of the
*4882a593Smuzhiyunfollowing methods:
*4882a593Smuzhiyun
*4882a593Smuzhiyun1. setting the ASYNC_TX_ACK flag if no child operations are to be submitted
*4882a593Smuzhiyun2. submitting an unacknowledged descriptor as a dependency to another
*4882a593Smuzhiyun   async_tx call will implicitly set the acknowledged state.
*4882a593Smuzhiyun3. calling async_tx_ack() on the descriptor.
*4882a593Smuzhiyun
*4882a593Smuzhiyun3.4 When does the operation execute?
*4882a593Smuzhiyun------------------------------------
*4882a593Smuzhiyun
*4882a593SmuzhiyunOperations do not immediately issue after return from the
*4882a593Smuzhiyunasync_<operation> call.  Offload engine drivers batch operations to
*4882a593Smuzhiyunimprove performance by reducing the number of mmio cycles needed to
*4882a593Smuzhiyunmanage the channel.  Once a driver-specific threshold is met the driver
*4882a593Smuzhiyunautomatically issues pending operations.  An application can force this
*4882a593Smuzhiyunevent by calling async_tx_issue_pending_all().  This operates on all
*4882a593Smuzhiyunchannels since the application has no knowledge of channel to operation
*4882a593Smuzhiyunmapping.
*4882a593Smuzhiyun
*4882a593Smuzhiyun3.5 When does the operation complete?
*4882a593Smuzhiyun-------------------------------------
*4882a593Smuzhiyun
*4882a593SmuzhiyunThere are two methods for an application to learn about the completion
*4882a593Smuzhiyunof an operation.
*4882a593Smuzhiyun
*4882a593Smuzhiyun1. Call dma_wait_for_async_tx().  This call causes the CPU to spin while
*4882a593Smuzhiyun   it polls for the completion of the operation.  It handles dependency
*4882a593Smuzhiyun   chains and issuing pending operations.
*4882a593Smuzhiyun2. Specify a completion callback.  The callback routine runs in tasklet
*4882a593Smuzhiyun   context if the offload engine driver supports interrupts, or it is
*4882a593Smuzhiyun   called in application context if the operation is carried out
*4882a593Smuzhiyun   synchronously in software.  The callback can be set in the call to
*4882a593Smuzhiyun   async_<operation>, or when the application needs to submit a chain of
*4882a593Smuzhiyun   unknown length it can use the async_trigger_callback() routine to set a
*4882a593Smuzhiyun   completion interrupt/callback at the end of the chain.
*4882a593Smuzhiyun
*4882a593Smuzhiyun3.6 Constraints
*4882a593Smuzhiyun---------------
*4882a593Smuzhiyun
*4882a593Smuzhiyun1. Calls to async_<operation> are not permitted in IRQ context.  Other
*4882a593Smuzhiyun   contexts are permitted provided constraint #2 is not violated.
*4882a593Smuzhiyun2. Completion callback routines cannot submit new operations.  This
*4882a593Smuzhiyun   results in recursion in the synchronous case and spin_locks being
*4882a593Smuzhiyun   acquired twice in the asynchronous case.
*4882a593Smuzhiyun
*4882a593Smuzhiyun3.7 Example
*4882a593Smuzhiyun-----------
*4882a593Smuzhiyun
*4882a593SmuzhiyunPerform a xor->copy->xor operation where each operation depends on the
*4882a593Smuzhiyunresult from the previous operation::
*4882a593Smuzhiyun
*4882a593Smuzhiyun    void callback(void *param)
*4882a593Smuzhiyun    {
*4882a593Smuzhiyun	    struct completion *cmp = param;
*4882a593Smuzhiyun
*4882a593Smuzhiyun	    complete(cmp);
*4882a593Smuzhiyun    }
*4882a593Smuzhiyun
*4882a593Smuzhiyun    void run_xor_copy_xor(struct page **xor_srcs,
*4882a593Smuzhiyun			int xor_src_cnt,
*4882a593Smuzhiyun			struct page *xor_dest,
*4882a593Smuzhiyun			size_t xor_len,
*4882a593Smuzhiyun			struct page *copy_src,
*4882a593Smuzhiyun			struct page *copy_dest,
*4882a593Smuzhiyun			size_t copy_len)
*4882a593Smuzhiyun    {
*4882a593Smuzhiyun	    struct dma_async_tx_descriptor *tx;
*4882a593Smuzhiyun	    addr_conv_t addr_conv[xor_src_cnt];
*4882a593Smuzhiyun	    struct async_submit_ctl submit;
*4882a593Smuzhiyun	    addr_conv_t addr_conv[NDISKS];
*4882a593Smuzhiyun	    struct completion cmp;
*4882a593Smuzhiyun
*4882a593Smuzhiyun	    init_async_submit(&submit, ASYNC_TX_XOR_DROP_DST, NULL, NULL, NULL,
*4882a593Smuzhiyun			    addr_conv);
*4882a593Smuzhiyun	    tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, &submit)
*4882a593Smuzhiyun
*4882a593Smuzhiyun	    submit->depend_tx = tx;
*4882a593Smuzhiyun	    tx = async_memcpy(copy_dest, copy_src, 0, 0, copy_len, &submit);
*4882a593Smuzhiyun
*4882a593Smuzhiyun	    init_completion(&cmp);
*4882a593Smuzhiyun	    init_async_submit(&submit, ASYNC_TX_XOR_DROP_DST | ASYNC_TX_ACK, tx,
*4882a593Smuzhiyun			    callback, &cmp, addr_conv);
*4882a593Smuzhiyun	    tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, &submit);
*4882a593Smuzhiyun
*4882a593Smuzhiyun	    async_tx_issue_pending_all();
*4882a593Smuzhiyun
*4882a593Smuzhiyun	    wait_for_completion(&cmp);
*4882a593Smuzhiyun    }
*4882a593Smuzhiyun
*4882a593SmuzhiyunSee include/linux/async_tx.h for more information on the flags.  See the
*4882a593Smuzhiyunops_run_* and ops_complete_* routines in drivers/md/raid5.c for more
*4882a593Smuzhiyunimplementation examples.
*4882a593Smuzhiyun
*4882a593Smuzhiyun4. Driver Development Notes
*4882a593Smuzhiyun===========================
*4882a593Smuzhiyun
*4882a593Smuzhiyun4.1 Conformance points
*4882a593Smuzhiyun----------------------
*4882a593Smuzhiyun
*4882a593SmuzhiyunThere are a few conformance points required in dmaengine drivers to
*4882a593Smuzhiyunaccommodate assumptions made by applications using the async_tx API:
*4882a593Smuzhiyun
*4882a593Smuzhiyun1. Completion callbacks are expected to happen in tasklet context
*4882a593Smuzhiyun2. dma_async_tx_descriptor fields are never manipulated in IRQ context
*4882a593Smuzhiyun3. Use async_tx_run_dependencies() in the descriptor clean up path to
*4882a593Smuzhiyun   handle submission of dependent operations
*4882a593Smuzhiyun
*4882a593Smuzhiyun4.2 "My application needs exclusive control of hardware channels"
*4882a593Smuzhiyun-----------------------------------------------------------------
*4882a593Smuzhiyun
*4882a593SmuzhiyunPrimarily this requirement arises from cases where a DMA engine driver
*4882a593Smuzhiyunis being used to support device-to-memory operations.  A channel that is
*4882a593Smuzhiyunperforming these operations cannot, for many platform specific reasons,
*4882a593Smuzhiyunbe shared.  For these cases the dma_request_channel() interface is
*4882a593Smuzhiyunprovided.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe interface is::
*4882a593Smuzhiyun
*4882a593Smuzhiyun  struct dma_chan *dma_request_channel(dma_cap_mask_t mask,
*4882a593Smuzhiyun				       dma_filter_fn filter_fn,
*4882a593Smuzhiyun				       void *filter_param);
*4882a593Smuzhiyun
*4882a593SmuzhiyunWhere dma_filter_fn is defined as::
*4882a593Smuzhiyun
*4882a593Smuzhiyun  typedef bool (*dma_filter_fn)(struct dma_chan *chan, void *filter_param);
*4882a593Smuzhiyun
*4882a593SmuzhiyunWhen the optional 'filter_fn' parameter is set to NULL
*4882a593Smuzhiyundma_request_channel simply returns the first channel that satisfies the
*4882a593Smuzhiyuncapability mask.  Otherwise, when the mask parameter is insufficient for
*4882a593Smuzhiyunspecifying the necessary channel, the filter_fn routine can be used to
*4882a593Smuzhiyundisposition the available channels in the system. The filter_fn routine
*4882a593Smuzhiyunis called once for each free channel in the system.  Upon seeing a
*4882a593Smuzhiyunsuitable channel filter_fn returns DMA_ACK which flags that channel to
*4882a593Smuzhiyunbe the return value from dma_request_channel.  A channel allocated via
*4882a593Smuzhiyunthis interface is exclusive to the caller, until dma_release_channel()
*4882a593Smuzhiyunis called.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe DMA_PRIVATE capability flag is used to tag dma devices that should
*4882a593Smuzhiyunnot be used by the general-purpose allocator.  It can be set at
*4882a593Smuzhiyuninitialization time if it is known that a channel will always be
*4882a593Smuzhiyunprivate.  Alternatively, it is set when dma_request_channel() finds an
*4882a593Smuzhiyununused "public" channel.
*4882a593Smuzhiyun
*4882a593SmuzhiyunA couple caveats to note when implementing a driver and consumer:
*4882a593Smuzhiyun
*4882a593Smuzhiyun1. Once a channel has been privately allocated it will no longer be
*4882a593Smuzhiyun   considered by the general-purpose allocator even after a call to
*4882a593Smuzhiyun   dma_release_channel().
*4882a593Smuzhiyun2. Since capabilities are specified at the device level a dma_device
*4882a593Smuzhiyun   with multiple channels will either have all channels public, or all
*4882a593Smuzhiyun   channels private.
*4882a593Smuzhiyun
*4882a593Smuzhiyun5. Source
*4882a593Smuzhiyun---------
*4882a593Smuzhiyun
*4882a593Smuzhiyuninclude/linux/dmaengine.h:
*4882a593Smuzhiyun    core header file for DMA drivers and api users
*4882a593Smuzhiyundrivers/dma/dmaengine.c:
*4882a593Smuzhiyun    offload engine channel management routines
*4882a593Smuzhiyundrivers/dma/:
*4882a593Smuzhiyun    location for offload engine drivers
*4882a593Smuzhiyuninclude/linux/async_tx.h:
*4882a593Smuzhiyun    core header file for the async_tx api
*4882a593Smuzhiyuncrypto/async_tx/async_tx.c:
*4882a593Smuzhiyun    async_tx interface to dmaengine and common code
*4882a593Smuzhiyuncrypto/async_tx/async_memcpy.c:
*4882a593Smuzhiyun    copy offload
*4882a593Smuzhiyuncrypto/async_tx/async_xor.c:
*4882a593Smuzhiyun    xor and xor zero sum offload