xref: /OK3568_Linux_fs/kernel/Documentation/sound/designs/compress-offload.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun=========================
2*4882a593SmuzhiyunALSA Compress-Offload API
3*4882a593Smuzhiyun=========================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunPierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com>
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunVinod Koul <vinod.koul@linux.intel.com>
8*4882a593Smuzhiyun
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunOverview
11*4882a593Smuzhiyun========
12*4882a593SmuzhiyunSince its early days, the ALSA API was defined with PCM support or
13*4882a593Smuzhiyunconstant bitrates payloads such as IEC61937 in mind. Arguments and
14*4882a593Smuzhiyunreturned values in frames are the norm, making it a challenge to
15*4882a593Smuzhiyunextend the existing API to compressed data streams.
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunIn recent years, audio digital signal processors (DSP) were integrated
18*4882a593Smuzhiyunin system-on-chip designs, and DSPs are also integrated in audio
19*4882a593Smuzhiyuncodecs. Processing compressed data on such DSPs results in a dramatic
20*4882a593Smuzhiyunreduction of power consumption compared to host-based
21*4882a593Smuzhiyunprocessing. Support for such hardware has not been very good in Linux,
22*4882a593Smuzhiyunmostly because of a lack of a generic API available in the mainline
23*4882a593Smuzhiyunkernel.
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunRather than requiring a compatibility break with an API change of the
26*4882a593SmuzhiyunALSA PCM interface, a new 'Compressed Data' API is introduced to
27*4882a593Smuzhiyunprovide a control and data-streaming interface for audio DSPs.
28*4882a593Smuzhiyun
29*4882a593SmuzhiyunThe design of this API was inspired by the 2-year experience with the
30*4882a593SmuzhiyunIntel Moorestown SOC, with many corrections required to upstream the
31*4882a593SmuzhiyunAPI in the mainline kernel instead of the staging tree and make it
32*4882a593Smuzhiyunusable by others.
33*4882a593Smuzhiyun
34*4882a593Smuzhiyun
35*4882a593SmuzhiyunRequirements
36*4882a593Smuzhiyun============
37*4882a593SmuzhiyunThe main requirements are:
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun- separation between byte counts and time. Compressed formats may have
40*4882a593Smuzhiyun  a header per file, per frame, or no header at all. The payload size
41*4882a593Smuzhiyun  may vary from frame-to-frame. As a result, it is not possible to
42*4882a593Smuzhiyun  estimate reliably the duration of audio buffers when handling
43*4882a593Smuzhiyun  compressed data. Dedicated mechanisms are required to allow for
44*4882a593Smuzhiyun  reliable audio-video synchronization, which requires precise
45*4882a593Smuzhiyun  reporting of the number of samples rendered at any given time.
46*4882a593Smuzhiyun
47*4882a593Smuzhiyun- Handling of multiple formats. PCM data only requires a specification
48*4882a593Smuzhiyun  of the sampling rate, number of channels and bits per sample. In
49*4882a593Smuzhiyun  contrast, compressed data comes in a variety of formats. Audio DSPs
50*4882a593Smuzhiyun  may also provide support for a limited number of audio encoders and
51*4882a593Smuzhiyun  decoders embedded in firmware, or may support more choices through
52*4882a593Smuzhiyun  dynamic download of libraries.
53*4882a593Smuzhiyun
54*4882a593Smuzhiyun- Focus on main formats. This API provides support for the most
55*4882a593Smuzhiyun  popular formats used for audio and video capture and playback. It is
56*4882a593Smuzhiyun  likely that as audio compression technology advances, new formats
57*4882a593Smuzhiyun  will be added.
58*4882a593Smuzhiyun
59*4882a593Smuzhiyun- Handling of multiple configurations. Even for a given format like
60*4882a593Smuzhiyun  AAC, some implementations may support AAC multichannel but HE-AAC
61*4882a593Smuzhiyun  stereo. Likewise WMA10 level M3 may require too much memory and cpu
62*4882a593Smuzhiyun  cycles. The new API needs to provide a generic way of listing these
63*4882a593Smuzhiyun  formats.
64*4882a593Smuzhiyun
65*4882a593Smuzhiyun- Rendering/Grabbing only. This API does not provide any means of
66*4882a593Smuzhiyun  hardware acceleration, where PCM samples are provided back to
67*4882a593Smuzhiyun  user-space for additional processing. This API focuses instead on
68*4882a593Smuzhiyun  streaming compressed data to a DSP, with the assumption that the
69*4882a593Smuzhiyun  decoded samples are routed to a physical output or logical back-end.
70*4882a593Smuzhiyun
71*4882a593Smuzhiyun- Complexity hiding. Existing user-space multimedia frameworks all
72*4882a593Smuzhiyun  have existing enums/structures for each compressed format. This new
73*4882a593Smuzhiyun  API assumes the existence of a platform-specific compatibility layer
74*4882a593Smuzhiyun  to expose, translate and make use of the capabilities of the audio
75*4882a593Smuzhiyun  DSP, eg. Android HAL or PulseAudio sinks. By construction, regular
76*4882a593Smuzhiyun  applications are not supposed to make use of this API.
77*4882a593Smuzhiyun
78*4882a593Smuzhiyun
79*4882a593SmuzhiyunDesign
80*4882a593Smuzhiyun======
81*4882a593SmuzhiyunThe new API shares a number of concepts with the PCM API for flow
82*4882a593Smuzhiyuncontrol. Start, pause, resume, drain and stop commands have the same
83*4882a593Smuzhiyunsemantics no matter what the content is.
84*4882a593Smuzhiyun
85*4882a593SmuzhiyunThe concept of memory ring buffer divided in a set of fragments is
86*4882a593Smuzhiyunborrowed from the ALSA PCM API. However, only sizes in bytes can be
87*4882a593Smuzhiyunspecified.
88*4882a593Smuzhiyun
89*4882a593SmuzhiyunSeeks/trick modes are assumed to be handled by the host.
90*4882a593Smuzhiyun
91*4882a593SmuzhiyunThe notion of rewinds/forwards is not supported. Data committed to the
92*4882a593Smuzhiyunring buffer cannot be invalidated, except when dropping all buffers.
93*4882a593Smuzhiyun
94*4882a593SmuzhiyunThe Compressed Data API does not make any assumptions on how the data
95*4882a593Smuzhiyunis transmitted to the audio DSP. DMA transfers from main memory to an
96*4882a593Smuzhiyunembedded audio cluster or to a SPI interface for external DSPs are
97*4882a593Smuzhiyunpossible. As in the ALSA PCM case, a core set of routines is exposed;
98*4882a593Smuzhiyuneach driver implementer will have to write support for a set of
99*4882a593Smuzhiyunmandatory routines and possibly make use of optional ones.
100*4882a593Smuzhiyun
101*4882a593SmuzhiyunThe main additions are
102*4882a593Smuzhiyun
103*4882a593Smuzhiyunget_caps
104*4882a593Smuzhiyun  This routine returns the list of audio formats supported. Querying the
105*4882a593Smuzhiyun  codecs on a capture stream will return encoders, decoders will be
106*4882a593Smuzhiyun  listed for playback streams.
107*4882a593Smuzhiyun
108*4882a593Smuzhiyunget_codec_caps
109*4882a593Smuzhiyun  For each codec, this routine returns a list of
110*4882a593Smuzhiyun  capabilities. The intent is to make sure all the capabilities
111*4882a593Smuzhiyun  correspond to valid settings, and to minimize the risks of
112*4882a593Smuzhiyun  configuration failures. For example, for a complex codec such as AAC,
113*4882a593Smuzhiyun  the number of channels supported may depend on a specific profile. If
114*4882a593Smuzhiyun  the capabilities were exposed with a single descriptor, it may happen
115*4882a593Smuzhiyun  that a specific combination of profiles/channels/formats may not be
116*4882a593Smuzhiyun  supported. Likewise, embedded DSPs have limited memory and cpu cycles,
117*4882a593Smuzhiyun  it is likely that some implementations make the list of capabilities
118*4882a593Smuzhiyun  dynamic and dependent on existing workloads. In addition to codec
119*4882a593Smuzhiyun  settings, this routine returns the minimum buffer size handled by the
120*4882a593Smuzhiyun  implementation. This information can be a function of the DMA buffer
121*4882a593Smuzhiyun  sizes, the number of bytes required to synchronize, etc, and can be
122*4882a593Smuzhiyun  used by userspace to define how much needs to be written in the ring
123*4882a593Smuzhiyun  buffer before playback can start.
124*4882a593Smuzhiyun
125*4882a593Smuzhiyunset_params
126*4882a593Smuzhiyun  This routine sets the configuration chosen for a specific codec. The
127*4882a593Smuzhiyun  most important field in the parameters is the codec type; in most
128*4882a593Smuzhiyun  cases decoders will ignore other fields, while encoders will strictly
129*4882a593Smuzhiyun  comply to the settings
130*4882a593Smuzhiyun
131*4882a593Smuzhiyunget_params
132*4882a593Smuzhiyun  This routines returns the actual settings used by the DSP. Changes to
133*4882a593Smuzhiyun  the settings should remain the exception.
134*4882a593Smuzhiyun
135*4882a593Smuzhiyunget_timestamp
136*4882a593Smuzhiyun  The timestamp becomes a multiple field structure. It lists the number
137*4882a593Smuzhiyun  of bytes transferred, the number of samples processed and the number
138*4882a593Smuzhiyun  of samples rendered/grabbed. All these values can be used to determine
139*4882a593Smuzhiyun  the average bitrate, figure out if the ring buffer needs to be
140*4882a593Smuzhiyun  refilled or the delay due to decoding/encoding/io on the DSP.
141*4882a593Smuzhiyun
142*4882a593SmuzhiyunNote that the list of codecs/profiles/modes was derived from the
143*4882a593SmuzhiyunOpenMAX AL specification instead of reinventing the wheel.
144*4882a593SmuzhiyunModifications include:
145*4882a593Smuzhiyun- Addition of FLAC and IEC formats
146*4882a593Smuzhiyun- Merge of encoder/decoder capabilities
147*4882a593Smuzhiyun- Profiles/modes listed as bitmasks to make descriptors more compact
148*4882a593Smuzhiyun- Addition of set_params for decoders (missing in OpenMAX AL)
149*4882a593Smuzhiyun- Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL)
150*4882a593Smuzhiyun- Addition of format information for WMA
151*4882a593Smuzhiyun- Addition of encoding options when required (derived from OpenMAX IL)
152*4882a593Smuzhiyun- Addition of rateControlSupported (missing in OpenMAX AL)
153*4882a593Smuzhiyun
154*4882a593SmuzhiyunState Machine
155*4882a593Smuzhiyun=============
156*4882a593Smuzhiyun
157*4882a593SmuzhiyunThe compressed audio stream state machine is described below ::
158*4882a593Smuzhiyun
159*4882a593Smuzhiyun                                        +----------+
160*4882a593Smuzhiyun                                        |          |
161*4882a593Smuzhiyun                                        |   OPEN   |
162*4882a593Smuzhiyun                                        |          |
163*4882a593Smuzhiyun                                        +----------+
164*4882a593Smuzhiyun                                             |
165*4882a593Smuzhiyun                                             |
166*4882a593Smuzhiyun                                             | compr_set_params()
167*4882a593Smuzhiyun                                             |
168*4882a593Smuzhiyun                                             v
169*4882a593Smuzhiyun         compr_free()                  +----------+
170*4882a593Smuzhiyun  +------------------------------------|          |
171*4882a593Smuzhiyun  |                                    |   SETUP  |
172*4882a593Smuzhiyun  |          +-------------------------|          |<-------------------------+
173*4882a593Smuzhiyun  |          |       compr_write()     +----------+                          |
174*4882a593Smuzhiyun  |          |                              ^                                |
175*4882a593Smuzhiyun  |          |                              | compr_drain_notify()           |
176*4882a593Smuzhiyun  |          |                              |        or                      |
177*4882a593Smuzhiyun  |          |                              |     compr_stop()               |
178*4882a593Smuzhiyun  |          |                              |                                |
179*4882a593Smuzhiyun  |          |                         +----------+                          |
180*4882a593Smuzhiyun  |          |                         |          |                          |
181*4882a593Smuzhiyun  |          |                         |   DRAIN  |                          |
182*4882a593Smuzhiyun  |          |                         |          |                          |
183*4882a593Smuzhiyun  |          |                         +----------+                          |
184*4882a593Smuzhiyun  |          |                              ^                                |
185*4882a593Smuzhiyun  |          |                              |                                |
186*4882a593Smuzhiyun  |          |                              | compr_drain()                  |
187*4882a593Smuzhiyun  |          |                              |                                |
188*4882a593Smuzhiyun  |          v                              |                                |
189*4882a593Smuzhiyun  |    +----------+                    +----------+                          |
190*4882a593Smuzhiyun  |    |          |    compr_start()   |          |        compr_stop()      |
191*4882a593Smuzhiyun  |    | PREPARE  |------------------->|  RUNNING |--------------------------+
192*4882a593Smuzhiyun  |    |          |                    |          |                          |
193*4882a593Smuzhiyun  |    +----------+                    +----------+                          |
194*4882a593Smuzhiyun  |          |                            |    ^                             |
195*4882a593Smuzhiyun  |          |compr_free()                |    |                             |
196*4882a593Smuzhiyun  |          |              compr_pause() |    | compr_resume()              |
197*4882a593Smuzhiyun  |          |                            |    |                             |
198*4882a593Smuzhiyun  |          v                            v    |                             |
199*4882a593Smuzhiyun  |    +----------+                   +----------+                           |
200*4882a593Smuzhiyun  |    |          |                   |          |         compr_stop()      |
201*4882a593Smuzhiyun  +--->|   FREE   |                   |  PAUSE   |---------------------------+
202*4882a593Smuzhiyun       |          |                   |          |
203*4882a593Smuzhiyun       +----------+                   +----------+
204*4882a593Smuzhiyun
205*4882a593Smuzhiyun
206*4882a593SmuzhiyunGapless Playback
207*4882a593Smuzhiyun================
208*4882a593SmuzhiyunWhen playing thru an album, the decoders have the ability to skip the encoder
209*4882a593Smuzhiyundelay and padding and directly move from one track content to another. The end
210*4882a593Smuzhiyunuser can perceive this as gapless playback as we don't have silence while
211*4882a593Smuzhiyunswitching from one track to another
212*4882a593Smuzhiyun
213*4882a593SmuzhiyunAlso, there might be low-intensity noises due to encoding. Perfect gapless is
214*4882a593Smuzhiyundifficult to reach with all types of compressed data, but works fine with most
215*4882a593Smuzhiyunmusic content. The decoder needs to know the encoder delay and encoder padding.
216*4882a593SmuzhiyunSo we need to pass this to DSP. This metadata is extracted from ID3/MP4 headers
217*4882a593Smuzhiyunand are not present by default in the bitstream, hence the need for a new
218*4882a593Smuzhiyuninterface to pass this information to the DSP. Also DSP and userspace needs to
219*4882a593Smuzhiyunswitch from one track to another and start using data for second track.
220*4882a593Smuzhiyun
221*4882a593SmuzhiyunThe main additions are:
222*4882a593Smuzhiyun
223*4882a593Smuzhiyunset_metadata
224*4882a593Smuzhiyun  This routine sets the encoder delay and encoder padding. This can be used by
225*4882a593Smuzhiyun  decoder to strip the silence. This needs to be set before the data in the track
226*4882a593Smuzhiyun  is written.
227*4882a593Smuzhiyun
228*4882a593Smuzhiyunset_next_track
229*4882a593Smuzhiyun  This routine tells DSP that metadata and write operation sent after this would
230*4882a593Smuzhiyun  correspond to subsequent track
231*4882a593Smuzhiyun
232*4882a593Smuzhiyunpartial drain
233*4882a593Smuzhiyun  This is called when end of file is reached. The userspace can inform DSP that
234*4882a593Smuzhiyun  EOF is reached and now DSP can start skipping padding delay. Also next write
235*4882a593Smuzhiyun  data would belong to next track
236*4882a593Smuzhiyun
237*4882a593SmuzhiyunSequence flow for gapless would be:
238*4882a593Smuzhiyun- Open
239*4882a593Smuzhiyun- Get caps / codec caps
240*4882a593Smuzhiyun- Set params
241*4882a593Smuzhiyun- Set metadata of the first track
242*4882a593Smuzhiyun- Fill data of the first track
243*4882a593Smuzhiyun- Trigger start
244*4882a593Smuzhiyun- User-space finished sending all,
245*4882a593Smuzhiyun- Indicate next track data by sending set_next_track
246*4882a593Smuzhiyun- Set metadata of the next track
247*4882a593Smuzhiyun- then call partial_drain to flush most of buffer in DSP
248*4882a593Smuzhiyun- Fill data of the next track
249*4882a593Smuzhiyun- DSP switches to second track
250*4882a593Smuzhiyun
251*4882a593Smuzhiyun(note: order for partial_drain and write for next track can be reversed as well)
252*4882a593Smuzhiyun
253*4882a593SmuzhiyunGapless Playback SM
254*4882a593Smuzhiyun===================
255*4882a593Smuzhiyun
256*4882a593SmuzhiyunFor Gapless, we move from running state to partial drain and back, along
257*4882a593Smuzhiyunwith setting of meta_data and signalling for next track ::
258*4882a593Smuzhiyun
259*4882a593Smuzhiyun
260*4882a593Smuzhiyun                                        +----------+
261*4882a593Smuzhiyun                compr_drain_notify()    |          |
262*4882a593Smuzhiyun              +------------------------>|  RUNNING |
263*4882a593Smuzhiyun              |                         |          |
264*4882a593Smuzhiyun              |                         +----------+
265*4882a593Smuzhiyun              |                              |
266*4882a593Smuzhiyun              |                              |
267*4882a593Smuzhiyun              |                              | compr_next_track()
268*4882a593Smuzhiyun              |                              |
269*4882a593Smuzhiyun              |                              V
270*4882a593Smuzhiyun              |                         +----------+
271*4882a593Smuzhiyun              |                         |          |
272*4882a593Smuzhiyun              |                         |NEXT_TRACK|
273*4882a593Smuzhiyun              |                         |          |
274*4882a593Smuzhiyun              |                         +----------+
275*4882a593Smuzhiyun              |                              |
276*4882a593Smuzhiyun              |                              |
277*4882a593Smuzhiyun              |                              | compr_partial_drain()
278*4882a593Smuzhiyun              |                              |
279*4882a593Smuzhiyun              |                              V
280*4882a593Smuzhiyun              |                         +----------+
281*4882a593Smuzhiyun              |                         |          |
282*4882a593Smuzhiyun              +------------------------ | PARTIAL_ |
283*4882a593Smuzhiyun                                        |  DRAIN   |
284*4882a593Smuzhiyun                                        +----------+
285*4882a593Smuzhiyun
286*4882a593SmuzhiyunNot supported
287*4882a593Smuzhiyun=============
288*4882a593Smuzhiyun- Support for VoIP/circuit-switched calls is not the target of this
289*4882a593Smuzhiyun  API. Support for dynamic bit-rate changes would require a tight
290*4882a593Smuzhiyun  coupling between the DSP and the host stack, limiting power savings.
291*4882a593Smuzhiyun
292*4882a593Smuzhiyun- Packet-loss concealment is not supported. This would require an
293*4882a593Smuzhiyun  additional interface to let the decoder synthesize data when frames
294*4882a593Smuzhiyun  are lost during transmission. This may be added in the future.
295*4882a593Smuzhiyun
296*4882a593Smuzhiyun- Volume control/routing is not handled by this API. Devices exposing a
297*4882a593Smuzhiyun  compressed data interface will be considered as regular ALSA devices;
298*4882a593Smuzhiyun  volume changes and routing information will be provided with regular
299*4882a593Smuzhiyun  ALSA kcontrols.
300*4882a593Smuzhiyun
301*4882a593Smuzhiyun- Embedded audio effects. Such effects should be enabled in the same
302*4882a593Smuzhiyun  manner, no matter if the input was PCM or compressed.
303*4882a593Smuzhiyun
304*4882a593Smuzhiyun- multichannel IEC encoding. Unclear if this is required.
305*4882a593Smuzhiyun
306*4882a593Smuzhiyun- Encoding/decoding acceleration is not supported as mentioned
307*4882a593Smuzhiyun  above. It is possible to route the output of a decoder to a capture
308*4882a593Smuzhiyun  stream, or even implement transcoding capabilities. This routing
309*4882a593Smuzhiyun  would be enabled with ALSA kcontrols.
310*4882a593Smuzhiyun
311*4882a593Smuzhiyun- Audio policy/resource management. This API does not provide any
312*4882a593Smuzhiyun  hooks to query the utilization of the audio DSP, nor any preemption
313*4882a593Smuzhiyun  mechanisms.
314*4882a593Smuzhiyun
315*4882a593Smuzhiyun- No notion of underrun/overrun. Since the bytes written are compressed
316*4882a593Smuzhiyun  in nature and data written/read doesn't translate directly to
317*4882a593Smuzhiyun  rendered output in time, this does not deal with underrun/overrun and
318*4882a593Smuzhiyun  maybe dealt in user-library
319*4882a593Smuzhiyun
320*4882a593Smuzhiyun
321*4882a593SmuzhiyunCredits
322*4882a593Smuzhiyun=======
323*4882a593Smuzhiyun- Mark Brown and Liam Girdwood for discussions on the need for this API
324*4882a593Smuzhiyun- Harsha Priya for her work on intel_sst compressed API
325*4882a593Smuzhiyun- Rakesh Ughreja for valuable feedback
326*4882a593Smuzhiyun- Sing Nallasellan, Sikkandar Madar and Prasanna Samaga for
327*4882a593Smuzhiyun  demonstrating and quantifying the benefits of audio offload on a
328*4882a593Smuzhiyun  real platform.
329