1*4882a593Smuzhiyun========================= 2*4882a593SmuzhiyunALSA Compress-Offload API 3*4882a593Smuzhiyun========================= 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunPierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com> 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunVinod Koul <vinod.koul@linux.intel.com> 8*4882a593Smuzhiyun 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunOverview 11*4882a593Smuzhiyun======== 12*4882a593SmuzhiyunSince its early days, the ALSA API was defined with PCM support or 13*4882a593Smuzhiyunconstant bitrates payloads such as IEC61937 in mind. Arguments and 14*4882a593Smuzhiyunreturned values in frames are the norm, making it a challenge to 15*4882a593Smuzhiyunextend the existing API to compressed data streams. 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunIn recent years, audio digital signal processors (DSP) were integrated 18*4882a593Smuzhiyunin system-on-chip designs, and DSPs are also integrated in audio 19*4882a593Smuzhiyuncodecs. Processing compressed data on such DSPs results in a dramatic 20*4882a593Smuzhiyunreduction of power consumption compared to host-based 21*4882a593Smuzhiyunprocessing. Support for such hardware has not been very good in Linux, 22*4882a593Smuzhiyunmostly because of a lack of a generic API available in the mainline 23*4882a593Smuzhiyunkernel. 24*4882a593Smuzhiyun 25*4882a593SmuzhiyunRather than requiring a compatibility break with an API change of the 26*4882a593SmuzhiyunALSA PCM interface, a new 'Compressed Data' API is introduced to 27*4882a593Smuzhiyunprovide a control and data-streaming interface for audio DSPs. 28*4882a593Smuzhiyun 29*4882a593SmuzhiyunThe design of this API was inspired by the 2-year experience with the 30*4882a593SmuzhiyunIntel Moorestown SOC, with many corrections required to upstream the 31*4882a593SmuzhiyunAPI in the mainline kernel instead of the staging tree and make it 32*4882a593Smuzhiyunusable by others. 33*4882a593Smuzhiyun 34*4882a593Smuzhiyun 35*4882a593SmuzhiyunRequirements 36*4882a593Smuzhiyun============ 37*4882a593SmuzhiyunThe main requirements are: 38*4882a593Smuzhiyun 39*4882a593Smuzhiyun- separation between byte counts and time. Compressed formats may have 40*4882a593Smuzhiyun a header per file, per frame, or no header at all. The payload size 41*4882a593Smuzhiyun may vary from frame-to-frame. As a result, it is not possible to 42*4882a593Smuzhiyun estimate reliably the duration of audio buffers when handling 43*4882a593Smuzhiyun compressed data. Dedicated mechanisms are required to allow for 44*4882a593Smuzhiyun reliable audio-video synchronization, which requires precise 45*4882a593Smuzhiyun reporting of the number of samples rendered at any given time. 46*4882a593Smuzhiyun 47*4882a593Smuzhiyun- Handling of multiple formats. PCM data only requires a specification 48*4882a593Smuzhiyun of the sampling rate, number of channels and bits per sample. In 49*4882a593Smuzhiyun contrast, compressed data comes in a variety of formats. Audio DSPs 50*4882a593Smuzhiyun may also provide support for a limited number of audio encoders and 51*4882a593Smuzhiyun decoders embedded in firmware, or may support more choices through 52*4882a593Smuzhiyun dynamic download of libraries. 53*4882a593Smuzhiyun 54*4882a593Smuzhiyun- Focus on main formats. This API provides support for the most 55*4882a593Smuzhiyun popular formats used for audio and video capture and playback. It is 56*4882a593Smuzhiyun likely that as audio compression technology advances, new formats 57*4882a593Smuzhiyun will be added. 58*4882a593Smuzhiyun 59*4882a593Smuzhiyun- Handling of multiple configurations. Even for a given format like 60*4882a593Smuzhiyun AAC, some implementations may support AAC multichannel but HE-AAC 61*4882a593Smuzhiyun stereo. Likewise WMA10 level M3 may require too much memory and cpu 62*4882a593Smuzhiyun cycles. The new API needs to provide a generic way of listing these 63*4882a593Smuzhiyun formats. 64*4882a593Smuzhiyun 65*4882a593Smuzhiyun- Rendering/Grabbing only. This API does not provide any means of 66*4882a593Smuzhiyun hardware acceleration, where PCM samples are provided back to 67*4882a593Smuzhiyun user-space for additional processing. This API focuses instead on 68*4882a593Smuzhiyun streaming compressed data to a DSP, with the assumption that the 69*4882a593Smuzhiyun decoded samples are routed to a physical output or logical back-end. 70*4882a593Smuzhiyun 71*4882a593Smuzhiyun- Complexity hiding. Existing user-space multimedia frameworks all 72*4882a593Smuzhiyun have existing enums/structures for each compressed format. This new 73*4882a593Smuzhiyun API assumes the existence of a platform-specific compatibility layer 74*4882a593Smuzhiyun to expose, translate and make use of the capabilities of the audio 75*4882a593Smuzhiyun DSP, eg. Android HAL or PulseAudio sinks. By construction, regular 76*4882a593Smuzhiyun applications are not supposed to make use of this API. 77*4882a593Smuzhiyun 78*4882a593Smuzhiyun 79*4882a593SmuzhiyunDesign 80*4882a593Smuzhiyun====== 81*4882a593SmuzhiyunThe new API shares a number of concepts with the PCM API for flow 82*4882a593Smuzhiyuncontrol. Start, pause, resume, drain and stop commands have the same 83*4882a593Smuzhiyunsemantics no matter what the content is. 84*4882a593Smuzhiyun 85*4882a593SmuzhiyunThe concept of memory ring buffer divided in a set of fragments is 86*4882a593Smuzhiyunborrowed from the ALSA PCM API. However, only sizes in bytes can be 87*4882a593Smuzhiyunspecified. 88*4882a593Smuzhiyun 89*4882a593SmuzhiyunSeeks/trick modes are assumed to be handled by the host. 90*4882a593Smuzhiyun 91*4882a593SmuzhiyunThe notion of rewinds/forwards is not supported. Data committed to the 92*4882a593Smuzhiyunring buffer cannot be invalidated, except when dropping all buffers. 93*4882a593Smuzhiyun 94*4882a593SmuzhiyunThe Compressed Data API does not make any assumptions on how the data 95*4882a593Smuzhiyunis transmitted to the audio DSP. DMA transfers from main memory to an 96*4882a593Smuzhiyunembedded audio cluster or to a SPI interface for external DSPs are 97*4882a593Smuzhiyunpossible. As in the ALSA PCM case, a core set of routines is exposed; 98*4882a593Smuzhiyuneach driver implementer will have to write support for a set of 99*4882a593Smuzhiyunmandatory routines and possibly make use of optional ones. 100*4882a593Smuzhiyun 101*4882a593SmuzhiyunThe main additions are 102*4882a593Smuzhiyun 103*4882a593Smuzhiyunget_caps 104*4882a593Smuzhiyun This routine returns the list of audio formats supported. Querying the 105*4882a593Smuzhiyun codecs on a capture stream will return encoders, decoders will be 106*4882a593Smuzhiyun listed for playback streams. 107*4882a593Smuzhiyun 108*4882a593Smuzhiyunget_codec_caps 109*4882a593Smuzhiyun For each codec, this routine returns a list of 110*4882a593Smuzhiyun capabilities. The intent is to make sure all the capabilities 111*4882a593Smuzhiyun correspond to valid settings, and to minimize the risks of 112*4882a593Smuzhiyun configuration failures. For example, for a complex codec such as AAC, 113*4882a593Smuzhiyun the number of channels supported may depend on a specific profile. If 114*4882a593Smuzhiyun the capabilities were exposed with a single descriptor, it may happen 115*4882a593Smuzhiyun that a specific combination of profiles/channels/formats may not be 116*4882a593Smuzhiyun supported. Likewise, embedded DSPs have limited memory and cpu cycles, 117*4882a593Smuzhiyun it is likely that some implementations make the list of capabilities 118*4882a593Smuzhiyun dynamic and dependent on existing workloads. In addition to codec 119*4882a593Smuzhiyun settings, this routine returns the minimum buffer size handled by the 120*4882a593Smuzhiyun implementation. This information can be a function of the DMA buffer 121*4882a593Smuzhiyun sizes, the number of bytes required to synchronize, etc, and can be 122*4882a593Smuzhiyun used by userspace to define how much needs to be written in the ring 123*4882a593Smuzhiyun buffer before playback can start. 124*4882a593Smuzhiyun 125*4882a593Smuzhiyunset_params 126*4882a593Smuzhiyun This routine sets the configuration chosen for a specific codec. The 127*4882a593Smuzhiyun most important field in the parameters is the codec type; in most 128*4882a593Smuzhiyun cases decoders will ignore other fields, while encoders will strictly 129*4882a593Smuzhiyun comply to the settings 130*4882a593Smuzhiyun 131*4882a593Smuzhiyunget_params 132*4882a593Smuzhiyun This routines returns the actual settings used by the DSP. Changes to 133*4882a593Smuzhiyun the settings should remain the exception. 134*4882a593Smuzhiyun 135*4882a593Smuzhiyunget_timestamp 136*4882a593Smuzhiyun The timestamp becomes a multiple field structure. It lists the number 137*4882a593Smuzhiyun of bytes transferred, the number of samples processed and the number 138*4882a593Smuzhiyun of samples rendered/grabbed. All these values can be used to determine 139*4882a593Smuzhiyun the average bitrate, figure out if the ring buffer needs to be 140*4882a593Smuzhiyun refilled or the delay due to decoding/encoding/io on the DSP. 141*4882a593Smuzhiyun 142*4882a593SmuzhiyunNote that the list of codecs/profiles/modes was derived from the 143*4882a593SmuzhiyunOpenMAX AL specification instead of reinventing the wheel. 144*4882a593SmuzhiyunModifications include: 145*4882a593Smuzhiyun- Addition of FLAC and IEC formats 146*4882a593Smuzhiyun- Merge of encoder/decoder capabilities 147*4882a593Smuzhiyun- Profiles/modes listed as bitmasks to make descriptors more compact 148*4882a593Smuzhiyun- Addition of set_params for decoders (missing in OpenMAX AL) 149*4882a593Smuzhiyun- Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL) 150*4882a593Smuzhiyun- Addition of format information for WMA 151*4882a593Smuzhiyun- Addition of encoding options when required (derived from OpenMAX IL) 152*4882a593Smuzhiyun- Addition of rateControlSupported (missing in OpenMAX AL) 153*4882a593Smuzhiyun 154*4882a593SmuzhiyunState Machine 155*4882a593Smuzhiyun============= 156*4882a593Smuzhiyun 157*4882a593SmuzhiyunThe compressed audio stream state machine is described below :: 158*4882a593Smuzhiyun 159*4882a593Smuzhiyun +----------+ 160*4882a593Smuzhiyun | | 161*4882a593Smuzhiyun | OPEN | 162*4882a593Smuzhiyun | | 163*4882a593Smuzhiyun +----------+ 164*4882a593Smuzhiyun | 165*4882a593Smuzhiyun | 166*4882a593Smuzhiyun | compr_set_params() 167*4882a593Smuzhiyun | 168*4882a593Smuzhiyun v 169*4882a593Smuzhiyun compr_free() +----------+ 170*4882a593Smuzhiyun +------------------------------------| | 171*4882a593Smuzhiyun | | SETUP | 172*4882a593Smuzhiyun | +-------------------------| |<-------------------------+ 173*4882a593Smuzhiyun | | compr_write() +----------+ | 174*4882a593Smuzhiyun | | ^ | 175*4882a593Smuzhiyun | | | compr_drain_notify() | 176*4882a593Smuzhiyun | | | or | 177*4882a593Smuzhiyun | | | compr_stop() | 178*4882a593Smuzhiyun | | | | 179*4882a593Smuzhiyun | | +----------+ | 180*4882a593Smuzhiyun | | | | | 181*4882a593Smuzhiyun | | | DRAIN | | 182*4882a593Smuzhiyun | | | | | 183*4882a593Smuzhiyun | | +----------+ | 184*4882a593Smuzhiyun | | ^ | 185*4882a593Smuzhiyun | | | | 186*4882a593Smuzhiyun | | | compr_drain() | 187*4882a593Smuzhiyun | | | | 188*4882a593Smuzhiyun | v | | 189*4882a593Smuzhiyun | +----------+ +----------+ | 190*4882a593Smuzhiyun | | | compr_start() | | compr_stop() | 191*4882a593Smuzhiyun | | PREPARE |------------------->| RUNNING |--------------------------+ 192*4882a593Smuzhiyun | | | | | | 193*4882a593Smuzhiyun | +----------+ +----------+ | 194*4882a593Smuzhiyun | | | ^ | 195*4882a593Smuzhiyun | |compr_free() | | | 196*4882a593Smuzhiyun | | compr_pause() | | compr_resume() | 197*4882a593Smuzhiyun | | | | | 198*4882a593Smuzhiyun | v v | | 199*4882a593Smuzhiyun | +----------+ +----------+ | 200*4882a593Smuzhiyun | | | | | compr_stop() | 201*4882a593Smuzhiyun +--->| FREE | | PAUSE |---------------------------+ 202*4882a593Smuzhiyun | | | | 203*4882a593Smuzhiyun +----------+ +----------+ 204*4882a593Smuzhiyun 205*4882a593Smuzhiyun 206*4882a593SmuzhiyunGapless Playback 207*4882a593Smuzhiyun================ 208*4882a593SmuzhiyunWhen playing thru an album, the decoders have the ability to skip the encoder 209*4882a593Smuzhiyundelay and padding and directly move from one track content to another. The end 210*4882a593Smuzhiyunuser can perceive this as gapless playback as we don't have silence while 211*4882a593Smuzhiyunswitching from one track to another 212*4882a593Smuzhiyun 213*4882a593SmuzhiyunAlso, there might be low-intensity noises due to encoding. Perfect gapless is 214*4882a593Smuzhiyundifficult to reach with all types of compressed data, but works fine with most 215*4882a593Smuzhiyunmusic content. The decoder needs to know the encoder delay and encoder padding. 216*4882a593SmuzhiyunSo we need to pass this to DSP. This metadata is extracted from ID3/MP4 headers 217*4882a593Smuzhiyunand are not present by default in the bitstream, hence the need for a new 218*4882a593Smuzhiyuninterface to pass this information to the DSP. Also DSP and userspace needs to 219*4882a593Smuzhiyunswitch from one track to another and start using data for second track. 220*4882a593Smuzhiyun 221*4882a593SmuzhiyunThe main additions are: 222*4882a593Smuzhiyun 223*4882a593Smuzhiyunset_metadata 224*4882a593Smuzhiyun This routine sets the encoder delay and encoder padding. This can be used by 225*4882a593Smuzhiyun decoder to strip the silence. This needs to be set before the data in the track 226*4882a593Smuzhiyun is written. 227*4882a593Smuzhiyun 228*4882a593Smuzhiyunset_next_track 229*4882a593Smuzhiyun This routine tells DSP that metadata and write operation sent after this would 230*4882a593Smuzhiyun correspond to subsequent track 231*4882a593Smuzhiyun 232*4882a593Smuzhiyunpartial drain 233*4882a593Smuzhiyun This is called when end of file is reached. The userspace can inform DSP that 234*4882a593Smuzhiyun EOF is reached and now DSP can start skipping padding delay. Also next write 235*4882a593Smuzhiyun data would belong to next track 236*4882a593Smuzhiyun 237*4882a593SmuzhiyunSequence flow for gapless would be: 238*4882a593Smuzhiyun- Open 239*4882a593Smuzhiyun- Get caps / codec caps 240*4882a593Smuzhiyun- Set params 241*4882a593Smuzhiyun- Set metadata of the first track 242*4882a593Smuzhiyun- Fill data of the first track 243*4882a593Smuzhiyun- Trigger start 244*4882a593Smuzhiyun- User-space finished sending all, 245*4882a593Smuzhiyun- Indicate next track data by sending set_next_track 246*4882a593Smuzhiyun- Set metadata of the next track 247*4882a593Smuzhiyun- then call partial_drain to flush most of buffer in DSP 248*4882a593Smuzhiyun- Fill data of the next track 249*4882a593Smuzhiyun- DSP switches to second track 250*4882a593Smuzhiyun 251*4882a593Smuzhiyun(note: order for partial_drain and write for next track can be reversed as well) 252*4882a593Smuzhiyun 253*4882a593SmuzhiyunGapless Playback SM 254*4882a593Smuzhiyun=================== 255*4882a593Smuzhiyun 256*4882a593SmuzhiyunFor Gapless, we move from running state to partial drain and back, along 257*4882a593Smuzhiyunwith setting of meta_data and signalling for next track :: 258*4882a593Smuzhiyun 259*4882a593Smuzhiyun 260*4882a593Smuzhiyun +----------+ 261*4882a593Smuzhiyun compr_drain_notify() | | 262*4882a593Smuzhiyun +------------------------>| RUNNING | 263*4882a593Smuzhiyun | | | 264*4882a593Smuzhiyun | +----------+ 265*4882a593Smuzhiyun | | 266*4882a593Smuzhiyun | | 267*4882a593Smuzhiyun | | compr_next_track() 268*4882a593Smuzhiyun | | 269*4882a593Smuzhiyun | V 270*4882a593Smuzhiyun | +----------+ 271*4882a593Smuzhiyun | | | 272*4882a593Smuzhiyun | |NEXT_TRACK| 273*4882a593Smuzhiyun | | | 274*4882a593Smuzhiyun | +----------+ 275*4882a593Smuzhiyun | | 276*4882a593Smuzhiyun | | 277*4882a593Smuzhiyun | | compr_partial_drain() 278*4882a593Smuzhiyun | | 279*4882a593Smuzhiyun | V 280*4882a593Smuzhiyun | +----------+ 281*4882a593Smuzhiyun | | | 282*4882a593Smuzhiyun +------------------------ | PARTIAL_ | 283*4882a593Smuzhiyun | DRAIN | 284*4882a593Smuzhiyun +----------+ 285*4882a593Smuzhiyun 286*4882a593SmuzhiyunNot supported 287*4882a593Smuzhiyun============= 288*4882a593Smuzhiyun- Support for VoIP/circuit-switched calls is not the target of this 289*4882a593Smuzhiyun API. Support for dynamic bit-rate changes would require a tight 290*4882a593Smuzhiyun coupling between the DSP and the host stack, limiting power savings. 291*4882a593Smuzhiyun 292*4882a593Smuzhiyun- Packet-loss concealment is not supported. This would require an 293*4882a593Smuzhiyun additional interface to let the decoder synthesize data when frames 294*4882a593Smuzhiyun are lost during transmission. This may be added in the future. 295*4882a593Smuzhiyun 296*4882a593Smuzhiyun- Volume control/routing is not handled by this API. Devices exposing a 297*4882a593Smuzhiyun compressed data interface will be considered as regular ALSA devices; 298*4882a593Smuzhiyun volume changes and routing information will be provided with regular 299*4882a593Smuzhiyun ALSA kcontrols. 300*4882a593Smuzhiyun 301*4882a593Smuzhiyun- Embedded audio effects. Such effects should be enabled in the same 302*4882a593Smuzhiyun manner, no matter if the input was PCM or compressed. 303*4882a593Smuzhiyun 304*4882a593Smuzhiyun- multichannel IEC encoding. Unclear if this is required. 305*4882a593Smuzhiyun 306*4882a593Smuzhiyun- Encoding/decoding acceleration is not supported as mentioned 307*4882a593Smuzhiyun above. It is possible to route the output of a decoder to a capture 308*4882a593Smuzhiyun stream, or even implement transcoding capabilities. This routing 309*4882a593Smuzhiyun would be enabled with ALSA kcontrols. 310*4882a593Smuzhiyun 311*4882a593Smuzhiyun- Audio policy/resource management. This API does not provide any 312*4882a593Smuzhiyun hooks to query the utilization of the audio DSP, nor any preemption 313*4882a593Smuzhiyun mechanisms. 314*4882a593Smuzhiyun 315*4882a593Smuzhiyun- No notion of underrun/overrun. Since the bytes written are compressed 316*4882a593Smuzhiyun in nature and data written/read doesn't translate directly to 317*4882a593Smuzhiyun rendered output in time, this does not deal with underrun/overrun and 318*4882a593Smuzhiyun maybe dealt in user-library 319*4882a593Smuzhiyun 320*4882a593Smuzhiyun 321*4882a593SmuzhiyunCredits 322*4882a593Smuzhiyun======= 323*4882a593Smuzhiyun- Mark Brown and Liam Girdwood for discussions on the need for this API 324*4882a593Smuzhiyun- Harsha Priya for her work on intel_sst compressed API 325*4882a593Smuzhiyun- Rakesh Ughreja for valuable feedback 326*4882a593Smuzhiyun- Sing Nallasellan, Sikkandar Madar and Prasanna Samaga for 327*4882a593Smuzhiyun demonstrating and quantifying the benefits of audio offload on a 328*4882a593Smuzhiyun real platform. 329