sox/git/sox.1

34 SoX \- Sound eXchange, the Swiss Army knife of audio manipulation
50 SoX reads and writes audio files in most popular formats and can
52 sources, synthesise audio, and, on many systems, act as a general
53 purpose audio player or a multi-track audio recorder. It also has
57 To simplify playing and recording audio, if SoX is invoked as
63 command provides a convenient way to just query audio file header information.
70 SoX is a command-line audio processing tool, particularly suited to making
72 If you need an interactive, graphical audio editor, use
98 translates an audio file in Sun AU format to a Microsoft WAV file, whilst
108 converts `raw' (a.k.a. `headerless') audio to a self-describing file format,
112 adjusts audio speed,
116 concatenates two audio files, and
120 mixes together two audio files.
124 plays a collection of audio files whilst applying a bass boosting effect,
132 records half an hour of stereo audio, and
145 records a stream of audio such as LP/cassette and splits in to multiple
146 audio files at points with 2 seconds of silence.  Also, it does not start
147 recording until it detects audio is playing and stops after it sees
157 SoX can work with `self-describing' and `raw' audio files.
159 completely describes the signal and encoding attributes of the audio
161 information, so the audio characteristics of these must be described
165 audio data such that it can be processed with SoX:
170 though these days, 16 and even 32\ kHz are becoming more common. Audio
171 Compact Discs use 44100\ Hz (44\*d1\ kHz). Digital Audio Tape and many
172 computer systems use 48\ kHz. Professional audio systems often use 96
178 audio. 24-bit is used in the professional audio arena. Other sizes are
182 The way in which each audio sample is represented (or `encoded').  Some
184 Some compress the audio data so that the stored audio data takes up less
191 The number of audio channels contained in the file.  One (`mono') and
192 two (`stereo') are widely used.  `Surround sound' audio typically
196 encoded audio signal over a unit of time.  It can depend on all of the
203 embedded in the file that can be used to describe the audio in some way,
206 One important use of audio file comments is to convey `Replay Gain'
218 command can be used to display information from audio file headers.
221 format characteristics of an audio file.  Depending on the circumstances,
246 .SS Playing & Recording Audio
277 Some systems provide more than one type of (SoX-compatible) audio
279 Systems can also have more than one audio device (a.k.a. `sound card').
280 If more than one audio driver has been
292 environment variable can be used to override the default audio device,
309 audio output device, SoX will automatically invoke the \fBrate\fR effect
338 Many file formats that compress audio discard some of the audio signal
340 back again will not produce an exact copy of the original audio.  This
342 low signal bandwidth is more important than high audio fidelity, and for
347 Formats that discard audio signal information are called `lossy'.
349 measure of how closely the original audio signal can be reproduced when
352 Audio file conversion with SoX is lossless when it can be, i.e. when not
360 SoX converts all audio files to an internal uncompressed
361 format before performing any audio processing. This means that
363 losses in audio fidelity.  E.g. with
370 audio\*mwith a possible reduction in fidelity above that which
372 Hence, if what is ultimately desired is lossily compressed audio, it is
373 highly recommended to perform all audio processing using lossless file
381 Dithering is a technique used to maximise the dynamic range of audio
419 Clipping is distortion that occurs when an audio signal level (or
429 effects to increase the audio volume. Clipping could also occur with many
431 simply playing the audio.
433 Playing an audio file often involves resampling, and processing by
435 amplification, all of which can produce distortion if the audio signal
438 For these reasons, it is usual to make sure that an audio
445 audio is clipped as delivered.
451 effects can assist in determining the signal level in an audio file. The
489 channels.  The audio from each input will be concatenated in the order
494 It is similar to `concatenate' in that the audio from each input file is
498 audio to an output device, but is not generally useful when the output is a
505 channels in the output file will not contain audio from every input
506 file.  A mixed audio file cannot be un-mixed without reference to the
512 same.  A merged audio file comprises all of the channels from all of the
529 volume adjustment effect) after the audio has been combined. However, it
553 where n is the number of input files.  If this results in audio that is
559 If mixed audio seems loud enough at some points but
603 it has read all available audio data from the input files.
627 filenames. This is generally not difficult since most audio
637 SoX will read audio data from `standard input' (stdin),
640 SoX will send audio data to `standard output' (stdout).
658 For headerless (raw) audio,
696 the default audio device (if one has been built into SoX) is to be used.
709 Using a null file to input audio is equivalent to
710 using a normal audio file that contains an infinite amount
715 Using a null file to output audio amounts to discarding the audio
717 audio instead of affecting it (such as \fBnoiseprof\fR or \fBstat\fR).
723 .SS Supported File & Audio Device Types
726 for a list and description of the supported file formats and audio device
780 Set the size in bytes of the buffers used for processing audio (default 8192).
818 effect for how to determine the actual bit depth of the audio within a
871 option can be given to enable its use in helping to detect audio file types.
876 will process audio channels for most multi-channel
905 effect to guard against clipping and to normalise the audio. E.g.
913 Optionally, the audio can be normalized to a given level (usually)
928 invoked whilst playing audio.  This option is typically set via the
942 then exit without actually processing any audio.  E.g.
984 and is calibrated for digital audio as follows (right channel shown):
1007 SoX to play or record audio.
1044 SoX is processing your audio.
1064 Override an (incorrect) audio length given in an audio file's header. If
1065 this option is given then SoX will keep reading audio until it reaches
1075 the audio signal will be inverted.
1096 SoX of the number of bits per sample in a `raw' (`headerless') audio
1112 converts raw CD digital audio (16-bit, signed-integer) to a
1116 The number of audio channels in the audio file. This can be any number
1120 SoX of the number of channels in a `raw' (`headerless') audio file.
1141 of channels in the audio signal to the number given.  For
1152 The audio encoding type.  Sometimes needed with file-types that
1185 ADPCM is a form of audio compression that has a good
1186 compromise between audio quality and encoding/decoding speed.
1195 wireless telephone calls.  It utilises several audio
1197 SoX has support for GSM's original 13kbps `Full Rate' audio format.
1198 It is usually CPU-intensive to work with GSM audio.
1207 SoX of the encoding of a `raw' (`headerless') audio
1222 convert raw CD digital audio (16-bit, signed-integer) to
1242 SoX of the sample rate of a `raw' (`headerless') audio file (see the
1251 For example, if audio was recorded with a sample-rate of say 48k from
1265 rate of the audio signal to the given value.  For example, the
1277 Gives the type of the audio file.  For both input and output files,
1279 audio file (e.g. raw, mp3) where the actual/desired type cannot be
1304 These options specify whether the byte-order of the audio data is,
1312 identifier, or for an output file that is actually an audio device.
1319    sox \-B audio.s16 trimmed.s16 trim 2
1323    sox \-B audio.s16 \-B trimmed.s16 trim 2
1378 In addition to converting, playing and recording audio files, SoX can
1379 be used to invoke a number of audio `effects'.  Multiple effects may
1382 Note that applying multiple effects in real-time (i.e. when playing audio)
1391 A single effects chain is made up of one or more effects.  Audio from
1395 SoX supports running multiple effects chains over the input audio.
1396 In this case, when one chain indicates it is done processing audio,
1397 the audio data is then sent through the next effects chain.  This
1459 A position within the audio stream; the syntax is
1463 (\fB=\fR) or end (\fB\-\fR) of audio, or to the previous \fIposition\fR if
1464 the effect accepts multiple position arguments (\fB+\fR).  The audio length
1466 \fB\-0\fR for end-of-audio, though, even if the length is unknown.  Which of
1470 Examples: \fB=2:00\fR (two minutes into the audio stream), \fB\-100s\fR (one
1471 hundred samples before the end of audio), \fB+0:12+10s\fR (twelve seconds
1473 than half a second before the end of audio).
1497 Most effects that expect an audio position or duration in a parameter,
1528 audio's frequency to phase relationship without changing its frequency
1552 defaults to a mode oriented to pitched audio,
1555 for un-pitched audio (e.g. percussion).
1590 Boost or cut the bass (lower) or treble (upper) frequencies of the audio
1670 the audio signal to the given number
1696 Add a chorus effect to the audio.  This can make a single vocal sound
1739 Compand (compress or expand) the dynamic range of the audio.
1789 A typical value (for audio which is initially quiet) is
1846 Comparable with compression, this effect modifies an audio signal to
1861 Apply a DC shift to the audio.  This can be useful to remove a DC
1863 from the audio.  The effect of a DC offset is reduced headroom and
1872 of \(+-2 that indicates the amount to shift the audio (which is in the
1906 audio without the correct de-emphasis filter results in audio that sounds harsh
1911 effect, it is possible to apply the necessary de-emphasis to audio that
1913 de-emphasised audio to a new CD (which will then play correctly on any
1914 CD player), or simply play the correctly de-emphasised audio files on the
1928 audio sample rate to be either 44.1kHz or 48kHz.  Maximum deviation
1936 Delay one or more audio channels such that they start at the given
1959 Apply dithering to the audio.
2010 affects the audio.
2026 Makes audio easier to listen to on headphones.
2027 Adds `cues' to 44\*d1kHz stereo (i.e. audio CD format) audio so that
2034 Add echoing to the audio.
2069 Add a sequence of echoes to the audio.
2119 Apply a fade effect to the beginning, end, or both of the audio.
2131 For fade-outs, the audio will be truncated at
2141 If the audio length can be determined from the input file header and any
2146 audio stream.
2184 Apply a flanging effect to the audio.
2219 Apply amplification or attenuation to the audio signal, or, in some
2228 requires temporary file space to store the audio to be processed, so may
2229 be unsuitable for use with `streamed' audio.
2241 option, the levels of the audio channels of a multi-channel file are `equalised', i.e.
2246 the audio is not `normalised').
2285 option normalises the audio to 0dB FSD; it is often used in conjunction with a negative
2287 to the effect that the audio is normalised to a given level below 0dB.
2305 few dBs more than occasionally (in a piece of audio) is not recommended
2385 Apply a LADSPA [5] (Linux Audio Developer's Simple Plugin API) plugin.
2420 level may be given if the original audio has been equalised for some
2440 audio is first divided into bands using Linkwitz-Riley cross-over filters
2460 The audio file is played with a simulated FM radio sound (or broadcast
2469 Calculate a profile of the audio for use in noise reduction.  See the
2473 Reduce noise in the audio signal by profiling and filtering.  This
2476 effect on a section of audio that ideally would contain silence but in
2499 components of the audio signal.  Before replacing an original recording
2502 values to find the optimal one for your audio; use headphones to check
2504 sections of the audio.
2513 Normalise the audio.
2533 Pad the audio with silence, at the beginning, the end, or any
2534 specified points through the audio.
2538 the position in the input audio stream at which to insert it.
2544 if omitted correspond to the beginning and the end of the audio respectively.
2547 adds 1\*d5 seconds of silence padding at each end of the audio, whilst
2549 inserts 4000 samples of silence 3 minutes into the audio.
2550 If silence is wanted only at the end of the audio, specify either the end
2556 the audio on a channel-by-channel basis.
2559 Add a phasing effect to the audio.
2589 Change the audio pitch (but not tempo).
2603 Change the audio sampling rate (i.e. resample the audio) to any given
2635 audio playback
2650 is the percentage of the audio frequency band that is preserved and
2654 audio.  If no quality option is given, the quality level used is `high'
2655 (but see `Playing & Recording Audio' above regarding playback).
2806 Select and mix input audio channels into output audio channels.  Each output
2810 Note that this effect operates on the audio
2860 As `p', but invert the audio
2919 effect is to split an audio file into a set of files, each containing
2921 processing on individual audio channels).  Where more than a few
2936 containing six audio channels were given, the script would produce six
2945 Repeat the entire audio \fIcount\fR times, or once if \fIcount\fR is not given.
2947 Requires temporary file space to store the audio to be repeated.
2948 Note that repeating once yields two copies: the original audio and the
2949 repeated audio.
2956 Add reverberation to the audio using the `freeverb' algorithm.  A
2964 increases both the volume and the length of the audio, so to prevent clipping
2979 Reverse the audio completely.
2980 Requires temporary file space to store the audio to be reversed.
2991 Removes silence from the beginning, middle, or end of the audio.
2994 The \fIabove-periods\fR value is used to indicate if audio should be
2995 trimmed at the beginning of the audio. A value of zero indicates no
2997 non-zero \fIabove-periods\fR, it trims audio up until it finds
2998 non-silence. Normally, when trimming silence from beginning of audio
3000 values to trim all audio up to a specific count of non-silence
3001 periods. For example, if you had an audio file with two songs that
3009 trimming audio. By increasing the duration, burst of noise can be
3013 silence.  For digital audio, a value of 0 may be fine but for audio
3017 When optionally trimming silence from the end of the audio, you specify
3019 to remove all audio after silence is detected.
3024 silence in the middle of the audio.
3027 that must exist before audio is not copied any more.  By specifying
3028 a higher duration, silence that is wanted can be left in the audio.
3034 end of your audio file to trim off silence reliably.  A workaround is
3036 By first reversing the audio, you can use the \fIabove-periods\fR
3037 to reliably trim all audio from what looks like the front of the file.
3045 silence in the middle of the audio.
3049 indicates that \fIbelow-periods\fR \fIduration\fR length of audio
3118 Create a spectrogram of the audio; the audio is passed unmodified
3124 and shows time in the X-axis, frequency in the Y-axis, and audio
3127 If the audio signal contains multiple channels then these are shown
3129 for stereo audio).
3137 of the audio is required; e.g. with
3142 channel, and of thirty seconds of audio starting from twenty seconds
3188 or known audio duration to the X-axis size, or 100 otherwise.  If
3305 In order to process a smaller section of audio without affecting other
3311 This option sets the X-axis resolution such that audio with the given
3318 creates a spectrogram showing the first minute of the audio, whilst
3322 effect is applied to the entire audio signal.
3328 Start the spectrogram at the given point in the audio stream.  For
3333 creates a spectrogram showing all but the first minute of the audio
3334 (the output file, however, receives the entire audio stream).
3343 Adjust the audio speed (pitch and tempo together).  \fIfactor\fR
3364 Splice together audio sections.  This effect provides two things over
3365 simple audio concatenation: a (usually short) cross-fade is applied at
3380 Type	Audio	Fade level	Transitions
3389 effect to select the audio sections to be joined together.  As when
3393 (default 0\*d005 seconds) of audio after the ideal joining point.  The
3394 beginning of the audio section to splice on should be trimmed with the
3401 audio sections as input files and the
3404 length of the first audio section (including the excess).
3451 concatenating the audio, i.e. by appending \fBsplice 1\fR to the
3452 command. (Clicks at the beginning and end of the audio can be removed by
3461 # Audio Copy and Paste Over
3474 two splices are used to `copy and paste' audio.
3491 is given).  For example, if f1.wav and f2.wav are audio files
3501 Display time and frequency domain statistical information about the audio.
3502 Audio is passed unmodified through the SoX processing chain.
3507 is the duration of the audio in samples,
3509 is the number of audio channels,
3511 is the audio sample rate, and
3514 sample in the audio,
3523 The maximum sample value in the audio; usually this will be a positive number.
3526 The minimum sample value in the audio; usually this will be a negative number.
3530 The average of the absolute value of each sample in the audio.
3533 The average of each sample in the audio.  If this figure is non-zero, then it indicates the
3540 as the audio's average power.
3550 effect which would make the audio as loud as possible without clipping.
3558 Note that the delta measurements are not applicable for multi-channel audio.
3583 audio file.
3589 audio in SoX's internal buffer.
3598 Display time domain statistical information about the audio channels;
3599 audio is passed unmodified through the SoX processing chain.
3600 Statistics are calculated and displayed for each audio channel and,
3679 For multi-channel audio, an overall figure for each of the above
3699 is the duration in seconds of the audio, and
3725 Change the audio duration (but not its pitch).
3757 This effect can be used to generate fixed or swept frequency audio tones
3764 Audio for each channel in a multi-channel audio file can be synthesised
3767 Though this effect is used to generate audio, an input file must still
3769 synthesised audio length, the number of channels, and the sampling rate;
3770 however, since the input file's audio is not normally needed, a `null
3776 audio file containing a sine-wave swept from 300 to 3300\ Hz:
3818 This effect generates audio at maximum volume (0dBFS), which means that there
3819 is a high chance of clipping when using the audio subsequently, so
3837 \fIlen\fR is the length of audio to synthesise (any time specification);
3898 Change the audio playback speed but not its pitch. This effect uses the
3899 WSOLA algorithm. The audio is chopped up into segments which are then
3949 parameter gives the audio length in milliseconds over which
3980 Apply a tremolo (low frequency amplitude modulation) effect to the audio.
3988 Cuts portions out of the audio.  Any number of \fIposition\fRs may be
3989 given; audio is not sent to the output until the first \fIposition\fR
3991 audio at each \fIposition\fR.  Using a value of 0 for the first \fIposition\fR
3992 parameter allows copying from the beginning of the audio.
4006 will both play from 12 minutes 34 seconds into the audio up to 15 minutes into
4007 the audio (i.e. 2 minutes and 26 seconds long), then resume playing two
4008 minutes before the end of audio.
4027 from the front of the audio, so in order to trim from the back, the
4047 is suitable for use with streamed audio.
4056 other charactistics of the input audio.
4061 The amount of audio (in seconds) to search for quieter/shorter bursts
4062 of audio to include prior to the detected trigger point.
4064 Allowed gap (in seconds) between quieter/shorter bursts of audio to
4067 The amount of audio (in seconds) to preserve before the trigger point
4078 order to detect the start of the wanted audio.  This option sets the
4116 Apply an amplification or an attenuation to the audio signal.
4144 inverts the audio signal in addition to adjusting its volume.
4155 for a detailed discussion on electrical (and hence audio signal)
4173 mode, this effect will display the percentage of the audio that needed to be
4204 .IR "Cookbook formulae for audio EQ biquad filter coefficients" ,
4205 https://webaudio.github.io/Audio-EQ-Cookbook/audio-eq-cookbook.html
4224 .IR "Linux Audio Developer's Simple Plugin API" ,