sox/git/sox.1

10 '\" Replacement em-dash for nroff (default is too short).
34 SoX \- Sound eXchange, the Swiss Army knife of audio manipulation
37 \fBsox\fR [\fIglobal-options\fR] [\fIformat-options\fR] \fIinfile1\fR
38 	[[\fIformat-options\fR] \fIinfile2\fR] ... [\fIformat-options\fR] \fIoutfile\fR
39 	[\fIeffect\fR [\fIeffect-options\fR]] ...
41 \fBplay\fR [\fIglobal-options\fR] [\fIformat-options\fR] \fIinfile1\fR
42 	[[\fIformat-options\fR] \fIinfile2\fR] ... [\fIformat-options\fR]
43 	[\fIeffect\fR [\fIeffect-options\fR]] ...
45 \fBrec\fR [\fIglobal-options\fR] [\fIformat-options\fR] \fIoutfile\fR
46 	[\fIeffect\fR [\fIeffect-options\fR]] ...
53 purpose audio player or a multi-track audio recorder. It also has
58 \fBplay\fR, the output file is automatically set to be the default sound
59 device, and if invoked as \fBrec\fR, the default sound device is used as an
70 SoX is a command-line audio processing tool, particularly suited to making
100    sox recital.au \-b 16 recital.wav channels 1 rate 16k fade 3 norm
103 (down-mix to one channel, sample rate change, fade-in, nomalize),
104 and stores the result at a bit-depth of 16.
106    sox \-r 16k \-e signed \-b 8 \-c 1 voice-memo.raw voice-memo.wav
108 converts `raw' (a.k.a. `headerless') audio to a self-describing file format,
118    sox \-m music.mp3 voice.wav mixed.flac
126    play \-n \-c1 synth sin %\-12 sin %\-9 sin %\-5 sin %\-2 fade h 0.1 1 0.1
128 plays a synthesised `A minor seventh' chord with a pipe-organ sound,
130    rec \-c 2 radio.aiff trim 0 30:00
134    play \-q take1.aiff & rec \-M take1.aiff take1\-dub.aiff
137 records a new track in a multi-track recording.  Finally,
140    rec \-r 44100 \-b 16 \-e signed-integer \-p \\
142 	sox \-p song.ogg silence 1 0.50 0.1% 1 2.0 0.1% : \\
157 SoX can work with `self-describing' and `raw' audio files.
158 `self-describing' formats (e.g. WAV, FLAC, MP3) have a header that
167 sample rate
168 The sample rate in samples per second (`Hertz' or `Hz').
169 Digital telephony traditionally uses a sample rate of 8000\ Hz (8\ kHz),
175 sample size
176 The number of bits used to store each sample.  Today, 16-bit is
177 commonly used. 8-bit was popular in the early days of computer
178 audio. 24-bit is used in the professional audio arena. Other sizes are
182 The way in which each audio sample is represented (or `encoded').  Some
183 encodings have variants with different byte-orderings or bit-orderings.
186 parameters and the number of samples would imply.  Commonly-used
187 encoding types include floating-point, \(*m-law, ADPCM, signed-integer
195 The term `bit-rate' is a measure of the amount of storage occupied by an
197 above and is typically denoted as a number of kilo-bits per second
198 (kbps).  An A-law telephony signal has a bit-rate of 64 kbps. MP3-encoded
199 stereo music typically has a bit-rate of 128\-196 kbps. FLAC-encoded
200 stereo music typically has a bit-rate of 550\-760 kbps.
202 Most self-describing formats also allow textual `comments' to be
209 generating it.  Note that by default, SoX copies input file comments
227 Command-line format options.
236 Command-line format options.
244 if the file type cannot be determined. Command-line format options may
254    play existing-file.wav
258    rec new-file.wav
262    sox existing-file.wav \-d
266    sox \-d new-file.wav
277 Some systems provide more than one type of (SoX-compatible) audio
281 built-in to SoX, and the default selected by SoX when recording or playing
284 environment variable can be used to override the default.  For example
292 environment variable can be used to override the default audio device,
297    sox ... \-t oss
303    sox ... \-t alsa
308 When playing a file with a sample rate that is not supported by the
310 to perform the necessary sample rate conversion.  For
312 default \fBrate\fR quality level is set to `low'. This
316    play ... rate \-m
319 .B \-\-play\-rate\-arg
328 To help with setting a suitable recording level, SoX includes a peak-level
331    rec \-n
333 The recording level should be adjusted (using the system-provided mixer
336 See also \fB\-S\fR below.
341 is the case for many formats used in telephony (e.g. A-law, GSM) where
355 is not less than in the source format.  E.g.  converting from an 8-bit
356 PCM format to a 16-bit PCM format is lossless but converting from an
357 8-bit PCM format to (8-bit) A-law isn't.
369 effect, and finally creates the output MP3 file by re-compressing the
382 stored at a particular bit-depth. Any distortion introduced by
388 Specifically, by default, SoX automatically adds TPDF dither
389 when the output bit-depth is less than 24 and any
392 bit-depth reduction has been specified explicitly using a command-line
395 the output file format supports only bit-depths lower than that of the
398 an effect has increased effective bit-depth within the internal
405 bit-depth is 16, then SoX's internal representation will utilise 18
411 .B \-V
413 .B \-D
415 dithering manually (e.g. to select a noise-shaping curve), see the
457    sox dull.wav bright.wav gain \-6 treble +6
465 .B \-G
474 following methods: `concatenate', `sequence', `mix', `mix-power',
476 The default method is `sequence' for
488 by default) then the input files must also have the same number of
501 If either the `mix' or `mix-power' combining method is selected then two or
506 file.  A mixed audio file cannot be un-mixed without reference to the
513 input files. Un-merging is possible using multiple
521 The `multiply' combining method multiplies the sample values of
522 corresponding channels (treated as numbers in the interval \-1 to +1).
535 .B \-v
541 The \fB\-V\fR option (below) can be used to show the input file volume
552 volume (amplitude) of each input signal by a factor of \(S1/\s-2n\s+2,
565 With the `mix-power' combine method, the
568 \(S1/\s-2\(srn\s+2 instead of \(S1/\s-2n\s+2.
573 SoX's default behaviour is to take one or more input files and
576 This behaviour can be changed by specifying the pseudo-effect `newfile'
607 keyboard interrupt key which is normally Ctrl-C).  This is a natural requirement
609 that when using SoX to play multiple files, Ctrl-C behaves slightly
614 has a time period or sample count to determine the stopping
625 effect-name will not work since SoX will treat it as an effect
626 specification.  The only work-around to this is to avoid such
628 filenames have a filename `extension', whilst effect-names do not.
633 \fB\-\fR
635 filename `\-' which,
642 when using it for an input file, the file-type (see
643 .B \-t
650 .B \-
656    sox \-M "|genw \-\-imd \-" "|genw \-\-thd \-" out.wav
659 .B \-t
663 \fB\(dq\fIwildcard-filename\fB\(dq\fR
664 Specifies that filename `globbing' (wild-card matching) should be performed
669    play \-\-rate 6k *.vox
673    play \-\-rate 6k file1.vox file2.vox file3.vox
675 which will treat only the first vox file as having a sample rate of 6k.
678    play \-\-rate 6k "*.vox"
680 the given sample rate option will be applied to all three vox files.
682 \fB\-p\fR, \fB\-\-sox\-pipe\fR
687    play "|sox \-n \-p synth 2" "|sox \-n \-p synth 2 tremolo 10" stat
691 .B \-p
692 is in fact an alias for `\fB\-t sox \-\fR'.
694 \fB\-d\fR, \fB\-\-default\-device\fR
696 the default audio device (if one has been built into SoX) is to be used.
703 \fB\-n\fR, \fB\-\-null\fR
706 SoX-specific mechanism and is not related to any operating-system
720 is by default 48\ kHz, but, as with a normal
721 file, this can be overridden if desired using command-line format
735 environment variable can be used to provide alternative default values for
739    SOX_OPTS="\-\-buffer 20000 \-\-play\-rate\-arg \-hs \-\-temp /mnt/temp"
745 .B \-\-no\-clobber
746 as default might be handled better using a shell alias
751 the benefit of SOX_OPTS carrying some system-wide default options.  An
752 alternative approach is to explicitly invoke SoX with default
755    SOX_OPTS="\-V \-\-no-clobber"
757    sox \-V2 \-\-clobber $input $output ...
764    export SOX_OPTS="\-V \-\-no-clobber"
768    setenv SOX_OPTS "\-V \-\-no-clobber"
770 MS-DOS/MS-Windows:
772    set SOX_OPTS=\-V \-\-no-clobber
774 MS-Windows GUI: via Control Panel : System : Advanced : Environment
779 \fB\-\-buffer\fR \fBBYTES\fR, \fB\-\-input\-buffer\fR \fBBYTES\fR
780 Set the size in bytes of the buffers used for processing audio (default 8192).
781 .B \-\-buffer
783 .B \-\-input\-buffer
785 .B \-\-buffer
789 .B \-\-buffer
793 \fB\-\-clobber\fR
795 given for the output file.  This is the default behaviour.
797 \fB\-\-combine concatenate\fR\^|\^\fBmerge\fR\^|\^\fBmix\fR\^|\^\fBmix\-power\fR\^|\^\fBmultiply\fR…
800 .B \-m
802 .B \-M
804 .B \-T
810 \fB\-D\fR, \fB\-\-no\-dither\fR
821 \fB\-\-effects\-file \fIFILENAME\fR
831 \fB\-G\fR, \fB\-\-guard\fR
836    sox \-G infile \-b 16 outfile rate 44100 dither \-s
840    sox infile \-b 16 outfile gain \-h rate 44100 gain \-rh dither \-s
849 \fB\-h\fR, \fB\-\-help\fR
852 \fB\-\-help\-effect \fINAME\fR
856 \fB\-\-help\-format \fINAME\fR
860 \fB\-\-i\fR, \fB\-\-info\fR
866 \fB\-m\fR\^|\^\fB\-M\fR
867 Equivalent to \fB\-\-combine mix\fR and \fB\-\-combine merge\fR, respectively.
869 .B \-\-magic
873 \fB\-\-multi\-threaded\fR | \fB\-\-single\-threaded\fR
874 By default, SoX is `single threaded'.
875 If the \fB\-\-multi\-threaded\fR option is given however then SoX
876 will process audio channels for most multi-channel
877 effects in parallel on hyper-threading/multi-core architectures. This
879 this option in conjunction with a larger buffer size than is the default
880 to gain any benefit from multi-threaded processing
881 (e.g. 131072; see \fB\-\-buffer\fR above).
883 \fB\-\-no\-clobber\fR
902 \fB\-\-norm\fR[\fB=\fIdB-level\fR]
907    sox \-\-norm infile \-b 16 outfile rate 44100 dither \-s
911    sox infile \-b 16 outfile gain \-h rate 44100 gain \-nh dither \-s
916    sox \-\-norm=\-3 infile outfile
926 \fB\-\-play\-rate\-arg ARG\fR
932 \fB\-\-plot gnuplot\fR\^|\^\fBoctave\fR\^|\^\fBoff\fR
935 (the default if
936 .B \-\-plot
939 and configuration of many of the transfer-function based effects.
944    sox \-\-plot octave input-file \-n highpass 1320 > highpass.plt
948 \fB\-q\fR, \fB\-\-no\-show\-progress\fR
950 This is the opposite of the \fB\-S\fR option.
952 \fB\-R\fR
954 applicable, SoX will embed a fixed time-stamp in the output file (e.g.
960 \fB\-\-replay\-gain track\fR\^|\^\fBalbum\fR\^|\^\fBoff\fR
961 Select whether or not to apply replay-gain adjustment to input files.
962 The default is
978 \fB\-S\fR, \fB\-\-show\-progress\fR
982 output file.  Also shown is a peak-level meter, and an indication if
983 clipping has occurred.  The peak-level meter shows up to two channels
991 \-25	\-	\-11	====
992 \-23	T{
994 T}	\-9	====\-
995 \-21	=\-	\-7	=====
996 \-19	==	\-5	=====\-
997 \-17	==\-	\-3	======
998 \-15	===	\-1	=====!
999 \-13	===\-
1003 A three-second peak-held value of headroom in dBs will be shown to the right
1006 This option is enabled by default when using
1009 \fB\-T\fR\fR
1010 Equivalent to \fB\-\-combine multiply\fR.
1012 \fB\-\-temp\fI DIRECTORY\fR
1015 This can be useful if there are permission or free-space problems with the
1016 default location. In this case, using `\fB\-\-temp .\fR' (to use the
1019 \fB\-\-version\fR
1021 .IP \fB\-V\fR[\fIlevel\fR]
1050 By default, the verbosity level is set to 2 (shows errors and
1051 warnings). Each occurrence of the \fB\-V\fR option increases the
1056 .B \-V0
1063 \fB\-\-ignore\-length\fR
1068 \fB\-v\fR, \fB\-\-volume\fR \fIFACTOR\fR
1089 \fB\-b\fR \fIBITS\fR, \fB\-\-bits\fR \fIBITS\fR
1090 The number of bits (a.k.a. bit-depth or sometimes word-length) in each
1091 encoded sample.  Not applicable to complex encodings such as MP3 or GSM.
1093 A/\(*m-law, ADPCM.
1096 SoX of the number of bits per sample in a `raw' (`headerless') audio
1099    sox \-r 16k \-e signed \-b 8 input.raw output.wav
1101 converts a particular `raw' file to a self-describing `WAV' file.
1105 to set the output encoding size.  By default (i.e. if this option is
1110    sox input.cdda \-b 24 output.wav
1112 converts raw CD digital audio (16-bit, signed-integer) to a
1113 24-bit (signed-integer) `WAV' file.
1115 \fB\-c\fR \fICHANNELS\fR, \fB\-\-channels\fR \fICHANNELS\fR
1126    sox \-r 48k \-e float \-b 32 \-c 2 input.raw output.wav
1128 converts a particular `raw' file to a self-describing `WAV' file.
1130    play \-c 1 music.wav
1145    sox input.wav \-c 1 output.wav bass \-b 24
1146    sox input.wav      output.wav bass \-b 24 channels 1
1151 \fB\-e \fIENCODING\fR, \fB\-\-encoding\fR \fIENCODING\fR
1152 The audio encoding type.  Sometimes needed with file-types that
1157 .IP \fBsigned-integer\fR
1159 with a 16 or 24 \-bit encoding size.
1161 .IP \fBunsigned-integer\fR
1163 with an 8-bit encoding size.  A value of 0 represents maximum signal
1165 .IP \fBfloating-point\fR
1166 PCM data stored as IEEE 753 single precision (32-bit) or double
1167 precision (64-bit) floating-point (`real') numbers.
1169 .IP \fBa-law\fR
1171 sample.  It has a precision equivalent to roughly 13-bit PCM and is
1172 sometimes encoded with reversed bit-ordering (see the
1173 .B \-X
1175 .IP \fBu-law,\ mu-law\fR
1177 sample.  A.k.a. \(*m-law.  It has a precision equivalent to roughly
1178 14-bit PCM and is
1179 sometimes encoded with reversed bit-ordering (see the
1180 .B \-X
1182 .IP \fBoki-adpcm\fR
1183 OKI (a.k.a. VOX, Dialogic, or Intel) 4-bit ADPCM;
1184 it has a precision equivalent to roughly 12-bit PCM.
1187 .IP \fBima-adpcm\fR
1188 IMA (a.k.a. DVI) 4-bit ADPCM;
1189 it has a precision equivalent to roughly 13-bit PCM.
1190 .IP \fBms-adpcm\fR
1191 Microsoft 4-bit ADPCM; it has a precision equivalent to roughly 14-bit
1193 .IP \fBgsm-full-rate\fR
1196 formats with different bit-rates and associated speech quality.
1198 It is usually CPU-intensive to work with GSM audio.
1203 e.g. `unsigned-integer' can be given as `un', but not `u' (ambiguous
1204 with `u-law').
1209 .B \-b
1211 .B \-c
1218    sox input.cdda \-e float output1.wav
1220    sox input.cdda \-b 64 \-e float output2.wav
1222 convert raw CD digital audio (16-bit, signed-integer) to
1223 floating-point `WAV' files (single & double precision respectively).
1225 By default (i.e. if this option is not given), the output encoding
1229 \fB\-\-no\-glob\fR
1230 Specifies that filename `globbing' (wild-card matching) should not be
1232 directory contains the two files `five-seconds.wav' and `five*.wav', then
1234    play \-\-no\-glob "five*.wav"
1238 \fB\-r, \fB\-\-rate\fR \fIRATE\fR[\fBk\fR]
1239 Gives the sample rate in Hz (or kHz if appended with `k') of the file.
1242 SoX of the sample rate of a `raw' (`headerless') audio file (see the
1244 .B \-b
1246 .B \-c
1251 For example, if audio was recorded with a sample-rate of say 48k from
1254    sox \-r 48720 input.wav output.wav
1264 effect should be invoked in order to change (if necessary) the sample
1269    sox input.wav \-r 48k output.wav bass \-b 24
1270    sox input.wav        output.wav bass \-b 24 rate 48k
1276 \fB\-t\fR, \fB\-\-type\fR \fIFILE-TYPE\fR
1282    another-command | sox \-t mp3 \- output.wav
1284    sox input.wav \-t raw output.bin
1295 \fB\-L\fR, \fB\-\-endian little\fR
1297 \fB\-B\fR, \fB\-\-endian big\fR
1299 \fB\-x\fR, \fB\-\-endian swap\fR
1304 These options specify whether the byte-order of the audio data is,
1307 encoded as floating-point, or as signed or unsigned integers of 16 or
1310 self-describing files.  A given endian-setting option may be ignored
1317 file; so, for example, when the following is run on a little-endian system:
1319    sox \-B audio.s16 trimmed.s16 trim 2
1321 trimmed.s16 will be created as little-endian;
1323    sox \-B audio.s16 \-B trimmed.s16 trim 2
1325 must be used to preserve big-endianness in the output file.
1328 .B \-V
1331 \fB\-N\fR, \fB\-\-reverse\-nibbles\fR
1333 sometimes useful with ADPCM-based formats.
1337 .B \-x
1340 \fB\-X\fR, \fB\-\-reverse\-bits\fR
1346 .B \-x
1352 \fB\-\-add\-comment \fITEXT\fR
1355 \fB\-\-comment \fITEXT\fR
1359 SoX will provide a default comment if this option (or
1363 .B "\-\-comment \(dq\(dq" .
1365 \fB\-\-comment\-file \fIFILENAME\fR
1369 \fB\-C\fR, \fB\-\-compression\fR \fIFACTOR\fR
1371 this option is not given then a default compression factor will apply.
1382 Note that applying multiple effects in real-time (i.e. when playing audio)
1388 the global SoX option \fB\-M\fR can be used to isolate then recombine
1389 tracks from a multi-track recording.
1409 .B \-\-buffer
1410 option and it should be kept small, relative to the sample rate, if
1416 There are a few pseudo-effects that aid using multiple effects chains.
1422 which will move back to the first effects chain.  Pseudo-effects
1442 Where applicable, default values for optional parameters are shown in parenthesis ( ).
1460 [\fB=\fR\^|\^\fB+\fR\^|\^\fB\-\fR]\fItimespec\fR, where \fItimespec\fR is a
1463 (\fB=\fR) or end (\fB\-\fR) of audio, or to the previous \fIposition\fR if
1465 must be known for end-relative locations to work; some effects do accept
1466 \fB\-0\fR for end-of-audio, though, even if the length is unknown.  Which of
1467 \fB=\fR, \fB+\fR, \fB\-\fR is the default depends on the effect and is shown
1470 Examples: \fB=2:00\fR (two minutes into the audio stream), \fB\-100s\fR (one
1472 and ten samples after the previous position), \fB\-0.5+1s\fR (one sample less
1476 Used to specify the band-width of a filter.  A number of different
1489 q	Q-factor	See [2]
1493 For each effect that uses this parameter, the default method (i.e. if no
1509 Specifies the number of samples directly, as in `8000s'.  For large sample
1512 Time specifications can also be chained with \fB+\fR or \fB\-\fR into a new
1514 left, respectively: `3:00\-200s' means two hundred samples less than three
1518 .B sox \-h
1525 Apply a two-pole all-pass filter with central frequency (in Hz)
1526 \fIfrequency\fR, and filter-width \fIwidth\fR.
1527 An all-pass filter changes the
1528 audio's frequency to phase relationship without changing its frequency
1531 This effect supports the \fB\-\-plot\fR global option.
1533 \fBband\fR [\fB\-n\fR] \fIcenter\fR[\fBk\fR]\fR [\fIwidth\fR[\fBh\fR\^|\^\fBk\fR\^|\^\fBo\fR\^|\^\f…
1534 Apply a band-pass filter.
1548 \-
1554 The \fB\-n\fR (for noise) option uses the alternate mode
1555 for un-pitched audio (e.g. percussion).
1557 \fB\-n\fR introduces a power-gain of about 11dB in the filter, so beware
1565 This effect supports the \fB\-\-plot\fR global option.
1569 \fBbandpass\fR\^|\^\fBbandreject\fR [\fB\-c\fR] \fIfrequency\fR[\fBk\fR]\fI width\fR[\fBh\fR\^|\^\f…
1570 Apply a two-pole Butterworth band-pass or band-reject filter with
1571 central frequency \fIfrequency\fR, and (3dB-point) band-width
1573 .B \-c
1577 default: constant 0dB peak gain.
1581 These effects support the \fB\-\-plot\fR global option.
1586 Apply a band-reject filter.
1591 using a two-pole shelving filter with a response similar to that
1592 of a standard hi-fi's tone-controls.  This is also
1597 useful range is about \-20 (for a large cut) to +20 (for a large
1603 If desired, the filter can be fine-tuned using the following
1608 cut.  The default value is 100\ Hz (for \fBbass\fR) or 3\ kHz (for
1615 `slope' (the default, or if appended with `\fBs\fR') may be used.
1618 default value is 0\*d5.
1622 These effects support the \fB\-\-plot\fR global option.
1626 \fBbend\fR [\fB\-f \fIframe-rate\fR(25)] [\fB\-o \fIover-sample\fR(16)] { \fIstart-position(+)\fB,\…
1628 Each given triple: \fIstart-position\fB,\fIcents\fB,\fIend-position\fR
1634 The pitch-bending algorithm utilises the Discrete Fourier Transform (DFT)
1635 at a particular frame rate and over-sampling rate.
1637 .B \-f
1639 .B \-o
1647    play \-n synth 2.5 sin 667 gain 1 \\
1648 	bend .35,180,.25  .15,740,.53  0,\-520,.3
1654 .B gain\ \-5
1666 This effect supports the \fB\-\-plot\fR global option.
1677 effect is invoked automatically if SoX's \fB\-c\fR option specifies a
1680 .B \-c
1685    sox input.wav \-c 1 output.wav bass \-b 24
1686    sox input.wav      output.wav bass \-b 24 channels 1
1695 \fBchorus \fIgain-in gain-out\fR <\fIdelay decay speed depth \fB\-s\fR\^|\^\fB\-t\fR>
1708 Each four-tuple parameter
1710 and the decay (relative to gain-in) with a modulation
1712 The modulation is either sinusoidal (\fB\-s\fR) or triangular
1713 (\fB\-t\fR).  Gain-out is the volume of the output.
1719    play guitar1.wav chorus 0.7 0.9 55 0.4 0.25 2 \-t
1724    play guitar1.wav chorus 0.6 0.9 50 0.4 0.25 2 \-t \\
1725 	 60 0.32 0.4 1.3 \-s
1730    play guitar1.wav chorus 0.5 0.9 50 0.4 0.25 2 \-t \\
1731 	 60 0.32 0.4 2.3 \-t 40 0.3 0.3 1.3 \-s
1735 [\fIsoft-knee-dB\fB:\fR]\fIin-dB1\fR[\fB,\fIout-dB1\fR]{\fB,\fIin-dB2\fB,\fIout-dB2\fR}
1737 [\fIgain\fR [\fIinitial-volume-dB\fR [\fIdelay\fR]]]
1764 .I out-dB1
1766 .IR in-dB1 ;
1768 .I in-dB1
1771 \fB0,\fIout-dBn\fR).
1773 .I soft-knee-dB
1790 .B \-90
1812    sox asz.wav asz-car.wav compand 0.3,1 6:\-70,\-60,\-20 \-5 \-90 0.2
1814 The transfer function (`6:\-70,...') says that very soft sounds (below
1815 \-70dB) will remain unchanged.  This will stop the compander from
1817 However, sounds in the range \-60dB to 0dB (maximum
1819 original music will be compressed 3-to-1 into a 20dB range, which is
1821 road noise.  The `6:' selects 6dB soft-knee companding.
1822 The \-5 (dB) output gain is needed to avoid clipping (the number is
1824 The \-90 (dB) for the initial volume will work fine for a clip that starts
1828 In the next example, compand is being used as a noise-gate for when the
1831    play infile compand .1,.2 \-inf,\-50.1,\-inf,\-50,\-50 0 \-90 .1
1833 Here is another noise-gate, this time for when the
1837    play infile compand .1,.1 \-45.1,\-45,\-inf,0,\-inf 45 \-90 .1
1839 This effect supports the \fB\-\-plot\fR global option (for the transfer function).
1843 for a multiple-band companding effect.
1845 \fBcontrast \fR[\fIenhancement-amount\fR(75)]
1848 .I enhancement-amount
1849 controls the amount of the enhancement and is a number in the range 0\-100.
1851 .I enhancement-amount
1872 of \(+-2 that indicates the amount to shift the audio (which is in the
1873 range of \(+-1).
1892    sox \-n dc.wav synth 5 sin %0 50
1897 Apply Compact Disc (IEC 60908) de-emphasis (a treble attenuation shelving
1900 Pre-emphasis was applied in the mastering of some CDs issued in the early
1902 sought-after issues of albums by The Beatles, Pink Floyd and others.
1903 Pre-emphasis should be removed at playback time by a de-emphasis
1905 this filter, and very few PC CD drives have it; playing pre-emphasised
1906 audio without the correct de-emphasis filter results in audio that sounds harsh
1911 effect, it is possible to apply the necessary de-emphasis to audio that
1912 has been extracted from a pre-emphasised CD, and then either burn the
1913 de-emphasised audio to a new CD (which will then play correctly on any
1914 CD player), or simply play the correctly de-emphasised audio files on the
1917    sox track1.wav track1\-deemph.wav deemph
1919 and then burn track1-deemph.wav to CD, or
1921    play track1\-deemph.wav
1927 The de-emphasis filter is implemented as a biquad and requires the input
1928 audio sample rate to be either 44.1kHz or 48kHz.  Maximum deviation
1931 This effect supports the \fB\-\-plot\fR global option.
1943 present un-delayed.
1947    play \-n synth \-j 3 sin %3 sin %\-2 sin %\-5 sin %\-9 \\
1948 	sin %\-14 sin %\-21 fade h .01 2 1.5 delay \\
1949 	1.3 1 .76 .54 .27 remix \- fade h 0 2.7 2.5 norm \-1
1954    play \-n synth pl G2 pl B2 pl D3 pl G3 pl D4 pl G4 \\
1955 	delay 0 .05 .1 .15 .2 .25 remix \- fade 0 4 .1 norm \-1
1958 \fBdither\fR [\fB\-S\fR\^|\^\fB\-s\fR\^|\^\fB\-f \fIfilter\fR] [\fB\-a\fR] [\fB\-p \fIprecision\fR]
1962 sample size is less than 24 bits.  With no options, this effect will
1963 add triangular (TPDF) white noise.  Noise-shaping (only for certain
1964 sample rates) can be selected with
1967 .B \-f
1968 option, it is possible to select a particular noise-shaping filter from
1969 the following list: lipshitz, f-weighted, modified-e-weighted,
1970 improved-e-weighted, gesemann, shibata, low-shibata, high-shibata.  Note
1971 that most filter types are available only with 44100Hz sample rate.  The
1977 noise-shaping curves.
1980 .B \-S
1983 plain TPDF is probably better, and above \(~~ 37k, noise-shaping
1987 .B \-a
1988 option enables a mode where dithering (and noise-shaping if applicable)
1992 dithering is not fool-proof, so the fades should be carefully checked
1993 for any noise modulation; if this occurs, then either re-dither the whole
2000 .B \-p
2004 .B \-R
2005 option is not given, then the pseudo-random number generator used to
2022 For a general resampling effect with anti-aliasing, see \fBrate\fR.  See
2033 \fBecho \fIgain-in gain-out\fR <\fIdelay decay\fR>
2046 and the decay (relative to gain-in) of that echo.
2047 Gain-out is the volume of the output.
2068 \fBechos \fIgain-in gain-out\fR <\fIdelay decay\fR>
2073 and the decay (relative to gain-in) of that echo.
2074 Gain-out is the volume of the output.
2082 The sample will be bounced twice in symmetric echos:
2086 The sample will be bounced twice in asymmetric echos:
2090 The sample will sound as if played in a garage:
2096 Apply a two-pole peaking equalisation (EQ) filter.
2097 With this filter, the signal-level at and around a selected frequency
2098 can be increased or decreased, whilst (unlike band-pass and band-reject
2102 \fIwidth\fR, the band-width,
2114 This effect supports the \fB\-\-plot\fR global option.
2118 \fBfade\fR [\fItype\fR] \fIfade-in-length\fR [\fIstop-position(=)\fR [\fIfade-out-length\fR]]
2125 and \fBp\fR for inverted parabola.  The default is logarithmic.
2127 A fade-in starts from the first sample and ramps the signal level from 0
2128 to full volume over the time given as \fIfade-in-length\fR.  Specify 0 if
2129 no fade-in is wanted.
2131 For fade-outs, the audio will be truncated at
2132 .I stop-position
2134 interval of \fIfade-out-length\fR before the \fIstop-position\fR.  If
2135 .I fade-out-length
2137 \fIfade-in-length\fR.
2138 No fade-out is performed if
2139 .I stop-position
2142 previous effects, then \fB\-0\fR (or, for historical reasons, \fB0\fR) may
2144 .I stop-position
2145 to indicate the usual case of a fade-out that ends at the end of the input
2148 Any time specification may be used for \fIfade-in-length\fR and
2149 \fIfade-out-length\fR.
2155 \fBfir\fR [\fIcoefs-file\fR\^|\^\fIcoefs\fR]
2159 containing the filter coefficients (white-space separated; may contain
2160 `#' comments).  If the given filename is `\-', or if no argument is
2165    sox infile outfile fir 0.0195 \-0.082 0.234 0.891 \-0.145 0.043
2174      1.2311233052619888e\-01
2175     \-4.4777096106211783e\-01
2176      5.1031563346705155e\-01
2177     \-6.6502926320995331e\-02
2181 This effect supports the \fB\-\-plot\fR global option.
2183 \fBflanger\fR [\fIdelay depth regen width speed shape phase interp\fR]
2193 \ 	Range	Default	Description
2194 delay	0 \- 30	0	Base delay in milliseconds.
2195 depth	0 \- 10	2	Added swept delay in milliseconds.
2196 regen	\-95 \- 95	0	T{
2200 width	0 \- 100	71	T{
2204 speed	0\*d1 \- 10	0\*d5	Sweeps per second (Hz).
2206 phase	0 \- 100	25	T{
2208 Swept wave percentage phase-shift for multi-channel (e.g. stereo) flange;
2209 0 = 100 = same phase on each channel.
2213 Digital delay-line interpolation: \fBlinear\fR\^|\^\fBquadratic\fR.
2218 \fBgain \fR[\fB\-e\fR\^|\^\fB\-B\fR\^|\^\fB\-b\fR\^|\^\fB\-r\fR] [\fB\-n\fR] [\fB\-l\fR\^|\^\fB\-h\…
2227 .B \-n
2232 .I gain-dB
2236 .I gain-dB
2240 .B \-e
2241 option, the levels of the audio channels of a multi-channel file are `equalised', i.e.
2249 .B \-B
2255 .B \-B
2260 .B \-B
2263 .B \-b
2265 .B \-B
2270 .B \-B
2272 .B \-b
2276 .B \-r
2280 .B \-h
2284 .B \-n
2286 .I gain-dB
2290    sox infile outfile gain \-n
2294    sox infile outfile gain \-n \-3
2296 normalises to \-3dB.
2299 .B \-l
2302    sox infile outfile gain \-l 6
2312 .B \-h
2313 option is used to apply gain to provide head-room for subsequent
2316    sox infile outfile gain \-h bass +6
2322 \fBgain \-h\fR rather than an explicit attenuation, is that if the
2324 \fBgain \-r\fR, for example:
2326    sox infile outfile gain \-h bass +6 rate 44100 gain \-r
2332 Output formatting (dithering and bit-depth reduction) also requires
2335    sox infile outfile gain \-h bass +6 rate 44100 gain \-rh dither
2343 .B \-G
2344 can be given to automatically invoke \fBgain \-h\fR and \fBgain \-r\fR.
2352 \fBhighpass\fR\^|\^\fBlowpass\fR [\fB\-1\fR|\fB\-2\fR] \fIfrequency\fR[\fBk\fR]\fR [\fRwidth\fR[\fB…
2353 Apply a high-pass or low-pass filter with 3dB point \fIfrequency\fR.
2354 The filter can be either single-pole (with
2356 or double-pole (the default, or with
2359 applies only to double-pole filters;
2360 the default is Q = 0\*d707 and gives a Butterworth response.  The filters
2362 double-pole filters are described in detail in [1].
2364 These effects support the \fB\-\-plot\fR global option.
2366 See also \fBsinc\fR for filters with a steeper roll-off.
2368 \fBhilbert\fR [\fB\-n \fItaps\fR]
2369 Apply an odd-tap Hilbert transform filter, phase-shifting the signal
2376 An odd-tap Hilbert transform filter has a bandpass characteristic,
2379 \fB\-n\fR.  By default, the number of taps is chosen for a cutoff
2382 This effect supports the \fB\-\-plot\fR global option.
2384 \fBladspa\fR [\fB-l\fR\^|\^\fB-r\fR] \fImodule\fR [\fIplugin\fR] [\fIargument\fR ...]
2386 Despite the name, LADSPA is not Linux-specific, and a wide range of
2392 default values if possible.
2397 .B \-r
2398 (replicate) option allows cloning a mono plugin to handle multi-channel
2403 .B \-l
2422 A default gain of \-10dB is used if a
2430 \fBlowpass\fR [\fB\-1\fR|\fB\-2\fR] \fIfrequency\fR[\fBk\fR]\fR [\fRwidth\fR[\fBq\fR\^|\^\fBo\fR\^|…
2431 Apply a low-pass filter.
2435 [\fIsoft-knee-dB\fB:\fR]\fIin-dB1\fR[\fB,\fIout-dB1\fR]{\fB,\fIin-dB2\fB,\fIout-dB2\fR}
2437 [\fIgain\fR [\fIinitial-volume-dB\fR [\fIdelay\fR]]]\(dq {\fIcrossover-freq\fR[\fBk\fR] \(dqattack1…
2439 The multi-band compander is similar to the single-band compander but the
2440 audio is first divided into bands using Linkwitz-Riley cross-over filters
2444 frequency for that band is given by \fIcrossover-freq\fR; these can be
2447 For example, the following (one long) command shows how multi-band
2451    play track1.wav gain \-3 sinc \-n 29 \-b 100 8000 mcompand \\
2452 	\(dq0.005,0.1 \-47,\-40,\-34,\-34,\-17,\-33\(dq 100 \\
2453 	\(dq0.003,0.05 \-47,\-40,\-34,\-34,\-17,\-33\(dq 400 \\
2454 	\(dq0.000625,0.0125 \-47,\-40,\-34,\-34,\-15,\-33\(dq 1600 \\
2455 	\(dq0.0001,0.025 \-47,\-40,\-34,\-34,\-31,\-31,\-0,\-30\(dq 6400 \\
2456 	\(dq0,0.025 \-38,\-31,\-28,\-28,\-0,\-25\(dq \\
2457 	gain 15 highpass 22 highpass 22 sinc \-n 255 \-b 16 \-17500 \\
2458 	gain 9 lowpass \-1 17801
2462 Note that the pipeline is set up with US-style 75us pre-emphasis.
2466 for a single-band companding effect.
2468 \fBnoiseprof\fR [\fIprofile-file\fR]
2472 \fBnoisered\fR [\fIprofile-file\fR [\fIamount\fR]]
2479 profile to \fIprofile-file\fR, or to stdout if no \fIprofile-file\fR or
2480 if `\-' is given.  E.g.
2482    sox speech.wav \-n trim 0 1.5 noiseprof speech.noise-profile
2490 .IR profile-file ,
2491 or from stdin if no \fIprofile-file\fR or if `\-' is given.  E.g.
2493    sox speech.wav cleaned.wav noisered speech.noise-profile 0.3
2497 number between 0 and 1 with a default of 0\*d5.  Higher numbers will
2500 with a noise-reduced version, experiment with different
2509    sox noisy.wav \-n trim 0 1 noiseprof | play noisy.wav noisered
2512 \fBnorm\fR [\fIdB-level\fR]
2515 is just an alias for \fBgain \-n\fR; see the
2520 Out Of Phase Stereo effect.
2521 Mixes stereo to twin-mono where each mono channel contains the
2530 in the over-driven output.
2551 position or specify a zero-length pad at the start.
2556 the audio on a channel-by-channel basis.
2558 \fBphaser \fIgain-in gain-out delay decay speed\fR [\fB\-s\fR\^|\^\fB\-t\fR]
2563 and the decay (relative to gain-in) with a modulation
2565 The modulation is either sinusoidal (\fB\-s\fR) \*mpreferable for multiple
2567 (\fB\-t\fR) \*mgives single instruments a sharper phasing effect.
2569 feedback, and usually no less than 0\*d1.  Gain-out is the volume of the output.
2573    play snare.flac phaser 0.8 0.74 3 0.4 0.5 \-t
2577    play snare.flac phaser 0.9 0.85 4 0.23 1.3 \-s
2581    play snare.flac phaser 0.89 0.85 1 0.24 2 \-t
2585    play snare.flac phaser 0.6 0.66 3 0.6 2 \-t
2588 \fBpitch \fR[\fB\-q\fR] \fIshift\fR [\fIsegment\fR [\fIsearch\fR [\fIoverlap\fR]]]
2602 \fBrate\fR [\fB\-q\fR\^|\^\fB\-l\fR\^|\^\fB\-m\fR\^|\^\fB\-h\fR\^|\^\fB\-v\fR] [override-options] \…
2605 (even non-integer if this is supported by the output file format)
2614 Band-width
2619 \-q	T{
2629 \-l	low	80%	100	T{
2633 \-m	medium	95%	100	T{
2637 \-h	high	95%	125	T{
2639 16-bit mastering (use with dither)
2641 \-v	T{
2644 T}	95%	175	24-bit mastering
2649 .I Band-width
2658 band-limited interpolation.  By default, all algorithms have
2659 a `linear' phase response; for `medium', `high' and
2660 `very high', the phase response is configurable (see below).
2664 effect is invoked automatically if SoX's \fB\-r\fR option specifies a
2667 .B \-r
2672    sox input.wav \-r 48k output.wav bass \-b 24
2673    sox input.wav        output.wav bass \-b 24 rate 48k
2689 Occasionally, however, it may be desirable to fine-tune the resampler's
2697 \-M/\-I/\-L	Phase response = minimum/intermediate/linear
2698 \-s	Steep filter (band-width = 99%)
2699 \-a	Allow aliasing/imaging above the pass-band
2700 \-b\ 74\-99\*d7	Any band-width %
2701 \-p\ 0\-100	T{
2703 Any phase response (0 = minimum, 25 = intermediate, 50 = linear, 100 = maximum)
2715 (`pre-echo') than if they occur after it (`post-echo').  Note that
2720 A phase response setting may be used to control the distribution of any
2722 `pre' and `post': with minimum phase, there is no pre-echo but the
2723 longest post-echo; with linear phase, pre and post echo are in equal
2725 phase setting attempts to find the best compromise by selecting a small
2726 length (and level) of pre-echo and a medium lengthed post-echo.
2728 Minimum, intermediate, or linear phase response is selected using the
2732 .B \-L
2733 option; a custom phase response can be created with the
2734 .B \-p
2735 option.  Note that phase responses between `linear' and `maximum'
2738 A resampler's band-width setting determines how much of the frequency
2739 content of the original signal (w.r.t. the original sample rate when
2740 up-sampling, or the new sample rate when down-sampling) is preserved
2741 during conversion.  The term `pass-band' is used to refer to all frequencies
2742 up to the band-width point (e.g. for 44\*d1kHz sampling rate, and a
2743 resampling band-width of 95%, the pass-band represents frequencies from
2744 0Hz (D.C.) to circa 21kHz).  Increasing the resampler's band-width
2749 .B \-s
2750 `steep filter' option changes resampling band-width from the default 95%
2752 .B \-b
2753 option allows the band-width to be set to any value in the range
2754 74\-99\*d7 %, but note that band-width values greater than 99% are not
2758 .B \-a
2759 option is given, then aliasing/imaging above the pass-band is allowed.  For
2761 resampling band-width of 95%, this means that frequency content above
2762 21kHz can be distorted; however, since this is above the pass-band (i.e.
2767 the minimum band-width allowable with
2768 .B \-b
2773    sox input.wav \-b 16 output.wav rate \-s \-a 44100 dither \-s
2775 default (high) quality resampling; overrides: steep filter, allow
2776 aliasing; to 44\*d1kHz sample rate; noise-shaped dither to 16-bit WAV
2779    sox input.wav \-b 24 output.aiff rate \-v \-I \-b 90 48k
2781 very high quality resampling; overrides: intermediate phase, band-width 90%;
2782 to 48k sample rate; store output to 24-bit AIFF file.
2798 \fBremix\fR [\fB\-a\fR\^|\^\fB\-m\fR\^|\^\fB\-p\fR] <\fIout-spec\fR>
2799 \fIout-spec\fR	= \fIin-spec\fR{\fB,\fIin-spec\fR} | \fB0\fR
2801 \fIin-spec\fR	= [\fIin-chan\fR]\^[\fB\-\fR[\fIin-chan2\fR]]\^[\fIvol-spec\fR]
2803 \fIvol-spec\fR	= \fBp\fR\^|\^\fBi\fR\^|\^\fBv\^\fR[\fIvolume\fR]
2807 channel is specified, in turn, by a given \fIout-spec\fR: a list of
2813 .B \-m
2816 are mix-combined before entering the effects chain).
2819 .I out-spec
2820 contains comma-separated input channel-numbers and hyphen-delimited
2821 channel-number ranges; alternatively,
2831    sox input.wav output.wav remix 1\-3,7 3
2834 is a mix-down of input channels 1, 2, 3, and 7, and the right channel is
2838 right of the hyphen are optional and default to 1 and to the number of input
2841    sox input.wav output.wav remix \-
2843 performs a mix-down of all input channels to mono.
2845 By default, where an output channel is mixed from multiple (n) input
2846 channels, each input channel will be scaled by a factor of \(S1/\s-2n\s+2.
2848 of input channels with a \fIvol-spec\fR (volume specification).
2864 1 = no change, 0\*d5 \(~= 6dB attenuation, 2 \(~= 6dB gain, \-1 = invert
2870 .I out-spec
2872 .I vol-spec
2873 then, by default, \(S1/\s-2n\s+2 scaling is not applied to any other channels in the
2874 same out-spec (though may be in other out-specs).
2875 The \-a (automatic)
2883    sox input.wav output.wav remix \-a 1,2 3,4v0.8
2887 The \-m (manual) option disables all automatic volume adjustments, so
2889    sox input.wav output.wav remix \-m 1,2 3,4v0.8
2907 If the \fB\-p\fR option is given, then any automatic \(S1/\s-2n\s+2 scaling
2908 is replaced by \(S1/\s-2\(srn\s+2 (`power') scaling; this gives a louder mix
2926 chans=\`soxi \-c "$1"\`
2927 while [ $chans \-ge 1 ]; do
2929    out=\`echo "$1"|sed "s/\\(.*\\)\\.\\(.*\\)/\\1\-$chans0.\\2/"\`
2931    chans=\`expr $chans \- 1\`
2938 .IR input-01.wav ,
2939 \fIinput-02.wav\fR, ...,
2940 .IR input-06.wav .
2944 \fBrepeat\fR [\fIcount\fR(1)|\fB\-\fR]
2946 The special value \fB\-\fR requests infinite repetition.
2951 \fBreverb\fR [\fB\-w\fR|\fB\-\-wet-only\fR] [\fIreverberance\fR (50%) [\fIHF-damping\fR (50%)
2952 [\fIroom-scale\fR (100%) [\fIstereo-depth\fR (100%)
2954 [\fIpre-delay\fR (0ms) [\fIwet-gain\fR (0dB)]]]]]]
2967    play dry.wav gain \-3 pad 0 3 reverb
2970 .B \-w
2974    play \-m voice.wav "|sox voice.wav \-p reverse reverb \-w reverse"
2986 This effect supports the \fB\-\-plot\fR global option.
2988 \fBsilence \fR[\fB\-l\fR] \fIabove-periods\fR [\fIduration threshold\fR[\fBd\fR\^|\^\fB%\fR]
2989 [\fIbelow-periods duration threshold\fR[\fBd\fR\^|\^\fB%\fR]]
2994 The \fIabove-periods\fR value is used to indicate if audio should be
2997 non-zero \fIabove-periods\fR, it trims audio up until it finds
2998 non-silence. Normally, when trimming silence from beginning of audio
2999 the \fIabove-periods\fR will be 1 but it can be increased to higher
3000 values to trim all audio up to a specific count of non-silence
3003 an \fIabove-period\fR of 2 to strip out both silence periods and the
3006 When \fIabove-periods\fR is non-zero, you must also specify a
3008 amount of time that non-silence must be detected before it stops
3012 \fIthreshold\fR is used to indicate what sample value you should treat as
3018 a \fIbelow-periods\fR count.  In this case, \fIbelow-period\fR means
3023 at the end, you could set below-period to a value of 2 to skip over the
3026 For \fIbelow-periods\fR, \fIduration\fR specifies a period of silence
3036 By first reversing the audio, you can use the \fIabove-periods\fR
3041 \fIbelow-periods\fR that is negative.  This value is then
3044 \fIabove-periods\fR, making it suitable for removing periods of
3048 .B \-l
3049 indicates that \fIbelow-periods\fR \fIduration\fR length of audio
3055 number is interpreted as a sample count, not as a number of seconds.
3063 to indicate a percentage of maximum value of the sample value
3070    rec \fIparameters filename other-effects\fR silence 1 5 2%
3074 …-a\fI att\fR\^|\^\fB\-b\fI beta\fR] [\fB\-p\fI phase\fR\^|\^\fB\-M\fR\^|\^\fB\-I\fR\^|\^\fB\-L\fR]…
3076 Apply a sinc kaiser-windowed low-pass, high-pass, band-pass, or band-reject filter
3079 6dB points of a high-pass and low-pass filter that may be invoked
3081 given, then \fIfreqHP\fR less than \fIfreqLP\fR creates a band-pass filter,
3082 \fIfreqHP\fR greater than \fIfreqLP\fR creates a band-reject filter.
3086    sinc -4k
3087    sinc 3k-4k
3088    sinc 4k-3k
3090 create a high-pass, low-pass, band-pass, and band-reject filter
3093 The default stop-band attenuation of 120dB can be overridden with
3094 \fB\-a\fR; alternatively, the kaiser-window `beta' parameter can be
3095 given directly with \fB\-b\fR.
3097 The default transition band-width of 5% of the total band can be
3098 overridden with \fB\-t\fR (and \fItbw\fR in Hertz); alternatively, the
3099 number of filter taps can be given directly with \fB\-n\fR.
3101 If both \fIfreqHP\fR and \fIfreqLP\fR are given, then a \fB\-t\fR or
3102 \fB\-n\fR option given to the left of the frequencies applies to both
3111 .B \-L
3112 options control the filter's phase response; see the \fBrate\fR effect
3115 This effect supports the \fB\-\-plot\fR global option.
3120 \fBsox \-\-help\fR and check the list of supported effects to see if
3124 and shows time in the X-axis, frequency in the Y-axis, and audio
3125 signal magnitude in the Z-axis.  Z-axis values are represented by the
3126 colour (or optionally the intensity) of the pixels in the X-Y plane.
3133    sox my.wav \-n spectrogram
3139    sox my.wav \-n remix 2 trim 20 30 spectrogram
3147    sox my.wav \-n rate 6k spectrogram
3153    sox my.wav \-n trim 0 10 spectrogram \-x 600 \-y 200 \-z 100
3157 by 200 pixels in size and the Z-axis range will be 100 dB).  Note that
3161    sox \-n \-n synth 6 tri 10k:14k spectrogram \-z 100 \-w kaiser
3170    rate 2k spectrogram \-X 200 \-Z \-10 \-w kaiser
3172 Options are also available to control the appearance (colour-set,
3175    sox my.wav \-n spectrogram \-m \-l \-o print.png
3182 .IP \fB\-x\ \fInum\fR
3183 Change the (maximum) width (X-axis) of the spectrogram from its default
3185 See also \fB\-X\fR and \fB\-d\fR.
3186 .IP \fB\-X\ \fInum\fR
3187 X-axis pixels/second; the default is auto-calculated to fit the given
3188 or known audio duration to the X-axis size, or 100 otherwise.  If
3189 given in conjunction with \fB\-d\fR, this option affects the width of
3198 .B \-V
3200 See also \fB\-x\fR and \fB\-d\fR.
3201 .IP \fB\-y\ \fInum\fR
3202 Sets the Y-axis size in pixels (per channel); this is the number of
3205 number is not one more than a power of two (e.g. 129).  By default the
3206 Y-axis size is chosen automatically (depending on the number of
3208 .B \-Y
3210 .IP \fB\-Y\ \fInum\fR
3211 Sets the target total height of the spectrogram(s).  The default value
3212 is 550 pixels.  Using this option (and by default), SoX will choose a
3218 .B \-y
3220 .IP \fB\-z\ \fInum\fR
3221 Z-axis (colour) range in dB, default 120.  This sets the dynamic-range
3222 of the spectrogram to be \-\fInum\fR\ dBFS to 0\ dBFS.
3224 may range from 20 to 180.  Decreasing dynamic-range effectively
3226 .IP \fB\-Z\ \fInum\fR
3227 Sets the upper limit of the Z-axis in dBFS.
3232 .IP \fB\-n\fR
3234 are shown using the brightest colour in the palette - a kind of
3235 automatic \fB\-Z\fR flag.
3236 .IP \fB\-q\ \fInum\fR
3237 Sets the Z-axis quantisation, i.e. the number of different colours (or
3238 intensities) in which to render Z-axis
3239 values.  A small number (e.g. 4) will give a `poster'-like effect making
3243 colours to use inside the Z-axis range; two colours are reserved to
3244 represent out-of-range values.
3245 .IP \fB\-w\ \fIname\fR
3246 Window: Hann (default), Hamming, Bartlett, Rectangular, Kaiser or Dolph.  The
3249 `window function'.  By default, SoX uses the Hann window which has good
3250 all-round frequency-resolution and dynamic-range properties.  For better
3251 frequency resolution (but lower dynamic-range), select a Hamming window;
3252 for higher dynamic-range (but poorer frequency-resolution), select a
3254 .IP \fB\-W\ \fInum\fR
3258 .IP \fB\-s\fR
3262 .B \-x
3264 .IP \fB\-m\fR
3265 Creates a monochrome spectrogram (the default is colour).
3266 .IP \fB\-h\fR
3267 Selects a high-colour palette\*mless visually pleasing than the default
3272 .IP \fB\-p\ \fInum\fR
3276 parameter, from 1 (the default) to 6, selects the permutation.
3277 .IP \fB\-l\fR
3279 default has a dark background).
3280 .IP \fB\-a\fR
3283 .IP \fB\-r\fR
3285 .IP \fB\-A\fR
3286 Selects an alternative, fixed colour-set.  This is provided only for
3289 differentiation at the bottom end which results in masking of low-level
3291 .IP \fB\-t\ \fItext\fR
3293 .IP \fB\-c\ \fItext\fR
3296 .IP \fB\-o\ \fIfile\fR
3297 Name of the spectrogram output PNG file, default `spectrogram.png'.
3298 If `-' is given, the spectrogram will be sent to standard output
3310 .IP \fB\-d\ \fIduration\fR
3311 This option sets the X-axis resolution such that audio with the given
3313 (a time specification) fits the selected (or default) X-axis width.  For
3316    sox input.mp3 output.wav \-n spectrogram \-d 1:00 stats
3325 .B \-X
3326 for an alternative way of setting the X-axis resolution.
3327 .IP \fB\-S\ \fIposition(=)\fR
3331    sox input.aiff output.wav spectrogram \-S 1:00
3338 For the ability to perform off-line processing of spectral data, see the
3351 Technically, the speed effect only changes the sample rate information,
3353 automatically to resample to the output sample rate, using its default
3363 \fBsplice \fR [\fB\-h\fR\^|\^\fB\-t\fR\^|\^\fB\-q\fR] { \fIposition(=)\fR[\fB,\fIexcess\fR[\fB,\fIl…
3365 simple audio concatenation: a (usually short) cross-fade is applied at
3373 .B \-q
3374 may be given to select the fade envelope as half-cosine wave (the default),
3375 triangular (a.k.a. linear), or quarter-cosine wave respectively.
3411     -----------><--->
3416                 *   : :   * - - *
3421                       <--->   <----->
3435    sox too-long.wav part1.wav trim 0 30.130
3439    sox too-long.wav part2.wav trim 1:03.422
3443    sox part1.wav part2.wav just-right.wav splice 30.130
3447    play "|sox \-n \-p synth 1 sin %1" "|sox \-n \-p synth 1 sin %3"
3462 # acpo infile copy-start copy-stop paste-over-start outfile
3464 # (i.e. such that contain +/\-).
3465 e=0.005                      # Using default excess
3467 sox "$1" piece.wav trim $2\-$e\-$l =$3+$e
3469 sox "$1" part2.wav trim $4+$3\-$2\-$e\-$l
3471    splice $4+$e +$3\-$2+$e+$l+$e
3482 It is also possible to use this effect to perform general cross-fades,
3486 .B \-q
3487 option would typically be given (to select an `equal power' cross-fade), and
3489 should be zero (which is the default if
3490 .B \-q
3492 to be cross-faded, then
3494    sox f1.wav f2.wav out.wav splice \-q $(soxi \-D f1.wav),3
3496 cross-fades the files where the point of equal loudness is 3 seconds
3497 before the end of f1.wav, i.e. the total length of the cross-fade is
3500 \fBstat\fR [\fB\-s \fIscale\fR] [\fB\-rms\fR] [\fB\-freq\fR] [\fB\-v\fR] [\fB\-d\fR]
3511 is the audio sample rate, and
3512 .I x\s-2\dk\u\s0
3513 represents the PCM value (in the range \-1 to +1 by default) of each successive
3514 sample in the audio,
3521 Scaled by	\ 	See \-s below.
3522 Maximum amplitude	max(\fIx\s-2\dk\u\s0\fR)	T{
3523 The maximum sample value in the audio; usually this will be a positive number.
3525 Minimum amplitude	min(\fIx\s-2\dk\u\s0\fR)	T{
3526 The minimum sample value in the audio; usually this will be a negative number.
3528 Midline amplitude	\(12\^min(\fIx\s-2\dk\u\s0\fR)\^+\^\(12\^max(\fIx\s-2\dk\u\s0\fR)
3529 Mean norm	\(S1/\s-2n\s+2\^\(*S\^\^\(br\^\fIx\s-2\dk\u\s0\fR\^\(br\^	T{
3530 The average of the absolute value of each sample in the audio.
3532 Mean amplitude	\(S1/\s-2n\s+2\^\(*S\^\fIx\s-2\dk\u\s0\fR	T{
3533 The average of each sample in the audio.  If this figure is non-zero, then it indicates the
3538 RMS amplitude	\(sr(\(S1/\s-2n\s+2\^\(*S\^\fIx\s-2\dk\u\s0\fR\(S2)	T{
3542 Maximum delta	max(\^\(br\^\fIx\s-2\dk\u\s0\fR\^\-\^\fIx\s-2\dk\-1\u\s0\fR\^\(br\^)
3543 Minimum delta	min(\^\(br\^\fIx\s-2\dk\u\s0\fR\^\-\^\fIx\s-2\dk\-1\u\s0\fR\^\(br\^)
3544 Mean delta	\(S1/\s-2n\-1\s+2\^\(*S\^\^\(br\^\fIx\s-2\dk\u\s0\fR\^\-\^\fIx\s-2\dk\-1\u\s0\fR\^\(br\^
3545 RMS delta	\(sr(\(S1/\s-2n\-1\s+2\^\(*S\^(\fIx\s-2\dk\u\s0\fR\^\-\^\fIx\s-2\dk\-1\u\s0\fR)\(S2)
3558 Note that the delta measurements are not applicable for multi-channel audio.
3561 .B \-s
3563 The default value of
3565 is 2147483647 (i.e. the maximum value of a 32-bit signed integer).
3571 .B \-rms
3576 .B \-v
3580 .B \-freq
3586 .B \-d
3588 displays a hex dump of the 32-bit signed PCM data
3591 sometimes occur in cross-platform versions of SoX.
3597 \fBstats\fR [\fB\-b \fIbits\fR\^|\^\fB\-x \fIbits\fR\^|\^\fB\-s \fIscale\fR] [\fB\-w \fIwindow-time…
3603 For example, for a typical well-mastered stereo music file:
3609 DC offset   0.000803 \-0.000391  0.000803
3610 Min level  \-0.750977 \-0.750977 \-0.653412
3612 Pk lev dB      \-2.49     \-2.49     \-3.69
3613 RMS lev dB    \-19.41    \-19.13    \-19.71
3614 RMS Pk dB     \-13.82    \-13.82    \-14.38
3615 RMS Tr dB     \-85.25    \-85.25    \-82.66
3616 Crest factor       \-      6.79      6.32
3619 Bit-depth      16/16     16/16     16/16
3632 are shown, by default, in the range \(+-1.
3634 .B \-b
3636 with the given number of bits; for example, for 16 bits, the scale would be \-32768 to +32767.
3638 .B \-x
3640 .B \-b
3643 .B \-s
3644 option scales the three measurements by a given floating-point number.
3653 are peak and trough values for RMS level measured over a short window (default 50ms).
3670 The right-hand
3671 .I Bit-depth
3672 figure is the standard definition of bit-depth i.e. bits less
3673 significant than the given number are fixed at zero.  The left-hand
3675 one for negative numbers) subtracted from the right-hand figure (the
3679 For multi-channel audio, an overall figure for each of the above
3686 .IR Bit-depth :
3701 is equal to the sample-rate multiplied by
3731 it is retained as it can sometimes out-perform
3739 size is in ms.  Default is 20ms.  The
3743 ratio, in [0 1].  Default depends on stretch factor. 1
3746 ratio, in [0 0\*d5].  The amount of a fade's default depends on
3755 …-j \fIKEY\fR] [\fB\-n\fR] [\fIlen\fR [\fIoff\fR [\fIph\fR [\fIp1\fR [\fIp2\fR [\fIp3\fR]]]]]] {[\f…
3758 with various wave shapes, or to generate wide-band noise of various
3764 Audio for each channel in a multi-channel audio file can be synthesised
3771 file' (with the special name \fB\-n\fR) is often given instead (and the
3776 audio file containing a sine-wave swept from 300 to 3300\ Hz:
3778    sox \-n output.wav synth 3 sine 300\-3300
3782    sox \-r 8000 \-n output.wav synth 3 sine 300\-3300
3789    sox \-n output.wav synth 3 sine 300\-3300 brownnoise
3795    play \-n synth 0.5 sine 200\-500 synth 0.5 sine fmod 700\-100
3802    play \-n synth 4 pluck %\-29
3808 	play \-n synth 4 pluck $n repeat 2; done
3824 Note that, by default, the
3826 effect incorporates the functionality of \fBgain \-h\fR (see the
3830 .B \-n
3838 a value of 0 indicated to use the input length, which is also the default.
3841 [white]noise, tpdfnoise, pinknoise, brownnoise, pluck; default=sine.
3844 (frequency modulation); default=create.
3849 be used.  The default frequency is 440Hz.  By default, the tuning used
3851 .B \-j
3855 is an integer number of semitones relative to A (so for example, \-9
3864 one of the characters `:', `+', `/', or `\-'.  This character is used to
3870 Square: a second-order function is used to change the tone.
3873 .IP \fB\-\fR
3874 Exponential: as `/', but initial phase always zero, and stepped (less
3881 \fIoff\fR is the bias (DC-offset) of the signal in percent; default=0.
3883 \fIph\fR is the phase shift in percentage of 1 cycle; default=0.  Not
3887 `rising' (triangle, exp, trapezium); default=50 (square, triangle, exp),
3888 default=10 (trapezium), or sustain (pluck); default=40.
3891 begins; default=50. exp: the amplitude in multiples of 2dB; default=50,
3892 or tone-1 (pluck); default=20.
3895 ends; default=60, or tone-2 (pluck); default=90.
3897 \fBtempo \fR[\fB\-q\fR] [\fB\-m\fR\^|\^\fB\-s\fR\^|\^\fB\-l\fR] \fIfactor\fR [\fIsegment\fR [\fIsea…
3900 shifted in the time domain and overlapped (cross-faded) at points where
3904 By default, linear searches are used to find the best overlapping
3906 .B \-q
3913 .B \-m
3914 option is used to optimize default values of segment, search and
3918 .B \-s
3919 option is used to optimize default values of segment, search and
3923 .B \-l
3924 option is used to optimize default values of segment, search and
3928 If \-m, \-s, or \-l is specified, the default value of segment will be
3929 calculated based on factor, while default search and overlap values are
3930 based on segment. Any values you provide still override these default
3940 flags are specified, the default value is 82 and is typically suited to
3942 of 2), 41\ ms may give a better result.  The \-m, \-s, and \-l flags will cause
3943 the segment default to be automatically adjusted based on factor.
3944 For example using \-s (for speech) with a tempo of 1.25 will calculate a
3945 default segment value of 32.
3951 flags are specified, the default value is 14.68.  Larger values use
3955 quality. The \-m, \-s, and \-l flags will cause
3956 the search default to be automatically adjusted based on segment.
3961 Default value is 12, but \-m, \-s, or \-l flags automatically
3976 Apply a treble tone-control effect.
4000    play infile trim 12:34 =15:00 -2:00
4004    play infile trim 12:34 2:26 -2:00
4011 Upsample the signal by an integer factor: \fIfactor\fR\-1 zero-value
4018 For a general resampling effect with anti-imaging, see \fBrate\fR.  See
4024 i.e. 16-bit, 44\-48kHz) recordings of speech.  The algorithm currently
4051 Default values are shown in parenthesis.
4053 .IP \fB\-t\ \fInum\fR\ (7)
4057 .IP \fB\-T\ \fInum\fR\ (0.25)
4060 .IP \fB\-s\ \fInum\fR\ (1)
4063 .IP \fB\-g\ \fInum\fR\ (0.25)
4066 .IP \fB\-p\ \fInum\fR\ (0)
4076 .IP \fB\-b\ \fInum\fR
4080 .IP \fB\-N\ \fInum\fR
4083 .IP \fB\-n\ \fInum\fR
4086 .IP \fB\-r\ \fInum\fR
4089 .IP \fB\-f\ \fInum\fR
4091 .IP \fB\-m\ \fInum\fR
4092 Measurement duration; by default, twice the measurement period; i.e.
4094 .IP \fB\-M\ \fInum\fR
4096 .IP \fB\-h\ \fInum\fR
4097 `Brick-wall' frequency of high-pass filter applied at the input to the
4099 .IP \fB\-l\ \fInum\fR
4100 `Brick-wall' frequency of low-pass filter applied at the input to the
4102 .IP \fB\-H\ \fInum\fR
4103 `Brick-wall' frequency of high-pass lifter used in the detector
4105 .IP \fB\-L\ \fInum\fR
4106 `Brick-wall' frequency of low-pass lifter used in the detector
4118 .B \-v
4132 if \fBpower\fR, then a power (i.e. wattage or voltage-squared) ratio,
4178 for a volume-changing effect with different capabilities, and
4180 for a dynamic-range compression/expansion/limiting effect.
4183 command-line parameters, or 2 if an error occurs during file processing.
4186 (sox-users@lists.sourceforge.net).
4203 R. Bristow-Johnson,
4205 https://webaudio.github.io/Audio-EQ-Cookbook/audio-eq-cookbook.html
4209 .IR "Q-factor" ,
4215 https://web.archive.org/web/20070320114719/http://www.harmony-central.com/Effects/effects-explained…
4237 Copyright 1998\-2013 Chris Bagwell and SoX Contributors.