High-Level Audio Editing Language?
Bill Freeman
f at ke1g.mv.com
Mon Jun 30 10:11:24 EDT 2008
> Bill McGonigle wrote:
> Unless you're merging audio tracks, editing audio tends to defy
> scripting because you need an "ear" to listen for the pauses, pops and
> other cutting/splicing points.
But the case in point, looking for touch tones, is different. Touch tones
are designed to be automatically recognizable. Especially since Bill knows
which touch tone he's using, he wouldn't necessarily need an FFT (though it
may, in practice, be the easiest way to do it, given pre-existing software).
During a touch tone, the bulk of the energy is in two narrow bands. The
way that the phone company first detected them was to AGC the signal so that
it had a defined total energy, feed it to a bank of band pass filters tuned
to the 7 or 8 (lots of government/military phones have/had a fourth column)
frequencies that are used for touch tones. Then apply thresholds to the
output of the filters, which, if exceeded, define the corresponding
frequency as detected, and set them high enough so that there is no way that
more than two filters can receive enough of the limited energy to be detected.
Then wait for one from group A and one from group B to be simultaneously
detected for a substantial number of milliseconds.
There are tradeoffs: Set the threshold too high, and a small amount of noise
will render a perfectly fine touch tone undetectable. Also, bad dials with
too much "twist" (difference in amplitude between the two tones) won't be
detected. (And twist is sometimes the fault of the transmission path, rather
than the dial.) Set the threshold too low, and the chances of misinterpreting
human speech as a touch tone increase dramatically.
There's lots of fine research in old issues of the BSTJ (around 1963, I think,
if your library goes back that far), should you want to roll your own. But
I'd be surprised if there isn't already a suitable software decoder somewhere
inside asterisk, that you could apply in order to get a time stream, or sample
numbers, of when touch tone 2 starts and ends in a sample, and then you could
use sox to carve up the file with your choice of scripting language.
Not a canned solution, but could be fun, especially if you can charge someone
for it.
Bill
More information about the gnhlug-discuss
mailing list