During one of our panels at Balticon Thomas Gideon and I were speculating on automated transcription and the idea came out to try Google Voice (previously called Grand Central). I happen to have a GV account, so I agreed to give it a shot.
I had time this morning, so I made some attempts. In short, it didn’t work.
I’m not entirely sure why it didn’t work. As I play back the recordings that were made, the volume seems ample. The podcast recordings come across like somebody on a speakerphone, which I think would be a common practice for somebody calling you from a car.
It may be that Google hasn’t tackled taking on multiple voices yet and simply cancels the transcription process when a certain percentage of the attempt fails. My control recordings show that a short message transcribes well (even if all words aren’t 100%), but when I make up a bunch of words the transcription is “not available”.
It is also possible that it is just the phone I was using. I’ve received a few voicemails from people “in the wild” and the transcription was much more accurate. I’d say around 95%, where even the successful control test totally munged the end of the quote.
Here are the files and results:
- Control recording which was successful
- Transcription text: “hi presented quotes by thomas jefferson a government big enough to supply you with everything you need is the government big enough to takeaway everything that you have the courses history shows that i got a very close to pretty decreases”
- Control recording which was unsuccessful
- ITB recording which was unsuccessful
- TheCommandLine recording which was unsuccessful. (excerpted from his show released May 7, 2009).
If you have an idea of a different approach let me know and I’ll be happy to try it!
With all that being said:
As a traditional voicemail service Google Voice absolutely rocks and the transcription is really cool. Above and beyond the actual coolness of having a transcription of people’s voicemails, the web interface is very slick in how they are presented. The text of the transcript is just below the audio transport bar.
Words that Google thinks it got correct are traditional black text. Words that it wasn’t sure of are gray. As you play the voicemail it actually highlights each word as they’re spoken and if you click and drag the progress pointer the word highlights follow the movement. Very slick.
Perhaps Google may shoot off an automated transcription service. I’m sure it would be in support of audio web searches, with inevitable Adwords, of course. It sure would make a lot of podcaster’s lives easier!