A Second Test of YouTube’s Captioning

By Larry B, March 12, 2010

Audio quality being very important to the success of speech-recognition, I’ve re-recorded the video from my YouTube speech-recognition auto-captioning test. This time I used a high quality condenser mic plugged into a good mixer, and recorded in an acoustically good space.

With good quality audio, YouTube made a much better caption file. To be fair, in the beginning I throw around a few company names which aren’t real words, and I didn’t expect those to be right in the caption. But YouTube seems to be unable to recognize “YouTube”, which is kind of funny in its own way.

The other issue is the awful audio/video sync problem I’ve had recording direct from Webcam into YouTube. Oddly, I downloaded the video and corrected the problem using QT Sync. When I re-uploaded the corrected file to YouTube, the sync was off again.

Anyway, the captions are the interesting part. Here’s the clip:

YouTube Offers Speech-Recognition Captioning

By Larry B, March 5, 2010

It was only a matter of time. YouTube is bringing the speech recognition technology from Google Voice to bear on all the video in its vast library.

The industry has seen a variety of solutions for using speech-recognition to create a transcript of a video or podcast. Virage, Pictron, Streamsage, Podzinger all have done this. Only Pictron is more or less the same company it was at the start. Virage was acquired by Autonomy and has languished there as a Web product, Streamsage was acquired by Comcast and turned into an internal division, Podzinger has become Ramp…I’m not sure what they do, at this point, but it’s not the podcast transcription service they used to be. Virage and Streamsage go back almost ten years in this space, but their systems are still running in various enterprise and educational settings.

But back to YouTube… I use Google Voice, and the speech recognition is pretty good.  I rarely have to actually listen to a voice mail, since it shows up in my email as a text message that’s almost always easily decipherable, if not perfect. So just for fun, I tried YouTube’s captioning. Here’s the result.

Usually, speech-recognition provides a good set of words for searching, if nothing else. I’ve used speech-to-text to create searchable text from a video with very good results. It makes the video file, which is essentially opaque to a search engine, into something transparent. OK…in this case, maybe translucent.

I’m sure this would do better with better audio, and I will test that. In the meantime, YouTube does provide the means to download and edit the caption file, which is probably what this is best suited for, anyway. It’s a head start on a caption file, complete with time markers already in place. For those of us who are not professional transcriptionists, that has to beat making one from scratch.