YouTube Offers Speech-Recognition Captioning

By Larry B, March 5, 2010

It was only a matter of time. YouTube is bringing the speech recognition technology from Google Voice to bear on all the video in its vast library.

The industry has seen a variety of solutions for using speech-recognition to create a transcript of a video or podcast. Virage, Pictron, Streamsage, Podzinger all have done this. Only Pictron is more or less the same company it was at the start. Virage was acquired by Autonomy and has languished there as a Web product, Streamsage was acquired by Comcast and turned into an internal division, Podzinger has become Ramp…I’m not sure what they do, at this point, but it’s not the podcast transcription service they used to be. Virage and Streamsage go back almost ten years in this space, but their systems are still running in various enterprise and educational settings.

But back to YouTube… I use Google Voice, and the speech recognition is pretty good.  I rarely have to actually listen to a voice mail, since it shows up in my email as a text message that’s almost always easily decipherable, if not perfect. So just for fun, I tried YouTube’s captioning. Here’s the result.

Usually, speech-recognition provides a good set of words for searching, if nothing else. I’ve used speech-to-text to create searchable text from a video with very good results. It makes the video file, which is essentially opaque to a search engine, into something transparent. OK…in this case, maybe translucent.

I’m sure this would do better with better audio, and I will test that. In the meantime, YouTube does provide the means to download and edit the caption file, which is probably what this is best suited for, anyway. It’s a head start on a caption file, complete with time markers already in place. For those of us who are not professional transcriptionists, that has to beat making one from scratch.

Low-tech high-value instructional video

By Larry B, March 4, 2010

Lots of us involved in instructional technology content development are rightly cognizant of high production values and a carefully edited script.  In my prior job as director of educational technology development at Harvard Business School, we were very focused on making highly-designed instructional products that looked great, sounded great, and didn’t waste a syllable in their tightly edited, word-crafted voiceover.

Nothing wrong with that, if that’s your target market and you’re planning to productize the content at a high price.  But there’s another way, as Jon Udell highlights in his conversation with Sal Khan, principal of Khan Academy.org.  Khan (interestingly, an MBA graduate of Harvard Business School) has created over one thousand instructional videos aimed primarily at middle/high school and college students on topics from Biology to Physics to Economics to Mathematics.

Khan uses nothing more complicated than a screen capture program like Camtasia, a Wacom tablet and a $20 headset to create powerful, explanatory tutorials that give the feel of looking-over-the-expert’s-shoulder.  Khan’s videos are posted to YouTube, which has granted khanacademy an exception to the ten-minute limit that applies to conventional YouTube channels.

What’s amazing is the scalability of this approach. Khan has been able to create this vast collection of material because he’s found the right combination for effective teaching while having a scalable process. You might think that reaching kids today means competing with video games, high-def TV, sophisticated animations and graphics by trying to beat those formats on their terms.  Khan’s gone the other way, and hit a home run, as evidenced by the popularity of his site and the feedback coming from kids, parents, and teachers.

There’s lots of rich detail in Jon’s interview with Sal on IT Conversations and it’s worth a listen (even if you don’t usually find podcast interviews compelling – this one is worth the download). Khan is working on analytics, assessment, and other innovation around the library of content he’s creating.

But at its heart, the lesson I see in this is that it’s not always about having the most advanced technology and picture-perfect production. Figuring out how to reach your audience and be effective, might mean going decidely low-tech.

Flash Video Performance on the Mac – Finally Some Real Data

By Larry B, March 1, 2010

Is Flash video on a Mac a CPU hog?  More than on Windows? If so, why?

Thankfully, someone’s finally done a test to put some data behind the anecdotes.  (Doh!  Why I didn’t think of doing that?!?) Jan Ozer over at the Streaming Learning Center hastested Flash video vs. HTML5 video, covering all the browsers on both Windows and (Intel) Mac, and Flash versions 10.0 and (the new, performance-optimized) 10.1.

It’s hard to summarize the findings without leaving out important detail, so I recommend looking at Jan’s data directly.  The tables are revealing.  But in a nutshell, Jan found that where the video decoder can access hardware acceleration, performance is excellent, and where it can’t…not so much.  This means that on Windows, Flash is actually slightly more CPU-efficient than HTML5. On the Mac, where Apple has not made API hooks to its graphics hardware acceleration available to software developers, Flash and HTML5 are both hogs – unless you’re using HTML5 in Safari.  It suggests that Apple is using graphics acceleration APIs that it’s keeping from others who are developing applications for the Mac. (Kinda smells like what Microsoft was accused of years ago – keeping various Windows APIs secret so that its non-OS products would always have an advantage over  competitors. Microsoft has denied this. )  

Is it fair – or smart – to withhold powerful APIs from the devleopers who create the applications that make your computer useful and relevant to users?  At best. it’s disingenuous for Apple to criticize Adobe for Flash performance on the Mac while keeping access to hardware acceleration under wraps.  

In any case, Jan’s tests show that Adobe is continuing to work on this (to the extent that it can).  Video performance in Flash 10.1 is improved over 10.0 on both Mac and Windows. On Windows, the difference is dramatic.

Open Video and the “Flash Problem”

By Larry B, February 26, 2010

Open Video Alliance LogoI had the distinct pleasure of attending Lawrence Lessig’s talk at Harvard Law School on behalf of the Open Video Alliance.  It was a terrific event, simulcast worldwide to dozens of screening locations using entirely open technologies; in particular, HTML5 and the Ogg Theora video codec.

Interesting side note that the session was funded in part by iCommons, the open standards/knowledge/software advocate; while I work at iCommons, Harvard University’s academic computing team.  I kept hearing “iCommons” mentioned, and it took a moment to recognize that it was another iCommons.  But I digress….

Lessig’s talk was great. The parts about open software, open standards, and the architecture (both legal and technical) of the read/write culture mirror closely the points in his books, and are eye-opening.  If you’ve not read Remix: Making Art and Commerce Thrive in the Hybrid Economy, you ought to.

But what I was really wondering about most during this talk was the open video concept and mention of the “Flash problem”.  Seems the Ogg Theora codec and HTML5 are seen as potential resolutions of a huge problem — the problem of proprietary video codecs and players. But as someone who builds, buys, deploys, and manages streaming video platforms and content, I couldn’t quite come to terms with all that I’d have to give up if I replaced the Flash Player in my solutions with HTML5 and Ogg.  Flash’s universality has been a tremendous boon to online video. Those of us who remember the format wars — Real vs Windows Media vs Quicktime, platform-specific plugins, single-platform codecs, browser incompatibilities — Flash is a breath of fresh air compared to that.  Being able to support a single set of APIs and codecs for all my users has been huge.  And, using a mature player such as the JW FLV Player,  being able to do stuff like:

  • support for rtmp or http streaming
  • callback event-based client-side scripting
  • playlist support (RSS, ATOM, XSPF)
  • bandwidth switching/adaptive streaming
  • plugins for screengrabs or stats collection
  • subtitles
  • control over buffering

I can create an outstanding user experience using these tools, and do it for more than the degenerate case of simply putting a video in the page.  All sorts of interactive behavior can be easily layered into my video apps, and with no browser dependencies to worry about.

Contrast that with my first experience showing HTML5 video to a non-techie, my wife.  At the end of Lessig talk, the Open Video Alliance announced the winners of the Open Video in 60 Seconds contest, which gave contributors 60 seconds to explain open video using video.  One of the entries, (not the winner, although IMHO it should have been) was by Rafaella, a teacher from Italy who did an outstanding job showing that all creativity is but a link in a long chain of the creative contributions of others.  I came home eager to show my wife, who I thought would really appreciate it.

At the conference, it was played in Quicktime with English subtitles. At home, I quickly found it on the Web, in the HTML5 player….with no subtitles.  Huh?  You gotta be kidding me!   Helpfully, a download link is provided to the .srt file containing the subtitles. That’s helpful. After all, of course I’d want to read this in an open texteditor alongside my video:

1
00:00:00,883 –> 00:00:02,485
I’m Raffaella.
Nice to meet you.

2
00:00:02,585 –> 00:00:04,982
I’m a teacher.
I make animations with kids.

Thankfully, a link was also provided to the original source site for the video, which offered a subtitled version, in…..you guessed it…Flash.

So….the Flash-based video world is seen as proprietary, which it is.  But as an applications guy, what makes a platform proprietary to me?  Vendor lock-in. Platform lock-in. Client-server dependencies.  I don’t really see this as a huge problem in Flash video.  I can deliver videos in Sorenson, On2, or MPEG4 codecs. I can use players by numerous vendors, or roll my own for free with the Flex SDK. I can serve video from any server, from FMS 3.5 to Apache to Wowza. I can switch from rtmp to http, or from Akamai to a free server under my desk. Or I can dump Flash and play the same MP4 content in Quicktime, RealPlayer, or Silverlight.  I’m not getting that proprietary locked-in feeling, really.

So what’s my point?  Not that Open Video is a bad idea….I think it’s wonderful. I love vendors with an open mindset and open products, like Kaltura.  Imagine when browser support for video approaches the support now universal for DOM scripting, Javascript, AJAX, etc. Powerful toolkits like JQuery and ExtJS are only possible because of the support for standards in the browser.  And, crucially, these toolkits have made it possible to do things in the browser that previously could only be done with Flash. There are some demos that show the promise of attractive, plugin-free Web video, although compatibility and functionality are still in a nascent stage.

But as a real working technologist solving problems on the ground every day, I don’t entirely understand the “Flash problem”.  I don’t want to employ closed technologies that narrow my options and lock me in, but I’m not seeing Flash as being that way very much when it comes to video.  But I’m eager to be educated.


Remix

Lawrence Lessig. Penguin Press HC, The 2008, Hardcover, 352 pages, $3.82

4.0

Streaming Flash Video With Amazon Cloudfront

By Larry B, February 22, 2010

Since Amazon’s AWS is now supporting RTMP Flash streaming (on-demand only, so far) through its Cloudfront CDN, I thought it was time to write a quickstart guide for those who hadn’t tried it yet.

You’ll find How to Get Started with Amazon Cloudfront Streaming on streamingmedia.com. The article walks through the steps for getting signed up and streaming in no time. A working example of the Cloudfront configuration created in the article is on this site at Flash Streaming With Amazon Cloudfront.

There’s lots of room for analysis of Cloudfront’s costs, performance, degree of control, and ease of use compared to other, more traditional CDNs.  This article doesn’t talk about any of that. What it covers is what you need to know to get signed up, get it configured, and start streaming.

Disrupting Class – Takeaways for Parents (and Instructional Technologists)

By Larry B, February 10, 2010

Clay Christensen’s latest book, Disrupting Class: How Disruptive Innovation Will Change the Way the World Learns, applies the rules of Disruptive Innovation to the landscape of public K-12 education, and describes the ways in which the writing is already on the wall: public education as we know it will be transformed by application of computer-based instruction.  Boldly, Clay and his co-authors make a startling prediction: by 2019, 50% of high school courses will be delivered primarily online. That will require some changes and advances in instructional technology: authoring, scalability, instructional design, personalization, assessment. I’ll be following some of these advances and writing about them in the coming months. But the technology predictions are not the most startling thing in the book.

What’s most startling to me is their discussion of what researchers say is THE most important predictor of cognitive capacity in children.  The LENA Foundation sums up the findings of Todd Risley and Betty Hart succinctly:

  1. The variation in children’s IQs and language abilities is relative to the amount parents speak to their children.
  2. Children’s academic successes at ages nine and ten are attributable to the amount of talk they hear from birth to age three.
  3. Parents of advanced children talk significantly more to their children than parents of children who are not as advanced.

More importantly, the amount of talking that parents do to babies under one year in age, before they have demonstrated any meaningful language ability, is crucial.

The children whose parents did not begin speaking seriously to their children until their children could speak, at roughly age 12 months, suffered a persistent deficit in intellectual capacity, compared to those whose parents were talkative from the beginning.

Risely and Hart call it “language dancing” – ongoing, sophisticated speaking to a pre-talking baby – as distinct from “business talk” (“now roll over”, “do you want a bottle?”, and other more “utilitarian” talk). Again, from Disrupting Class (italics exactly as in the original):

One of the most important findings of the Risley-Hart study was that the level of income, ethnicity, and level of parents’ education had no explanatory power in determining the level of cognitive capacity that the children achieved. It is all explained by the amount of language dancing, or extra talk, over and above business talk, that the parents engaged in.  It accounted literally for all of the variance in outcomes.

I think we all knew that talking to kids is important, but showing that extra talking to babies this early has THE dominant effect on cognitive ability later in life is simply stunning.


Disrupting Class

Clayton Christensen. McGraw-Hill 2008, Hardcover, 288 pages, $19.01

4.0


Meaningful Differences in the Everyday Experience of Young American Children

Todd R. Risley. Paul H Brookes Pub Co 1995, Hardcover, 304 pages, $23.03

4.5

Kaltura API Testing

By Larry B, January 6, 2010

A test of the Wordpress Kaltura Plugin. Interesting how they’ve implemented this. I love the idea, and the design of the API.

Interesting, too, that my first upload, an h.264/AAC Quicktime file (.mov) will not work. Kaltura lets me upload the file, name it, and insert into the page. At that point, what you see here is what there is.  No error message, no indication of what’s wrong. Just no video.

video management, video solution, video streaming

With a little more digging, it appears to be an issue with the Kaltura WordPress plugin rather than the Kaltura server-side app. I can log in to the Kaltura Management Console and play the video there, but the EMBED statement Kaltura plugged into this page is lacking some necessary parameters. I has worked in the past….so it’ll take some investigation to see what’s broken, exactly.

[ADDITION: 2.26.2010]

OK…so it turns out not to be a Kaltura problem at all. The problem was the WYSIWYG editor in Wordpress itself. When Kaltura inserts the video player tag into the blog entry editor, it does so in raw HTML mode, and it does so correctly.  Problem is that if you happen to switch back to WYSIWYG, the editor clobbers the video tag Kaltura inserted, clearing out a bunch of important tag attributes. Mystery solved. Kaltura is not at fault!

Panorama Theme by Themocracy