LearningAPI has moved to a new blog!

The learningapi blog has moved to a new URL. These posts will remain here, but all new content has moved to learningAPI.com: Digital Media, Streaming Video & Educational Technology. You may also subscrdibe to the RSS feed for the new learningAPI.com blog.

April 13, 2007

Image, Audio & Video Search - Reading Content and Context

In his article, Improving Image Search, Harvard's Michael Hemment writes about a research project at UC San Diego that uses human-generated sample data to train an engine that analyses images to extract searchable metadata. 

 Supervised Multiclass Labeling (SML), automatically analyses the content of images, compares it to various “learned” objects and classes, and then assigns searchable labels or keywords to the images. SML can also be used to identify content and generate keywords for different parts of the same image.

This is an interesting topic. I'm reminded of several related topics -- all involved in extracting useful metadata from binary media objects :
  • The Music Genome Project and their Pandora site. Uses human-generated metadata to describe the music, but using fields very similar in concept to the data in VIA or the seed data used in SML. 
  • Using OCR tools to identify and index text that appears in an image. Google's Orcopus project is an open-source way to do this, although commercial products like Pictron do it for images and video. 
  • Speech-recognition on audio/video content is similarly a way to try to index the otherwise opaque contents of a binary media file. What's odd is how little use this has gotten in the real world, even though the technology has been around for quite some years.

    I read somewhere on the web recently, (can't recall the source) the correct observation that hugely popular video sites like YouTube are built on making video findable by using very primitive metadata combined with the all-important context. Who else likes this? What else has this person created/bookmarked/shared? What comments and tags have users applied? All have turned out to be far more useful than a full transcript or speech-recognition search. 
One burning question for me is, why is searching inside a PDF massively useful, but searching inside a video just doesn't quite hit the mark?  What's holding video or image searching back?  Is it the quality of the metadata we extract and index?  Does video simply contain less information density (in its transcript) than a written article (i.e. have you ever read the transcript of a half-hour program, only to realize that you can read/skim it in less than 3 minutes?)? Or do people simply use these kinds of assets differently than they do text-based documents, so different rules and benefits apply when searching?  

TrackBack URL for this entry:

Listed below are links to weblogs that reference 'Image, Audio & Video Search - Reading Content and Context' from learningAPI.com: Media and Learning Technology - Larry Bouthillier.