LearningAPI has moved to a new blog!

The learningapi blog has moved to a new URL. These posts will remain here, but all new content has moved to learningAPI.com: Digital Media, Streaming Video & Educational Technology. You may also subscrdibe to the RSS feed for the new learningAPI.com blog.

January 31, 2006

Contextual Search API from Yahoo - Keyword Extraction for free

I've been playing with some of  Yahoo's search APIs lately.  In particular, I was intrigued by the Content Analysis Service that takes a block of text, along with an optional "helper phrase" to help point to the context of the subject matter, and extracts keywords from it.   I'm always on the lookout for technologies that can help categorize or 'gist'  content.  In particular, the speech-to-text data extracted via voice-recognition from podcasts, videos and lectures is not good enough a transcript to read, but usually is good enough to search.  Is keyword extraction a useful tool for getting the topics from a blob of text?  Try it and see!  

The folks at the BBC certainly found ContextualAnalysis useful for doing research into the connections and relations among public figures and politicans.  Using this service to extract people's names from public documents, the team was able to create "six degrees of separation"-type graphs of "who-knows-whom" (or at least is "associated-with-whom") very quickly and at low cost.  

It took some time to figure out the code for this and get it all to work, but here's an example of it in action.  Here I used the text from my recent post - Digital Asset Management - Some Advice, but you can paste your own in here to try it out.  When you click on Run Query, the data will submit to Yahoo's ContextualAnalysisService via a PHP proxy on my website (to get around cross-domain scripting security restrictions in the browser), and the results will pop up under the form, AJAX-style.  This query uses Yahoo's JSON API, a simple and lightweight protocol for data exchange.  

FORM-BASED PROXY VERSION (any browser)
Helper Phrase:
Text to process:


There's another technique for making these AJAX calls that does not require a proxy - it employs SCRIPT tags dynamically added to the page (inserting DOM elements) with "SRC=" attributes that call the Yahoo API.  The inexplicable problem I found is that this version works in Firefox/Netscape but not in IE.  I'm unable to figure out why, since other sites using the very same code work fine. The SCRIPT element is written to the DOM with a SRC URL which - if I copy and paste it directly into the browser - works. But when I write the SCRIPT element to the page, IE never makes the HTTP call to retreive it. Unfortunately, IE's developer and debugging tools are so poor that it's difficult to find out what's going on.  If anyone has a suggestion, please share it with me.  

Update - thanks to colleague Jeff Griffith at HBS, who discovered that the reason IE is notworking is that the block of text submitted in the form was too long and violated a character limit that IE apparently has for SCRIPT SRC attributes. Shortening the text solved the problem.


SCRIPT TAG VERSION (seems to not work in IE, although it should)
Helper Phrase:
Text to process:


TrackBack URL for this entry:
http://www.learningapi.com/cgi-bin/mt-tb.cgi/88

Listed below are links to weblogs that reference 'Contextual Search API from Yahoo - Keyword Extraction for free' from learningAPI.com: Media and Learning Technology - Larry Bouthillier.

Comments

Yahoo Term Extractor is great. We have base our free API on Yahoo but integrated with other services creating an API that is a a mash up of Yahoo!, Google Adwords, Wikipedia and Wordtracker. In more details it uses the power of Yahoo Term Extractor and Google get Keywords From Site to extract keywords and feed them to Wikipedia and wordtracker engine to get better and more search friendly results. Check it on our site : wordsfinder . com

This is much worse for Mozilla and its $100M/year subsidy than it is for Microsoft http://www.frogmix.com/search/microsoft . Neither GOOG nor MSFT depend on the browser itself for profits. No matter what IE’s browser share is, I doubt Microsoft makes much money from it (since the only real monetization opportunity is via traffic to the MSN start page as the default home page, or a Mozilla-like deal with Google).

Check out AlchemyAPI for term extraction in 8 languages. Commercial support / SLA also available.

Post a comment