January 31, 2006
Contextual Search API from Yahoo - Keyword Extraction for free
I've been playing with some of Yahoo's search APIs lately. In particular, I was intrigued by the Content Analysis Service that takes a block of text, along with an optional "helper phrase" to help point to the context of the subject matter, and extracts keywords from it. I'm always on the lookout for technologies that can help categorize or 'gist' content. In particular, the speech-to-text data extracted via voice-recognition from podcasts, videos and lectures is not good enough a transcript to read, but usually is good enough to search. Is keyword extraction a useful tool for getting the topics from a blob of text? Try it and see!The folks at the BBC certainly found ContextualAnalysis useful for doing research into the connections and relations among public figures and politicans. Using this service to extract people's names from public documents, the team was able to create "six degrees of separation"-type graphs of "who-knows-whom" (or at least is "associated-with-whom") very quickly and at low cost.
It took some time to figure out the code for this and get it all to work, but here's an example of it in action. Here I used the text from my recent post - Digital Asset Management - Some Advice, but you can paste your own in here to try it out. When you click on Run Query, the data will submit to Yahoo's ContextualAnalysisService via a PHP proxy on my website (to get around cross-domain scripting security restrictions in the browser), and the results will pop up under the form, AJAX-style. This query uses Yahoo's JSON API, a simple and lightweight protocol for data exchange.
FORM-BASED
PROXY VERSION (any browser)
There's another technique for making these AJAX calls that does not require a proxy - it employs SCRIPT tags dynamically added to the page (inserting DOM elements) with "SRC=" attributes that call the Yahoo API. The inexplicable problem I found is that this version works in Firefox/Netscape but not in IE. I'm unable to figure out why, since other sites using the very same code work fine. The SCRIPT element is written to the DOM with a SRC URL which - if I copy and paste it directly into the browser - works. But when I write the SCRIPT element to the page, IE never makes the HTTP call to retreive it. Unfortunately, IE's developer and debugging tools are so poor that it's difficult to find out what's going on. If anyone has a suggestion, please share it with me.
Update - thanks to colleague Jeff Griffith at HBS, who discovered that the reason IE is notworking is that the block of text submitted in the form was too long and violated a character limit that IE apparently has for SCRIPT SRC attributes. Shortening the text solved the problem.
SCRIPT
TAG VERSION (seems to not work in IE, although it should)


Posted by larryb at 06:51 AM [permanent link] | Comments (1)
| TrackBacks (2)
Category: Innovative Technology , Web and Software Development
Category: Innovative Technology , Web and Software Development
TrackBack URL for this entry:
http://www.learningapi.com/cgi-bin/mt-tb.cgi/88
Listed below are links to weblogs that reference 'Contextual Search API from Yahoo - Keyword Extraction for free' from learningAPI.com: Media and Learning Technology - Larry Bouthillier.
http://www.learningapi.com/cgi-bin/mt-tb.cgi/88
Listed below are links to weblogs that reference 'Contextual Search API from Yahoo - Keyword Extraction for free' from learningAPI.com: Media and Learning Technology - Larry Bouthillier.
Tramadol.
Excerpt: Cheap tramadol. Tramadol hydrochloride. Buy tramadol. Dog s tramadol. Tramadol. Tramadol 180 free shipping.
Weblog: Tramadol.
Tracked: July 7, 2008 06:45 PM
Excerpt: Cheap tramadol. Tramadol hydrochloride. Buy tramadol. Dog s tramadol. Tramadol. Tramadol 180 free shipping.
Weblog: Tramadol.
Tracked: July 7, 2008 06:45 PM
Free Http Proxy
Excerpt: The right to anonymity is important to people using inte
Weblog: Free Http Proxy
Tracked: July 16, 2008 01:52 AM
Excerpt: The right to anonymity is important to people using inte
Weblog: Free Http Proxy
Tracked: July 16, 2008 01:52 AM
Search
Archives
Recent Entries
Facebook and Academic Institutions - Content or Context?
Video Transcript Browsing Interface
The New RealPlayer 11 - A First Look
Is RealPlayer going to make a comeback?
Is Amazon's S3 the cheapest streaming video hosting out there?
Image, Audio & Video Search - Reading Content and Context
e-Learning 2.0 - The End of the Course?
Online Video and Web 2.0 - What's missing?
Fundamentals of Website Development - Course Resources
A Full-Featured Flash Video Player
Video Transcript Browsing Interface
The New RealPlayer 11 - A First Look
Is RealPlayer going to make a comeback?
Is Amazon's S3 the cheapest streaming video hosting out there?
Image, Audio & Video Search - Reading Content and Context
e-Learning 2.0 - The End of the Course?
Online Video and Web 2.0 - What's missing?
Fundamentals of Website Development - Course Resources
A Full-Featured Flash Video Player
Author Links
About the author
Speaking Engagements
Streaming and Multimedia Articles and Tutorials
My Harvard Business School Bio page
Blogroll
Digital Media Bulletin - Jose Alvear
ResearchForward - Michael J. Hemment
BusinessOfVideo.com
Online Video Punch
The Learning Circuits Blog
Elatable - Bradley Horowitz
Harold Jarche
HBS Prof. Andy McAfee on Web 2.0
DV for Teachers
SciTech Daily Review
Quirksmode - Javascript & AJAX
Educational Technology & Life
Jon Udell
Learning Technology - Denis Saulnier
Weblog Categories
Digital Restrictions Management
eLearning & Instructional Technology
Innovative Technology
Misc
Personal Video Publishing
Streaming Media
Streaming Media Technology Tips
Video and Multimedia Technology
Web and Software Development
Weblogs
External Links

Comments
Yahoo Term Extractor is great. We have base our free API on Yahoo but integrated with other services creating an API that is a a mash up of Yahoo!, Google Adwords, Wikipedia and Wordtracker. In more details it uses the power of Yahoo Term Extractor and Google get Keywords From Site to extract keywords and feed them to Wikipedia and wordtracker engine to get better and more search friendly results. Check it on our site : wordsfinder . com
Posted by: frank vitetta | November 24, 2007 09:22 AM