Experimenting with Reuters' Calais Automatic Tagging Tool
Monday, February 11, 2008 at 12:05PM Reuters recently released Calais to developers. Calais is a set of software tools and rules that read text and automatically assign various tags based on an analysis of the text. Calais outputs tags in the following categories:
- Company
- IndustryTerm
- Organization
- Person
While Calais itself does not yet have a user interface available, several developers have prepared interfaces and I decided to try the one provided by Abhay Kumar to test out the system.
First I selected source text from my blog, a recent post titled Cognitive Enhancement and Scientific Collaboration, Working Together. It is short and contains references to a variety of things, including people, institutions, topics, and, for good measure, at least one alien race. The tags I had manually assigned to the post included Collaboration, Social Networking, Expertise Management, Social Media, sustainability, and Cognitive Enhancement.
Next I copied the text and title into Kumar’s tool and pressed the “submit” button. This is what I got back:
- Organization: Oxford University, Humanity Institute (Comment: Oxford University is correct, but Humanity Institute is only partially correct; the actual institution referenced in the blog is Future of Humanity Institute.)
- IndustryTerm: pure technical solutions, collaborative technologies, expertise management systems, social networks, energy (Comment: The list is OK. I would have liked to have seen “collaboration” and “cognitive enhancement” included, though.)
- Company: Google (Comment: This is correct; I did not mention any other companies. I wonder, though, if Google would still have been extracted had I used it as a verb?)
- Person: Ostrow (Comment: OK, this was a trick. I mentioned three names in the post, two of which are fictional. Calais missed Nick Bostrom (real) and Thufir Hawat (a fictional character in Dune), but it did mention Ostrow (a fictional character from the movie Forbidden Planet).
Despite the issues, I’m impressed and looking forward to tools like this making their way into more products and services. The addition of features such as learning, training, and authority lists will provide significant aids to both manual and automated use of such tools.
Calais,
How To,
Semantic Web,
Software,
Tagging 
Reader Comments (4)
As I write more now for the new cancer story Boobs on Ice http://boobsonice.com and for our initiative with the American Cancer Society I'm putting more and more thoughts into tagging. So pleas - keep looking, I know you'll find us something simple, available, and at least fairly reliable. Whatever it is, It'll do better than I do on my own.
At this stage of my life and my career I have finally learned what I believe is a great truth: some people are born to tag, and some people are not.
I have also decided, after much thought and soul searching, that I was not born to tag.
I shall therefor avail myself of available tools to make my stuff retrievable and shall use tools to retrieve stuff no matter what crazy tags other people assign to their own stuff!
Dennis
http://traction.tractionsoftware.com/traction/permalink/Blog576
Wanted to let you know that Calais 2.0 is now live on OpenCalais.com, as well as plugins for WordPress, Drupal and Yahoo! Search's new SearchMonkey developer platform.
Check it out when you get a chance and let us know your thoughts. Thanks.