« Have Big Conferences Changed in 20 Years? | Main | How Reliable are Widgets and RSS Feeds? »
Monday
Feb112008

Experimenting with Reuters' Calais Automatic Tagging Tool

By Dennis D. McDonald

Reuters recently released Calais to developers. Calais is a set of software tools and rules that read text and automatically assign various tags based on an analysis of the text. Calais outputs tags in the following categories:

  • Company
  • IndustryTerm
  • Organization
  • Person

While Calais itself does not yet have a user interface available, several developers have prepared interfaces and I decided to try the one provided by Abhay Kumar to test out the system.

First I selected source text from my blog, a recent post titled Cognitive Enhancement and Scientific Collaboration, Working Together. It is short and contains references to a variety of things, including people, institutions, topics, and, for good measure, at least one alien race. The tags I had manually assigned to the post included Collaboration, Social Networking, Expertise Management, Social Media, sustainability, and Cognitive Enhancement.

Next I copied the text and title into Kumar’s tool and pressed the “submit” button. This is what I got back:

  • Organization: Oxford University, Humanity Institute (Comment: Oxford University is correct, but Humanity Institute is only partially correct; the actual institution referenced in the blog is Future of Humanity Institute.)
  • IndustryTerm: pure technical solutions, collaborative technologies, expertise management systems, social networks, energy (Comment: The list is OK. I would have liked to have seen “collaboration” and “cognitive enhancement” included, though.)
  • Company: Google (Comment: This is correct; I did not mention any other companies. I wonder, though, if Google would still have been extracted had I used it as a verb?)
  • Person: Ostrow (Comment: OK, this was a trick. I mentioned three names in the post, two of which are fictional. Calais missed Nick Bostrom (real) and Thufir Hawat (a fictional character in Dune), but it did mention Ostrow (a fictional character from the movie Forbidden Planet).

Despite the issues, I’m impressed and looking forward to tools like this making their way into more products and services. The addition of features such as  learning, training, and authority lists will provide significant aids to both manual and automated use of such tools.

 

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (4)

I'm here to admit I'm a terrible tagger. I don't think in terms that people who may be searching for what I am writing about might look for. So far I've not found a gnome who's interested in doing this job, so a tool would be SO welcome.

As I write more now for the new cancer story Boobs on Ice http://boobsonice.com and for our initiative with the American Cancer Society I'm putting more and more thoughts into tagging. So pleas - keep looking, I know you'll find us something simple, available, and at least fairly reliable. Whatever it is, It'll do better than I do on my own.
February 11, 2008 | Unregistered CommenterSusan Reynolds
Susan,

At this stage of my life and my career I have finally learned what I believe is a great truth: some people are born to tag, and some people are not.

I have also decided, after much thought and soul searching, that I was not born to tag.

I shall therefor avail myself of available tools to make my stuff retrievable and shall use tools to retrieve stuff no matter what crazy tags other people assign to their own stuff!

Dennis
February 11, 2008 | Registered CommenterDennis D. McDonald
There are key distinctions between tagging for categorization and tagging for other purposes such as for action, priority or content type. I just wrote a full post on the distinction over at my blog:

http://traction.tractionsoftware.com/traction/permalink/Blog576
February 19, 2008 | Unregistered CommenterJordan Frank
Krista Thomas from Thomson Reuters here.

Wanted to let you know that Calais 2.0 is now live on OpenCalais.com, as well as plugins for WordPress, Drupal and Yahoo! Search's new SearchMonkey developer platform.

Check it out when you get a chance and let us know your thoughts. Thanks.
May 19, 2008 | Unregistered CommenterKrista Thomas

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
All HTML will be escaped. Hyperlinks will be created for URLs automatically.