UIMA Papers

The Apache UIMA™ Project website includes detailed documentation about the UIMA framework but there are better sources for a high level overview.

Accelerating corporate research in the development, application and deployment of human language technologies is an early paper on UIMA describing the goals for the project, the system architecture, developer roles and some early experiences. The section on distinct developer roles and how the different interfaces support these roles is particularly interesting. As a lone developer it is useful to define which roles I want to play and which roles I can avoid by reusing existing components.

Encoding Extraction as Inferences and PMML and UIMA Based Frameworks for Deploying Analytic Applications and Services are both examples of building applications using UIMA by integrating existing inference applications with the UIMA framework to implement semantic analysis over a document collection.

Learning by Reading

The Internet provides a vast storehouse of knowledge but this information is primarily formatted for human readers. Learning by Reading or Machine Reading is a process of extracting knowledge from text documents and converting it to a form that a machine can use for inference or reasoning. Machine Reading is still an open area of research. The problem brings together different facets of AI research including Natural Language Processing, Knowledge Representation and Reasoning, and Machine Learning. In 2010, The North American Association for Computational Linguistics held the First International Workshop on Formalisms and Methodology for Learning by Reading. In the introduction to the proceedings they noted that, “Such systems directly build on relatively mature areas of research, including Information Extraction (for picking out relevant information from the text), Commonsense and AI Reasoning (for deriving inferences from the knowledge acquired), Bootstrapped Learning (for using the learned knowledge to expand the knowledge base) and Question Answering (for providing evaluation mechanisms for Learning by Reading systems).”

In my last post I listed several abilities that a robot brain should possess. These included Natural Language Understanding, Common Sense Reasoning, and Encyclopedic Knowledge. It’s really not possible to isolate these abilities because each depends on the others but I need to start with some subset of the problem. I’ve decided to work on developing Encyclopedic Knowledge through Machine Reading as my first step.

I’m building my Machine Reading application using open source software from the Apache UIMA™ project. UIMA stands for Unstructured Information Management Architecture and was developed to provide a bridge between the unstructured information in text documents and structured data formats such as databases or knowledge bases. UIMA defines an architecture for building text processing pipelines and the UIMA project provides a number of text processing components that can be used within this architecture. UIMA provides development tools as plugins for the Eclipse IDE.  It is well documented and includes getting started guides with examples. Over the next few weeks I’ll be getting familiar with UIMA and since UIMA is primarily written in Java, I’ll be resurrecting my dormant Java skills as well.

The Project “Robby the Robot”

Can I build a system that can have an interesting conversation with a human being? That is the quest of my project, Robby the Robot.

I always loved SciFi robots; Robby the Robot in Forbidden Planet and the robot B-9 in the Lost in Space TV series. I devoured SciFi books about robots like Isaac Asimov’s I, Robot, a collection of short stories which bears no resemblance to the film of the same name. To me the most interesting aspect of these robots was that they could talk – more than just talk but answer questions, ask for clarification and infer your intentions. They understood natural language, had encyclopedic knowledge and common sense reasoning.

Building robot brains is what drew me to computers but real life intruded. I have to make a living so I’ve spent the last few decades writing software for network equipment instead of building robot brains. I stopped paying attention to things like Artificial Intelligence but then I saw the PBS NOVA episode Smartest Machine on Earth about IBM’s Jeopardy! playing computer, Watson. It seems that engineers at IBM have made a lot of progress toward building the kind of robot brains I was interested in.

I believe that the web and open source software make projects like this worth pursuing. The Internet provides access to encyclopedic knowledge but it is not in the form that a computer can use for knowledge representation and reasoning. There are open source software tools and frameworks for text and natural language processing, freely available electronic lexicons knowledge bases. The key to build Robby will be integrating these free resources in new and interesting ways just as IBM created Watson by integrating existing natural language processing and question answering systems with web-scale knowledge and machine learning. This is an engineering approach to Artificial Intelligence; exploring existing tools and using what fits the application when ever possible. Only write software from scratch if what you need doesn’t exist.