Fogbeam Logo

Wednesday, May 22, 2013

Why The "Star Trek Computer" will be Open Source and Released Under Apache License v2

If you remember the television series Star Trek: The Next Generation, then you know exactly what someone means when they use the expression “the Star Trek Computer”. On TNG, “the computer” had abilities which were so far ahead of real-world tech of the time, that it took on an almost mythological status. And even to this day, people reference “The Star Trek Computer” as a sort of short-hand for the goal of advances in computing technology. We are mesmerized by the idea of a computer which can communicate with us in natural, spoken language, answering questions, locating data and calculating probabilities in a conversational manner, and - seemingly - with access to all of the data in the known Universe.

And while we still don’t have a complete “Star Trek Computer” to date, there is no question that amazing progress is being made. The performance of IBM’s Watson supercomputer on the game show Jeopardy is one of the most astonishing of the recent demonstrations of how far computing has come.

So given that, what can we say about the eventual development of something we can call “The Star Trek Computer”? Right now, I’d say that we can say at least two things: It will be Open Source, and licensed under the Apache Software License v2. There’s a good chance it will also be a project hosted by the Apache Software Foundation.

This might seem like a surprising declaration to some, but if you’ve been watching what’s going on around the ASF the past couple of years, it actually makes a lot of sense. A number of projects related to advanced computing technologies, of the sort which would be needed to build a proper “Star Trek Computer” have migrated to, or launched within, the Apache Incubator, or are long-standing ASF projects. We’re talking about projects which develop Semantic Web technologies, Big Data / cluster computing projects, Natural Language Processing projects, and Information Retrieval projects. All of these represent elements which would go into a computing system like the Star Trek one, and work in this area has been slowly coalescing around the Apache Software Foundation for some time now.

Apache Jena, for example, is foundational technology for the “Semantic Web” which creates a massively interlinked, “database of databases” world of Linked Data. When we talk about how the Star Trek computer had “access to all the data in the known Universe”, what we really mean is that it had access to something like the Semantic Web and the Linked Data cloud. Jena provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine. Jena moved into the Apache Incubator back on 2010-11-23, and graduated as a TLP on 2012-04-18. Since then, the Jena team have continued to push out new release and advance the state of Jena on a continual basis.

Another Apache project, OpenNLP, could provide the essential “bridge” that allows the computer to understand questions, commands and requests which are phrased in normal English (or some other human language). In addition to supporting the natural language interface with the system, OpenNLP is a powerful library for extracting meaning (semantics) from unstructured data - specifically textual data in an unstructured (or semi structured) format. An example of unstructured data would be the blog post, an article in the New York Times, or a Wikipedia article. OpenNLP combined with Jena and other technologies, allows “The computer” to “read” the Web, extracting meaningful data and saving valid assertions for later use. OpenNLP entered the Apache Incubator on 2010-11-23 and graduated as a Top Level Project on 2011-02-15.

Apache Stanbol is another new'ish project within the ASF, which describes itself as “a set of reusable components for semantic content management.” Specifically, Stanbol provides components to support reasoning, content enhancement, knowledge models and persistence, for semantic knowledge found in “content”. With Stanbol, you can pipe a piece of text (this blog post, for example) through Stanbol and have Stanbol extract Named Entities, create links to dbPedia, and otherwise attach semantic meaning to “non semantic” content. To accomplish this, Stanbol builds on top of other projects, including OpenNLP and Jena. Stanbol joined the Apache Incubator on 2010-11-15 and graduated as a TLP on 2012-09-19.

If we stopped here, we could already support the claim that the ASF is a key hub for development of the kinds of technologies which will be needed to construct the “Star Trek Computer”, but there’s no need to stop. It gets better...

Apache UIMA is similar to Stanbol in some regards, as it represents a framework for building applications which can extract semantic meaning from unstructured data. Part of what makes UIMA of special note, however, is that the technology was originally a donation from IBM to the ASF, and also that UIMA was actually a part of the Jeopardy winning Watson supercomputer[1]. So if you were wondering, yes, Open Source code is advanced enough to constitute one portion of the most powerful demonstration seen to date, of the potential of a Star Trek Computer.

Lucene is probably the most well known and widely deployed Open Source information retrieval library in the world, and for good reason. Lucene is lightweight, powerful, and performant, and makes it fairly straightforward to index massive quantities of textual data, and search across that data. Apache Solr layers on top of Lucene to provide a more complete “search engine” application. Together, Lucene/Solr constitute a very powerful suite of tools for doing information retrieval.

Mahout is a Machine Learning library, which builds on top of Apache Hadoop to enable massively scalable machine learning. Mahout includes pre-built implementations of many important machine learning algorithms, but is particularly notable for its capabilities for processing textual data and performing clustering and classification operations. Mahout provided algorithms will probably be part of an overall processing pipeline, along with UIMA, Stanbol, and OpenNLP, which supports giving “the computer” the ability to “read” large amounts of text data and extract meaning from it.

And while we won’t try to list every ASF project here, which could be a component of such a system, we would be remiss if we failed to mention, at least briefly, a number of other projects which relate to this overall theme of information retrieval, text analysis, semantic web, etc. In terms of “Big Data” or “cluster computing” technology, you have to look at the Hadoop, Mesos and S4 projects. Other Semantic Web related projects at the ASF include Clerezza and Marmotta. And from a search, indexing and information retrieval perspective, one must consider Nutch, ManifoldCF and Droids.

As you can see, the Apache Software Foundation is home to a tremendous amount of activity which is creating the technology which will eventually be required to make a true “Star Trek Computer”. For this reason, we posit that when we finally have a “Star Trek Computer” it will be Open Source and ALv2 licensed. And there’s a good chance it will find a home at the ASF, along with these other amazing projects.

Of course, you don't necessarily need a full-fledged "Star Trek Computer" to derive value from these technologies. You can begin utilizing Semantic Web tech, Natural Language Processing, scalable machine Learning, and other advanced computing techniques to derive business value today. For more information on how you can build advanced technological capabilities to support strategic business initiatives, contact us at Fogbeam Labs today. For all the latest updates from Fogbeam Labs, follow us on Twitter