Fogbeam Logo

Wednesday, August 27, 2014

Starting Points For Learning About Open Source

I (Phil) have recently been asked to speak on a panel discussing Open Source Software and issues regarding intellectual property, OSS licensing, patents, and how recent changes have affected the Open Source world, etc. This makes sense, given that everything we do at Fogbeam Labs is Open Source, and we make participating in the OSS community part of our mission and core values. But I'm no legal expert, and there's plenty I don't know about the legal issues in this sphere, and there are licenses that I don't know much about (esp. the lesser used ones). So I decided to do some "boning up" on the topic in advance, and remembered that there are quite a few resources dedicated to this topic, which are themselves "open source" (or at least freely available).

So, I thought I'd throw together a list quickly, which may be useful to anyone who wants to get an overview of what this "Open Source" thing is all about, or who wants to deepen their understanding of OSS licenses and related topics.

First, we have the absolutely classic The Cathedral and the Bazaar by Eric S. Raymond. This book deals with the fundamental dichotomy between how software is produced in the decentralized, distributed "Open Source" model, and how it is produced in a rigid, top-down, bureaucratic organization (like most software companies). Note that the linked page includes both the text of the book (including foreign language translations) and comments by the author and links to other discussions and comments by other observers.

Fundamentally, if you want to understand the Open Source world and the mindset of the people who populate it, this is required reading. No, not everybody agrees with everything esr has to say, and yes, this book is somewhat dated now. But it has been so amazingly influential that it's become part of the very fabric of this movement.

Next up we have Understanding Open Source and Free Software Licensing by Andrew M. St. Laurent. This book focuses specifically on OSS and Free Software licenses, and includes a comprehensive analysis / explanation of all of the important and widely used licenses that you will encounter. If you have ever wondered "what do the mean when they say that the GPL is 'viral'" or "what's the problem with mixing code that's released under different licenses" or something similar, this is your book. It's not a law textbook, but it covers the legalities and legal implications of OSS licensing for laymen quite well.

Another excellent title covering the legal nuts and bolts of Open Source licensing is Open Source Licensing: Software Freedom and Intellectual Property Law by Lawrence Rosen. Rosen has been a high profile participant in legal aspects of Open Source for years, and has written a great book to help people understand the interaction of law and software. This book and the aforementioned Understanding Open Source and Free Software Licensing collectively cover pretty much everything you could want to know about licensing and legal issues (to the extent that such a thing is possible. There is still a lack of case-law and legal clarity in certain areas).

Another excellent book, especially for those leading - or who would lead - Open Source projects is Karl Fogel's Producing Open Source Software: How to Run a Successful Free Software Project or "Producing OSS" as it's known. "Producing OSS" covers the nuts and bolts of running an Open Source project and actually shipping software. Surprisingly (or perhaps not so surprisingly) there is a lot more to running a successful project than dealing with code and tech issues. Karl's book deals with the various "soft" issues that projects face - dealing with volunteers, creating a meritocracy, understanding how money affects the project, etc. I highly recommend this book to anyone who is, or wants to be, an active participant in any Open Source community.

And last, but certainly not least, we have the Architecture of Open Source Applications series. In these two books, the creators of dozens of popular Open Source projects explain the inner workings of their projects, and reveal the architectural details that made them successful. If you value learning via emulation, this is an amazing series of case studies to learn from.

And there you have it folks - a virtual cornucopia of Open Source wisdom collected over the years. If you have ever wanted to develop a solid understanding of how Open Source works and what it's all about, this is a great place to kick off your journey. And, of course, feel free to post any questions or comments here.

Thursday, February 13, 2014

Why We Don't Want To Be "The Next Red Hat"

Earlier today I read an interesting article at Tech Crunch by Peter Levine, in which he asserts that "there will never be another Red Hat" and more or less lambasts the notion of a company based on Open Source.

We are a company based on Open Source.

So, I guess my first thought should have been "Oh, shit. We're doing this all wrong. Let's yank all of our repositories off of GitHub and close everything immediately."

Yeah.... no.

The truth is, Peter makes an interesting point or two in his article, and some of what he says at the end is moderately insightful. In fact, it reflects some decisions we made a few months ago about how we're going to position some new product offerings in 2014. But nothing in his article really provides any support for the idea that there is one, and only one, successful "Open Source Company".

OK, to be fair, I'll take his word that Red Hat is the only public company who's primary foundation is Open Source. But I'll counter that by pointing out that "going public" is not the sole measure of success for a firm. I'll also grant you that even Red Hat, seemingly the most successful "Open Source company" to date, is much smaller than Microsoft, Oracle, and Amazon.com

Guess what? Almost every company is much smaller than Microsoft, Oracle and Amazon.com. Comparing a company to those outliers is hardly damning them. Truth is, RH is an $11 billion company - nothing to sneeze at. And yes, we have been known, on occasion, to use the phrase "the next Red Hat" when trying to describe to people what we're out to do here at Fogbeam.

Let's look at something else while we're at it... Red Hat are hardly the only successful Open Source company in the world anyway. They are probably the biggest and the most well known, but stop and consider a few other names you may have heard: Alfreco, Jaspersoft, Bonitasoft, SugarCRM, Cloudera, Hortonworks, Pivotal, Pentaho... Yeah, you get the drift.

And then there's this jewel of a quote from the article: If you’re lucky and have a super-successful open source project, maybe a large company will pay you a few bucks for one-time support, or ask you to build a “shim” or a “foo” or a “bar.” Unless I'm misinterpreting Peter here, he seems to be suggesting that companies do not want to, or are not willing to, pay for support for the Open Source solutions they use. All I can say is that this does not match my experience at all. Oh, don't get me wrong... there will always be some percentage of "freeloaders" who use the OSS code and never buy a support subscription. Red Hat know that, and we know that. But what we also know is that most businesses that are using a product for a mission critical purpose want a vendor behind the product, and they are willing to pay for that (as long as the value is there). The fact is, companies want to know that if a system breaks, there is somebody to call who will provide support with an SLA. They want to know that if they need training, there is somebody to call to provide that training. They want to know that if professional services are need for integrations or customizations, that there is somebody that they can call, who knows their shit. And, more prosaically, they want to know that there is a company there to sue if the shit really hits the fan.

So when I read Peter's article, I really don't hear a strong argument that there can't be other successful Open Source companies. In fact, I can't help but think that all he's really saying is "It's hard to build an Open Source company that will generate returns at a scale, and in a timeframe, that's compatible with the goals of Andreesen-Horowitz." And that's a perfectly fine thing to say. Maybe an Open Source company would be a bad investment for A16Z. But that isn't even close to the same thing as suggesting that you can't be successful using Open Source - if your goals and success criteria are different.

Anyway, as far as the whole "next Red Hat" thing goes - the thing is, we don't actually aspire to be "the next Red Hat". We've just used that term because it's a simplification and it's illustrative. But as far as aspirations for where we are going? Nah... In fact, here's the thing. We aren't out to be "the next Microsoft" either. Or "the next IBM". or "the next Oracle" or "the next Amazon.com" and so on and so on, ad infinitum.

No, fuck all that. Our aspirations are far bigger than that. Wait, did I say "bigger"? Maybe I really just meant "different". Bigger isn't always better, and there are other ways to distinguish yourself besides size. Will be be an $11 billion dollar company one day? I don't know. Maybe we'll actually be a $221 billion dollar company. Maybe we'll be a $2 million dollar company. Maybe we'll never make a dime at all.

What I do know is that our plan is this: We are working to build a company that is so fucking awesome that in a few years, people doing startups will go to people and say "We plan to be the next Fogbeam Labs"...

Thursday, February 6, 2014

On Solving The Social Aspect Of BPM

Over at BPM.com forums Peter Schooff has posed a very interesting question: "What Is the Key to Solving the Social Aspect of BPM?" This is a topic we've thought a lot about, and "social BPM" is very core to use here at Fogbeam Labs, so I wanted to take a moment and share some thoughts on this very important topic.

The discussion here is focused around this factoid from a recent Aberdeen survey:

Thirty-four percent (34%) of respondents in Aberdeen’s Solving Collaboration Challenges with Social ERP indicated that they have difficulty converting collaborative data into business execution. This is unnerving because, for many processes, the ability for people working together collaboratively is essential for process effectiveness.

To really understand this, you have to consider what exactly the collaborative aspects of a BPM process are. And, in truth, many processes (perhaps most) are inherently collaborative, even if the collaborative aspect is not explicitly encoded into a BPMN2 diagram. Think of any time you've been involved in a process of some sort (whether BPM software or workflow engines were involved or not) and you have to make a decision or take some action... and you needed information or input from someone else first. If you picked up a telephone and made a call, or sent an email or an IM, then you are doing "social BPM" whether you use the term or not.

The first factors then, in really taking advantage of collaboration in BPM, are the exact same things involved in fostering collaboration in any fashion. It's not really a technology issue, it's an issue of culture, organization design, and incentives. Do people in your organization fundamentally trust each other? Is information shared widely or hoarded? Does the DNA of your firm encourage intra-firm competition between staff members, or widespread collaboration which puts the good of the firm first? Sadly, in too many firms the culture is simply inherently not collaborative, and nothing you do in terms of BPM process design, or deploying of "enterprise social software" or BPM technology is going to fix your broken culture.

Next, we have to look at these question: Does your firm actually empower individual employees to make decisions and use their judgment? Can an employee deviate from the process? No? Well what if the process is broken? Can your staff "route around" badly designed process steps, involve other people as necessary, inject new information, reroute tasks and otherwise take initiative? If the answers to most or all of these questions are "no" then you aren't going to have collaborative processes. If your organization is a rigid, top-down hierarchy that embraces a strict "command and control" philosophy, you're never going to get optimal effect from encouraging people to collaborate on BPM processes - or anything else.

It's only once you have the cultural and structural issues taken care of that technology even comes into play. Can some BPM software do more than others to encourage and facilitate social collaboration? Absolutely. That's why we are developing our Social BPM offering with specific capabilities that help cultivate knowledge sharing and collaboration. Using semantic web technology to tie context to tasks and content (where "context" includes things like "Bob in France is the expert on this topic and here's his contact info"), and exploiting "weak ties" and Social Network Analysis to provide suggested sources for consultation, are crucial technical capabilities for making BPM more "social". Additionally, if you have the cultural and structural alignment in place to really foster collaboration and knowledge sharing, then enterprise social software are amazingly powerful tools for cultivating knowledge transfer, fostering engagement, and driving alignment throughout your organization.

Done well, combining social software and BPM can provide tremendous benefits. But no technology is going to help if your culture is wrong. If you're having trouble with collaboration, I strongly encourage you to examine the "soft" issues before you spend a dime on additional technological tooling.

Wednesday, November 13, 2013

Dominiek ter Heide is Dead Wrong. The Semantic Web Has Not "Failed"

There is an interesting article at Gigaom right now, by Dominiek ter Heide of Bottlenose in which the author asserts that the Semantic Web has failed, and purports to give the three reasons why it has failed.

This is, of course, utter bullocks. I want to take this opportunity to explain why and provide the counterpoint to Dominiek's piece.

For starters, there is simply no legitimate basis for saying that "the Semantic Web has failed" to begin with. Given that his initial assertion is flat out wrong, there's almost no reason for a point-by-point rebuttal to the rest of his piece, but we'll work our way through it anyway, as the process may be educational.

So, if I'm going to say that the Semantic Web has not failed, then how might I substantiate or justify that claim? OK, easy enough... you probably use the Semantic Web every. single. day. And so do most of your friends. You just don't know it. And that is kind of the point. The Semantic Web isn't something that's really meant for end users to interact with directly. The essence of the Semantic Web is to enable machine readable data with explicitly defined semantics. Doing that allows the machines to do a better job of helping the humans do whatever it is they are trying to do. A typical user could easily use an application backed by the Semantic Web without ever knowing about it.

And here's the thing - they do. I said before that you probably use the Semantic Web every day. You might have thought "Yeah, right Phil, no way do I use anything like that". Well, if you use Google [1][3], Yahoo[2][3], or Bing[3], then guess what - you're using the Semantic Web. Have you seen those Google Rich Snippets around things like results for restaurants, etc.? That is powered by the Semantic Web. Aside: For the sake of this article, I treat RDFa, Microdata, Microformats, RDF/XML, JSON-LD, etc., as being functionally equivalent, as the distinction is not relevant to the overall point I'm making.

I could stop here and say that we've already proven that Dominiek ter Heide is wrong, but let's dig a little deeper.

The first reason that Dominiek gives reduces to an argument that everything on the Semantic Web is "obsolete knowledge" or Obsoledge.

This has the effect of making the shelf-life of knowledge shorter and shorter. Alvin Toffler has – in his seminal book Revolutionary Wealth – coined the term Obsoledge to refer to this increase of obsolete knowledge.
If we want to create a web of data we need to expand our definition of knowledge to go beyond obsolete knowledge and geeky factoids. I really don’t care what Leonardo DaVinci’s height was or which Nobel prize winners were born before 1945. I care about how other people feel about last night’s Breaking Bad series finale.
This is simply a factually incorrect view of the Semantic Web. Again, the goal of the Semantic Web is to provide machine readable, defined semantics along with data on the web. It does not matter one bit if that data is as old as a reference to Leonardo Da Vinci or as recent as a reference to last night's episode of Grimm. The Semantic Web is just as relevant to the kind of up-to-date, trending data that Dominiek seems so obsessed with, as it is with "historical" data. And let also point out that history remains amazingly important - as the old saw goes "Those who fail to learn from the past are doomed to repeat it". To suggest that knowledge lose all value simply because it is old is simply absurd.

His second argument simply states that "Documents are dead". I could just point out that both this blog post, which you are currently reading, Faithful Reader, as well as his own article at Gigaom, are both "documents". You do the math.

It goes deeper than that, however. His argument, again, fails for extremely obvious reasons which betray a total misunderstanding of the Semantic Web and the state of the Web in general. His argument is that "now" data is encapsulated in tweets and other "streaming", social-media, real-time data sources. While it is a fair point that more and more data is being passed around in tweets and their ilk, the factually incorrect part is to claim that those sources are not valid components of the Semantic Web just like everything else on the web. Case in point: One of our products here at Fogbeam Labs (Neddick), consumes data from all of: RSS feeds, IMAP email accounts, AND Twitter, and performs semantic concept extraction on all of those various data sources (and more are coming, including G+, Facebook, LinkedIn, etc.) and we can find the connections between, say, a Tweet and a related blog post! That's the power of the Semantic Web, and the point that Mr. ter Heide seems to be missing.

His final argument is that "Information should be pushed, not pulled". Again, this betrays a complete misunderstanding of the Semantic Web. The knowledge extracted from Semantic Web sources can be used in either "push" or "pull" modalities. Again, one of our products can leverage Semantic Web data to generate real-time alerts using Email, XMPP, or HTTP POST, based on identifying a relevant bit of knowledge in a piece of content - whether that piece of content is a Tweet, a real-time Business Event extracted from a SOA/ESB backbone, or a Blog post.

Nearing the end of this piece, let me just say that the Semantic Web is becoming more and more important with every passing day. As tools like Apache Stanbol for automating the process of extracting rich semantics from unstructured data mature and become better and more widely available, the number of applications for explicit semantics is just going to mushroom.

To finish up, let's look at a quick example of what I'm talking about... let's say you have deployed our Enterprise Social Network - Quoddy and your company does something with musicians. Your Quoddy status update messages occasionally mention, say, Jon Bon Jovi, Bob Marley, Richard Marx, and Madonna. How would you do a search without SW tech that says "show me all posts that mention musicians"? Not gonna happen. But by using Stanbol for semantic extraction and storing that knowledge in a triplestore, we can make that kind of query trivial.

It gets better though... Stanbol comes "out of the box" with the ability to dereference entities that are in DBPedia and other knowledge bases, which is cool enough in it's own right... but you can also easily add local knowledge and your own custom enhancement engines. So now entities that are meaningful only in your local domain (part numbers, SKUs, customer numbers, employee ID numbers, whatever) can be semantically interlinked and queried as part of the overall knowledge graph.

Hell, I'd go so far as to say that Apache Stanbol (along with Apache OpenNLP and a few related projects... Apache UIMA, Apache Clerezza, and Apache Marmotta, etc.) might just be the most important open source project around right now. And nobody has heard of it. Again, the Semantic Web is largely not something that the average end user needs to know or think about. But they'll benefit from the capabilities that semantic tech brings to the table.

At the end of the day, the Semantic Web is just a step on the road to having something like the Star Trek Computer or a widely available and ubiquitous IBM Watson. Saying that the Semantic Web has failed is to ignore all of the facts and deny reality.

Wednesday, September 25, 2013

Fogbeam Status Update - September 2013

Dear Friends of Fogbeam:

Just to be clear, no, we are not about to be acquired by LinkedIn. But I'll come back to why I say that, in a few moments.

On to the news and important stuff. It's been a lot longer than normal since our last status update email. If you follow the writings of Paul Graham, you may recall his famous "How Not To Die" essay[1], where he talks about how startups usually succeed if they can just avoid dying long enough. In this essay, he makes another interesting point, in these lines:

For us the main indication of impending doom is when we don't hear from you. When we haven't heard from, or about, a startup for a couple months, that's a bad sign. If we send them an email asking what's up, and they don't reply, that's a really bad sign. So far that is a 100% accurate predictor of death. Whereas if a startup regularly does new deals and releases and either sends us mail or shows up at YC events, they're probably going to live.

Given that, you might wonder if you should take it as a bad sign that we haven't emailed you in some time. As it happens, nothing could be further from the truth. While we haven't been sending a lot of emails, we have been blogging[2], tweeting[3], sharing content on Facebook and Google+, etc. But, far, far more importantly than all of that, is that we've been heads down, grinding away, working on moving things forward.

As a result of that hard work, we were recently able to proudly announce three new project releases[4], including our first every "simultaneous release" of three components of the Fogcutter project. We also launched our brand new website at http://www.fogbeam.com at the same time. We now consider our Enterprise Social Network, Quoddy, and our Information Discovery Platform, Neddick, as being in Limited Availability status. This means we have two products available for sale, with the caveat that we are only looking to make sales to customers that fit certain criteria, and who will engage with us in a "co creation" scenario as we move towards a "GA" release.

We have also been hard at work in terms of market research, and have chosen a target market to pursue as a "beach-head market" and have identified approximately 160 companies in North Carolina that we will be attempting to gain access to, and hopefully land those first few alpha customers. Also on the sales and marketing front, we are starting to see results from our content marketing strategy and are receiving inbound leads via email and Twitter.

Things have not been "sunshine and roses" since last time however. Sadly, one member of our founding team, Robert Fischer, chose to step down, due to issues in his personal life. We won't get into details out of respect for his privacy, but he had external situations that were imposing a great deal of stress on him, and left him feeling that he was not able to contribute to the level he would want. We certainly will (and do) miss Robert, but we continue to soldier on, despite this setback.

On the other hand, we are fortunate to be able to announce a new member of our team, Eric Stone. While not a "replacement" for Robert per-se, Eric brings our team back to three, and adds another wicked smart member who is going to be a tremendous asset for us. Eric received his Computer Science degree from UNC Chapel Hill, and is currently pursuing graduate studies in Statistics & Operations Research, also at UNC-CH. Eric interned with us this summer, and did such a bang-up job that we asked him to stay on as a permanent member of the team.

The other adversity we had to fight in 2012 was a serious health issue that I (Phil) encountered, when I was initially diagnosed as diabetic. Prior to being diagnosed, my blood sugar reached a level that caused a potentially fatal condition known as DKA, and left me in the hospital for three days, almost exactly one year ago. Thankfully the condition is very survivable with modern medical technology, and I'm still here and kicking. My diabetes is now well controlled and life is back to normal (or what passes for normal for a startup founder).

All of that said, let's get back to why we mentioned LinkedIn earlier on. This is a reference to a recent article[5] that appeared in San Jose Business Journal, titled The Companies LinkedIn Should Buy With Its $1B Cash Infusion. In this piece, SJBJ listed Fogbeam Labs as one of their suggested purchases for LI. Now, as we said, we don't actually expect LinkedIn to come calling wanting to acquire us anytime soon. And, truth be told, we probably don't *want* to be acquired this early, as the valuation we would receive right now would not come close to meeting our expectations and goals (just to be clear, we plan on building a company here that can go public with a multi billion dollar valuation). This mention is notable however, as it demonstrates that people as far away as Silicon Valley are aware of what we're doing, and are paying some attention to us. And this despite the fact that we really haven't done any publicity or PR work that was targeted specifically at the West Coast.

So, to wrap this up: We are making great progress on the product front, we are receiving some recognition from media as far away as Silicon Valley, we have overcome some serious adversity, and we refuse to die - in more ways than one! As 2013 draws to a close, our focus starts to shift to engaging with our chosen "beach-head market" and trying to generate some initial revenue and clarify our short-term product roadmap.

Thanks for listening, and please feel free to ping us with any questions or comments.

Phil, Sarah and Eric
Fogbeam Labs

Tuesday, May 28, 2013

Social, Events, BPM... oh my! But what about Knowledge and Context?

There is a very good article at ZDNet which speaks to the importance of the "trinity" of event driven architectures, social software and BPM. And while the basic point is sound (all of those technologies certainly are more valuable when integrated and used together) this article leaves out an important element: Knowledge.

Integrating Social software with BPM and an Event based architecture is, of course, part of what we are giving you the power to do with Quoddy, our open source Enterprise Social Network product. But we believe you need to go beyond providing a social front-end for subscribing to, sharing, discussing and acting on business events and tasks... you need to provide the context and knowledge that exists within the firm, and outside the walls of the firm, that support decision making. And that's what we are developing with Quoddy and the rest of our Fogcutter Suite of products. All of the pieces aren't quite finished yet, but we are evolving a system which will allow you to subscribe to, for example, business events from your ESB/SOA infrastructure, render relevant events into your event stream, and then find the users, documents, applications, databases and other knowledge sources, within your firm, or on the 'net, which are relevant to learning about and acting on that event.

We posit that it is this combination of events, tasks, users and knowledge / context, which will fully unleash the vision of the Digital Nervous System. When all of the people in your organization have finger-tip access to the events which are occurring - in real time, or near real time - within your organization, and convenient access to the related contextual knowledge surrounding those events, then you have the foundation for serious enterprise agility and responsiveness.

To this end, we are working on new features across our product line, which allow semantic concept extraction and automatic linking and referencing of entities with defined semantics from within your enterprise content, and then supports semantic queries against, and reasoning and inference over, that knowledge. Follow this blog, or follow our twitter feed for all the latest news and announcements as we continue down this amazingly exciting path. We can't quite give you the Star Trek Computer yet, but with Semantic Web tech applied in the enterprise, and combined with BPM, Business Events and Social Software, we will be giving you the most powerful tools yet for managing knowledge and information within your enterprise.


For more information on how you can begin to integrate Social, Events, BPM and the Semantic Web in your organization, contact us today.

Wednesday, May 22, 2013

Why The "Star Trek Computer" will be Open Source and Released Under Apache License v2

If you remember the television series Star Trek: The Next Generation, then you know exactly what someone means when they use the expression “the Star Trek Computer”. On TNG, “the computer” had abilities which were so far ahead of real-world tech of the time, that it took on an almost mythological status. And even to this day, people reference “The Star Trek Computer” as a sort of short-hand for the goal of advances in computing technology. We are mesmerized by the idea of a computer which can communicate with us in natural, spoken language, answering questions, locating data and calculating probabilities in a conversational manner, and - seemingly - with access to all of the data in the known Universe.

And while we still don’t have a complete “Star Trek Computer” to date, there is no question that amazing progress is being made. The performance of IBM’s Watson supercomputer on the game show Jeopardy is one of the most astonishing of the recent demonstrations of how far computing has come.

So given that, what can we say about the eventual development of something we can call “The Star Trek Computer”? Right now, I’d say that we can say at least two things: It will be Open Source, and licensed under the Apache Software License v2. There’s a good chance it will also be a project hosted by the Apache Software Foundation.

This might seem like a surprising declaration to some, but if you’ve been watching what’s going on around the ASF the past couple of years, it actually makes a lot of sense. A number of projects related to advanced computing technologies, of the sort which would be needed to build a proper “Star Trek Computer” have migrated to, or launched within, the Apache Incubator, or are long-standing ASF projects. We’re talking about projects which develop Semantic Web technologies, Big Data / cluster computing projects, Natural Language Processing projects, and Information Retrieval projects. All of these represent elements which would go into a computing system like the Star Trek one, and work in this area has been slowly coalescing around the Apache Software Foundation for some time now.

Apache Jena, for example, is foundational technology for the “Semantic Web” which creates a massively interlinked, “database of databases” world of Linked Data. When we talk about how the Star Trek computer had “access to all the data in the known Universe”, what we really mean is that it had access to something like the Semantic Web and the Linked Data cloud. Jena provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine. Jena moved into the Apache Incubator back on 2010-11-23, and graduated as a TLP on 2012-04-18. Since then, the Jena team have continued to push out new release and advance the state of Jena on a continual basis.

Another Apache project, OpenNLP, could provide the essential “bridge” that allows the computer to understand questions, commands and requests which are phrased in normal English (or some other human language). In addition to supporting the natural language interface with the system, OpenNLP is a powerful library for extracting meaning (semantics) from unstructured data - specifically textual data in an unstructured (or semi structured) format. An example of unstructured data would be the blog post, an article in the New York Times, or a Wikipedia article. OpenNLP combined with Jena and other technologies, allows “The computer” to “read” the Web, extracting meaningful data and saving valid assertions for later use. OpenNLP entered the Apache Incubator on 2010-11-23 and graduated as a Top Level Project on 2011-02-15.

Apache Stanbol is another new'ish project within the ASF, which describes itself as “a set of reusable components for semantic content management.” Specifically, Stanbol provides components to support reasoning, content enhancement, knowledge models and persistence, for semantic knowledge found in “content”. With Stanbol, you can pipe a piece of text (this blog post, for example) through Stanbol and have Stanbol extract Named Entities, create links to dbPedia, and otherwise attach semantic meaning to “non semantic” content. To accomplish this, Stanbol builds on top of other projects, including OpenNLP and Jena. Stanbol joined the Apache Incubator on 2010-11-15 and graduated as a TLP on 2012-09-19.

If we stopped here, we could already support the claim that the ASF is a key hub for development of the kinds of technologies which will be needed to construct the “Star Trek Computer”, but there’s no need to stop. It gets better...

Apache UIMA is similar to Stanbol in some regards, as it represents a framework for building applications which can extract semantic meaning from unstructured data. Part of what makes UIMA of special note, however, is that the technology was originally a donation from IBM to the ASF, and also that UIMA was actually a part of the Jeopardy winning Watson supercomputer[1]. So if you were wondering, yes, Open Source code is advanced enough to constitute one portion of the most powerful demonstration seen to date, of the potential of a Star Trek Computer.

Lucene is probably the most well known and widely deployed Open Source information retrieval library in the world, and for good reason. Lucene is lightweight, powerful, and performant, and makes it fairly straightforward to index massive quantities of textual data, and search across that data. Apache Solr layers on top of Lucene to provide a more complete “search engine” application. Together, Lucene/Solr constitute a very powerful suite of tools for doing information retrieval.

Mahout is a Machine Learning library, which builds on top of Apache Hadoop to enable massively scalable machine learning. Mahout includes pre-built implementations of many important machine learning algorithms, but is particularly notable for its capabilities for processing textual data and performing clustering and classification operations. Mahout provided algorithms will probably be part of an overall processing pipeline, along with UIMA, Stanbol, and OpenNLP, which supports giving “the computer” the ability to “read” large amounts of text data and extract meaning from it.

And while we won’t try to list every ASF project here, which could be a component of such a system, we would be remiss if we failed to mention, at least briefly, a number of other projects which relate to this overall theme of information retrieval, text analysis, semantic web, etc. In terms of “Big Data” or “cluster computing” technology, you have to look at the Hadoop, Mesos and S4 projects. Other Semantic Web related projects at the ASF include Clerezza and Marmotta. And from a search, indexing and information retrieval perspective, one must consider Nutch, ManifoldCF and Droids.

As you can see, the Apache Software Foundation is home to a tremendous amount of activity which is creating the technology which will eventually be required to make a true “Star Trek Computer”. For this reason, we posit that when we finally have a “Star Trek Computer” it will be Open Source and ALv2 licensed. And there’s a good chance it will find a home at the ASF, along with these other amazing projects.

Of course, you don't necessarily need a full-fledged "Star Trek Computer" to derive value from these technologies. You can begin utilizing Semantic Web tech, Natural Language Processing, scalable machine Learning, and other advanced computing techniques to derive business value today. For more information on how you can build advanced technological capabilities to support strategic business initiatives, contact us at Fogbeam Labs today. For all the latest updates from Fogbeam Labs, follow us on Twitter