Fogbeam Logo

Wednesday, November 13, 2013

Dominiek ter Heide is Dead Wrong. The Semantic Web Has Not "Failed"

There is an interesting article at Gigaom right now, by Dominiek ter Heide of Bottlenose in which the author asserts that the Semantic Web has failed, and purports to give the three reasons why it has failed.

This is, of course, utter bullocks. I want to take this opportunity to explain why and provide the counterpoint to Dominiek's piece.

For starters, there is simply no legitimate basis for saying that "the Semantic Web has failed" to begin with. Given that his initial assertion is flat out wrong, there's almost no reason for a point-by-point rebuttal to the rest of his piece, but we'll work our way through it anyway, as the process may be educational.

So, if I'm going to say that the Semantic Web has not failed, then how might I substantiate or justify that claim? OK, easy enough... you probably use the Semantic Web every. single. day. And so do most of your friends. You just don't know it. And that is kind of the point. The Semantic Web isn't something that's really meant for end users to interact with directly. The essence of the Semantic Web is to enable machine readable data with explicitly defined semantics. Doing that allows the machines to do a better job of helping the humans do whatever it is they are trying to do. A typical user could easily use an application backed by the Semantic Web without ever knowing about it.

And here's the thing - they do. I said before that you probably use the Semantic Web every day. You might have thought "Yeah, right Phil, no way do I use anything like that". Well, if you use Google [1][3], Yahoo[2][3], or Bing[3], then guess what - you're using the Semantic Web. Have you seen those Google Rich Snippets around things like results for restaurants, etc.? That is powered by the Semantic Web. Aside: For the sake of this article, I treat RDFa, Microdata, Microformats, RDF/XML, JSON-LD, etc., as being functionally equivalent, as the distinction is not relevant to the overall point I'm making.

I could stop here and say that we've already proven that Dominiek ter Heide is wrong, but let's dig a little deeper.

The first reason that Dominiek gives reduces to an argument that everything on the Semantic Web is "obsolete knowledge" or Obsoledge.

This has the effect of making the shelf-life of knowledge shorter and shorter. Alvin Toffler has – in his seminal book Revolutionary Wealth – coined the term Obsoledge to refer to this increase of obsolete knowledge.
If we want to create a web of data we need to expand our definition of knowledge to go beyond obsolete knowledge and geeky factoids. I really don’t care what Leonardo DaVinci’s height was or which Nobel prize winners were born before 1945. I care about how other people feel about last night’s Breaking Bad series finale.
This is simply a factually incorrect view of the Semantic Web. Again, the goal of the Semantic Web is to provide machine readable, defined semantics along with data on the web. It does not matter one bit if that data is as old as a reference to Leonardo Da Vinci or as recent as a reference to last night's episode of Grimm. The Semantic Web is just as relevant to the kind of up-to-date, trending data that Dominiek seems so obsessed with, as it is with "historical" data. And let also point out that history remains amazingly important - as the old saw goes "Those who fail to learn from the past are doomed to repeat it". To suggest that knowledge lose all value simply because it is old is simply absurd.

His second argument simply states that "Documents are dead". I could just point out that both this blog post, which you are currently reading, Faithful Reader, as well as his own article at Gigaom, are both "documents". You do the math.

It goes deeper than that, however. His argument, again, fails for extremely obvious reasons which betray a total misunderstanding of the Semantic Web and the state of the Web in general. His argument is that "now" data is encapsulated in tweets and other "streaming", social-media, real-time data sources. While it is a fair point that more and more data is being passed around in tweets and their ilk, the factually incorrect part is to claim that those sources are not valid components of the Semantic Web just like everything else on the web. Case in point: One of our products here at Fogbeam Labs (Neddick), consumes data from all of: RSS feeds, IMAP email accounts, AND Twitter, and performs semantic concept extraction on all of those various data sources (and more are coming, including G+, Facebook, LinkedIn, etc.) and we can find the connections between, say, a Tweet and a related blog post! That's the power of the Semantic Web, and the point that Mr. ter Heide seems to be missing.

His final argument is that "Information should be pushed, not pulled". Again, this betrays a complete misunderstanding of the Semantic Web. The knowledge extracted from Semantic Web sources can be used in either "push" or "pull" modalities. Again, one of our products can leverage Semantic Web data to generate real-time alerts using Email, XMPP, or HTTP POST, based on identifying a relevant bit of knowledge in a piece of content - whether that piece of content is a Tweet, a real-time Business Event extracted from a SOA/ESB backbone, or a Blog post.

Nearing the end of this piece, let me just say that the Semantic Web is becoming more and more important with every passing day. As tools like Apache Stanbol for automating the process of extracting rich semantics from unstructured data mature and become better and more widely available, the number of applications for explicit semantics is just going to mushroom.

To finish up, let's look at a quick example of what I'm talking about... let's say you have deployed our Enterprise Social Network - Quoddy and your company does something with musicians. Your Quoddy status update messages occasionally mention, say, Jon Bon Jovi, Bob Marley, Richard Marx, and Madonna. How would you do a search without SW tech that says "show me all posts that mention musicians"? Not gonna happen. But by using Stanbol for semantic extraction and storing that knowledge in a triplestore, we can make that kind of query trivial.

It gets better though... Stanbol comes "out of the box" with the ability to dereference entities that are in DBPedia and other knowledge bases, which is cool enough in it's own right... but you can also easily add local knowledge and your own custom enhancement engines. So now entities that are meaningful only in your local domain (part numbers, SKUs, customer numbers, employee ID numbers, whatever) can be semantically interlinked and queried as part of the overall knowledge graph.

Hell, I'd go so far as to say that Apache Stanbol (along with Apache OpenNLP and a few related projects... Apache UIMA, Apache Clerezza, and Apache Marmotta, etc.) might just be the most important open source project around right now. And nobody has heard of it. Again, the Semantic Web is largely not something that the average end user needs to know or think about. But they'll benefit from the capabilities that semantic tech brings to the table.

At the end of the day, the Semantic Web is just a step on the road to having something like the Star Trek Computer or a widely available and ubiquitous IBM Watson. Saying that the Semantic Web has failed is to ignore all of the facts and deny reality.