- WEB STARTUPS
- WEB JOBS
- ALL TOPICS
Media Darling Powerset vs. Non-Media Darling Hakia
Over the weekend, The web was abuzz with discussion about Microsoft considering the acquisition of natural language search company Powerset. Some time ago I had heard a rumor that someone was looking at Powerset, but was relatively uninterested. Hearing that the potential acquirer is Microsoft certainly makes it more interesting, but I have to say the concept leaves me more than a bit incredulous.
From skeptic to user
I became familiar with Powerset’s only competitor, Hakia initially because they are a New York company. I became intrigued with Hakia because several months ago I tried their search engine, and it worked – really well. This was a surprising result for me since I have always been a skeptic regarding all things relating to artificial intelligence, speech recognition, natural language processing, and other such fuzzy technologies.
At least in the area of natural language processing Hakia that has changed my mind. In fact, it has become common for me to use the Hakia search engine when Google does not deliver sufficient results.
Hakia and Powerset are part of the same general area of natural language search. The idea with both services is that you can actually ask specific questions and get answers. But there are critical differences between Hakia and Powerset. And those differences bring me back to my incredulity at the idea that Microsoft is taking a serious look at Powerset.
Powerset indexes 750 times slower than Hakia!
I have no expertise in natural language processing or semantic search, or any type of full text search for that matter. But as far as I can tell, Hakia’s technology is *far* superior to that of Powerset’s. Why would I say that?
Well first, as I have already said, it works. It is a real live search engine. I use it. I can’t say the same for Powerset. Powerset has yet to show anything but a search engine for Wikipedia. A big part of the reason Powerset doesn’t seem able to offer a real search engine is the fact that according to their own reports, it takes them about 25 seconds to index a page, based on an average of 25 sentences per page. According to Hakia it takes them 1/30th of a second to index a page. Essentially this means that Powerset cannot scale. It is seven hundred fifty times slower than Hakia!
Now you might assume that Powerset is slower because it’s applying some serious, and superior indexing mojo, and therefore what it is doing is much more valuable than what Hakia is doing. But alas that is also not true.
Hakia really knows how to read
Hakia is doing something called “ontological semantics”. What this means is that over the last four years, Hakia has developed an “ontology” for human expression. In layman terms, what this means is that what Hakia does when it indexes a page is to look at each sentence and figure out what the *questions* are that each sentence answers. Any given sentence usually answers 3 or 4 questions. These questions are coded and go into what Hakia calls their Qdex, or question index.
In order to be able to figure out what the relevant questions are for a given sentence, Hakia’s indexer has to literally read the sentence. By “read” I mean it has to understand the actual meaning of the sentence semantically. This is a big deal.
Powerset uses statistics + syntax but can’t actually read
So, while Hakia is actually reading, Powerset, does not actually attempt to understand what sentences mean. It uses a system that parses the syntax of the sentence and guesses matches based on statistics. But this approach means that for questions that do not match previously encountered syntactical patterns, the system will not be able to find answers, even if there are in fact answers in the database.
Powerset benefits from the Silicon Valley echo chamber
Now, if, for a moment, you presume that it is true, or even *possibly* true that Hakia is the superior service and technology, or if you even assume that Hakia is just equivalent to Powerset, why would Powerset be so continuously celebrated while Hakia is overshadowed?
The only answer I can come up with is that the west coast is such an echo chamber that very little sound gets in or out. And so it must be shocking when a New York company develops a technology that seems to beat the pants off something that should be pure Silicon Valley. Just a thought.
In any case, it seems, for the record, worth noting that we have the clear leader in natural language processing and search technology right here. And, as an admitted New York partisan, after a while it does get a little annoying to hear such continued fawning over a west coast company that is very likely, at the end of the day, just another Silicon Valley also-ran.
This article was authored by Hank Williams who is a New York-based entrepreneur who recently launched a new blog: Why Does Everything Suck? exploring the tech marketplace from 10,000 feet.