- WEB STARTUPS
- WEB JOBS
- ALL TOPICS
Web 3.0 Archive
A big part of what I do professionally is focused on thinking about how to improve the usefulness of the web. Tied into that is the additional question of how to empower developers to create more useful applications.
Much of this exploration has lead me to believe that the most powerful “pregnant” web concept is the simple idea that the web should be a web of objects, and should become less a web of text or pages. Indeed the web has been moving in that direction, but the road map has not been entirely clear.
Web inventor Sir Tim Berners-Lee and the W3C have pioneered the broad outlines of the concept of objectifying the web with the ideas embodied in the W3C semantic web specifications for RDF, OWL, and SPARQL technologies. But in truth, most developers have no idea what the term the “semantic web” means and are totally unfamiliar with RDF, OWL and SPARQL.
Despite the fact that the officially proposed terminology and methodologies have not quite taken hold, the idea of “objects not pages” most definitely has. Application developers are creating APIs to allow people to access their data objects, and other application developers are using those APIs to consume data objects. And because the need is so great, when developers do not make their data objects easily accessible, other applications are going as far as scraping web pages, in effect manually objectifying source sites.
And so, while the most common term for the idea of “objects not pages” has been the “semantic web”, I would really like to get everyone around the lesser known but more encompassing term, Web 3.0.
I know the idea of glomming onto the Web 2.0 bandwagon rubs some people the wrong way, but we need a “big tent” term to describe stuff that is so important, and the truth is the word “semantic web” just doesn’t cut it. In fact, in my informal surveys, it almost universally turns people off.
But terminology aside, the concepts here are really important and are building momentum. We must, as a developer/entrepreneur community begin to focus on best practices for this object-oriented web, and to discuss its broader implications. The emerging mashups and semantic applications are compelling, but they are just the beginning. Facebook and its social graph is really the first major Web 3.0 application, so make no mistake, these ideas are powerful.
Because I believe this is such an important mission, and because I strongly believe it needs more shepherding, I have committed to doing my part to move these ideas forward. I am co-chairing the Jupiter Web 3.0 Conference Series, which launches in Santa Clara next month. My co-chair is Dan Grigorovici who writes lots of interesting stuff on this space at web3beat.
The Web 3.0 Conference is the first in what will be a regular series that we hope will become *the* gathering ground for talking about how we can, should, and will approach these next generation issues. And indeed since I have been thinking a lot about these issues I will be writing a lot about them in the next few weeks.
Particularly if you are in the Bay Area, but really no matter where you are, if you want to get a view into where the next generation of the web is going and how you can leverage it, this will be the place to be. But whether you come to the conference or not, I am hoping to spark a discussion about moving the ball forward. Needless to say I have my own ideas, which I will be sharing, both in person at the conference and on these pages in the next few weeks, but this should be a multi-way discussion. If you blog about this issue and let me know I will link to you in upcoming posts, and I will try to respond as well.
Let the Web 3.0 Era begin!
Earlier this week, I attended the NY Social Media Club meeting. The topic of discussion was Semantic Web and Web 3.0. There were two panelists, moderated by Howard Greenstein. The first panelist was Tim McGuinness, Vice President of Search from Hakia.com. Hakia is a NY-based startup that has a great meaning-based search engine. They just launched a new beta version with some social networking features this week. They use Natural Language Processing techniques to produce better search results. Nate Westheimer was the other panelist. He is the founder of BricaBox.com, a site that launched its Beta this week. Also, Marco Neumann, the leader of the NY Semantic Web Meetup, contributed a lot to the conversation. This post is not a strict summary, but rather some thoughts related and inspired by the discussion yesterday. I purposely use the term Meaning-Based web, and stay away from using the term Semantic Web, since it refers more to a set of technologies than a wider concept.
Meaning-Based Web – Motivation
Semantic Web is really about improving the connections and the meaning that one can gleam from the Internet so that when you search, the tool only returns the searches relevant to the meaning of what you are looking for. The goal of meaning-based web technologies is to make the meaning of the pages on the World Wide Web better understood by computers. This will drastically improve our ability to find things, and to ask intelligent questions about the world.
To illustrate the difference: today, when somebody does a search for “George Bush”, the search engines are fundamentally looking for a string of characters in the sequence you typed in. It does not understand that you are talking about a person, cannot relate it directly that George Bush is the president, etc. You want your search to find all the cases when George Bush is referred to through meaning, i.e. The President, 41rd President’s Son, “W,” Kerry’s opponent in 2004, etc. To us humans, these are obvious connections to make, to computers – not so.
There are two ways to approach this goal, through two philosophically different directions. They are the Semantic Web Techniques and Natural Language Processing, and in a way are two sides of the same coin.
Approach One – Evolve the Web (Semantic Web, Microformats)
The first approach is to evolve the web by adding more information to it. This means that the content producers will add more information to the Web, and thus enhance it to make it more understandable to machines. Semantic Web is a set of W3C standards that allow for the addition of data and query it. Very similar to the W3C approach, Microformats are another, more light-weight, way to do the same. The goal is for content to have semantic information attached to it so that computers can read it and form connections just like the humans do.
Using this approach, a page with information about a movie called “Magnolia” has hooks in the page (possibly invisible to the user) that mark it as such. A page about flower magnolia has markings that explain that it’s about flower.
Approach Two – Extract Meaning (through Natural Language Analysis)
The second approach is work harder to extract meaning from the web. To assume that enhancing the data is very cumbersome, and that while some people will do it, not everybody will. Additionally, adding more data means that there will always be holes and things that cannot be expressed easily. It would be great if computers could get closer to the real meaning of what the web pages are talking about.
This movement espouses Natural Language Processing techniques which are a set of techniques that try to extract meaning and relationships from text. The algorithms read the text and cull meanings from the text, coupled with an ontology of relationships defined elsewhere. For example, their ontology will know that car is a vehicle, and that car has certain actions that it can perform, and that it’s different from an inanimate object, which means that it cannot speak. As it “reads” the web pages, it applied the ontology to the content and records not where a specific word can be found in a document, but rather where a specific concept is. Additionally, these ontologies are language independent, except for some minor exception language particularities.
To come back to our example, using this approach the technology will automatically be able to tell the page is talking about a flower because it sees words like “grows,” “soil”, etc – same clues that allow us humans disambiguate the meaning of words. It is able to figure out meaning from the context.
In reality, both approaches have their strengths and weaknesses, although I have to admit to be more partial to NLP approach over the long term.
The implications here are profound. As this technology improves, searching will become seamless, and Search Engine Optimization will be the thing of the past. Search engines will understand the true meaning value of the content, and will be able to direct people towards you. The very cumbersome task of thinking up various words that your content can be searched on will also be a thing of the past.
Ability to understand text on a higher level (natural language processing) means that ads will be targeted even more precisely. A lot of ambiguities will be resolved easily, just by the engine asking you a few disambiguating questions. As a user, you will be rewarded for putting more search terms since the engine will be able to find the information you are looking for faster. You will be able to have an interactive conversation with your search engine until you zero in on precisely what you are looking for.
As far as search engines are concerned, I see meaning-based searching as the future. However, that cannot happen in isolation. For example, there is a lot of bad information on the web about child vaccinations, a vocal minority of sorts, mostly driven laypeople. There is also a tremendous amount of authoritative research data that shows the benefits of vaccinations. One of the reasons that Google search has been successful is that they have been able to harness the power of authority – their original Page Rank algorithm was based on the assumption that a page that has been linked to a lot is more authoritative than others. Since then, their search algorithms evolved thousand fold, but the central concept of authoritative sources is still very important on the internet (and in real life).
On top of the natural language techniques, and authoritative-based approach, the next realm in search is personalization and social networking. The next generation of collaborative filtering technologies will be collaboration-based with personalization mixed in. You’ll receive not just the best content, but the best content targeted to your current interests. If the search engine knows that I am currently interested in dancing, and I search for salsa, it will automatically return sites related to dancing, as opposed to cooking. Additionally, if it can mark studios or events that my friends have been to, that would be even more valuable.
So what is Web 3.0? Nobody knows yet, and neither do I. Right now, I think it’s emerging as a combination of several emerging technologies – meaning-based web, social networking, greater personalization, and locale-based information. I think that once you are able to create mashups based on meaning-based information, extract that information easily from existing data sources, then we will have Web 3.0. Lastly, many sites will offer not just access APIs, but a way to really integrate your application into them. Therefore, Facebook API , OpenSocial, and Ning are early precursors of Web 3.0.
New things will become possible. It will be easy to cross-reference unstructured documents with information stored in relational databases. It will be easy to create a personal profile page based on the information already out there on the internet. It will be easy to create something similar to tumble blog based on your web activities. We are not quite there yet with mashups, at least not based on what I’ve seen. We are close though. When everything becomes a data source, then we will have arrived.
Since I work for a company, Alfresco, that is focused on bringing Web 2.0 ideas into the enterprise, I am concerned with how this will affect the people behind the firewall. Just like on the public web, I see a great opportunity to transform existing systems and ways of collaborating.
One of the reasons why many content management solutions exist is to add semantic meaning to data. When you create a taxonomy to classify your documents, you are adding semantic meaning on top of unstructured content. Much of the reasons we are doing it is because computers can not quite do it themselves. With technology improving, a lot of traditional document management systems will be fundamentally changed. Whole areas of taxonomy analysis, information architecture will be transformed, since the semantic web techniques will allow extracting these taxonomies automatically from the documents themselves.
I also see some great short-term opportunities in Natural language Processing technologies and services. If a document can be automatically tagged with metadata instead of humans having to do it, this leads us to much better user experience and thus more useful content management systems. Some of this will require better plumbing, some of it will require newer interfaces, the kind of that Adobe Flex or Microsoft SilverLight are starting to enable. This is why we are firmly committed to Flex as the future evolution of our user interface.
Since the next generation of the web will feature much better meaning-based technologies, this will also dramatically improve collaboration and information sharing. Tools will be developed that will become agents, searching in the background for information that’s relevant to your work interests, and will automatically notify you of things you didn’t even know you were looking for. As you are working on solving a problem, an agent will also be searching for a solution to your problems, both inside the intranet and on the public web. The agents will also be able to traverse your social network, and connect you with other people in your social network or company who have expertise in the area.
Auto Tagging, Auto-Classification, new ways of collaborating – Wiki-based, Mashup-based, are all transforming the public web. And these superior ways of collaborating are moving rapidly inside the enterprise. Forrester talks about tech populism – the idea that as the web in becomes more user-friendly, enterprise users will demand the same simplicity and interactivity they are becoming used to.
This is the future I am excited to be a part of.
Some more resources on Meaning-Based Web and Web 3.0:
- Great article about Semantic Web From Scientific American by Tim Berners-Lee, the inventor of the world wide web.
- Semantic Wave Report from Project 10X
- Ling pipe – an advanced NLP java library. New York-Based and partially open source software.
- Twine – From Radar networks – a semantic web startup that just got some funding.
- Hakia.com – NLP-based search engine
This article was authored by Jean Barmash, a New York-based technologist. Jean is the Director of Technical Services at Alfresco, the Open Source Enterprise Content Management company. He also blogs at NY Web Guy.
Now, before we get into this article, I would ask that each of you remain calm, cool and collected. This is not your normal post on CN. This is an exciting moment. I have for you, the first ever, Web 3.0 interface!
Brought to us by CMP and their Internet Evolution team, this Web 3.0 interface is named, "The Wisdom of Clouds". So it seems that Web 3.0 is popups, moveable Web pages and men who popup and talk to us. And here I thought Web 3.0 was going to be better than Web 2.0.
The man explains that this Wisdom of Clouds is a way to use the tools provided to "float through the clouds to find what you are looking for". It’s also a virtual interface allowing you to navigate graphically. Where is a LOLCAT when you need one?
Here is what Stephen Saunders, Internet Evolution’s Founding Editor had to say on the launch, "Every day, Internet Evolution publishes at least three new blogs about the future of the Internet. After only two months, we found that our users needed a new and more efficient way to tap into the ideas on the site. We’re calling our new interface ‘The Wisdom of Clouds’ as an ironic homage to James Surowiecki’s ‘The Wisdom of Crowds,’ which we happen to think is complete nonsense. And we’re referring to it as a ‘Web 3.0′ interface because we know it will really annoy all the alleged experts who talk pretentious twaddle about the Internet and its various iterations."
I, for one, am excited to be part of this moment in Internet history.
Next time you are in NYC, try to count the number of coffee shops, diners, bodegas, and other establishments selling "The World’s Greatest Coffee!" It’s hard to go just one block without seeing a sign with those words. But how can everyone be selling the greatest? Isn’t there only one champion?
Jason Calacanis has decided that today he has declared the "official" definition of Web 3.0.
Let me break it down for everyone… there is ONE WEB. One. Not 2, not 20, not 50. There is no Web 3.0. It’s just the Web. But naturally a person can sell more books and get more inbounds by declaring Web 21.2B instead of just calling it the Web or the Internet.
Here are some of the reactions from across "Web 2.0" (or is it Web 3.0?):
- Josh seems to agree with me regarding one web, "I’ve recently come to the conclusion that Web 2.0 has no meaning. It now means “any Internet-based company that has launched after 2004”. It is as useless a descriptor as “dot com” was."
- Fred also dislikes the numbering, "I don’t like the term web 2.0 and I sure hope we don’t perpetuate this nonsensical versioning much further."
- Mathew looks deeper into the meaning, "Jason’s definition is also effectively a thumbnail description of Mahalo, the people-powered search/directory service he is trying to build."
So exactly what is Jason’s Web 3.0 definition?
Web 3.0 is defined as the creation of high-quality content and services produced by gifted individuals using Web 2.0 technology as an enabling platform.
Let’s analyze what the Mahalo CEO has defined as the next version of the Web as we know it.
"creation of high-quality content and services produced by gifted individuals" – there is plenty of high-quality content and services today – but "gifted individuals" – so is that to say that today there are no gifted individuals? Or is there a certain demographic that is considered "gifted?" A certain race, sex, bank account?
"using Web 2.0 technology as an enabling platform" – we already do this today
So the only real difference between where we are today and Jason’s Web 3.0 definition is "gifted individuals" – the entrepreneur a-list if you will. Because the common man or woman just couldn’t be considered in this new Web 3.0 mechanism.
Couple quick notes:
It’s interesting that people actually comment on Jason’s weblog since he turns on comments at will – it’s so "sheep-like"
Jason makes a great statement that I agree with 100%, "I’d also like to thank TechMeme for being the easiest linkbaiting tool in the history of Web 2.0 (can it really be this easy?). " While this story was a good piece for Techmeme, there are ways to game the system (like any system including Mahalo). I am sure we will soon see a a statement from the Mahalo CEO — something like "Mahalo – It’s Techmeme free"