- WEB STARTUPS
- WEB JOBS
- ALL TOPICS
While on the surface this might look like a replacement for Google’s site search or other local search tools, Wikia’s WISE is more of a developer complement. WISE could be used to create some VERY large and powerful mashups. You can combine the search with other APIs including Yahoo’s BOSS search API. There’s a developer’s guide to help you get startup working with WISE.
Wikia’s Search remains a community-edited search index. The basic idea is if you know a better result for a search query, you can add it instantly instead of recommending it like you have to on other mainstream search engines.
Launch partners include NY-based Snooth, Digg, Last.fm, AccuWeather and the Washington Post. The Vaynerchuk brothers’ startup PleaseDress.Me is also using this new WISE search application framework for their tshirt search engine. I was able to grab some screenshots of some of the current apps and they are displayed below.
Wikia Search has launched a new Firefox toolbar which they call Wikia Evolution. Wikia Evolution is available for download and offers an easy to way to add pages to the Wikia Search project.
Here’s how the Wikia Evolution toolbar works once installed. When you search on Google, Yahoo, etc. or view any Web page, a new set of options will be displayed (see the sample below). The options allow you to add the page to the Wikia Search index instantly and tag the page for the appropriate keywords. There’s also a rating option available for each page that is submitted into the search database.
Wikia founder Jimmy Wales said on the Evolution toolbar launch, “This toolbar, like everything we are doing at Wikia Search, is open source. We hope that if you are a toolbar fan and programmer, you will let us know what features need to be added and/or take this and do something surprising and cool with it.”
Wikia Search is announcing today the launch of a suite of statistics and tools for research and reporting on the Wikia Search application. The site statistics tool offers information on queries, contributions and their social network. The statistics can be run by the day or month and there are some Ajaxy sliders for more details. And for those of you with a math calculator, standard deviations are provided.
There are also a few site tools launching today:
- Most Queries
- Most Contributions
- Recent Top Queries
- Recent Top Contributions
- Most Wanted – the difference between top queries and top contributions
- Keyword Trends
Most recently we reported on Wikia Search adding Google AdSense advertising to the search results.
Just hours after we listed all of the major search engines and their advertising partners, we’ve been informed that Wikia Search has added Google AdSense ads this morning. I’ve embedded a screenshot below so you can see the ads. They were not there last night. I am guessing just like the search engine, the ads must be in alpha as they seem to be squished into an iframe box.
Update 11:45am – Looks like they removed the box around the ads.
Back in January, Wikia launched the Alpha version of Wikia Search. We said it "wasn’t ready". Last month they made some major additions to the engine. Today they are rolling out the next Alpha version of the engine. I’ve embedded a video below that explains the updates. So far the company claims 20,000 users have registered, 25,000 mini-articles created, and 60,000 edits have been processed to-date.
I spoke with CEO Jimmy Wales who explained the updates to the alpha search engine. Wales noted that many of the new features are "wiki-like" and that you can now edit any search result as you see fit. Add, delete or update any link within a search result. You can also annotate, or preview, a URL in the search result. There’s also the ability to highlight a site in the results if you feel it’s worthy.
Here’s the search result for CenterNetworks. Note the logos up in the header and rated pages. One interesting thing to try out — scroll to the bottom of the page, watch how it adds more results on the fly with no pagination. I’ve never seen this before.
Here are all of the new features inside of the Wikia Search which allow anyone to customize the search results:
- The ability to edit any result, title and summary. The edits are then instantly available to everyone
- The ability to add new results for any search query instantly
- The ability to delete and/or hide any result
- Every result item can be rated 1-5 stars, which will slowly influence the ranking position
- The ability to add suggested and/or related searches for any query
- The ability to add public comments to any result item
- The opportunity to see site previews and annotate text, images, links and forms directly into the results
- The ability to try any search on Google, Yahoo, or any other search engine with a single click
- The ability to customize the background on the header for a more themed result for any search
- The opportunity to view the change history showing all the social actions for any page
I asked Jimmy Wales about the spam that this engine will see. Why wouldn’t I want to "own" every relevant query? And why wouldn’t my competitors want to also own every relevant query? He suggested that he wasn’t nearly as worried about spam as I am and that the community will monitor what’s going on. He said, "I don’t see spam as a big problem" when I asked why anyone wouldn’t want to edit the records they are affected by.
There is a pretty active community forming around the Wikia Search engine. I am on all of the mailing lists and have watched some interesting discussions and debates pop up over the past few months. If you are building anything search related, you should be monitoring the Wikia Search developer groups as well.
In related news, Mahalo CEO Jason Calacanis last week also announced the ability to edit pages on the search-engine friendly content source. Mahalo claims that any edits that are made by a user will be reviewed by one of Jason’s Mahaloians. The changes go live immediately and the review will come later on. This is a different than the above updates from Wikia as Wikia isn’t reviewing the changes, they are leaving it to the overall community to monitor.
The real question is whether people will want to edit for-profit sites for free while helping the CEOs generate more revenue that isn’t returned to the user. Will users want to edit Wikia Search/Mahalo the same way they edit Wikipedia? I’ve said for a while now that Jason would open Mahalo up because the more he can get done for free, the more likely Mahalo is to turn a profit. While Jason has claimed that Mahalo will have several thousand full-time paid editors over the next few years, my guess is that more features will open to the public first. Makes sense right — why pay outsiders when Jason/Jimmy’s fans will do it for free?
A few months ago, I started work on ArmchairGM’s New York Mets entry. My own knowledge stems back only until about 1983, but as one could imagine, the Web is a treasure trove of information, including stories about the early Mets teams. A quick search yielded this result — an entry from an old Geocities site. “Dave’s Mets Page”, in fact.
It immediately occurred to me that “Dave” would be a great contributor to ArmchairGM. After all, he had written excellent historical sports content to the point where I was able to base my writing almost exclusively off his. Scrolling down on his page, I found his email address, and sent him an email. It bounced.
Dave’s page admits that it was last updated on May 15, 1999. A quick look at his yearly summaries buttresses this fact, as the last summary written is 1998. And there’s one other interesting thing on Dave’s page: a copyright notice.
So here I have:
- Content I want to use
- An author who I can’t (easily) locate
- A clear indicator that the content is unavailable without permission.
That last bullet is, legally, meaningless, but anecdotally important. Web culture wrongly tends to assume that, in absence of a copyright notice or the equivalent, all content is available so long as due attribution is given. In this case, even that assumption is clearly false. Unless I have the author’s permission, I cannot use the content. And getting permission is, at best, difficult, as the author’s email address is inactive. (As it turns out, I probably could track the guy down. But let’s assume he had a more common last name; say, like, “Lewis.”)
The side effect of all this? Even if I were willing to pay for rights, I cannot, because there is no one to which I can write the check. In a very real sense, the content is held captive by an absent rights-owner, and the cost of me either (a) tracking down the rights-owner or (b) ignoring his rights and re-publishing in violation thereof probably exceed the value of the content altogether. Per a press release from Senator Pat Leahy, "Potential users of orphan works often fail to display or use such works out of concern that they may be found liable for statutory damages, amounting to as much as $150,000."
If you find this silly, rest assured you are not alone. In 2005-06, the U.S. Copyright Office studied the issue — called “orphan works” — and released its report in early 2006. Since then? A whole lot of nothing. A few bills and hearings in Congress, but no final action. Senator Leahy introduced his bill only last months, so there is some hope there, but call me skeptical regarding its odds of passage.
Even if it does pass, it only solves half the problem — that is, it removes liability from those who use “orphan” works and the rights-holder appears, but it does not remove the burdensome cost of seeking out the rights-holder. Indeed, the bill would require the republisher to “perform and document a good faith – but ultimately unsuccessful – search for the owner of the copyright in the work being used prior to such use.” And even then, “[i]f the owner later emerges and provides notice of infringement to the user, the user must negotiate reasonable compensation in good faith and render any such compensation agreed upon in a timely fashion.” Basically:
- I spend time finding some content I want to use
- I can’t locate the author easily, so
- I have to spend time and/or money to locate the author or risk significant statutory damages, and
- If the author emerges, the author has significant negotiating leverage that he’d not have held if he were easily locatable in the first place.
Wow! The words “perverse incentives” pop into mind. So do the words “not really a solution, Senator Leahy.”
A Better Idea: The Duty to Maintain
In the digital age, with content available over the Web, why put all the burden on the subsequent user of the content? Instead, let’s shift the burden on the rights-holder to, in the very least, maintain his content and/or contact information. Here are my givens and, therefore, the rubric for the idea:
- It is unreasonable to require that the author divulge his or her identity or contact information in order to receive copyright in the work
I would be surprised if there is anything controversial about that. Necessarily, the author would need to reveal his or her identity in order to enforce the rights associated with the work, but that’s a distinct question.
The corollary to that given is that a work is not considered an orphan merely because it, in the words of Leahy’s bill, “lacks identifying information pertaining to its owner[.]” That seems self-evident; otherwise, anything anonymous or even accidentally unsigned would be immediately available for republication. In short, without this tenet, copyright is eviscerated to a meaningful degree in many cases where it should not.
- If the author is unknown or unreachable, a certain amount of time must pass before the work is to be considered an orphan.
To a large degree, that is a restatement of the bullet item preceding. Using my Mets history example, though, here’s a distinction — let’s say that today were May 30, 1999, and I sent the same email to the same juno.com email address, and, again, the email bounced. The author is unreachable, sure, but he worked on the content in question (or some related content thereto) just 15 days prior. Would it be acceptable for me to declare the work and orphan and republish it? Hardly.
In the context of blogging, this point is even more meaningful. Imagine an anonymous blogger who posts once every three days. He goes on vacation for two weeks. His last post should, by no means, become succeptible to re-use as an orphan. On the other hand, say he stops blogging for two years. Different result? I think so. Similarly, apply the same logic to a known-but-unreachable blogger — in the case of a two week vacation, it’s unfair to claim that the work is an orphan, but in the case of a two year hiatus, it’s fair game.
That seems right to me. The conclusion, then: If an author is unknown and/or unreachable, and he fails to maintain his work, his works are considered orphans, and thereby are available for use by third parties without prior permission.
And, per the problem that Senator Leahy’s bill aims to address, those works should be available for use by third parties. Maintenance can be something as simple as writing an occasional blog post or, in the case of Dave’s Mets Page, changing the date to say “Last updated January 1, 2008?, or something like that. (In this case, pleas bear in mind that I am really only discussing compilations of Web-based content, such as blogs and other continually growing websites. It’d be more difficult, by leaps and bounds, to continually “maintain” a photograph, for example.) Or even more simply? Make sure you, the author, provide an easy and reliable way for others to contact you. You know, like keeping your email address current.
The question then becomes: How do we effectively put the burden on the rights-holder? My solution: a time-lapse Creative Commons license. For the first n months of unmaintained, anonymous work, the copyright holder would retain all rights. Another n months after that, the content becomes available via a non-commercial, no derivative works (NC-ND) license. And another n months after that, the content becomes available under a pure attribution license — basically, a link back. Perfectly tailored for the Web, as the default way to give attribution is via a hyperlink back to the source of the content. And for our purposes, it meets the initial goal, by shifting the burden onto the original author and not onto the subsequent user.
Could this be done legislatively? Probably, although the details would be murky and, again, I am skeptical that Congress will ever act on Leahy’s proposal, let alone my admittedly more controversial one. I would not consider Congressional action to be a realistic goal here, or, for that matter, in regard to copyright reform in any meaningful sense.
However, this rubric could be achieved socially. Already, many authors choose, of their own accord, to use “copyleft” licenses such as the Creative Commons menu or the GNU Free Documentation License (used by Wikipedia, for famous example). I believe that many authors who do not subscribe to the copyleft dogma would find the time-lapse Creative Commons licensing scheme less controversial, as so long as they maintain either (a) the content itself or (b) their contact information, the copyleft scheme will never come into play. As the risk of losing rights to one’s work only comes from sloth, it would be very hard for an author to socially defend their refusal to adopt this time-lapse license. In short, what appears controversial legislatively should be less so socially — and an effective solution to Web-based orphaned works.
Sure, there are details that need resolution, such as how the hypothetical license would define “maintain,” but those definitions are more for edge cases than the core problem set caused by orphan works. One hopes that the eventual solution for orphan works will include a duty on behalf of the rights-holder to maintain their work — or allow for licensing which permits re-use.
Remember the Wikia Search that launched in January? We had an extensive review of the product and said that it wasn’t ready for primetime yet. Today we learned that the next version (0.2) is now live on a test site and you can test it out as well. It’s still not ready for primetime but there are a lot of new features worth mentioning and is a positive step forward (the company asked me to note that it’s still a pre-alpha so be gentle — for example I was unable to get it to work in IE7).
The biggest feature is the ability to add and delete search results at will. So basically if you aren’t happy with the results for a term, you can add whatever you’d like. And if something is not correct, just delete it. For example, a search for CenterNetworks produced the main URL as the top result then a link on a story on Techcrunch about Yahoo – seemed out of place so I deleted it. It’s easy to say this will be spammed to heck but could actually prove itself similar to Wikipedia with limited spam. Not sure it’s a competitor to SEO Mahalo but it’s certainly more "human" than Google. If the spam was limited, this could be pretty hot over time.
You can also edit the header text and the description to make them more meaningful to future searchers. And you can comment on a search engine result (you know how I feel about this!), again to provide insight for the future.
The other big update is preview and annotations. Click the link and get a preview before the jump. Annotations allow you to pick up parts of the Web page and bring it back to the search engine result. I am not super crazy about this because it may remove the need for a person to visit the actual content creator’s Web page. The key for the search engine should be as a guide not a replacement.
Bottom line for this major update? Everything is editable.
Wikia is building tools which allow the community to actually participate in the search engine creation, modification and fosters engagement. I’ve seen good discussion on the Wikia Search mailing lists since January.