- WEB STARTUPS
- WEB JOBS
- ALL TOPICS
Scrape, Scrape, Scrape, Let’s All Scrape Our Way to Profit!
When a writer spends hours, days or weeks on a story, should they be mad when a site scrapes (i.e. steals) their content without permission? It doesn’t matter if the story is news about Apple, a tutorial about Ajax or a recipe of how to make a bacon stuffed burger. Scraping occurs when a site "lifts" content from site A and places it on site B without authorization of site A. Many times site B monetizes the the scraped content. And note that there are certainly times where you need to grab a bit of the source as a quote or to clarify a point in your story.
I’ve written about this topic many times and will continue to write about it until those who scrape or steal are put out of business or change their model. This past week I wrote about Socialmedian and their scraped content. While I don’t believe they are doing it for malicious reasons like some of the bottom-feeding scrapers, they are still participating. I guess it worked well for Socialmedian – they just got a cashout of a few million dollars. I am still waiting for an answer from their CEO as to why they need to scrape any content for their service to work. Note: "Digg does it" is not an answer.
A year ago the only scrapers were the bottom of the barrel scum who took full content and put ads around it. It seems we’ve moved up the food chain to larger sites living off scraping.
Whet Moser, Chicagoland Editor wrote a post titled, "Grand Theft HuffPo". Basically the HuffingtonPost completely scraped one of his writer’s posts without his permission.
Ryan Singel at Wired compares some examples of content on site A and on Huffington Post. Singel notes that Gawker publisher Nick Denton also "hates" the Huffington Post.
This morning I see that Huffington Post contributor Silicon Alley Insider (editor Henry Blodget) has jumped up to defend the scraping done over at HuffPo. Let’s get a quick disclosure out of the way, HuffingtonPost Co-founder Ken Lerer is an investor in Silicon Alley Insider.
Silicon Alley Insider has changed their game a few times since they launched. Initially they were all about NYC but then left for the Valley. It appears shortly after editor Peter Kafka left, they moved to a scraping model. Based on my estimations, they scrape 70-80% of the content that appears on SAI. It actually gets even more interesting for their service because many times the scraped content actually makes its way out to their partner Yahoo Finance.
Interesting note… many startups I meet with in NYC are telling me they are sick of Mr. Blodget’s scraping. I can only hope that these people will stop visiting the site because that’s the only way this game will change. As long as Mr. Blodget is making bank from the scrapes with no penalty, he will continue to do it.
His belief is that if the site "aggregating" the stolen content sends visitors back to the source, then the source should shut their mouth (and keyboard?) and like it. The problem is that the "aggregator" (in this case SAI) is only growing because they are stealing content from others! Henry is basically saying that it’s ok for large sites to scrape but not the bottom feeders. What happens many times is that the scraped story on the "aggregator" will be the site to get the massive traffic through the social news sites (e.g. Digg, Techmeme, etc.) while the source (you know, the one who spent the time to make the story) will get close to nothing. Great for the aggregator, bad for the source. And many times, the reader has no actual idea that the content didn’t come from the "aggregator". This is a huge issue as well – but not for the thief.
I can only assume that Mr. Blodget has a deal with the New York Times after seeing a story about Outside.In on his "aggregator" (http://www.alleyinsider.com/2008/12/another-cash-infusion-for-outsidein). Here we see what looks like a full story but it actually is a complete scrape from the Times. And to make matters worse, the story has comments!
Mr. Blodget has a team of talented writers and there’s just no reason they can’t write new content about the stories they want to cover and still provide the links out to the other sources discussing the same topic. Since the link is the same, both on the scraped story and if they wrote new content, then his argument holds no water.
Update: to clarify the link point - the reason why most aggregators want to scrape as much content as they can is because of how important SEO and traffic from Google is. Just posting a link to the source won’t get them the traffic from Google.
If you read CN regularly, you know my view on sites and services that steal the conversation. Many of the services that are participating in content scraping, are also stealing the conversation. All of the sites we’ve mentioned in this post, are contributing to this practice.
So why is scraping becoming more popular? Simple, it’s all about the cash and pageviews. I wonder if we were no longer using a pageview monetization model if scrapers would still be using this method. It’s so easy to take another’s content, change the story title to grab fresh Google juice, and then sit back and profit.
At the end of the day, the only one who may be able to start to save us is Google. If Google stops indexing the sites that regularly scrape content, we may just start to see real change. I’d love to get Matt Cutts’ take on this topic.