Stolen Content Archive

Scrape, Scrape, Scrape – The Music Video

by Allen - December 20th, 2008

Earlier today I provided commentary on the content scraping/stealing issue that has seen a lot of discussion over the past week. I thought it might be interesting on a Saturday night to take a look at the issue from another perspective. So tonight I bring to you the world premiere of "Scrape, Scrape, Scrape".

Read More »

Scrape, Scrape, Scrape, Let’s All Scrape Our Way to Profit!

by Allen - December 20th, 2008

When a writer spends hours, days or weeks on a story, should they be mad when a site scrapes (i.e. steals) their content without permission? It doesn’t matter if the story is news about Apple, a tutorial about Ajax or a recipe of how to make a bacon stuffed burger. Scraping occurs when a site "lifts" content from site A and places it on site B without authorization of site A. Many times site B monetizes the the scraped content. And note that there are certainly times where you need to grab a bit of the source as a quote or to clarify a point in your story.

I’ve written about this topic many times and will continue to write about it until those who scrape or steal are put out of business or change their model. This past week I wrote about Socialmedian and their scraped content. While I don’t believe they are doing it for malicious reasons like some of the bottom-feeding scrapers, they are still participating. I guess it worked well for Socialmedian – they just got a cashout of a few million dollars. I am still waiting for an answer from their CEO as to why they need to scrape any content for their service to work. Note: "Digg does it" is not an answer.

A year ago the only scrapers were the bottom of the barrel scum who took full content and put ads around it. It seems we’ve moved up the food chain to larger sites living off scraping.

Whet Moser, Chicagoland Editor wrote a post titled, "Grand Theft HuffPo". Basically the HuffingtonPost completely scraped one of his writer’s posts without his permission.

Ryan Singel at Wired compares some examples of content on site A and on Huffington Post. Singel notes that Gawker publisher Nick Denton also "hates" the Huffington Post.

This morning I see that Huffington Post contributor Silicon Alley Insider (editor Henry Blodget) has jumped up to defend the scraping done over at HuffPo. Let’s get a quick disclosure out of the way, HuffingtonPost Co-founder Ken Lerer is an investor in Silicon Alley Insider.

Silicon Alley Insider has changed their game a few times since they launched. Initially they were all about NYC but then left for the Valley. It appears shortly after editor Peter Kafka left, they moved to a scraping model. Based on my estimations, they scrape 70-80% of the content that appears on SAI. It actually gets even more interesting for their service because many times the scraped content actually makes its way out to their partner Yahoo Finance.

Interesting note… many startups I meet with in NYC are telling me they are sick of Mr. Blodget’s scraping. I can only hope that these people will stop visiting the site because that’s the only way this game will change. As long as Mr. Blodget is making bank from the scrapes with no penalty, he will continue to do it.

His belief is that if the site "aggregating" the stolen content sends visitors back to the source, then the source should shut their mouth (and keyboard?) and like it. The problem is that the "aggregator" (in this case SAI) is only growing because they are stealing content from others! Henry is basically saying that it’s ok for large sites to scrape but not the bottom feeders. What happens many times is that the scraped story on the "aggregator" will be the site to get the massive traffic through the social news sites (e.g. Digg, Techmeme, etc.) while the source (you know, the one who spent the time to make the story) will get close to nothing. Great for the aggregator, bad for the source. And many times, the reader has no actual idea that the content didn’t come from the "aggregator". This is a huge issue as well – but not for the thief.

I can only assume that Mr. Blodget has a deal with the New York Times after seeing a story about Outside.In on his "aggregator" (http://www.alleyinsider.com/2008/12/another-cash-infusion-for-outsidein). Here we see what looks like a full story but it actually is a complete scrape from the Times. And to make matters worse, the story has comments!

Mr. Blodget has a team of talented writers and there’s just no reason they can’t write new content about the stories they want to cover and still provide the links out to the other sources discussing the same topic. Since the link is the same, both on the scraped story and if they wrote new content, then his argument holds no water.

Update: to clarify the link point - the reason why most aggregators want to scrape as much content as they can is because of how important SEO and traffic from Google is. Just posting a link to the source won’t get them the traffic from Google.

If you read CN regularly, you know my view on sites and services that steal the conversation. Many of the services that are participating in content scraping, are also stealing the conversation. All of the sites we’ve mentioned in this post, are contributing to this practice.

So why is scraping becoming more popular? Simple, it’s all about the cash and pageviews. I wonder if we were no longer using a pageview monetization model if scrapers would still be using this method. It’s so easy to take another’s content, change the story title to grab fresh Google juice, and then sit back and profit.

At the end of the day, the only one who may be able to start to save us is Google. If Google stops indexing the sites that regularly scrape content, we may just start to see real change. I’d love to get Matt Cutts’ take on this topic.

Read More »

Stealing Content is Bad Enough, But Submitting It To Digg? That’s Plain Wrong.

by Allen - December 28th, 2007

Digg Stealing content is wrong. Last week I posted some fun when someone stole an interview from CN and put it in a Web blender. Today Tamar pointed me to something even worse — stolen content that has been submitted to Digg.

In this case, it was an article on Mashable titled "Flickr Toolbox". The article was posted in August 2007. It was posted this week on spam blog, "tech-2008.blogspot.com" and subsequently submitted to Digg here. While it didn’t make the frontpage, it received 8 votes and I am sure some traffic as well. These spam blogs need to go – even if they do boost inbound counts. 

If all of that isn’t enough, Digg ranks #2 behind Mashable for the title and the Digg links points to the spam blog. Would you agree with me that Digg should physically delete this post from their index so that Google will remove the link as well?  (I’ve already addressed my views on why Digg should not be indexed in Google results)

Read More »
Become a sponsor

SPONSORS

Clicky Web Analytics