Content scraping: Someone built a site from content scraped off my blog

17th May UPDATE:

The person identified in this post has now been in touch with me and explained that the issue was not caused by them directly, but a web-designer, who was apparently using my content on this person’s site, without the person being aware. In view of this and the damage it would cause to this person’s reputation (and because I am a very nice guy) I have removed the mentions of this person’s name from the post and the 2 images, which showed how a replica of this blog was made using my content.

I recently had someone build a whole site, from content they scraped from this blog! So, I thought I would talk a little about content scraping and show you what it looks like, (literally, there are pictures so make sure you have images turned on if you read this post via email!)

Content scraping

Content scraping is what happens when someone takes content from your site, without permission, and uses it to populate their sites; often claiming they wrote it. Content scrapers are being encouraged to continue, because search engines have still not figured out how to block those sites from their search results. So, they populate a site filled with other peoples content and then use that content to attract traffic, which they use to sell ads and in some cases, infect visitors with malware.

I have a LOT of sites who use my content every day, without attribution. The example below, called (name removed), is something different though. They not only scraped my blog posts, without naming me as the author, they also scraped my pages, including the page that sells my services, my audio program and my “Top Marketing Tips” page, see image below:

Images removed:

Here’s my original. See below:

Before I continue, I would like to point out that I tried to speak with “name removedt” last week but only managed to get through to one of their colleagues. They are now aware of the site. I left my contact details, however, name removed has not contacted me back. (see update at head of post!)

It seems that this example is not a regular scraping exercise to sell ads, but rather an attempt to build an entire site with my content, which I assume will then be sold to someone as a business opportunity. They may decide to keep it and add advertisements later, of course.

Extremely common

In reality, there is not a great deal you can do. In the above example, I am based in the UK and the site is hosted in The United States. This means you are dealing with international law. Other sites using my stuff without attribution are hosted all over the world.

You can contact the company hosting the site and ask them to remove it. Past experience suggests this is almost always a long-winded waste of time, as the hosting company either refuses to do anything, or the site is simply moved to another host. Equally, it’s hard to identify who is behind content scraping, as they usually steal someone’s identity, which I think has happened with the example above. I don’t believe name removed had anything to do with this.

NB: I will update this post if name removed responds!

Content scraping tip: Insert links

The reason I see so many of these scrapers using my work, is that I have at least 1 link in every post I write, which points back to this blog. As soon as my work is scraped and then published, I get a notification of their link. I found this to be more immediate and a lot more accurate than setting up Google alerts.

Some scrapers have my posts live on their blogs within 5 minutes of me publishing them. The reason it’s so fast, is that they usually do the whole thing using software. As soon as one of my posts reaches their RSS feed, it’s published on their blog as a post.

Attribution

All the content on this site is licensed under creative commons. In other words, you are VERY WELCOME to use my posts on your site or in your newsletters, so long as you attribute them to me under the terms of that license. Most sites that use my content do this. Probably 99%. In such cases, I actually benefit from them sharing my work. They allow me to reach new people, whilst being able to share something they found useful with their readers. Win-win.

Google has recently claimed to be cracking down on scraped content, so that it no longer ranks as highly in their search results. The hope is that this will make it unattractive for people to scrape content. It remains to be seen if this will have any positive impact. Certainly I am seeing no drop in the number of scraper sites that use my work each day.