Jim's Marketing Blog

Marketing ideas to help you grow your business

Content scraping: Someone built a site from content scraped off my blog

17th May UPDATE:

The person identified in this post has now been in touch with me and explained that the issue was not caused by them directly, but a web-designer, who was apparently using my content on this person’s site, without the person being aware.  In view of this and the damage it would cause to this person’s reputation (and because I am a very nice guy) I have removed the mentions of this person’s name from the post and the 2 images, which showed how a replica of this blog was made using my content.

I recently had someone build a whole site, from content they scraped from this blog!  So, I thought I would talk a little about content scraping and show you what it looks like, (literally, there are pictures so make sure you have images turned on if you read this post via email!)

Content scraping

Content scraping is what happens when someone takes content from your site, without permission, and uses it to populate their sites; often claiming they wrote it.  Content scrapers are being encouraged to continue, because search engines have still not figured out how to block those sites from their search results.  So, they populate a site filled with other peoples content and then use that content to attract traffic, which they use to sell ads and in some cases, infect visitors with malware.

I have a LOT of sites who use my content every day, without attribution.  The example below, called (name removed), is something different though.  They not only scraped my blog posts, without naming me as the author, they also scraped my pages, including the page that sells my services, my audio program and my “Top Marketing Tips” page, see image below:

Images removed:

Here’s my original. See below:

Before I continue, I would like to point out that I tried to speak with “name removedt” last week but only managed to get through to one of their colleagues.  They are now aware of the site.  I left my contact details, however, name removed has not contacted me back. (see update at head of post!)

It seems that this example is not a regular scraping exercise to sell ads, but rather an attempt to build an entire site with my content, which I assume will then be sold to someone as a business opportunity.  They may decide to keep it and add advertisements later, of course.

Extremely common

In reality, there is not a great deal you can do.  In the above example, I am based in the UK and the site is hosted in The United States.  This means you are dealing with international law.  Other sites using my stuff without attribution are hosted all over the world.

You can contact the company hosting the site and ask them to remove it.  Past experience suggests this is almost always a long-winded waste of time, as the hosting company either refuses to do anything, or the site is simply moved to another host.  Equally, it’s hard to identify who is behind content scraping, as they usually steal someone’s identity, which I think has happened with the example above.  I don’t believe name removed had anything to do with this.

NB: I will update this post if name removed responds!

Content scraping tip: Insert links

The reason I see so many of these scrapers using my work, is that I have at least 1 link in every post I write, which points back to this blog.  As soon as my work is scraped and then published, I get a notification of their link.  I found this to be more immediate and a lot more accurate than setting up Google alerts.

Some scrapers have my posts live on their blogs within 5 minutes of me publishing them.  The reason it’s so fast, is that they usually do the whole thing using software.  As soon as one of my posts reaches their RSS feed, it’s published on their blog as a post.

Attribution

All the content on this site is licensed under creative commons.  In other words, you are VERY WELCOME to use my posts on your site or in your newsletters, so long as you attribute them to me under the terms of that license.  Most sites that use my content do this.  Probably 99%.  In such cases, I actually benefit from them sharing my work.  They allow me to reach new people, whilst being able to share something they found useful with their readers.  Win-win.

Google has recently claimed to be cracking down on scraped content, so that it no longer ranks as highly in their search results.  The hope is that this will make it unattractive for people to scrape content.  It remains to be seen if this will have any positive impact.  Certainly I am seeing no drop in the number of scraper sites that use my work each day.

What are your experiences or opinions?

Do you know of any effective strategies for dealing with content scraping, which you would like to share?  Do sites scrape your content?

Please share your thoughts!

Let’s work together and grow your business. To find out more click here!

About Jim Connolly: I help small business owners grow their business, make more sales and boost their profits. To see how I can help you and your business, read this.

58 Comments

  1. OMG! I cannot believe the nerve of some people.

    I am so sorry to hear this has happened to you, Jim. I am even more sorry to hear that there is little that can be done about it.

    When we have spent time creating blogs and wording, whether it is for ourselves or our clients, we expect it to stay where it was intended.

    And at the end of the day, it diminishes their reputation as an expert. After all, anyone who reads your blog is unlikely to follow them after this!

    • Hi Nicky. I honestly don’t believe the lady mentioned on that site even knew it existed, until I called her offices last week.

      She’s almost certainly just had her name used, by the guys behind the site.

      This does show, as Howard says in the comments, the importance of tracking your reputation online.

      It’s too easy for people to have this crap happen in their name, without them even knowing.

  2. It’s truly bizarre, isn’t it Jim.

    On visiting this bogus site I see ALL the links to your pages and products are live – there’s been no effort to redirect. Even the home page currently has the post titled, “5 of the secrets behind Jim’s Marketing Blog”

    Ironically, if you Google “name removed”, this blog post is very prominent.

    I remember seeing a case of another prominent blogger being ripped off and his way of dealing with it was to leave just one, single word comment (with gravatar) that simply said:

    “Dude!!?”

    Elegant.

    Not had anything scraped but we’ve had youtube videos copied – and these were talking head videos that were watermarked with our site. They tried to get round the site reference by horizontally flipping the image. We got in touch with youtube and the guy was banned.

    I guess you could call it flattery, but in the end it’s just theft.

  3. Horrific. Can I suggest that we all try and get the word out for Jim, the below are contact details for (name removed), whom I’ve contacted RE: this post as well.Are the images on her posts still coming from your server? If so, you can redirect them to a ‘stolen.jgg’ gif to give her site less credibility.

  4. I’ve had my content scraped a few times and had mixed results. In my experience, the author themselves rarely takes it down so you have to go over their head. You can try sending a cease and desist letter to the webmaster (get a sample here: .

    If that doesn’t work, you can then escalate and send the letter to the ISP.

    Finally, each search engine has a Digital Millennium Copyright Act (a US law) policy and a means to submit notice to them in cases like these.

  5. I was just asked to write some content for a Dentist site and since I know very little I asked for an interview so I could gather info. Instead I was told to take content off of other sites. I was surprised but from what you are saying this happens all the time.

    Are we no longer unique individuals? Do we have to copy others work to be successful? I thought we were out here writing because we had something to say not steal.

  6. Icky. Hate it when this happens. besides being unfair to generously sharing creative minds, it ups the noise volume through lookalikes, knock-offs, outright fakes. Something can be done about it. Your including a link back to your own blog is one way.

    How about this? What is so difficult for Google to check two things on a page:
    1. is it Creative-Commons licensed?
    2. is proper attribution in place?
    If not, rank it down rigorously.

    Found this post of yours through retweets by @AnneMcx @GemmaWent @andrewgerrard.

  7. Good day Jim Connolly

    Remember about 2-3 years ago I had copied one of your articles (at that time you didn’t have your links inserted in the articles) and the software I used didn’t provide any links back to your site. You wrote to me and asked for it to be fixed and I did.

    Yeah, Mike Lopez I did it it too ‘GUILTY’ but learned thanks to Jim! I am sure Mrs.G might not even have any knowledge! Since these days many people DON’T do or are even truly involved with their website creation they leave everything up to development like a Kimba Green said.

    Well these days there are scripts or programed software that could do this and more.. how ever & fortunately every single method of doing this has to be done through a PC of some sort or server itself. which in return will have to render an IP address. All this in mind you will need to have access on your server and with the individuals IP you can block his whole network from access your entire server.. they will continue but you will also build a database of all those who attached themselves to your areas.

    Unfortunately shared hosting does not allow for this access… only seen on VPS or Dedicated servers where YOU are in demand..

    “Using a VPS or Dedicated servers takes out all & any middle man when web development or full control is a concern..”

    Being said this, shared hosting do provide a log report. There with the time stamps on her articles you can check your logs and find that particular IP address or you can block those from a whole day and you bound to catch hers… When you block an IP address that machine will not be allowed to view, read, or catch any data their PC will stay in loading mode and give an error after 30 seconds.

    talk to your shared hosting.. but I am sure they will not allow access to server logs. although they might could provide a down loadable file but still need the access to perform such duties.

    Here is the IP = 204.152.240.34
    this IP is on a shared hosting, still you can block it. now this will cause an effect only if the scrapper (script/software) is on this server

    Ps. Hopes this helped any concerns call me or buzz on FB..

    • Thanks Norman – Though I don’t recall writing to you (or anyone else) about content? It happens to me all day every day, goes with the job. I like the idea of the RSS footer plugin and may give that a whirl.

      I think this will need a follow-up post.

  8. Disgraceful Jim!
    Must be done by bots because they’ve left the UK pricing on the MasterClass page and she’s in California!
    “5 Secrets of Jim’s Marketing Blog” page gets you an Error 404.
    She looks legitimate from her LinkedIn site, but what on earth does this all do for her Online Reputation Management? Totally shred it I should think!
    Thanks for providing such a clear cut example of what goes on in the industry we like to think we’re part of.
    Sorry it had to happen to you though.
    Fortunately, transparency rules!

    • Pat of the reason for this post is to make it a little harder for someone to sell that site.

      As I said in the post, I feel sure this lady is totally innocent.

  9. sad to know that people are stealing yur ocntent
    But after reading yur post, I have one tip that we sud insert links in the post. If one steals teh link, we get a backlink
    Also, we can use RSS Footer plugin beacuse the scrappers use RSS Feeds fof scrapping and u get a linkback with this plugin

    Hope u get reply soon from the cheaters

    ATUL

  10. Disgusting,
    probably happend to me too, without my knowledge.

    The fake sites theme looks a bit clearer and nicer, so it might help as a feedback to improve your style. (I try to always see the positiv site).

    Please don’t look at my site, bought Thesis Theme ages ago, but did not implement it yet. My site brings in a lot of clients anyway and I am dominating the “Berlin Real Estate” keywords on google.

    Please be assured I only read the original, I only read you :-)
    Thanks for all the insights.

  11. Creative Commons licencing just sort of opens the doors for abuse in the first place right?

    Who has time to read legal notes regarding content.

    You can use the content, just attribute it to me. So in what manner, how, ect…

    It’s not worth it for someone who makes no investment to tarnish thier brand.

  12. Now I’ll have to say that this is extreme scraping. I’ve scraped sites before and being a programmer, I also did it (admitting guilty here) during my first years attempting to make money online.

    It’s really easy to do with scripts (as you said) and it will run 24/7. One thing worth mentioning just in case there are scrapers here reading, all of my scraped sites never ranked in Google. They were indexed but never ranked – not even in the top 100.

  13. As you say, Jim, this has been happening for years but this example is incredible!

    In the career industry, I can remember multiple examples of people who enlisted our help when someone took phrases from their sites or resume examples and put them on their own site. We often banded together and forced the site to go down.

    But this is so obvious it is almost laughable… The part that cracks me up is the Motivation Masters Class… hope she sells some for you… It’s a great value and great content.

    Some people’s kids! :-)

    • Hi Julie. Although she has so far failed to get in touch with me, I do not believe the lady whose name and image are on that site, is connected with it.

      As you say, it would make zero sense to copy the pages, which sell my services and include phone number etc.

  14. The real problem is the out right stealing and claiming it is yours.

    A few years ago, when I started one of my blogs “ScLoHo’s Collective Wisdom”, the site was created to promote others such as yourself, by reposting articles and content from others, with full credit and links to the original source.

    I even use different colors to separate my introductory words from those whom I am reposting.

    Only twice have I had someone request that I don’t use their material, which I complied with. Meanwhile 5000+ blog posts over 6 years later, Collective Wisdom lives on

  15. I wonder what the legal ramifications are for exposing the scrapers as frauds on all possible forums and online review sites?

    I would imagine that, as long as you can prove that yours is the original content, you would be protected. I know that I would want to put the “thief” out there for everyone to see, if they stole my copy and didn’t attribute it to me.

  16. Another question: How can you find out if someone is doing that if you (a) don’t have a link and (b) your Google alert is set up for your name? (If your name has been removed from the content, Google wouldn’t find it, correct?)

  17. I knew people stole content but didn’t know it went on to this extent Jim! Flattering in some respects but must be bloody annoying too.
    When you say you put a link back to your blog in all your posts, is that just an ordinary link or do you do something clever with it that flags up when its published somewhere else?

  18. Hi mate,

    As others have said, you can use plug-ins that insert funky links, or copyright details, etc, into your feed. You don’t see anything on your blog, but scrapers get the full message.

    Wankers.

  19. Its not surprising that this practice is so widespread, students around the world are comfortable with plagiarism and shamelessly submit projects and thesis without an ounce of credit to the creators of the content they used.
    One thing’s for sure, there’s no way they can copy Jim Connolly, the one and only.

  20. I had my whole site copied back when I started in 2002. It wasn’t automated at that time, but a simple copy of the html code. It was a real business in another country trying to use all my copy and business strategy as it was a company website, not a blog with articles.

    One question I do have is how does this effect your site?

    I heard that duplicate content is bad, so do those sites and yourself get penalized?

    • Hi Greg. It doesn’t currently seem to have much impact on the site.

      That site was actually started around Christmas and this is the first time I have even mentioned it.

      I wanted to bring the issue into the open, because as you can see from the comments, a lot of people were either unaware it happened or unaware of the options they have, if it happens to them.

  21. Hi Jim,

    I would like to just point out a very important statement you made in the comments above:

    “It has zero impact on me or my business, Peter. I just wanted to let people know it happens and open it up for a bit of a debate.”

    Keeps things in perspective.

    People are shocked by this? If they are, they’re living under a rock.

    There are 3 things that exist on this planet that absolutely nothing can change… There are good apples, bad apples, and just plain rotten ones.

    I guess this could be broken down into just two categories of apples, but I liked the ‘rotten’ description better than just bad.

    • I like how you got this into perspective, Mark.

      Ultimately, it has no real impact on he. However, some people as you can see from the comments, find it stressful when people steal their work.

      Through this post, a number of ideas have surfaced and I will be sharing those in a separate, follow-up post.

  22. You might try mixing your presentations or articles with video ,audio sound,PDF and using pictures and watermarking them and referring to them often.
    If you give parts of the information in text and a lot in the other mediums ,jumping back and forth. It then makes it hard to steal the content in Whole and use it without a lot of work on their part.
    For example start an article and then put part 2 in a video then back to article in part 3 maybe go to PDF for part 4.. and have your watermarks all through the parts.

    I know it is a lot of work but it seems Black Hat will always be out there in this get rich quick for doing nothing Internet.

  23. I just went and looked at that site and the current post was :
    5 of the secrets behind Jim’s Marketing Blog
    January 21, 2011 ·
    Pretty sad.

  24. Thanks Danny Brown I will have to check that plugin out. I also dislike it when people take article content especially for clients and don’t add the links or author resource and pass it off as their own.

  25. Hi.
    I’m sorry, I find this situation sarcastic.
    The supposed woman is totally “fake” if you look further in the sources (even if I admit they’ve made a good job to mask the seeds…).

    People think they can do anything on the Internet.
    Their are no “physical” limits so they merely suppose being totally protected just by the fact they are based in another country or located far from you !!

    The thing is that they don’t realize how much they put themselves in a complex situation.
    Everyone was at least once upon on a time tempted to “scrap” or “copy” the content of one another: it’s well known everywhere in every industry.
    But it doesn’t last, if those people want to succeed, they will need to make their home made content. Their readers will discover it soon, and the fire will come from their inner market. Customers won’t let them go that far !

    So we can talk for hours about the good/bad methods and the protecting tools, SEO and anti-copy systems (I know each one of them I’m a web developer…) so I know exactly what those tools are made of: it’s really cool to feel secured, but it doesn’t last too….And if you’re a beginner, or a small business that doesn’t have a lot of time/money I’d better advice you to focus on your content and make it remarkable.

    I think Jim that you’ve made the right choice by mentioning your content is original (CC), and organize your CMS as best as we can do today.

    But this is just insane, and I think this person will soon change their politic: Google is watching us !

  26. Having created and marketed my own software product, I was well aware that I had probably sold 1 copy for every 10 people using it.

    My stats were showing waaaaay too many downloads per number of legitimate sales.

    Even with Unlock Keys, the keys were spread all over the chat rooms.

    All I could really do was shrug it off and enjoy my earnings :-/

    Now I see the same thing happening with my blog content, and the content I write for other sites.

    That’s the way the world works, and if you play in the arena, you accept what happens there. Pick your battles. Enjoy your victories, and focus on your business.

    The majority of sites I have caught scraping my have been low traffic. But I do like your idea of always including a linkback somewhere in the text.

    Rick

  27. My jaw just hit the floor! I am astounded!!

    I once had someone attend a conference where I was the keynote and he took something I said, put it on his website, but with full attribution. The problem? He’s a competitor and it made it look like I was recommending his firm. He got REALLY snotty with me when I asked him to take it down. But he did.

    This, though, is just asinine.

  28. Copying the text off your site may be difficult to fight, but if they are illegally copying the photography without paying the appropriate licensing fees, that can become a serious licensing issue that can quickly mount to a major court case here in the States. I assume you are using a microstock site for your images. If you inform them of the content scraping, they will be more than happy to start a legal battle. They will get the message very quickly.

  29. What a pain!

    I had some pages copied a few years ago and picked this up with http://www.copyscape.com/ which can be good where it’s not just whole pages but parts of articles etc too.

    I’ve also added a no right click to my blog. All vain attempts no doubt.

    With another blog it was photos of work for a tradesperson that was being nicked and I ended up going through his entire image collection adding watermarks in the end.

    I’ve seen forums and groups copied too in an attempt to launch a site looking as if it already has lots of contributors.

  30. Jim,

    That is unbelievable and so blatant. I’ve heard other stories like this as well.

    Spamming and scraping – it’s a internet jungle out there.

  31. The Animated Woman

    February 13, 2011 at 00:00

    Whoa! My mouth is hanging open in disbelief.

    But how do I know this is your real site…? Maybe I’m on a scraped version of your real site…

    See? I knew I shouldn’t have watched Inception the other night.

Comments are closed.