Link to home
Start Free TrialLog in
Avatar of Jonathan Greenberg
Jonathan GreenbergFlag for United States of America

asked on

Does Google penalize for properly attributed duplicate web content?

My client wants to post on her own site articles from around the web in which her company is mentioned, with attribution and links back to the original sites.  From the perspectives of both Google and the sites from which the articles were originally published, is this acceptable?  If not, is there an acceptable way to do this?

Thanks!

Regards,
Jonathan
Avatar of Jeffrey Dake
Jeffrey Dake
Flag of United States of America image

This is fine, but I would but the meta tag no index on the page. This way you don't submit those pages to google.
This is fine, but I would but the meta tag no index on the page. This way you don't submit those pages to google.

No-indexing the page will keep it off Google entirely, which hurts both pages in the long run.  Instead, I would suggest that you want the duplicate page to reference the original page as the canonical source.  You do this by adding the rel=canonical tag to the duplicate page's head section.  Read this Google page for more info:

https://support.google.com/webmasters/answer/139066?hl=en

By setting up the canonical URL back to the original source, the republication/syndicated publication passes link signals back to the source, which improves the ranking of the source.  No duplicate content penalty will apply (win) and the original gets a boost (win).
ASKER CERTIFIED SOLUTION
Avatar of Jeffrey Dake
Jeffrey Dake
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You could just bring the original page into an iframe. All that the crawler will pickup in the home page is the link to the original.  The content will not be treated as duplicate because it still only resides with the original site, who gets a small boost from the link. As long as the content is not being served from two different places, it is not duplicate.

Cd&
Jeff and Jason are both correct. There are two paths that will avoid a duplicate content penalty:

1. content=NoIndex tag

2. rel=canonical tag


The question of which to use from an SEO standpoint comes down to page rank.

With the NoIndex tag, if you want to pass pagerank to the original version, you'll want to keep the follow attribute. However, if there are multiple links in the article, the PageRank will be spread out amongst all of them:
<meta name="robots" content="noindex, follow" />

Open in new window


With the NoIndex tag, if you don't want to pass pagerank to the original version or any other links in the article, you'd include nofollow:
<meta name="robots" content="noindex, nofollow" />

Open in new window


Alternatively, you could specify 'follow' in the meta tag and then specify no-follow on any hrefs within the article that you don't want to receive pagerank benefits.

With the canonical tag, all pagerank is directly attributed to the original article and is not spread out amongst any other links in the article. You basically consolidate the pagerank.

On a related note, if these were "news" articles, you'd use the standout tag on the original and the original source tag on the syndicated version.
Avatar of Jonathan Greenberg

ASKER

I think I need to go with the rel="noindex" solution.  Even though the rel="canonical" method is more appealing to me in general, the problem with that -- as well as with using the interesting additional tags suggested by Lucas -- is that, in this case, I have several articles on a single page, each duplicating content from different websites.

For avoiding trouble with Google, there seems to be no reliable way to reference the original page as the canonical source -- or to set the equivalent of rel="noindex" -- for only specific parts of a page.  Iframes could work, but only by putting all articles in a single iframe -- or in multiple iframes for individual articles -- and setting the iframed page or pages to noindex, which just seems way overboard for my purposes.  And I have no access to the external sites, so I'm unable to coordinate with them.
is that, in this case, I have several articles on a single page, each duplicating content from different websites.

Ah.  In that case, rel=noindex is better
Might I suggest an alternative.

Instead of duplicating entire articles, simply duplicate the pertinent information as quotes. Setup a page to the effect of "testimonials" or "word around the web" and build out quotes that are individually attributed back to the source.

ie.

"Some really great quote about your clients business." - John Doe's Blog

"Some other really nice thing said about your client." - Jane Doe's Web SIte

You could set these quotes up with nice stylization (web fonts), even potentially put an image of the author next to them. This would likely provide something that is interesting to look at and click-through on, as opposed to lengthy articles that may not be entirely related to your client.
Thanks to all who contributed to this one.

Lucas, you're alternative will actually work quite nicely for another page of content on the same site.  In this case, however, the client actually wants the full text of the original articles on her site.  But thanks for taking the time to make your suggestion.

Regards,
Jonathan