Hi experts,
I have a forum with +2 million pages indexed in google. how ever I know that a lot of this pages are doubled content because the same forum post open with 3 different urls.
for example the same topic opens for the urls
1. viewtopic.php?p=xxxx (view a post)
2. viewtopic.php?t=xxxx (view the entire topic but of course the post above as well)
3. this_is_example_topic.html (cached version of the same as the topic before)
I prefer normally the SEO version of the url (nr3.) but unfortunately google has indexed the versions viewtopic as well more than 500.000 times.
I thought about to block now simple the google indexer with my robots.txt and block access to all viewtopic urls which would filter everything out except the seo url version BUT i fear this is going to hit me negative and do more harm than well because this means goolge is going to kick hundreds of thousand urls out of the index. The content might not be 100% indexed with the SEO url version and once i kick the viewtopic urls out I might be out of traffic.
I would be happy for some suggestions or ideas on how to prevent a big mess but clean up my urls in google.
thanks in advance
leaving out the urls you consider to be duplicate, this may not stop google from indexing them though.
https://support.google.com/webmasters/answer/156184?hl=en
you could use rel='nofollow on links to that page but if that page is linked to from another site google could still index it
https://support.google.com/webmasters/answer/96569?hl=en&ctx=cb&src=cb&cbid=-5rmggrfsp2rq&cbrank=3
only way to stop the page from being indexed is using meta tag telling google not to index
<meta name="robots" content="noindex, nofollow" />