Solved

Website under attack:  Duplicated content in Google URLs but duplicates are URLs on our domain with a "?"

Posted on 2016-08-10
3
66 Views
Last Modified: 2016-08-12
We have noticed that hundreds of our webpages are being duplicated in the google index and are being indexed in google but we have not published any of these URLs.  The unusual thing is that all the duplicated pages contain our domain name i.e. our actual domain in the base part of the URL as if it was hosted on our webserver.  

e.g.  our legitimate homepage is like this:

http://www.our-domain.com/legitimate-page-URL.html

and the duplicated pages that are indexed in google for the above page are like this:

http://www.our-domain.com/legitimate-pageURL.html?garbage-text here-page2
http://www.our-domain.com/legitimate-page-URL.html?garbage-text-page3

etc:    there may be dozens of the pages 2 and 3  all with duplicate content from the legitimate page

Repeated dozens of times


 All the URLs are have a "?"  (question mark without the quotes" in the URL. We do not publish pages with a "?" in any of our pages.   We only have static HTML pages on our website. We do not use a database or content management system. Just static HTML pages.  

We were alerted there was a problem a week ago when we notice our homepage disappeared from google index - even a specific search for the page does not produce any result. Then another of our pages started to drop from position 1 to position 6  then other pages started dropping ranking positions.  

 All the duplicate page URLs  have duplicate content of a legitimate page on our website but these pages are not physically hosted on our site -  are not hosted on our website. We are therefore not clear how those pages got into the Google index.  The problem is that there are hundreds of URLs that are copies of our homepage but with different URLs all of these URLs are cached in Google, so it appears that google thinks these are legit pages when infact they are not. If we click on the link in google then we see the legitimate page content but with the bogus URLs. Non of the bogus URL are on our server.

Does anyone know what is happening here and what we can do to stop this issue and ensure these duplicate pages are not indexed in google and other engines?

thank you
JohnB
0
Comment
Question by:boltweb
  • 2
3 Comments
 
LVL 10

Accepted Solution

by:
Jeffrey Dake earned 500 total points
ID: 41751453
First thing I would do is make sure your home page has a rel canonical in the header pointing to the main page. This can help Google know that is your main page.  If all the urls with the junk parameters return that main canonical that will tell Google where the real page is.

Also make sure you register your domain in Google webmaster tools. Once you prove you have the domain, you can go into the webmaster tools and monitor errors you might have on your domain. There are also some tools in there to tell Google to ignore specific parameters.

Hope this helps
0
 

Author Comment

by:boltweb
ID: 41754345
Hello Jeffery, thank you for your reply. I have had my site in google webmaster tools. I can't add rel canonical yet because my pages are all static pages and I have 16,500 pages in total to tag with the rel canonical. I'm getting a script written to do this but until it is written I can't implement that solution on the entire site. However I've implemented it on the homepage and my home page recovered its position which was a big relief.  

I looked into google paramaters. That was something I did not know about. I saw this training video on Youtube ( Google parameters https://www.youtube.com/watch?v=DiEYcBZ36po ) which was very helpful and now I have set all parameters to be not indexed by google however note that this tool does  not remove a URL it simply excludes it from being indexed. The google removal tools - there are 2 but the problems there are a) I would need to edit the page to remove the page permanently - I can't do that since the page does not exist. So that leave the temporary remove tool in google. The problem there is I don't have a full list of URLs affected by this issue so I can't submit that list, and even if I did have the list the removal is only temporary and the would return again in 3 months.

The solution is the rel canonical. The other thing I can do is implement a rewrite mod in the .htaccess file to block URLs with the query strings however the problem with this is that I would loose the google search functionality that has a "?" in its results. I could move the google search to load in a special folder called /gsearch/ however I would have to do that for each of the languages for the site - for which there are 8 translations of the site. So if there was a way to block all query strings AND exclude the gsearch then that would solve the problem until I can get the script completed for doing the rel canonical application.
0
 

Author Closing Comment

by:boltweb
ID: 41754346
Thank you Jeffery. Got my homepage back!
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Cybersecurity has become the buzzword of recent years and years to come. The inventions of cloud infrastructure and the Internet of Things has made us question our online safety. Let us explore how cloud- enabled cybersecurity can help us with our b…
A/B testing is a simple and effective trick to get to know your audience, increase website conversions and make the most out of your online ad campaigns. It's widely available and doesn't need much tech knowledge to be executed, but the results it y…
This tutorial walks through the best practices in adding a local business to Google Maps including how to properly search for duplicates, marker placement, and inputing business details. Login to your Google Account, then search for "Google Mapmaker…
This Micro Tutorial will demonstrate how to add subdomains to your content reports. This can be very importing in having a site with multiple subdomains.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now