Link to home
Start Free TrialLog in
Avatar of Rhythmdvl
Rhythmdvl

asked on

Removing dead links from Google Site Search

We made some significant changes to a site—large sections were moved from “mysite.com/nolongerneeded” to “mysite.com/old.” There are no links to the /old directory, it’s our makeshift backup in case we want to quickly revert (i.e., those aren’t a problem).

Google Site Searches are still returning hits for pages in the /nolongerneeded folder. As expected, following these links leads to a 404 error. Much of the information on the moved pages has been restructured to “mysite.com/newstructure,” so people will still be using similar search terms and looking for relevant results.

I used SOFTplus GSiteCrawler v1.23 to recrawl the site and regenerate a site map. It lists the new pages. I put the /nolongerneeded folder into its Drop Parts and Ban filters, but the resulting sitemap still has:

<url><loc>http://www.mystite.com/nolongerneeded/research.html</loc><lastmod>2010-11-18T14:34:50+00:00</lastmod><changefreq>monthly</changefreq><priority>0.50</priority></url>

Open in new window


The last modification was the bulk move on 4/19—the November 2010 date was the last change to the page contents.

Can I just do a global search on the sitemap and delete all references to /nolongerneeded and resubmit to Google (we have a paid account, if it makes a difference)? Am I missing something straightforward?

I’m awfully sorry for such a basic question, but I’m a bit stuck—and since it seems to take a couple days for updates to propagate, I don’t have the time to go the trial-and-error route.

Thanks,

Rhythm

ASKER CERTIFIED SOLUTION
Avatar of Dave Baldwin
Dave Baldwin
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Rhythmdvl
Rhythmdvl

ASKER

So close ...


Thanks for the re-crawling help. I now have a Sitemap with correct information. I submitted it via Google Webmaster Tools, then went to the Site Search Control Panel and selected Index Now under the On-demand indexing using Sitemaps section. It came back with an "index refreshed" response.

But when going to the site and searching under the keyword, the old 404 links are still showing up. Should I be patient and they will stop appearing in a few days or am I still missing something?

How can I exclude sites and pages?

There are three ways you can exclude sites or pages from your Custom Search Engine: individually, in bulk, or using the Google Marker.

Individually You can exclude sites individually in the Sites tab of your Custom Search Engine's control panel. Select the Add Sites button under the Excluded sites section (or the Exclude sites link if you haven't listed any) and the Exclude sites individually option will open.

In bulk You can exclude sites in bulk by selecting the Exclude sites in bulk option under the Excluded sites section of your Sites tab. To use this option in the Sites tab, select the Add Sites button (or the Exclude sites link if you haven't listed any) and then the Exclude sites in bulk option. Enter the sites, pages, or patterns, one per line.

Using the Google Marker Once you have created a Custom Search Engine, you can exclude sites using the Google Marker. The Google Marker allows you to save sites to your Custom Search Engine as you browse the web. For more details, visit http://google.com/coop/cse/marker.
Oops, I hit submit too soon. (And can't see an edit function for the post).

That's from the Google Help Page:  Custom Search › Help articles › Creating and Editing Your CSE › Adding or Excluding Pages › How can I exclude sites and pages?  

I think I was banging my head against a wall because I missed in in reading and didn't use the keyword exclude in searching. Very simple process.

The sitemap was also vexing because in the On-Demand Indexing help page it says:
If I submit a new Sitemap, will pages from the previous Sitemap be dropped from my search engine?If your CSE or GSS has sufficient on-demand quota to accommodate the new pages, no pages will be dropped from your results. If you exceed your limit, we may remove some of your pages from on-demand indexing. We will always try to remove the least important pages, as determined by priority and last modified date in the Sitemap.

To review your use of the on-demand quota, visit the Indexing tab of your search engine control panel. In the On-demand indexing section of this page, you will see the Sitemap URL, indexing status, and available page quota. As long as the total number of new or newly updated pages you include in your Sitemap at any one time is less than your remaining quota listed here, no pages will be removed from your search engine.


Again, thanks for the Sitemap help--it's crucial to the overall site as well.
This was a great help. The ultimate solution differs a bit (see following posts) but I asked the wrong questions--this was the right solution for that related task. Thanks!
You really should do a 301 permanent redirect from your old url to your new url.  This way, people going to "mysite.com/nolongerneeded” will automatically get redirect to  “mysite.com/old."  In the same sense, all content from "mysite.com/nolongerneeded” will no longer render a 404 error and Google index will natural change telling it where the new content is now on “mysite.com/old."

If you are already beyond the point of being able to do 301 redirects, Google does not like having 404 errors in the index and will eventually remove them from the index.  If you want to speed up the process, you can request for Google to remove URLs with their URL Removal Tool.  You'll have to be logged into your Webmaster Tools account in order to use it.

Hope that helps

Matt
Thanks for the points.  I wasn't ignoring you, I was just busy doing other things.  Note that any external links for the old pages will still be found in Google, that part is not under your control.  And Google usually takes a while to update the search results.