asked on

Problems with sitemap.xml files

I have been trying to create a sitemap.xml for Google.
There have been various problems.
I tried to solve these by trying different software and the result is
a real conflict and frankly a mess of files I do not know what to do with.
I would like to erase everything and start again.
But I am concferned about deleting files in case I delete something that might either
A. Mess up the operation of my site
or
B. will prevent ANY sitemap being created.
I think most of these files have been created by xml-sitempa.com
It looks good software but the support is lousy.

Can an expert please tell me which of these files in the different directories listed below that I can delete SAFELY to start again.
Thanks

/public_html/custom/domain_2/generator/pages/mods/sitemap_notify.txt
/public_html/custom/domain_2/generator/pages/mods/sitemap_index_tpl.xml
/public_html/custom/domain_2/generator/pages/mods/sitemap_xml_tpl.xml
/public_html/custom/domain_2/generator/pages/mods/sitemap_base_tpl.xml
/public_html/custom/domain_2/generator/pages/mods/sitemap_ror_tpl.xml
/public_html/custom/domain_2/backup02192013/tmp/min_sitemapphp_88048206426.js
/public_html/custom/domain_2/backup02192013/tmp/min_sitemapphp_22012051606.css
/public_html/custom/domain_2/backup02192013/sitemap
/public_html/functions/sitemap_funct.php
/public_html/functions/sitemapgen_funct.php
/public_html/cron/sitemap.php
/public_html/where-to-online-shop/tmp/min_sitemapphp_142012095446.css
/public_html/where-to-online-shop/tmp/min_sitemapphp_426036286338.js
/public_html/generator/data
/public_html/generator/pages
/public_html/generator/changelog.txt
/public_html/generator/default.conf
/public_html/generator/documentation.html
/public_html/generator/howto-install.pdf
/public_html/generator/index.php
/public_html/generator/license.html
/public_html/generator/runcrawl.php
/public_html/generator
/public_html/generator/pages/page-generator.inc.php
/public_html/custom/domain_2/generator
/public_html/custom/domain_2/generator/pages/page-generator.inc.php

ASKER CERTIFIED SOLUTION

Ray Paseur

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

Bernard Savonet

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Bernard Savonet

Looking back at all your files: I woudl probably delete all ofvthem (with a backup,of course)

What you will lose:
- it seems that a cron job updates the sitemap regularly, and so you would lose this frequent update

BUT
remember that non-mapped pages will also be spidered, crawled and indexed

digisel

ASKER

thank you.
does clicking on googlebot start a crawl?
what is the difference between a site being spidered and crawled
and finally how often is a site spidered or crawled by goolge.
I know this varies a lot but what are the parameters.
thanks

digisel

ASKER

to fibo

you said to add the following to my robots.txt
http://www.sitemaps.org/protocol.html#submit_robots

Just wanted to check that the protocol should be html

digisel

ASKER

thank you

Bernard Savonet

B-) glad we could help, thx for the gradecanc points.

Robits.txt: you must place into it one line per sitemap you want to hint spiders to. The precise structure of the ljne is given in the reference I suggested.
Basically, it is something like

Sitemap: http://www.example.com/sitemap.xml
where the address MUST be the fully qualified (http://etc) address of the site and file.

Note thqt you can provide several files, and that if thdy contain duplicates this is not a problem.

Finally:
- the spider or crawler explores URLs. It gives the content to the indexer and also finds the links in the page, who will be spidered at some later stage
- most spiders will visit the entire web they know in 1-2 months

digisel

ASKER

thanks for the supplementary info and what yo9u have contributed.