Link to home
Start Free TrialLog in
Avatar of digisel
digiselFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Problems with sitemap.xml files

I have been trying to create a sitemap.xml for Google.
There have been various problems.  
I tried to solve these by trying different software and the result is
a real conflict and frankly a mess of files I do not know what to do with.
I would like to erase everything and start again.
But I am concferned about deleting files in case I delete something that might either
A.  Mess up the operation of my site
or
B. will prevent ANY sitemap being created.
I think most of these files have been created by xml-sitempa.com
It looks good software but the support is lousy.

Can an expert please tell me which of these files in the different directories listed below that I can delete SAFELY to start again.
Thanks


/public_html/custom/domain_2/generator/pages/mods/sitemap_notify.txt
/public_html/custom/domain_2/generator/pages/mods/sitemap_index_tpl.xml
/public_html/custom/domain_2/generator/pages/mods/sitemap_xml_tpl.xml
/public_html/custom/domain_2/generator/pages/mods/sitemap_base_tpl.xml
/public_html/custom/domain_2/generator/pages/mods/sitemap_ror_tpl.xml
/public_html/custom/domain_2/backup02192013/tmp/min_sitemapphp_88048206426.js
/public_html/custom/domain_2/backup02192013/tmp/min_sitemapphp_22012051606.css
/public_html/custom/domain_2/backup02192013/sitemap
/public_html/functions/sitemap_funct.php
/public_html/functions/sitemapgen_funct.php
/public_html/cron/sitemap.php
/public_html/where-to-online-shop/tmp/min_sitemapphp_142012095446.css
/public_html/where-to-online-shop/tmp/min_sitemapphp_426036286338.js
/public_html/generator/data
/public_html/generator/pages
/public_html/generator/changelog.txt
/public_html/generator/default.conf
/public_html/generator/documentation.html
/public_html/generator/howto-install.pdf
/public_html/generator/index.php
/public_html/generator/license.html
/public_html/generator/runcrawl.php
/public_html/generator
/public_html/generator/pages/page-generator.inc.php
/public_html/custom/domain_2/generator
/public_html/custom/domain_2/generator/pages/page-generator.inc.php
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Avatar of Bernard Savonet
Bernard Savonet
Flag of France image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Looking back at all your files: I woudl probably delete all ofvthem (with a backup,of course)

What you will lose:
- it seems that a cron job updates the sitemap regularly, and so you would lose this frequent update

BUT
remember that non-mapped pages will also be spidered, crawled and indexed
Avatar of digisel

ASKER

thank you.
does clicking on googlebot start a crawl?
what is the difference between a site being spidered and crawled
and finally how often is a site spidered or crawled by goolge.
I know this varies a lot but what are the parameters.
thanks
Avatar of digisel

ASKER

to fibo

you said to add the following to my robots.txt
http://www.sitemaps.org/protocol.html#submit_robots

Just wanted to check that the protocol should be html
Avatar of digisel

ASKER

thank you
B-) glad we could help, thx for the gradecanc points.

Robits.txt: you must place into it one line per sitemap you want to hint spiders to. The precise structure of the ljne is given in the reference I suggested.
Basically, it is something like

Sitemap: http://www.example.com/sitemap.xml
where the address MUST be the fully qualified (http://etc) address of the site and file.

Note thqt you can provide several files, and that if thdy contain duplicates this is not a problem.

Finally:
- the spider or crawler explores URLs. It gives the content to the indexer and also finds the links in the page, who will be spidered at some later stage
- most spiders will visit the entire web they know in 1-2 months
Avatar of digisel

ASKER

thanks for the supplementary info and what yo9u have contributed.