digisel
asked on
Problems with sitemap.xml files
I have been trying to create a sitemap.xml for Google.
There have been various problems.
I tried to solve these by trying different software and the result is
a real conflict and frankly a mess of files I do not know what to do with.
I would like to erase everything and start again.
But I am concferned about deleting files in case I delete something that might either
A. Mess up the operation of my site
or
B. will prevent ANY sitemap being created.
I think most of these files have been created by xml-sitempa.com
It looks good software but the support is lousy.
Can an expert please tell me which of these files in the different directories listed below that I can delete SAFELY to start again.
Thanks
/public_html/custom/domain _2/generat or/pages/m ods/sitema p_notify.t xt
/public_html/custom/domain _2/generat or/pages/m ods/sitema p_index_tp l.xml
/public_html/custom/domain _2/generat or/pages/m ods/sitema p_xml_tpl. xml
/public_html/custom/domain _2/generat or/pages/m ods/sitema p_base_tpl .xml
/public_html/custom/domain _2/generat or/pages/m ods/sitema p_ror_tpl. xml
/public_html/custom/domain _2/backup0 2192013/tm p/min_site mapphp_880 48206426.j s
/public_html/custom/domain _2/backup0 2192013/tm p/min_site mapphp_220 12051606.c ss
/public_html/custom/domain _2/backup0 2192013/si temap
/public_html/functions/sit emap_funct .php
/public_html/functions/sit emapgen_fu nct.php
/public_html/cron/sitemap. php
/public_html/where-to-onli ne-shop/tm p/min_site mapphp_142 012095446. css
/public_html/where-to-onli ne-shop/tm p/min_site mapphp_426 036286338. js
/public_html/generator/dat a
/public_html/generator/pag es
/public_html/generator/cha ngelog.txt
/public_html/generator/def ault.conf
/public_html/generator/doc umentation .html
/public_html/generator/how to-install .pdf
/public_html/generator/ind ex.php
/public_html/generator/lic ense.html
/public_html/generator/run crawl.php
/public_html/generator
/public_html/generator/pag es/page-ge nerator.in c.php
/public_html/custom/domain _2/generat or
/public_html/custom/domain _2/generat or/pages/p age-genera tor.inc.ph p
There have been various problems.
I tried to solve these by trying different software and the result is
a real conflict and frankly a mess of files I do not know what to do with.
I would like to erase everything and start again.
But I am concferned about deleting files in case I delete something that might either
A. Mess up the operation of my site
or
B. will prevent ANY sitemap being created.
I think most of these files have been created by xml-sitempa.com
It looks good software but the support is lousy.
Can an expert please tell me which of these files in the different directories listed below that I can delete SAFELY to start again.
Thanks
/public_html/custom/domain
/public_html/custom/domain
/public_html/custom/domain
/public_html/custom/domain
/public_html/custom/domain
/public_html/custom/domain
/public_html/custom/domain
/public_html/custom/domain
/public_html/functions/sit
/public_html/functions/sit
/public_html/cron/sitemap.
/public_html/where-to-onli
/public_html/where-to-onli
/public_html/generator/dat
/public_html/generator/pag
/public_html/generator/cha
/public_html/generator/def
/public_html/generator/doc
/public_html/generator/how
/public_html/generator/ind
/public_html/generator/lic
/public_html/generator/run
/public_html/generator
/public_html/generator/pag
/public_html/custom/domain
/public_html/custom/domain
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
thank you.
does clicking on googlebot start a crawl?
what is the difference between a site being spidered and crawled
and finally how often is a site spidered or crawled by goolge.
I know this varies a lot but what are the parameters.
thanks
does clicking on googlebot start a crawl?
what is the difference between a site being spidered and crawled
and finally how often is a site spidered or crawled by goolge.
I know this varies a lot but what are the parameters.
thanks
ASKER
to fibo
you said to add the following to my robots.txt
http://www.sitemaps.org/protocol.html#submit_robots
Just wanted to check that the protocol should be html
you said to add the following to my robots.txt
http://www.sitemaps.org/protocol.html#submit_robots
Just wanted to check that the protocol should be html
ASKER
thank you
B-) glad we could help, thx for the gradecanc points.
Robits.txt: you must place into it one line per sitemap you want to hint spiders to. The precise structure of the ljne is given in the reference I suggested.
Basically, it is something like
Sitemap: http://www.example.com/sitemap.xml
where the address MUST be the fully qualified (http://etc) address of the site and file.
Note thqt you can provide several files, and that if thdy contain duplicates this is not a problem.
Finally:
- the spider or crawler explores URLs. It gives the content to the indexer and also finds the links in the page, who will be spidered at some later stage
- most spiders will visit the entire web they know in 1-2 months
Robits.txt: you must place into it one line per sitemap you want to hint spiders to. The precise structure of the ljne is given in the reference I suggested.
Basically, it is something like
Sitemap: http://www.example.com/sitemap.xml
where the address MUST be the fully qualified (http://etc) address of the site and file.
Note thqt you can provide several files, and that if thdy contain duplicates this is not a problem.
Finally:
- the spider or crawler explores URLs. It gives the content to the indexer and also finds the links in the page, who will be spidered at some later stage
- most spiders will visit the entire web they know in 1-2 months
ASKER
thanks for the supplementary info and what yo9u have contributed.
What you will lose:
- it seems that a cron job updates the sitemap regularly, and so you would lose this frequent update
BUT
remember that non-mapped pages will also be spidered, crawled and indexed