Link to home
Start Free TrialLog in
Avatar of Julian Matz
Julian MatzFlag for Ireland

asked on

Google Sitemaps

Hello!

I'm using Google Sitemaps and the Python Sitemaps Generator.

I've tried about every option in the config file, and the only one that's really suitable is the external urllist.txt option.

Scanning the filesystem is not sutable as it has too many files I don't want in the sitemap. Plus, I do a lot of rewriting so the SE friendly URLS will not be included.
Scanning the access logs is not suitable because it also lists 404 errors.

And I cannot really keep the urls.txt file updated manually, so I wanted to ask if anyone has any ideas or knows of any scripts that will keep this file updated by scanning through HTTP links on the website?

Thanks in advance!
-Julian.
ASKER CERTIFIED SOLUTION
Avatar of ChrisMacleod
ChrisMacleod

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of periwinkle
There are many, many 3rd party solutions for google sitemaps;  see:

http://code.google.com/sm_thirdparty.html

In particular, look at the Downloadable Tools and the Online Generators - perhaps one of these will work well for you?
Hi Julian,

I've had good success with Audit My PC's online generator:
<http://www.auditmypc.com/free-sitemap-generator.asp>

And believe it or not, Coffeecup Software has a very decent Sitemapper program.  I actually enjoy using it as it makes very clean sitemaps with graphics and for Google it has many options for various files and folders and queries to ignore.  You can use their 30 day trial version.  Their paid version is $29.00.  

<http://www.coffeecup.com/google-sitemapper/>

Mark
Avatar of Julian Matz

ASKER

Hi Periwinkle and Desertcities,
Thanks for your comments, but I was looking for something that runs on my own server, preferably with cron. Something like the Google python script, except one that crawls http-based links and not the file-system.

Hi ChrisMacleod,
Thanks for the link. It's exactly what I was looking for... well, originally I was looking for something that just creates the urllist.txt file so that the google-sitemaps-gen script can use it to create the XML sitemap, but this one does the whole lot at once, which is cool too. It can also be used with cron, so it's perfect from what I can see.

-Julian.
Avatar of ChrisMacleod
ChrisMacleod

Your welcome Julian, i am think about purchasing this one also.
Well, now I can recommend it :)

It's extremely easy to install, easy to configure, is handy for reporting broken links, easy to use web-based, easy to set up with cron. For 15.00 US, I don't think you can go wrong.

The only thing was that I didn't receive the download link straight after payment. I only got it today, so it probably takes a couple hours for the e-mail to come through, or it's sent manually.... Not an issue really, personally I just hate waiting :)
FWIW, some of the downloadable tools run on your server;  glad that you found a solution that runs welll for you.
Periwinkle, I did check out your link, but the "Downloadable Tools" all seem to be for Windows. I also checked all the PHP resources under "Code Snippets", but in the end I just decided to go with the Standalone Generator because it did look very efficient and it only cost 14 or 15 dollars...
Julian - no problem - just wanted to point out (for posterity) that not everything at that link was windows-based.  The perl programs would have run as well on your server.