Link to home
Start Free TrialLog in
Avatar of j1mlondon
j1mlondon

asked on

search listing GOOGLE

How long does it take for googlebot to get your site listed / indexed ?

The bot has been 4 (apparently) times but never gone further than the index page. The site is dynamicallty driven PHP = MySQL and I have been reading a lot of informatuion regarding OTHER bots not searching URL strings containing ? and  or &

Can anyone provide any help on this.

Greatly appreciated.

Jim
ASKER CERTIFIED SOLUTION
Avatar of crosenblum
crosenblum

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Azmeen
Azmeen

Google will index page.php?argument quite well... But won't go into too much details for page.php?argument=something&another=whatever.

Example can be seen here (for my site):
http://www.google.com/search?q=HTNet+Scripting&hl=en&lr=&ie=UTF-8&oe=UTF-8&start=0&sa=N&filter=0

I get that result only after clicking view all similar pages. So basically it proves that Google does cache dynamic pages very well, however it chokes on the more important pages with more than two arguments (eg.: file.php?arg1=something&arg2=whatever&arg3=stuff won't get indexed - 3 arguments).

A good way to still index your PHP pages with more than two arguments is to use mod_rewrite to simulate a static page to the client. Most hosts enable mod_rewrite.

Documentation on mod_rewrite can be obtained here:
http://httpd.apache.org/docs/mod/mod_rewrite.html
There are three popular methods for changing your URL's to help search engines which are all discussed at the following site:
http://www.sitepoint.com/article/485/1
What I have once read, is that Google will not follow links from those dynamically generated pages. This is quite logical since Session-IDs and such stuff would cause an endless amount of pages.

Because of this I have one "non-dynamic" page as my start page (/index.html) which redirects normal browsers to the real start page via "meta"-command, javascript or (if both noth available) a click of the user. This page is rather small (about 1k) and does not affect normal users that much (anyway, they normally start at "subpages", coming from search engines - so that's okay). Normally you don't even see that page (check out http://www.4cheaters.de/index.html to see that). This page contains links to some other "hardcoded" pages. These are automatically generated every night and contain links to all dynamically generates subpages of my site (http://www.4cheaters.de/games.html - note that this page also contains the redirect, so don't really expect something else to show up ;).

These pages will not confuse any search engine (even if they try to access it with another Browser-ID (to find out if I'm spamming)) and they will redirect them to every subpage.

I'm quite successful with this.
Hello Jim,

I have a site also that is written entirely with php and have looked up a lot of information on the subject. In regards to Google, the session id is the main issue from what I have found. You will find a lot of information in regards to rewriting the url string, but I have never done this myself and have never had a problem with Google crawling my entire site, which is only about 1,400 pages.

For a bigger example of this you might want to search google for phpbb.com, which is a php forum and the entire site is dynamic and the url is not rewritten. At google search for allinurl:phpbb.com and this will list all pages indexed by google. You will notice the results are over 19,000 pages, which would be more than most sites would ever need indexed. I would suggest looking into removing the session id's and seeing if that fixes the problem. An excellent resource I found was also on the phpbb site. It details how to allow sessions for regular users but exclude them for search bots. I use this framework to remove the id's from the other scripts I have running.
http://phpbb.com/phpBB/viewtopic.php?t=32328&highlight=remove+session+google

I hope this helps!