[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

mod_rewrite and googlebot

Posted on 2007-10-04
6
Medium Priority
?
531 Views
Last Modified: 2008-01-09
I am planning to use mod_rewrite to hide ugly urls. e.g.
http://.../places/coffee-house would point to http://.../index.php?pid=43

I have got this to work. My question is how will the GoogleBot know to index the mod_rewrite urls (i.e. http://.../places/coffee-house) since technically they dont exist.

More detail - I plan to add unique meta tags (keywords, description) to the page depending on the pid passed to index.php. So each such mod_rewritten url has unique content. How can I get Googlebot to discover such Urls? Should I create empty directories mirroring the url path (e.g. mkdir www/places/coffee-house)?

Thanks,
Unmesh.
0
Comment
Question by:unmeshm
  • 3
  • 2
6 Comments
 
LVL 12

Expert Comment

by:pigmentarts
ID: 20013054
Google just follows links, if it treats them as normal URLs or redirected URLs depends entirely on how the rules were written. Google just follows URLs just like your browser, if the undesired results are visible in your browser address bar, then they are to Google! (with exceptions). Write the rules server side, local paths and no redirect flag you should be fine.

on another note,

I have read that Google can make educated guesses, for example if file sizes were different, if it does this or what it does with this information I have no idea.

> How can I get Googlebot to discover such Urls?
submit them all in a XML  sitemap.
0
 

Author Comment

by:unmeshm
ID: 20013538
The content is dynamically generated and grows over time hence adding the URL's to a sitemap XML is unfeasable.

I guess what you are saying is that I need to expose the URLs on my site somehow or have them linked from some other sites. Will have to figure out a way to have all the URL's tracked by the robot (both old and new).
0
 
LVL 33

Accepted Solution

by:
humeniuk earned 500 total points
ID: 20014087
"My question is how will the GoogleBot know to index the mod_rewrite urls (i.e. http://.../places/coffee-house) since technically they dont exist."

As noted above, Google will index them if they find them.  How can they find them?  Search engine spiders move from page to page via links.  If there is a link to the page, Google will find it.  If the linked page redirects, make sure the redirect returns a 301 (moved permanently) http code and you won't have any problems getting those pages indexed.


"I guess what you are saying is that I need to expose the URLs on my site somehow or have them linked from some other sites."

If you can't incorporate the links in your primary linking scheme (and if you don't, how do you expect users to find them ??), you can create a dynamic sitemap as well that generates the necessary links.
0
Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

 
LVL 12

Assisted Solution

by:pigmentarts
pigmentarts earned 500 total points
ID: 20014640
> The content is dynamically generated and grows over time hence adding the URL's to a sitemap XML is unfeasable.

The idea of submitting an XML sitemap is exactly for large, dynamic and otherwise unmanageable content.  XML sitemaps should be dynamically updated by your system. after submitting your first xml map, Google should pick it up every now and then when large changes are made.

as  humeniuk is saying, if a user can not browser to the page, Google will not find that page either.  

Therefore pages that are accessible on your site, that don't need sessions or scripts to get to, Google should find them. don't just rely on XML sitemaps, if the pages are only accessible to Google in a submitted XML map with no link on your site,  Google still may not index it.


0
 
LVL 33

Expert Comment

by:humeniuk
ID: 20014826
"... if the pages are only accessible to Google in a submitted XML map with no link on your site,  Google still may not index it."

Very good point.
0
 
LVL 12

Expert Comment

by:pigmentarts
ID: 20021300
thank you for the points unmeshm. :)
0

Featured Post

NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When it comes to write a Context Sensitive Help (an online help that is obtained from a specific point in state of software to provide help with that state) ,  first we need to make the file that contains all topics, which are given exclusive IDs. …
Dramatic changes are revolutionizing how we build and use technology. Every company is automating, digitizing, and modernizing operations. We need a better, more connected way to work together as teams so we can harness the insights from our system…
This tutorial demonstrates how to identify and create boundary or building outlines in Google Maps. In this example, I outline the boundaries of an enclosed skatepark within a community park.  Login to your Google Account, then  Google for "Google M…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
Suggested Courses
Course of the Month18 days, 12 hours left to enroll

834 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question