Solved

make google friendly

Posted on 2003-11-19
8
283 Views
Last Modified: 2010-05-19
Hi

The cgi scripts/dynamic sites are not visited from google.
What is the best way to make scan from google search robot?
0
Comment
Question by:tilmes
  • 3
  • 2
  • 2
8 Comments
 
LVL 20

Expert Comment

by:jmcg
ID: 9784156
You may be able to entice Google into scanning your site by placing an appropriate entry in your robots.txt file.

Take a look at Expert-Exchange's robots.txt file, for example:

User-agent: *
Disallow:

There are also "Allow:" directives and you could list your .cgi script there.

The main hesitation behind spidering dynamic sites is that they may never have the same thing twice. That leaves doubt about whether a search engine should keep track of what the page once said, but which it won't say again if you visit it.



0
 
LVL 2

Expert Comment

by:ext2
ID: 9784562
In Apache, simply use Action/AddHandler in your .htaccess file to translate external *.html URLs seen by users and search engines to internal *.cgi paths on the server:

  http://httpd.apache.org/docs/handler.html

Your users might never even know that your page are implemented internally with CGIs.

I've often wondered why ASP/JSP/Perl/etc. traditionally add their own extensions to URLs.  This needlessly exposes implementation, and it makes it difficult to change the implementation (Parnas would not be pleased).  <troll>It's poor design, and apparently MS is slow to understand this.</troll>
0
 

Author Comment

by:tilmes
ID: 9785097
Hi

How can i use
simply use Action/AddHandler in your .htaccess?
Please make me understand this.
0
Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

 

Author Comment

by:tilmes
ID: 9785226
Hi

I read about the article said that change query part of the dynamic URL
Example - http://www.my-online-store.com/books.asp?id=1190
to http://www.my-online-store.com/books/A 
HOw can i do this in my cgi script?
0
 
LVL 2

Accepted Solution

by:
ext2 earned 50 total points
ID: 9792205
Create a file named ".htaccess" in either your root web directory or some subdirectory of it.  Any setting you place in the .htaccess file will be applied to the directory in which it is contained as well as all subdirectories of that directory (you can even have multiple .htaccess files in your directory hierarchy, where one file overrides settings in another).  So, for testing purposes, you might create a subdirectory named "test" (accessible from the URL http://myserver.com/test/) and place an .htaccess file in that directory to play with.

In your .htaccess file, add something like this:

    Action my-handler /cgi-bin/myfilter.pl
    AddHandler my-handler .html

This will cause all URLs with extension .html within the "/test" directory or subdirectories to be internally sent to the CGI residing at http://myserver.com/cgi-bin/myfilter.pl .  Since multiple URLs might all be sent to the same CGI, you'll likely need some way for your CGI to determine which URL invoked it.  You can obtain this info from two environment variables:

  $ENV{PATH_TRANSLATED}
  $ENV{REQUEST_URI}

The former is the absolute file system path.  So, if you requested "http://myserver.com/test/ok.html" and your root web directory is "/var/htdocs", then this variables will be "/var/htdocs/test/ok.html".

The later is the path given in the URL (not including the domain name).  In the above example, this would be "/test/ok.html".

Now, what if you want only -some- HTML files within the "test" directory to be sent to your CGI?  The selection can be specified in your .htaccess file with a "Files" tag.  So, if your .htaccess instead included

  Action my-handler /cgi-bin/myfilter.pl
  <Files "ok.html">
      SetHandler my-handler
  </Files>

then only "http://myserver.com/test/ok.html" will be sent to the CGI, while "http://myserver.com/test/hello.html" will be served statically as normal (Apache actually uses "default-handler" as the name for the handler that serves static pages, so "SetHandler default-handler" would be a way to specify this explicitly on files).

Concerning your second message (11:35PM), what you do is pretend that http://www.my-online-store.com/books is a file (rather than a directory) and use an Action/SetHandler to send all requests on that file to your CGI.  The "A" part I believe can be retrieved via the environment variables as mentioned above.  If in doubt, just use a

  use Data::Dumper;

  print "Content-type: text/html\n\n";
  print Dumper(\%ENV);

to display what environment variables your CGI is seeing.
0
 

Author Comment

by:tilmes
ID: 9795559
thank you for the explanation.
I inserted below two in .htaccess file
but in the query string has not changed at all.
Do i need to change also in httpd.conf file in parent directory?

Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteRule cla/(\.*)/(\.*)/(\.*)/(\.*)/(\.*)/(\.*)/(\.*)/(\.*)/$ /tell/cgi-bin/ADcla/cla\.cgi?$1=$2&$3=$4&$5=$6&$7=$8


Action my-handler /tell/cgi-bin/ADcla/cla.cgi
AddHandler my-handler .html
0
 
LVL 20

Expert Comment

by:jmcg
ID: 10093432
Nothing has happened on this question in more than 7 weeks. It's time for cleanup!

My recommendation, which I will post in the Cleanup topic area, is to
accept answer by ext2 [grade B] (on the road to an answer).

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

jmcg
EE Cleanup Volunteer
0

Featured Post

Free Tool: Postgres Monitoring System

A PHP and Perl based system to collect and display usage statistics from PostgreSQL databases.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

808 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question