Solved

make google friendly

Posted on 2003-11-19
8
286 Views
Last Modified: 2010-05-19
Hi

The cgi scripts/dynamic sites are not visited from google.
What is the best way to make scan from google search robot?
0
Comment
Question by:tilmes
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
8 Comments
 
LVL 20

Expert Comment

by:jmcg
ID: 9784156
You may be able to entice Google into scanning your site by placing an appropriate entry in your robots.txt file.

Take a look at Expert-Exchange's robots.txt file, for example:

User-agent: *
Disallow:

There are also "Allow:" directives and you could list your .cgi script there.

The main hesitation behind spidering dynamic sites is that they may never have the same thing twice. That leaves doubt about whether a search engine should keep track of what the page once said, but which it won't say again if you visit it.



0
 
LVL 2

Expert Comment

by:ext2
ID: 9784562
In Apache, simply use Action/AddHandler in your .htaccess file to translate external *.html URLs seen by users and search engines to internal *.cgi paths on the server:

  http://httpd.apache.org/docs/handler.html

Your users might never even know that your page are implemented internally with CGIs.

I've often wondered why ASP/JSP/Perl/etc. traditionally add their own extensions to URLs.  This needlessly exposes implementation, and it makes it difficult to change the implementation (Parnas would not be pleased).  <troll>It's poor design, and apparently MS is slow to understand this.</troll>
0
 

Author Comment

by:tilmes
ID: 9785097
Hi

How can i use
simply use Action/AddHandler in your .htaccess?
Please make me understand this.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:tilmes
ID: 9785226
Hi

I read about the article said that change query part of the dynamic URL
Example - http://www.my-online-store.com/books.asp?id=1190
to http://www.my-online-store.com/books/A 
HOw can i do this in my cgi script?
0
 
LVL 2

Accepted Solution

by:
ext2 earned 50 total points
ID: 9792205
Create a file named ".htaccess" in either your root web directory or some subdirectory of it.  Any setting you place in the .htaccess file will be applied to the directory in which it is contained as well as all subdirectories of that directory (you can even have multiple .htaccess files in your directory hierarchy, where one file overrides settings in another).  So, for testing purposes, you might create a subdirectory named "test" (accessible from the URL http://myserver.com/test/) and place an .htaccess file in that directory to play with.

In your .htaccess file, add something like this:

    Action my-handler /cgi-bin/myfilter.pl
    AddHandler my-handler .html

This will cause all URLs with extension .html within the "/test" directory or subdirectories to be internally sent to the CGI residing at http://myserver.com/cgi-bin/myfilter.pl .  Since multiple URLs might all be sent to the same CGI, you'll likely need some way for your CGI to determine which URL invoked it.  You can obtain this info from two environment variables:

  $ENV{PATH_TRANSLATED}
  $ENV{REQUEST_URI}

The former is the absolute file system path.  So, if you requested "http://myserver.com/test/ok.html" and your root web directory is "/var/htdocs", then this variables will be "/var/htdocs/test/ok.html".

The later is the path given in the URL (not including the domain name).  In the above example, this would be "/test/ok.html".

Now, what if you want only -some- HTML files within the "test" directory to be sent to your CGI?  The selection can be specified in your .htaccess file with a "Files" tag.  So, if your .htaccess instead included

  Action my-handler /cgi-bin/myfilter.pl
  <Files "ok.html">
      SetHandler my-handler
  </Files>

then only "http://myserver.com/test/ok.html" will be sent to the CGI, while "http://myserver.com/test/hello.html" will be served statically as normal (Apache actually uses "default-handler" as the name for the handler that serves static pages, so "SetHandler default-handler" would be a way to specify this explicitly on files).

Concerning your second message (11:35PM), what you do is pretend that http://www.my-online-store.com/books is a file (rather than a directory) and use an Action/SetHandler to send all requests on that file to your CGI.  The "A" part I believe can be retrieved via the environment variables as mentioned above.  If in doubt, just use a

  use Data::Dumper;

  print "Content-type: text/html\n\n";
  print Dumper(\%ENV);

to display what environment variables your CGI is seeing.
0
 

Author Comment

by:tilmes
ID: 9795559
thank you for the explanation.
I inserted below two in .htaccess file
but in the query string has not changed at all.
Do i need to change also in httpd.conf file in parent directory?

Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteRule cla/(\.*)/(\.*)/(\.*)/(\.*)/(\.*)/(\.*)/(\.*)/(\.*)/$ /tell/cgi-bin/ADcla/cla\.cgi?$1=$2&$3=$4&$5=$6&$7=$8


Action my-handler /tell/cgi-bin/ADcla/cla.cgi
AddHandler my-handler .html
0
 
LVL 20

Expert Comment

by:jmcg
ID: 10093432
Nothing has happened on this question in more than 7 weeks. It's time for cleanup!

My recommendation, which I will post in the Cleanup topic area, is to
accept answer by ext2 [grade B] (on the road to an answer).

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

jmcg
EE Cleanup Volunteer
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

630 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question