Solved

make google friendly

Posted on 2003-11-19
8
281 Views
Last Modified: 2010-05-19
Hi

The cgi scripts/dynamic sites are not visited from google.
What is the best way to make scan from google search robot?
0
Comment
Question by:tilmes
  • 3
  • 2
  • 2
8 Comments
 
LVL 20

Expert Comment

by:jmcg
ID: 9784156
You may be able to entice Google into scanning your site by placing an appropriate entry in your robots.txt file.

Take a look at Expert-Exchange's robots.txt file, for example:

User-agent: *
Disallow:

There are also "Allow:" directives and you could list your .cgi script there.

The main hesitation behind spidering dynamic sites is that they may never have the same thing twice. That leaves doubt about whether a search engine should keep track of what the page once said, but which it won't say again if you visit it.



0
 
LVL 2

Expert Comment

by:ext2
ID: 9784562
In Apache, simply use Action/AddHandler in your .htaccess file to translate external *.html URLs seen by users and search engines to internal *.cgi paths on the server:

  http://httpd.apache.org/docs/handler.html

Your users might never even know that your page are implemented internally with CGIs.

I've often wondered why ASP/JSP/Perl/etc. traditionally add their own extensions to URLs.  This needlessly exposes implementation, and it makes it difficult to change the implementation (Parnas would not be pleased).  <troll>It's poor design, and apparently MS is slow to understand this.</troll>
0
 

Author Comment

by:tilmes
ID: 9785097
Hi

How can i use
simply use Action/AddHandler in your .htaccess?
Please make me understand this.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:tilmes
ID: 9785226
Hi

I read about the article said that change query part of the dynamic URL
Example - http://www.my-online-store.com/books.asp?id=1190
to http://www.my-online-store.com/books/A 
HOw can i do this in my cgi script?
0
 
LVL 2

Accepted Solution

by:
ext2 earned 50 total points
ID: 9792205
Create a file named ".htaccess" in either your root web directory or some subdirectory of it.  Any setting you place in the .htaccess file will be applied to the directory in which it is contained as well as all subdirectories of that directory (you can even have multiple .htaccess files in your directory hierarchy, where one file overrides settings in another).  So, for testing purposes, you might create a subdirectory named "test" (accessible from the URL http://myserver.com/test/) and place an .htaccess file in that directory to play with.

In your .htaccess file, add something like this:

    Action my-handler /cgi-bin/myfilter.pl
    AddHandler my-handler .html

This will cause all URLs with extension .html within the "/test" directory or subdirectories to be internally sent to the CGI residing at http://myserver.com/cgi-bin/myfilter.pl .  Since multiple URLs might all be sent to the same CGI, you'll likely need some way for your CGI to determine which URL invoked it.  You can obtain this info from two environment variables:

  $ENV{PATH_TRANSLATED}
  $ENV{REQUEST_URI}

The former is the absolute file system path.  So, if you requested "http://myserver.com/test/ok.html" and your root web directory is "/var/htdocs", then this variables will be "/var/htdocs/test/ok.html".

The later is the path given in the URL (not including the domain name).  In the above example, this would be "/test/ok.html".

Now, what if you want only -some- HTML files within the "test" directory to be sent to your CGI?  The selection can be specified in your .htaccess file with a "Files" tag.  So, if your .htaccess instead included

  Action my-handler /cgi-bin/myfilter.pl
  <Files "ok.html">
      SetHandler my-handler
  </Files>

then only "http://myserver.com/test/ok.html" will be sent to the CGI, while "http://myserver.com/test/hello.html" will be served statically as normal (Apache actually uses "default-handler" as the name for the handler that serves static pages, so "SetHandler default-handler" would be a way to specify this explicitly on files).

Concerning your second message (11:35PM), what you do is pretend that http://www.my-online-store.com/books is a file (rather than a directory) and use an Action/SetHandler to send all requests on that file to your CGI.  The "A" part I believe can be retrieved via the environment variables as mentioned above.  If in doubt, just use a

  use Data::Dumper;

  print "Content-type: text/html\n\n";
  print Dumper(\%ENV);

to display what environment variables your CGI is seeing.
0
 

Author Comment

by:tilmes
ID: 9795559
thank you for the explanation.
I inserted below two in .htaccess file
but in the query string has not changed at all.
Do i need to change also in httpd.conf file in parent directory?

Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteRule cla/(\.*)/(\.*)/(\.*)/(\.*)/(\.*)/(\.*)/(\.*)/(\.*)/$ /tell/cgi-bin/ADcla/cla\.cgi?$1=$2&$3=$4&$5=$6&$7=$8


Action my-handler /tell/cgi-bin/ADcla/cla.cgi
AddHandler my-handler .html
0
 
LVL 20

Expert Comment

by:jmcg
ID: 10093432
Nothing has happened on this question in more than 7 weeks. It's time for cleanup!

My recommendation, which I will post in the Cleanup topic area, is to
accept answer by ext2 [grade B] (on the road to an answer).

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

jmcg
EE Cleanup Volunteer
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Perl count the hash for print 4 161
Strange perl issue 6 126
remove duplicates from the csv file 13 102
rename outfile before writing 2 71
On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Both in life and business – not all partnerships are created equal. As the demand for cloud services increases, so do the number of self-proclaimed cloud partners. Asking the right questions up front in the partnership, will enable both parties …

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now