Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 192
  • Last Modified:

Need SID Killer code or workaround

Does anyone know of any code that can be used to get the search engines to ignore the SID that is created at the start of browsing.  The site is writing in HTML with a couple "shop" files written in Perl.  The shopping cart we are using is the Hassan cart, but we have certainly modified it.  

We have a site that has lots of content in html, but once google gets in there, a SID is attached to everything and it sure isn't helping our rankings.  We are trying to create a work around using a SID killer code, or if that doesn't work, routing the spider through a static sitemap link where the spider can browse the website (and we'll tell users not to, because it will dump their SID and their cart).  A

Anyone with any code or workaround suggestions.  If you need to see some code, let me know.

0
battlepigeons
Asked:
battlepigeons
3 Solutions
 
RoonaanCommented:
Personally I think that a webcrawler should not be able to spider your web basket, and should be limited to informational pages, and products pages only. Possibly you could the robots.txt to limit the search reach of the search engine spiders.

-r-
0
 
battlepigeonsAuthor Commented:
Problem is, all product pages must be accessed through the Cgi-bin for example.  When you enter the website, the first button you click has to go through the cgi-bin to create a session.  That way, you can put one thing in the cart and then browse around to anyother page on the website (other products, about us, home) and the session isn't dropped.

So you start off at:
http://www.mystuff.com/cgi-bin/shop.pl/index.html

then you push any button on the website and it has the token that is through a perl file in the cgi-bin:

MY_URL/page=socks.html

Out comes:

http://www.mystuff.com/cgi-bin/shop.pl/SID=1110160036.3642/page=socks.html

I've robots.txt'd it so that it can't enter the cgi-bin, but that still won't get my content spidered.  It will just stop the cacheing the SIDs.  So unless I want to create a sitemap for ONLY the spider, that will link to pages and content that will be pure html.....I need to find a way to deal with the spider/SID.
0
 
ShelfieldCollegeCommented:
Would it be possible to tell your CGI script that is initializing the sessions to not do so if the browser agent is googlebot etc?  I'm not a cgi/perl person at all, but this is how a similar thing can be done using PHP, which is what I use. Just a thought.

Cheers

-Matt-
0
 
battlepigeonsAuthor Commented:
Good idea...anyone know how to do that in perl?
0
 
ZylochCommented:
I'm no expert in Perl, but after searching around a bit, (as is often the case with Perl), someone has done the work already :)

You might think about using HTTP::BrowserDetect
http://search.cpan.org/~lhs/HTTP-BrowserDetect-0.98/BrowserDetect.pm#DETECTING_ROBOTS
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now