File not found (weird string in URL request) from Googlebot

Posted on 2012-09-06
Medium Priority
Last Modified: 2013-11-19
Just went live with our new website...

We're getting a bunch of "File not found" entries in our server logs from IP: (crawl-66-249-71-133.googlebot.com).



Initially, I thought maybe IIS 7 was sticking a Session ID into the URL... but the web.config file is configured to NOT put the Session ID in the URL.

<sessionState cookieless="false" mode="InProc" timeout="30" />

Open in new window

Coupled with this... I am unable to replicate this behavior in ANY browser (regardless of whether cookies are turned on or off).

We have Google Analytics installed.

Does Googlebot (in conjunction with Google Analytics or otherwise) inject something into the URL for tracking?

Why is Googlebot making requests to our server with a massive string injected into the URL?

Thanks in advance!
Question by:rocketTendon
  • 4
  • 2
  • 2
  • +1
LVL 58

Assisted Solution

Gary earned 600 total points
ID: 38373162
Definitely looks like a session id
Check some pages here, using googlebot

and see what you get back from your server

Expert Comment

by:Derek Jensen
ID: 38373277
Looks like average, everyday spam, to me...

My dad used to own an ISP, and every day his server would be attacked by IPs all over the globe; he witnessed a variety of different methods, but the most common was SQL/query string injection, and it often looked very similar to that.

Author Comment

ID: 38373603
GaryC123 ...

Thanks for the the link... but after using Redleg's tool... I still don't see anything even remotely close to what I'm seeing in the log files. There is no session id being appended or injected into the link HREFs (in the Redleg report) ... the link HREFs look as they should.


In my opinion, the strings being injected are way too uniform to be SQL injection... coupled with the fact that they're being appended immediately after the domain name and not after a URL variable.


The only consistent pattern in the "long string" is that they ALL the log entries in question start with:  (f(  .... then followed by what seems to be a session idesque string.
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Author Comment

ID: 38373611
I would post our URL ... but I would like to be able to go back and remove the URL after the issue has been resolved... but it does not appear that I would be able to do that.
LVL 16

Accepted Solution

grahamnonweiler earned 1400 total points
ID: 38373667
Googlebot is using a link that it found elsewhere (on someone else's website) to crawl your site.

That link has been corrupted, either intentionally (as the signature is similar to a common XSS attack)  or unintentionally on the "other" site.

If your server is returning a 404 or 302 error (which it should be) then no need to worry as the Googlebot will remove it from its spider cache in due course.

Expert Comment

by:Derek Jensen
ID: 38373717
Indeed, graham. :-)

I never claimed it *was* SQL injection, as the string contains no SQL to speak of. I only said it *appeared similar*. I'm not up to speed on all the lingo of webserver attacks, but it sounds like Graham is. :-)

...again, not that it *is* an attack; either way, nothing to sweat about. :-)

Author Comment

ID: 38373736
That makes sense grahamnonweiler ... although the log entries in question started appearing within 10 minutes of going live with the site... so a link found elsewhere on someone else's website seem unlikely (since the brand new script-names are referenced in the request). That being said, it wouldn't surprise me that there was some bot out there that had a very "quick to market" impact.

Will Googlebot remove it from it's spider cache if the response code being returned is 500?
LVL 16

Expert Comment

ID: 38373776
The 500 error (Internal Server Error) will not remove it from the spider cache - it needs to be either a 404 (best)  or a 302 (permanently gone.

Do you have any automated feeds for your site - such as Twitter/FB/Amazon - as these could also cause a similar situation - however - the signature (what you are referring to as looking like a session_id) would be very different.

Author Comment

ID: 38373786
Using a couple of entries from the log-file (along with Redleg's File Viewer)... I've validated that our server is returning a 404 response code.

Concern aborted.

Thanks guys!

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Although a lot of people devote their energy toward marketing for specific industries, there are some basic principles that can be applied to any sector imaginable. We’ll look at four steps to take and examine how those steps were put into action fo…
CTAs encourage people to do something specific to show interest in your company, product or service. Keep reading to learn why CTAs should always be thought of as extremely important, albeit small, sections of websites.
This tutorial walks through the best practices in adding a local business to Google Maps including how to properly search for duplicates, marker placement, and inputing business details. Login to your Google Account, then search for "Google Mapmaker…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
Suggested Courses
Course of the Month13 days, 21 hours left to enroll

807 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question