Link to home
Start Free TrialLog in
Avatar of rocketTendon
rocketTendon

asked on

File not found (weird string in URL request) from Googlebot

Just went live with our new website...

We're getting a bunch of "File not found" entries in our server logs from IP: 66.249.71.133 (crawl-66-249-71-133.googlebot.com).

EXAMPLE:

http://www.oursite.com/(f(jjjltgyzhg09ovqh8cyo2l_ztvta_he0oemrrzjk3ny21s4gz1czijojfpcsp6jamgupdk2vcikodwsza8fwputzia4prcpez7hrx5sya8xiwth_tfolwsx3435x45pnbtiov1xmerepxrukoket9ndiwrt09hzoogpyrzpt9phyomwkngj4fj8dd5wfg9tiir9l6notm-gczyfi2m0by2ndhnq1))/nikon_m59.aspx

Initially, I thought maybe IIS 7 was sticking a Session ID into the URL... but the web.config file is configured to NOT put the Session ID in the URL.

<sessionState cookieless="false" mode="InProc" timeout="30" />

Open in new window


Coupled with this... I am unable to replicate this behavior in ANY browser (regardless of whether cookies are turned on or off).

We have Google Analytics installed.

Does Googlebot (in conjunction with Google Analytics or otherwise) inject something into the URL for tracking?

Why is Googlebot making requests to our server with a massive string injected into the URL?

Thanks in advance!
Mike
SOLUTION
Avatar of Gary
Gary
Flag of Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Looks like average, everyday spam, to me...

My dad used to own an ISP, and every day his server would be attacked by IPs all over the globe; he witnessed a variety of different methods, but the most common was SQL/query string injection, and it often looked very similar to that.
Avatar of rocketTendon
rocketTendon

ASKER

GaryC123 ...

Thanks for the the link... but after using Redleg's tool... I still don't see anything even remotely close to what I'm seeing in the log files. There is no session id being appended or injected into the link HREFs (in the Redleg report) ... the link HREFs look as they should.

bigdogdman...

In my opinion, the strings being injected are way too uniform to be SQL injection... coupled with the fact that they're being appended immediately after the domain name and not after a URL variable.


Note:

The only consistent pattern in the "long string" is that they ALL the log entries in question start with:  (f(  .... then followed by what seems to be a session idesque string.
I would post our URL ... but I would like to be able to go back and remove the URL after the issue has been resolved... but it does not appear that I would be able to do that.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Indeed, graham. :-)

I never claimed it *was* SQL injection, as the string contains no SQL to speak of. I only said it *appeared similar*. I'm not up to speed on all the lingo of webserver attacks, but it sounds like Graham is. :-)

...again, not that it *is* an attack; either way, nothing to sweat about. :-)
That makes sense grahamnonweiler ... although the log entries in question started appearing within 10 minutes of going live with the site... so a link found elsewhere on someone else's website seem unlikely (since the brand new script-names are referenced in the request). That being said, it wouldn't surprise me that there was some bot out there that had a very "quick to market" impact.

Will Googlebot remove it from it's spider cache if the response code being returned is 500?
The 500 error (Internal Server Error) will not remove it from the spider cache - it needs to be either a 404 (best)  or a 302 (permanently gone.

Do you have any automated feeds for your site - such as Twitter/FB/Amazon - as these could also cause a similar situation - however - the signature (what you are referring to as looking like a session_id) would be very different.
Using a couple of entries from the log-file (along with Redleg's File Viewer)... I've validated that our server is returning a 404 response code.

Concern aborted.

Thanks guys!