File not found (weird string in URL request) from Googlebot

Just went live with our new website...

We're getting a bunch of "File not found" entries in our server logs from IP: 66.249.71.133 (crawl-66-249-71-133.googlebot.com).

EXAMPLE:

http://www.oursite.com/(f(jjjltgyzhg09ovqh8cyo2l_ztvta_he0oemrrzjk3ny21s4gz1czijojfpcsp6jamgupdk2vcikodwsza8fwputzia4prcpez7hrx5sya8xiwth_tfolwsx3435x45pnbtiov1xmerepxrukoket9ndiwrt09hzoogpyrzpt9phyomwkngj4fj8dd5wfg9tiir9l6notm-gczyfi2m0by2ndhnq1))/nikon_m59.aspx

Initially, I thought maybe IIS 7 was sticking a Session ID into the URL... but the web.config file is configured to NOT put the Session ID in the URL.

<sessionState cookieless="false" mode="InProc" timeout="30" />

Open in new window


Coupled with this... I am unable to replicate this behavior in ANY browser (regardless of whether cookies are turned on or off).

We have Google Analytics installed.

Does Googlebot (in conjunction with Google Analytics or otherwise) inject something into the URL for tracking?

Why is Googlebot making requests to our server with a massive string injected into the URL?

Thanks in advance!
Mike
LVL 3
rocketTendonAsked:
Who is Participating?
 
grahamnonweilerCommented:
Googlebot is using a link that it found elsewhere (on someone else's website) to crawl your site.

That link has been corrupted, either intentionally (as the signature is similar to a common XSS attack)  or unintentionally on the "other" site.

If your server is returning a 404 or 302 error (which it should be) then no need to worry as the Googlebot will remove it from its spider cache in due course.
0
 
GaryCommented:
Definitely looks like a session id
Check some pages here, using googlebot
http://redleg-redleg.com/file-viewer/

and see what you get back from your server
0
 
Derek JensenCommented:
Looks like average, everyday spam, to me...

My dad used to own an ISP, and every day his server would be attacked by IPs all over the globe; he witnessed a variety of different methods, but the most common was SQL/query string injection, and it often looked very similar to that.
0
Cloud Class® Course: SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

 
rocketTendonAuthor Commented:
GaryC123 ...

Thanks for the the link... but after using Redleg's tool... I still don't see anything even remotely close to what I'm seeing in the log files. There is no session id being appended or injected into the link HREFs (in the Redleg report) ... the link HREFs look as they should.

bigdogdman...

In my opinion, the strings being injected are way too uniform to be SQL injection... coupled with the fact that they're being appended immediately after the domain name and not after a URL variable.


Note:

The only consistent pattern in the "long string" is that they ALL the log entries in question start with:  (f(  .... then followed by what seems to be a session idesque string.
0
 
rocketTendonAuthor Commented:
I would post our URL ... but I would like to be able to go back and remove the URL after the issue has been resolved... but it does not appear that I would be able to do that.
0
 
Derek JensenCommented:
Indeed, graham. :-)

I never claimed it *was* SQL injection, as the string contains no SQL to speak of. I only said it *appeared similar*. I'm not up to speed on all the lingo of webserver attacks, but it sounds like Graham is. :-)

...again, not that it *is* an attack; either way, nothing to sweat about. :-)
0
 
rocketTendonAuthor Commented:
That makes sense grahamnonweiler ... although the log entries in question started appearing within 10 minutes of going live with the site... so a link found elsewhere on someone else's website seem unlikely (since the brand new script-names are referenced in the request). That being said, it wouldn't surprise me that there was some bot out there that had a very "quick to market" impact.

Will Googlebot remove it from it's spider cache if the response code being returned is 500?
0
 
grahamnonweilerCommented:
The 500 error (Internal Server Error) will not remove it from the spider cache - it needs to be either a 404 (best)  or a 302 (permanently gone.

Do you have any automated feeds for your site - such as Twitter/FB/Amazon - as these could also cause a similar situation - however - the signature (what you are referring to as looking like a session_id) would be very different.
0
 
rocketTendonAuthor Commented:
Using a couple of entries from the log-file (along with Redleg's File Viewer)... I've validated that our server is returning a 404 response code.

Concern aborted.

Thanks guys!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.