Bot Uses Our Web Form to Send us Spam

Julie Kurpa
Julie Kurpa used Ask the Experts™
on
Dear Experts,

We have several forms on our website that send emails to various users in our organization.  I didn't create these but am only the sys admin for the server.

One user is being pestered by spam that seems to be generated by one particular form.  
The form has a few radio buttons, Name, address, phone number and comment fields.

The set up of our forms is this:
myForm.html  gets filled out by users.
When the submit is clicked, it executes a myForm.cgi.
This myForm.cgi calls a compiled c++ program myForm_comp.cgi which parses the information from the form and sends it using a mail package to the user.  The user's email address is hardcoded in the compiled coded.  

This set up is the same for all our forms however this user is the only one who gets the spam.  We have several forms (separate cgi compiles) that get sent to different users depending on topic. All simple forms with similar fields.

In reviewing the Apache logs, we see, for example, what appears to be a bot (same IP, hundreds of lines in a 30 second window) hitting all our webpages and especially hitting two of our forms.  One being myForm.cgi the other being otherForm.cgi.

The apache log shows 11 GETS of the myForm.html and then hundreds of GET/POST for myForm.cgi.
The user reported she got 10 spam emails.  

The same pattern shows for the otherForm.html but no spam emails arrived.

We think it must be a bot that reads the html, fills it in (it's always the same information) and sends it off.  The "Return Path" in the message source is our webserver as is also the "Received From" as reported by our mail server. But can this be spoofed right?  

Many Questions.  Let's start with three:

1. If it is spoofed, how did they get the user's email which is only in the compiled program?

2. Why doesn't the otherForm.cgi user get spam? The otherForm.html has mostly radio buttons and only two text fields...email and comments. Easy fill!

3. Why does the user actually gets the spam several hours after the apache logs log myForm.cgi getting hit?  The emails show they are created at the time the apache logged the hits, but it doesn't come through our mail server until several hours later.  The latest batch arrived in her inbox 12 hours later.


Thank you for educating me and hopefully pointing me to some answers.  

BTW...I have followed the technique in this link of adding a hidden field to the html form that, if filled out, the CGI rejects it.  I am waiting for approval by the developer to put it into production.   https://www.lifewire.com/solutions-to-protect-web-forms-from-spam-3467469
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Developer & EE Moderator
Fellow 2018
Most Valuable Expert 2013
Commented:
Julie KurpaSr. Systems Programmer

Author

Commented:
The developer does not want to use Captcha.  :(  He feels it's an inconvenience for the customer.
Using HTTPS certificate can help a lot to prevent this.

Make sure your form have server side validation.

Bot usually use the same pattern for example they may fill a field form with 'google'.
so if they use some sort of pattern you can add extra server side validation not to pass if company field contain 'google'
something bot are not using form itself but the script that send mail.

I have not tried yet recaptcha 3 but with v2 the bot was able to bypassed it even with server side validation as they fill dummy data in required fields.
And there is no much thing you can do if the form if manually filled.

The v3 have an invisible technic that is less intrusive to the user.

Also about the delay is maybe because you have reach your daily limits of sending email or maybe this is related to timezone.
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

David FavorFractional CTO
Distinguished Expert 2018

Commented:
As Scott suggested, your 2x choices are...

1) Use CAPTCHA code.

2) Continue to get Spammed.

Be sure to use ReCAPTCHA V3, rather than V2.
Scott FellDeveloper & EE Moderator
Fellow 2018
Most Valuable Expert 2013

Commented:
The newest recaptha is in the background and nothing to do. The previous version is just clicking on "I am not a robot" and very common.

Sometimes if the developer is well versed on back end code and not front end they may shy away because they are not used to it. I would make sure that is the real reason.    

If it is spoofed, how did they get the user's email which is only in the compiled program?
They only hit the form which posts to your back end code.

Why doesn't the otherForm.cgi user get spam
Is the user based on a drop down? Perhaps the bot only uses what is the first choice.
Scott FellDeveloper & EE Moderator
Fellow 2018
Most Valuable Expert 2013

Commented:
Back when a captcha meant displaying hard to read images, they were a PITA for users.  For small sites, I created a simple captcha where my back end code asked a random question.  "Red means stop,  green means ___", "Five plus one equals ____".  I had about 50 of these I made up and saved to the db. One is randomly picked out and I saved the id of the question in a session.  Then when the form is posted, looked up the session for the random question id, matched up with the answer.  In addition, I saved each attempt to a session variable and after several tries, locked out that IP.   This is not the best route, but for a small site worked great.
Julie KurpaSr. Systems Programmer

Author

Commented:
Thanks everyone.

Can someone help me understand the difference between these two lines in Apache?

* this line appeared 11 times
196.52.2.17 - - [13/Feb/2019:20:58:14 -0500] "POST /cgi-bin/myForm.cgi HTTP/1.1" 200 426 "http://www.myweb.org/myForm.html" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"

* This line appeared hundreds of times but does not say it comes from the myForm.html
196.52.2.17 - - [13/Feb/2019:20:58:32 -0500] "POST /cgi-bin/myform.cgi HTTP/1.1" 200 427 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"
Scott FellDeveloper & EE Moderator
Fellow 2018
Most Valuable Expert 2013

Commented:
Can you surf directly to /cgi-bin/myform.cgi from your domain?   What is the action= on your form, my bet it is just that. So the bot picks that up and post directly to the action page and in your case it is /cgi-bin/myform.cgi . This is common.
Scott FellDeveloper & EE Moderator
Fellow 2018
Most Valuable Expert 2013

Commented:
I am not an apache expert, but you want to make sure the referring page is your myform.htmnl to hit the /cgi-bin/myform.cgi
Julie KurpaSr. Systems Programmer

Author

Commented:
When I type /cgi-bin/myForm.cgi,  it appears to execute the compiled cgi (a quick flash on the screen so fast I can't see what it is) and sends me to the main page which is what it's supposed to do if they submit the form through the html.    If no values are passed to the CGI, the CGI exits.

The html form uses:   action="cgi-bin/myForm.cgi"
Scott FellDeveloper & EE Moderator
Fellow 2018
Most Valuable Expert 2013

Commented:
The html form uses:   action="cgi-bin/myForm.cgi"

That means if I know the form fields, I can post directly to that page from anywhere. One thing you can do is make sure the referrer to that page is your own server.
Scott FellDeveloper & EE Moderator
Fellow 2018
Most Valuable Expert 2013

Commented:
Maybe somebody more familiar with apache can assist with the details for updating your htaccess file  http://www.htaccess-guide.com/deny-visitors-by-referrer/
Julie KurpaSr. Systems Programmer

Author

Commented:
For the referrer, is that something I test for in the compiled cgi?

Where do I go to learn how to do the captcha?

About the htaccess file, I tried to read about it and got thoroughly confused.    there is a reference to it in the httpd.conf but it looks like it's to block any attempt to read it.
Julie KurpaSr. Systems Programmer

Author

Commented:
oop Sorry Scott. I see the captcha link.  Checking it out now.
Julie KurpaSr. Systems Programmer

Author

Commented:
Thanks everyone.   I am reading about the captcha but will not be allowed to implement it without the developer's permission.

I will submit a separate question on the htaccess thing since I think that's kind of a biggie.
Scott FellDeveloper & EE Moderator
Fellow 2018
Most Valuable Expert 2013

Commented:
Yes, I think working with apache and htaccess will be a great step!
David FavorFractional CTO
Distinguished Expert 2018

Commented:
Scott's approach of blocking by referrer will work in some cases + in other cases has unforeseen side effects... especially for USA IPs.

Consider mobile providers + VPN providers, where a single IP may represent 1000s+ visitors.

Blocking by IP will block all traffic from an IP, so imaging blocking an IP which is currently servicing 1000s of phone customers through some cell tower in Dallas, Texas for AT&T.

You get the idea.

Your only 2x real options to block this type of traffic is use CAPTCHA code or attempt using device fingerprinting which is expensive + error prone, which is why CAPTCHA is the usual approach.
nociSoftware Engineer
Distinguished Expert 2018

Commented:
Blocking by referer can be trashed as well, one can specify the referer in  a call using curl f.e.
A better approach would be to include a hidden field on first presentation based on some random data that is also in the session desription. from a previous page.
and then check that info on return through the form Being a random factor you need from the website first means it needs multiple visits.
Then again you have to present the page first so that would only require an extra query. These signatures sould be relatively one off.

Other factors to considder, posting too many forms with such a as previous signature in a "too short" a timeframe. Those could be ignored. [ people need some time to fill out forms, sorter times means prefilled data somehow].  
No Finite stuff, just some extra hoops & Hurdles.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial