Link to home
Start Free TrialLog in
Avatar of mlafitte
mlafitteFlag for United States of America

asked on

Apache rejecting connection without Host in HTTP Header

I have designed a website to receive a file from a "bot".  The site works fine when I test it using Postman.  The bot does not and can't send Host or User-Agent information in the HTTP Header record.  My site apparently can't handle that and sends back a 400 message and does not process the message/file.  

If there is an obvious answer to this, I have not foound it.  I would be glad to have someone step in and finish this project.

Thanks for any help.
Avatar of David Favor
David Favor
Flag of United States of America image

1) Could be your Bot is sending using GET or PUT method, then your Apache server has the specific (GET/POST) method disabled.

2) Could be a protocol mismatch, so Bot can only talk HTTP + your Apache server promotes all HTTP traffic -> HTTPS.

3) Starting point, refer to your Apache logs + you'll likely see the exact problem.
Avatar of Dr. Klahn
Dr. Klahn

On a system with named vhosts and no "default" server then this behavior would be expected.

If the bot is not capable of providing the HOST field, then it can only talk to an Apache server on the "default", i.e., where requests that do not match any named vhost end up.  "Default" servers must be configured; an example vhost include for a default server is below.  This one is used to mousetrap bots and scrapers probing the IP address without an FQDN hostname.

Note that this still probably doesn't solve your problem because you can have one or the other.  You can set a vhost up to have a FQDN response but it requires a HOST field and so won't default, or set up a vhost to accept defaults but only when there is no HOST field.  In other words a default host responds to requests sent to http(s)://x.y.z.c, where x.y.z.c is the IP address of the server.

# ====================== VIRTUAL HOST  ======================
#
#                      localhost (default)
#
# ====================== VIRTUAL HOST  ======================

#
# Apache server named virtual host configuration file
# Responds to requests without FQDN HOST fields
#
# File names:
# If beginning with "/", use that explicit path.
# If *not* beginning with "/", prepend ServerRoot.
#

#
# VirtualHost begin: Define a new virtual host
#
<VirtualHost *:80>

#
# ServerAdmin:  We do not provide this
#
ServerAdmin root@127.0.0.1

#
# ServerName: The primary name for this virtual host
# ServerAlias: Other acceptable names for this virtual host
# UseCanonicalName:  Use ServerName to build URLs referring to itself
#
ServerName default:80
UseCanonicalName on

#
# DocumentRoot: This vhost's base directory.
#
DocumentRoot "/www/default"

#
# Vhost's base directory: Inherit no access from httpd.conf
#


# ========================================================
#
#              GLOBAL/DEFAULT LOGGING CONTROL
#
# ========================================================


# LogLevel: Controls messages logged to error_log.
#           debug, info, notice, warn, error, crit, alert, emerg.
#
LogLevel notice

# RewriteLogLevel: No longer exists, now an option for LogLevel

#
# Referrer logging control
# Prevent logging of uninformative requests
#

#
# Access logging control
# Prevent logging of uninformative requests
#

#
# Define format nicknames for CustomLog directive
#
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %b" common
LogFormat "%h  %{%b %d %H:%M:%S}t  %s  %b bytes\\n  \"%r\"  %{User-agent}i" locallog
LogFormat "%{Referer}i -> %U" referer
LogFormat "%t %{Referer}i -> %U" localreferer

#
# ErrorLog: Location of the error log file
#
ErrorLog /var/log/apache/error_default

#
# Location and format of the access logfile
#
# CustomLog /var/log/apache/access_default locallog env=!nonlog-request

#
# Location and format of the default agent and referrer logfile
#
# CustomLog log/referer_default localreferer env=!nonlog-refer
# CustomLog /var/log/apache/referer_default localreferer env=!nonlog-refer


# ========================================================
#
#                    REWRITE RULES
#
# ========================================================

# There are no rewrite rules.  All accesses intentionally fail.


# ========================================================
#
#                    SSL CONFIGURATION
#
# ========================================================

# This is a non-SSL server.  No SSL directives here.

#
# VirtualHost end: End of definitions for this virtual host
#
</VirtualHost>

Open in new window

Avatar of mlafitte

ASKER

David -

With our servers at Network Solutions, finding the proper logs seems to be an issue.  
I also loaded the page on Godaddy servers.  The only log I find shows the following:

Log from a successful post.  The Host name was in the header record on this attempt:

 - - [22/Jun/2021:23:15:56 -0700] "POST /index.php HTTP/1.1" 200 72 "-" "PostmanRuntime/7.28.0" 525 **0/525212**


Log from a failed post.  The Host name was excluded from the header record on this attempt::

50.209.116.21 - - [22/Jun/2021:23:17:22 -0700] "POST /index.php HTTP/1.1" 200 72 "-" "-" 23 **0/23596**
To debug this requires having full access to your entire LAMP Stack setup, via root shell.

You'll enable I/O Payload logging.

Big Tip: If you think you'll ever be able to write an HTTP/1.1 Bot that will work... this is a logic error...

The easy way to block Bots, which most people do is this...

1) Any HTTP protocol < 2.0 is a Bot as all Browsers have been running HTTP/2 for years, so block 100% of all HTTP/1.1 traffic... because... well... they're Bots...

2) Implementing #1 takes around 60 seconds to setup + test in Fail2Ban, so is likely the most common Fail2Ban block recipe in existence, next to ssh brute for login attacks.

3) To implement a useful Bot, code must be HTTP/2.0 + provide a set UA, which clearly describes the Bot, so people can block/allow Bots based on the "Real UA" of the Bot service you'll be offering.

4) In Fail2Ban recipes to block Bots, be sure to add exceptions for IP Ranges (not forged UAs) whitelisting/allowing Good Bots, because Good Bots sometimes incorrectly rotate between HTTP/1.1 + HTTP/2 so with no exceptions you'll likely break all manner of Bot access you'd like to allow.
Random Aside.

[ ... soap box on ... ]

Key problem, "With our servers at Network Solutions".

If you're working with complex code, like Bots, best to run your own dedicated servers, as you will have to setup your own logging configs.

All hosting companies I've ever reviewed setup all daemons with default configs, including default logging configs, so no I/O Payload configs are ever enabled.

If you're writing/debugging Bots, you'll almost certainly require I/O Payload logging.

So first step will likely be moving to hosting where you can configure own logging.

Or... you can always ask Network Solutions for assistance... er... I here a lot of laughter from those old dogs (been doing tech for decades) who have experience asking Network Solutions for anything... which is about as useful as asking Google or GoDaddy for anything.

[ ... soap box off ... ]
"Network Solutions" is one of the Web.com companies.  No, that is not a good thing.
Also, if the target is on shared hosting, then the "Host name" is a requirement for the server to deliver it to the correct site.  It is not unusual for shared hosting servers to have 100 sites on them.  How would they know which one you're trying to connect to without the Host name?
<opinion>
Sorry to hear you're hosting with Network Solutions.  But in the "until I met a man who had no shoes" vein ... per David supra, it could be godaddy.
</opinion>

On a multi-vhosted server there's no way this is going to work with a bot incapable of defining the target host field ... no matter what you do.
ASKER CERTIFIED SOLUTION
Avatar of David Favor
David Favor
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
So one of the processes I do have control over with the Netsol server is my htaccess file.  I keep reading about the ability to redirect ALL traffic using the mod_proxy module in Apache.  

If I did that, would the proxy server add it's own RequestHeader information into the forwarded request that might solve the problem?  Granted, I would have to set up another server with my php code, but would this work?