Port 80 requests

We have 50+ hosting server. And we mirrored one of the router's port to a cent os server. And want to trace port 80 Traffic to see the requested pages.
Is there any perl script to see this out put  ? or any other software like that to see it in real time ?
FireBallITAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Dan CraciunIT ConsultantCommented:
tcpdump -i eth1 'port 80'

Open in new window

Replace eth1 with the name of the interface where you receive the mirrored traffic.
Oh, and install tcpdump if it's not already installed.

HTH,
Dan
FireBallITAuthor Commented:
That does not shows full page address that just capture requests
maybe tshark solves it but do not how to handle
Dan CraciunIT ConsultantCommented:
tshark does the same packet capturing as tcpdump.

The command I posted should capture all traffic to port 80. What can't you find?
Active Protection takes the fight to cryptojacking

While there were several headline-grabbing ransomware attacks during in 2017, another big threat started appearing at the same time that didn’t get the same coverage – illicit cryptomining.

Dan CraciunIT ConsultantCommented:
I think I understand what you want to do. But with 50+ servers, conservatively each hosting 50 sites, each site having 1 hit per minute, that's 2500+ requests per minute.
It's not pretty to look at a blur of white lines in a terminal.

I think you should consider using syslog for Apache logs and consolidate all logs in one server.
Then use something like GoAccess to see a situation of your servers in real time.
FireBallITAuthor Commented:
goaccess and apache logs are good but not processable for real time.
We plan to build real time scripts to detect get  /  ref attacks so the first step is tracing requests
giltjrCommented:
I doubt that tshark will solve it.  What do you mean by "full page address?"

HTTP GET/POST commands don't have the host name or address in them.  All they have is path and file name.

If it is HTTP 1.1, there will be a http header that has the host name.
FireBallITAuthor Commented:
giltjrCommented:
You will never see that in any HTTP request what you will see is

      /questions/28706010/Port-80-requests.html#a40932176

The URL is made of multiple parts "http://" defines the protocol, "www.experts-exchange.com" is the host name, ":80" is the port and is assumed when the protocol is "http://",  /questions/28706010/Port-80-requests.html is the file location, is an pointer to a specific place with the page.  A URL can also have parameters passed that that would be added to the end of the file location with ?.

No packet capture will show you what you want, because that is not how http works.
FireBallITAuthor Commented:
So how can server determine to host address when it answering the requests ?
giltjrCommented:
I'm not sure I understand your question.

With HTTP there is a client and a server.  The clients sends a request to the server and it knows which IP address to send the request to because it does a DNS lookup based on the host name in the URL.

The server send a response back to the client by using the remote IP address in the IP header of the TCP connection.

If you are using virtual named hosts (Apache term) or host header (IIS term), then as I stated before then in the HTTP header there is a host name option.  

What I would suggest you do is run a packet capture for a few simple HTTP requests and look at the raw data and at the HTTP fields in HTTP request.  You can use Wireshark to do this.
FireBallITAuthor Commented:
 The clients sends a request to the server and it knows which IP address to send the request to because it does a DNS lookup based on the host name in the URL.

Our hosting servers has more then 2000 web sites. A client should visit more then one site on the same server possibly. And that cause a problem depending on your instruction in my opinion.
Dan CraciunIT ConsultantCommented:
No problem.

Client types
http://sitea.com/page1.htm

DNS says sitea.com is hosted at 1.2.3.4, on your server.
The request is sent to 1.2.3.4.

Apache checks the host name and responds according to the virtual-hosts directive, with the file page1.htm from /home/www/sitea/public_html

Client types
http://siteb.org/page1.htm, which is hosted on the same server.

Apache checks the host name and responds according to the virtual-hosts directive, with the file page1.htm from /home/www/siteb/public_html
Dan CraciunIT ConsultantCommented:
The problem in what you want is that you cannot use TCP packets directly, as the request-response mechanism is a TCP stream that is in no way guaranteed to be contiguous or even in order.

My proposed solution will let you look at traffic from at most 30 seconds ago. Not real-time, but close enough.
FireBallITAuthor Commented:
Dear dan thank you for the response. For your solution i need to connect and read from the log files.
We need an automated system and we are looking for a solution direct reading from packets.
giltjrCommented:
Why do you use something that can processes the web server's logs?

There are other products, but you could use Splunk.  You can install Splunk agents on the web server to forward everything the webserver writes to its logs.  If you are using Apache the standard Apache log should include the host name and the full URI.

You can then use Splunk to do analysis, either real time or after the fact.

As I stated before, HTTP 1.1 requests will have a header showing what the user typed in the host portion of the URL.  If they typed a name, it will have the name, if they typed a address it will have the address.

HTTP 1.0 requests will NOT have the hostname header in it.

Again, I would suggest you do a small capture from your computer of you visiting a web site, so you can see what is actually in a typical HTTP exchange.
PaulOffordCommented:
I'm a bit late in on this but I thought I'd add an alternative option.

You mentioned in your first post that you wanted to see the detail real-time.  You can do it with Tshark with a command something like this:

 tshark -i int_ref -ta -f "port 80" -T fields -E separator=, -E quote=d
 -e _ws.col.Time -e frame.number -e ip.src -e ip.dst -e tcp.srcport -e tcp.dstport
 -e http.request.method -e http.request.uri

You'll need to take out the line breaks - I just put them in to make it more readable.

This will output the information in CSV format onto your screen.  Quite honestly it will probably whip by so quickly I doubt you'll be able to use it, so you might want to direct the output to a file by suffixing the above command with > http_requests.csv

A better solution would be to save the captured packets as raw trace files (binary format) and then post-process them in batches with Tshark.  So going all the way back to Dan's post you would use:

 tcpdump -i eth1 'port 80' -w myTraceFile.pcap

You can then open this with Wireshark, or process it with Tshark.  If you want a full breakdown of response times you can use the Wireshark plugin called TRANSUM which you find at http://www.tribelabzero.com/transum .  In fact the TRANSUM User Guide has a pretty detailed description of the process to do exactly what you want to do in the section Batch Processing with Tshark.

Best regards...Paul

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.