Solved

Intermittently HTTP connections fail whilst HTTPS succeeds on select workstations

Posted on 2016-07-20
  • HTTP Protocol
  • SSL / HTTPS
  • Networking
  • Network Architecture
  • Windows Networking
  • +1
40
71 Views
Last Modified: 2016-10-01
In summary:

When certain PC's are trying to access http (port 80) at seemingly random times, they are unable to. With sites taking a long time to load or outright failing to . It only seems to affect outgoing (through the firewall) http traffic and nothing else, eg https (port 443) still works. Internal HTTP is unaffected.
The scope of the issue is over 2 different VLANS/Subnets but does not affect all pc's on those VLANs/subnets.

The only fix is for the user to restart the local PC at this time.  

Checked locally on the machines affected with all sorts of various attempts to fix.
Restarted any and all networking gear.
Ran wireshark packet traces.

Although we have performed WS traces we don't currently have the experience to identify issues, therefore any pointers would be greatly appreciated.
0
Comment
Question by:celnet
  • 16
  • 11
  • 8
  • +1
40 Comments
 
LVL 1

Author Comment

by:celnet
Comment Utility
Here is some additional information on this issue and our setup:
From the internet we have a Cisco ASA firewall which connects into dual redundant Cisco 3960 Distribution Layer switches (DLS). The Workstations all feed through 2 Cisco 2960 Access Layer Switches (ALS) configured in a stack, into the DLS. Server’s connect through top of rack ALS that again connect into the DLS.

From the ALS stack we patch into a patch panel and run to floor ports. We are low on floor ports so these connect to 5 port Netgear Hubs which the workstations connect to. There are several vLans throughout the office. The affected machines are connected as below:

Patch Panel -> Floor Port 1 -> hub 1 -> vLan 1
Workstation 1 - affected
Workstation 2 - affected
Workstation 3 – not affected

Patch Panel -> Floor Port 2 -> hub 2 -> vLan 2
Workstation 4 - affected
Workstation 5 - affected
Workstation 6 – not affected

Patch Panel -> Floor Port 2 -> hub 3 -> vLan 2
Workstation 7 - affected
Workstation 8 – not affected
Workstation 9 – not affected
Workstation 9 – not affected

Workstations are running Windows – all on 8.1 bar one on 7 – connected to a 2012 R2 Domain.

The Workstations affected are experiencing the following symptoms intermittently:
External to network: HTTP browsing fails. HTTPS works fine. Once a workstation is rebooted HTTP browsing works fine.
Internal to network: HTTP browsing is sometimes fine, but is usually affected. HTTPS works fine.

Attached is a wireshark trace, in case anyone is able to analyse better than we are able to. if any more info  is required to assist us here, just let us know and we will happily provide.

Thanks all, any and all help is much appreciated!
0
 
LVL 1

Author Comment

by:celnet
Comment Utility
Oh forgot to add, we have tried different ports for the workstations, we have also performed the following - some just to try some from Overlord Google:
ipv6 has been disabled
netsock reset
tcp/ip reset
Kaspersky configuration checked - doesn't appear to be interfering
All network switches have been restarted.

None of these actions had any impact. The only fix is a restart.
0
 
LVL 20

Expert Comment

by:carlmd
Comment Utility
Maybe a routing issue? I say this since it happens after a while and is cleared by rebooting. The next time it happens try clearing the routing table on that pc instead of a reboot, and see if that makes it work again. Just in case, it might be a good idea to print/save a copy of the routing table before clearing. Then if it works, look for what the difference is between a good and bad one.
0
 
LVL 1

Author Comment

by:celnet
Comment Utility
Worth a try. Would that not though route all traffic rather than just http? We shall see... I'll reply as soon as I have an update.
0
 
LVL 20

Expert Comment

by:carlmd
Comment Utility
One other thought. I would run a traceroute to a site (google.com for example) when it is working, and then again to the same site when it is not. Then compare and see where it is hanging.
0
 
LVL 1

Author Comment

by:celnet
Comment Utility
Cool thanks will do.

The guys have run tracert to www.bbc.co.uk and www.google.com as well as route print. I'll get them to perform the same actions and note any differences.
0
 
LVL 38

Expert Comment

by:Aaron Tomosky
Comment Utility
Are you actually using netgear hubs or are they switches?  What model number(s)?
0
 
LVL 57

Expert Comment

by:giltjr
Comment Utility
First, I find it hard to believe that routing has anything to do with this.  Routing decisions have no clue about http vs. https traffic.  Routing decisions are made based on an IP address, not a TCP port number, which at a network layer is the difference between http and https.

To make sure I understand the problem.  If you do http://www.google.com on workstation #1 it fails, if you do https://www.google.com if works?  If so then that implies a firewall/filtering device somewhere in the path.

You said that sometimes re-booting fixes the problem.  Does the workstation get the same IP address?  If not then the problem maybe a filtering device within your network.  I am assuming that your firewall does a many-to-one NAT for all internal devices so "Google" would see the same IP address from you site for all workstations, so they could not be filtering you.

In order to track down the problem you may need to do wireshark traces multiple places.   Typically I would suggest trying to get a trace from the workstation and from in front of the firewall.  That way you can see the request leave the workstation and make sure it leaves the firewall.
0
 
LVL 38

Expert Comment

by:Aaron Tomosky
Comment Utility
Do you have dual wan connections?
0
 
LVL 1

Author Comment

by:celnet
Comment Utility
@Aaron, They are switches not Hubs. Apologies I copy and pasted a draft of this question from a Sys Admin who didn't realise the difference.

Also, we do not have dual wan.

@giltjr, I am inclined to agree that it can't be routing, but at this stage I am open to any and all testing to rule things out.

To answer your other points, yes you understand correctly. When the issue occurs, http does not load, https does. Rebooting always fixes the issue. The machines get IP's from DHCP scopes, but they will invariably get the same IP address, due to the lease timings. We have multiple traces from all affected machines. I can supply these if needed.

However, I have just seen a post on another forum that may well contain the answer to this issue. I am investigating and if it does indeed solve I shall post the solution here.
0
 
LVL 57

Expert Comment

by:giltjr
Comment Utility
If you can post one of the Wireshark captures I be more than happy to look at it.
0
 
LVL 1

Author Comment

by:celnet
Comment Utility
Apologies, thought I had but it looks like it didn't work.

Ah I see whats happened. link to file here: http://tinyurl.com/hb3erpz
0
 
LVL 20

Expert Comment

by:carlmd
Comment Utility
Did you ever try either ID: 41724439 or ID: 41724448 above?
0
 
LVL 57

Expert Comment

by:giltjr
Comment Utility
Which specific host/ip address does the trace show you attempting to get to that does not work.
0
 
LVL 1

Author Comment

by:celnet
Comment Utility
@carlmd, yes have now tried both these - routing is fine.

@giltjr, the following connections were attempted:

http://wkstn42/ - Internal testing site on IP 10.0.10.2, took 3 minutes and partially loaded.

http://arstechnica.co.uk/ - external, took 3 minutes and partially loaded.

http://arstechnica.co.uk/?from-us - external, took 3 minutes and partially loaded.

https://security.googleblog.com/ - loaded straight away
0
 
LVL 57

Expert Comment

by:giltjr
Comment Utility
It looks like there is no problem from a network point of view.  The problem appears to be with the browser or how ever you are doing your timing.  

Example.  Open the wireshark trace and in the filter box enter "tcp.stream eq 43" without the quotes.   This is the stream that for http://arstechnica.co.uk/?from-us.  If you look at the 1st four packets you will see the 3 way TCP hand shake (SYN, SYN-ACK, ACK) which took a total of less than 1/10th of a second.  Then 7 seconds later you see the HTTP GET for "/?from-us".  That 7 second delay is on the computer that you were running the trace on.  Then about 0.3 seconds later you see the data start coming in, the 1st packet that is 1441 bytes in size.  The server is done sending data when you see the HTTP/1.1 200.  Over all from the GET to the O.K. it took 0.23 seconds.  

So I have no idea where you are seeing 3 minutes.  I will look more at the trace for the other sites.

The 60 second delay you see after the HTTP1.1/200 is because your browser requested to keep the http connection open and it just timed out after 60 or so seconds.  This is norma.
0
 
LVL 57

Expert Comment

by:giltjr
Comment Utility
Looked at the internal host you used for testing.  

If you enter "ip.addr == 10.0.10.2 and tcp.port == 80 and (http.request or http.response)" without the quotes you will see all of the http requests and responses.  Again, response time looks good.

From the 1st packet for the 1st get to the last packet it less than 1 second.

So from the trace whatever the problem is it appears to be performance issue processing the web pages on the client.
0
 
LVL 38

Assisted Solution

by:Aaron Tomosky
Aaron Tomosky earned 167 total points
Comment Utility
Look again at kaspersky, try disabling it or any other security type app on the workstation. Any proxy settings?
0
 
LVL 1

Author Comment

by:celnet
Comment Utility
Just to update, we are still seeing this issue. We now have this affecting another machine, that was previously unaffected.

We have tried disabling Kaspersky. No other security apps, nor proxy settings present.
0
 
LVL 20

Expert Comment

by:carlmd
Comment Utility
What browser are you using? Have you tried using different browsers?

When it is not working, have you tried telneting to the ip address on port 80 to see what happens?

For example:

 http://www.anta.net/misc/telnet-troubleshooting/http.shtml
0
Free camera licenses with purchase of My Cloud NAS

Milestone Arcus software is compatible with thousands of industry-leading cameras for added flexibility. Upon installation on your My Cloud NAS, you will receive two (2) camera licenses already enabled in the software. And for a limited time, get additional camera licenses FREE.

 
LVL 1

Author Comment

by:celnet
Comment Utility
Chrome is used mainly, but the guys also use most other browsers (Web developers) and they try during the issues - the same problem persists cross browser.

I have not tried Telnet, I shall do so at next occurrence.

One other thing the guys have just pointed out to me is that it usually occurs not long after they have first logged on/Unlocked in the morning.
0
 
LVL 57

Expert Comment

by:giltjr
Comment Utility
Again, I would suggest you do a packet capture from their desktops.

If possible I would also look at firewall logs and do a packet capture in-front of the firewall.

Do you have a proxy server?
0
 
LVL 1

Author Comment

by:celnet
Comment Utility
Problem just occurred on one machine:

We ran Telnet and we just receive a 408 response (time out) or connect failed, regardless if the issue is occurring or not.

No we do not have a Proxy.

The wireshark traces are run on the desktops when the problem occurs. I am reviewing Firewall logs now.
0
 
LVL 57

Expert Comment

by:giltjr
Comment Utility
What do you see in the packet capture?
0
 
LVL 20

Expert Comment

by:carlmd
Comment Utility
The telnet 408 response makes no sense. Even behind a firewall telnet has to work since the firewall would normally pass traffic on port and allow for the response.

telnet www.google.com 80

has to provide a response as I did here

telnet www.google.com 80
Trying...
Connected to www.google.com.
Escape character is '^]'.
HEAD /google/google.shtml HTTP/1.1
Host: www.google.com
Connection: close


HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Content-Length: 1580
Date: Tue, 09 Aug 2016 13:16:13 GMT
Connection: close

Connection closed.

Did you maybe not type all the stuff in bold?
0
 
LVL 57

Expert Comment

by:giltjr
Comment Utility
Well a http 408 from telnet request makes total sense if there is something between the client and the web server that is blocking/dropping traffic.

Think about it, if the traffic was flowing normally, then the browser would work and thus celnet would not be here asking for help.  : )
0
 
LVL 1

Author Comment

by:celnet
Comment Utility
Indeed! :)

So it's odd. This is exactly what we get:

telnet www.example.com 80
(Almost instantly goes to blank prompt, with flashing cursor - see image attached)
(if I leave it, after a while it goes back to usual CMD prompt.)
(I then try again, but this time typing:)
HEAD /example/example.shtml HTTP/1.1
(this is also blank as it types, so cant see it. After a while response is:)
HTTP/1.0 408 Request Timeout
Content-Type: text/html
Content-Length: 431
Connection: close
Date: Wed, 10 Aug 2016 08:55:41 GMT
Server: ECSF (ewr/15AD)

<?xml version="1.0" encoding="iso-8859-1"?>
                                           <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
                                                                                                                  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
                                                    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

<head>
                <title>408 - Request Timeout</title>
                                                        </head>
                                                                <body>
                                                                                <h1>408 - Request Timeout</h1>

<div>Server timeout waiting for the HTTP request from the client.</div>
                                                                        </body>
                                                                               </html>


Connection to host lost.

Capture.PNG
0
 
LVL 20

Expert Comment

by:carlmd
Comment Utility
Ok, the problem is that I am guessing you are using some version of microsoft telnet to do this. By default it does not echo anything. So, try this.

Open a command prompt
telnet
Microsoft Telnet> set localecho
Local Echo On
Microsoft Telnet> open www.google.com 80
Connecting To www.google.com

now without waiting for a response just type the following, don't worry about what you see on the screen.  please use capital letters as shown

GET  / HTTP/1.0

You should see a lot of stuff in response and then "Connection to host lost".

This signifies that you have made a connection to the web server at that URL.

Do this when it is working to see what you get, and then repeat when it is not working in a browser.
0
 
LVL 57

Expert Comment

by:giltjr
Comment Utility
Is is all sites you go to, some sites, or a few very specific sites?

A 408 means that the server had determined the connection has been active longer than it should be without  "appropriate" action.  Meaning the server is still waiting for a complete request from the client.

You mentioned Web developers.  Are they the only ones experiencing the problem?
0
 
LVL 38

Expert Comment

by:Aaron Tomosky
Comment Utility
Web developers you say? I used to support some of those and they were fans of fiddler and owasp which both setup a local proxy. Fiddler in particular would crash and leave the proxy up but nothing listening...
0
 
LVL 1

Author Comment

by:celnet
Comment Utility
@carlmd, yet to test when down, will report back.

@giltjr, Its any and all external http sites. I think the Telnet issue is just a combination of using Microsoft Telnet and my lack of experience with it. If I chose a page that doesn't exist I get a correct 404 response.

@Aaron Tomosky, yes it was brought to my attention by one of the Dev's here yesterday that they use 2 products that setup local proxy (Fiddler and Charles) and they said about that issue. However, one of those reporting the issue said he hasn't used either product for quite some time. That doesn't of course mean that it isn't running in the background...

Watch this space for more feedback!
0
 
LVL 57

Expert Comment

by:giltjr
Comment Utility
Well using telnet, IMHO, really will show you nothing.  Although telnet is a different program from a web browser it is still making a HTTP request.  It is just instead of the program making the request you are making it manually.

Getting a packet trace, from the user's computer, will show if the full request is leaving the computer.  Getting a packet trace from firewall will show if the full request is leaving your network.

Since it is any/all external sites, that implies it is your firewall or beyond.  Example, maybe your firewall has a limitation on the number of simultaneously connections.
0
 
LVL 1

Author Comment

by:celnet
Comment Utility
So, telnet in the end didn't give me any new info.

Having looked now at a few Netstat -ab commands during the issue, there are a couple things I am going to look at:

  1. Port Exhaustion
  2. Revisiting Kaspersky (again)

This time, I have completely uninstalled KAV from one of the affected machines that is regularly affected. I await to see if the issue reoccurs.

I will concurrently look more into port exhaustion and see if this is potentially the cause. I will report back, thanks again for all the assistance so far from everyone.
0
 
LVL 57

Assisted Solution

by:giltjr
giltjr earned 167 total points
Comment Utility
I doubt very much it is port exhaustion.   Why?  Because if you are the client and your client is running out of ports to use for client side initiated connections, it would run out no matter what the target port (80 or 443) is.

Lets go backwards a little bit.

When somebody is trying to use http, what exactly happens?

Do they get NO response?
Do they get slow response?

 In one of your posts you stated you received a HTTP 408.  That is actually issued by the server when a HTTP request takes too long.

On your ASA are you throttling bandwidth on port 80?    If you are throttling port 80 and not 443, that would explain why 80 has issues and 443 does not.

Are you able to sanitize and post your ASA configuration?
0
 
LVL 20

Expert Comment

by:carlmd
Comment Utility
From your comment I assume you mean that you could not telnet to the site either when http was down.

Have you tried the following in a command prompt when not working.
ipconfig /flushdns
ipconfig /release
ipconfig /renew

Also restarting TCP/IP
netsh int ip rese

Also, searching around I see that many people appear to have the same problem when using AVAST. Do you by any chance?
0
 
LVL 1

Author Comment

by:celnet
Comment Utility
Apologies for not replying sooner. The issue appeared to go away for a while and I have had other projects come to a head, so this took a back burner.

@carlmd, I have tried all you have suggested in your last post previously to no avail.

@giltjr, they receive usually a slow response initially and then none. No throttling of any kind is currently enabled.

What I have done/what has happened since I last updated:

I have updated the IOS on all our Cisco switches. One reported bug in the software we used sounded similar to this issue. However this did not solve (though it did solve an unrelated, random slow network speed issue).

I had the issue myself this morning. I noticed that Outlook also stopped responding at the same time - we are using RPC over http... I have asked the guys to check Outlook when the issue next occurs for them.

I have disabled Kaspersky on one machine and he is no longer having the issue. I disabled on mine and it seemed to solve the issue immediately without a restart.

I currently suspect that Kaspersky is the culprit here. I don't quite know how it is only affecting some and how it was ruled out before yet now seems to be the cause. I am in contact with Kaspersky support for ideas.

I will report back with findings.
0
 
LVL 20

Assisted Solution

by:carlmd
carlmd earned 166 total points
Comment Utility
Sounds like Kaspersky is using port 80 for something.

In the past bad/incomplete antivirus updates have caused Kapersky to make port 80 stop working.

You could try turning off automatic updates to see if it continues to happen or not.
0
 
LVL 38

Expert Comment

by:Aaron Tomosky
Comment Utility
Not sure if kaspersky specifically has this setting but most AV app do: block traffic while scanning or updating definition files or something to that degree. Perhaps that is getting stuck
0
 
LVL 1

Accepted Solution

by:
celnet earned 0 total points
Comment Utility
Ok, so there has been an update from Kaspersky that we have rolled out, about 2-3 weeks ago. Since rolling this out we have seen no further occurrences of this issue.

Really appreciate all the help here. Consider this closed!
0
 
LVL 1

Author Closing Comment

by:celnet
Comment Utility
Kaspersky Update appears to have fixed the issue.
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Resolve DNS query failed errors for Exchange
Don’t let your business fall victim to the coming apocalypse – use our Survival Guide for the Fax Apocalypse to identify the risks and signs of zombie fax activities at your business.
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now