Link to home
Start Free TrialLog in
Avatar of matt carter
matt carter

asked on

Slow network for the past week

Hi all, i am hoping for advise on what to look for regarding our slow network, i understand there are alot of issues that can cause slow connections so i will let you know our full setup and what i have tried so far.

2 Physical servers, 2 hyper v on each server
1st server
   Windows Server 2008, not running any software except hyper v, not on the domain.
   Hyper V 1, Primary Domain Controller, Server 2003, DHCP, DNS settings are
        Reverse Lookup, 2 Name servers, (same as parent), first one is pointing to the second domain controller and has the IP address
        listed.
        The second one in the list is point to the primary domain controller, but has IP address unknown,
        Forward Lookup Zone has 2 "same as parent" Host A, one each point to IP address of domain controller, then underneath that has
        all the workstations connected with computer name and IP address
        Under DHCP "Scope Options" 006 DNS servers we have 3 listed, IP address of main domain controller, Public Telstra IP Address, and
        3rd Public Bigpond IP address

   Hyper V 2 is our file server, running server 2012, accessing files on server seems to be okay, browsing through directories is not slow.
        No other software or settings are on this server, just files, it use to do a shadow copy of all files twice a day, one at 7am and one at
       12pm, since the issue of slow speeds the 12pm one drops off all access to mapped drives for 5 minutes on workstations, so i have since turned this one off.

General performance of first server seems fine, when remote accessing or working locally everything seems to be good

2nd server
   Server 2012, not running any software except Hyper V, not on the domain
Hyper V one, second domain controller, IP address matches DNS settings as mentioned above.
Hyper V two, Server 2012 running SQL Server,

This physical server is the one that has issues, remotely accessing server or working locally is very slow compared to what it was, and when opening the VM's they are also slow to operate,
I have gone through all event viewer logs on all 3 windows and nothing is showing up as a concern. Memory and CPU performance are all under 50%. When all staff leave for the day, performance goes back to normal.
Yesterday our dedicated 20/20 internet connection speed was really bad, under 1mbps, which has never been that bad, i have contacted ISP and they have no issues.

Also, work stations take longer to login now,
One last thing, if it helps, we use an external email hosting company, when everything was working normal speed to took about 1-2 minutes for the workstations to connect to exchange server every time they opened up outlook, now it is instant, as much as this sounds good, i dont know why all of a sudden it connects immediately, thought i would mention it in case it is relevant.

If you need any more information please let me know, happy to provide any screen prints if necessary too.

Thanks
Matt
Avatar of Alan
Alan
Flag of New Zealand image

Hi Matt,

First things:

1) Do you have a good backup (verified to work), from last night, or whatever schedule it is on?  If not, this would be a high priority to ensure and make safe (offline).  Is this backup data, system state, or both?

If you aren't sure, would it be viable to take down the servers this evening, and run a rock-solid backup of each one, perhaps using a Linux boot DVD / USB, and doing a dd image, or any other image method you are familiar with?

2) You have turned off your shadow copies at midday - I understand that, but it does mean that even if you have good backups running overnight, you are still exposed to a full day of work being lost.  Could you perhaps reschedule that midday one to run at 5:30pm (or whatever time most, even if not all, staff have left)?

3) Do you have anti malware software running on your machines?  If so, have you checked logs, and ensured it is completely up to date?  Slow downs can be due to malware running on one or more machines, so this is something to check as a high priority.  It is also possible that anti-malware software could be the culprit - I have seen it get stuck going mad scanning things, and slowing machines down which is not good, but not as bad as having something nasty crawling around your network, especially if it is ransomware.

4) What are the specific slowdowns that you (or your users) are seeing?  Is it file access to network shares, email, web browsing, something else?

Thanks,

Alan.
Avatar of matt carter
matt carter

ASKER

Hi Alan, thanks for the quick reply.

Yes we have daily Bare Metal Backups run each day on both servers, both backups were successful last night.
Yes i am intending to reinstate the midday shadow copy, i just turned it off the last 2 days incase it was causing an issue somewhere.

We have Trend Micro Maximum Security on all machines, all are up to day, we have had this on for several years now. I have not ran a scan on each individual workstation as yet, we have 60 workstations in the office, but will start this today as people go to lunch.

Specific slowdowns, the staff use an in house SQL program, this is very slow, when i log in to the physical server that hosts the VM it too is very slow, laggy, No hardware has changed on this server and it has always performed fine, only in the last 2 weeks it has had issues.
I mention this as from all the research i have done, alot of people say its under powered as to why SQL software is slow, in this case i dont think that is the issue.

Logging into workstations is taking a good 2-3 minutes now, it use to only take 20-30 seconds. So this too has slowed right down too.
Hi Matt,

When was the last time you rebooted the two physical servers?  If not recently, then I would try that first (you never know!)

I would also power off and reboot all the networking equipment - all your switches and your router(s).

You probably won't be able to do this during the day :-(

I usually do it in this order:

1) Shutdown all user machines (including my own) - I have a simple script that does it, then I check on the server to make sure nothing other than server(s) have active sessions / connections (not quite as 100% as physically checking each PC, but if you have 60 then that might take a while)

2) Shutdown the VMs (I leave the DC to last)

3) Shut down the physical server(s)

4) Power off all the networking gear, including the perimiter router / modem.

5) Bring up the network gear

6) Bring up the Physical Server(s)

7) Bring up the VMs (if they don't come up automatically) starting with the DC

8) Try a user workstation (usually my own) to make sure I can login


Hope that helps,

Alan.
HI Alan.

Yes i have tried restarting all VM's and physical servers, have rebooted router / modem,
I have not restarted the switches, will do this tonight along with all the servers again and let you know an update tomorrow.

Matt
our internet speed is also being affected, 20/20 copper connection is currently downloading  speeds at under 1mbps.

Matt
Hi Matt,

by:matt carter
ID: 42236740
our internet speed is also being affected, 20/20 copper connection is currently downloading  speeds at under 1mbps.

From your symptoms, I would guess that the issue is within your LAN, not due to your connection to the outside world (else you would not see slow responses when connecting to internal machines).

However, if you want to be sure, then maybe disconnect your perimeter modem / router from the LAN, and connect it to a machine, and run speed tests like that.  If it is normal (fast) then your issue must be internal.

If doing that, I normally take an old PC / laptop (I have one with no HDD), boot to a live Linux distro DVD (Ubuntu 16.04LTS say), and run the tests from there.  That way, if the machine is exposed to the internet and someone tries to hack it (which happens quite quickly), when you power it down, nothing that happened will remain.  If the machine is behind a NAT router, chances are that nothing would get in anyway, but just noting.

Alan.
Under DHCP "Scope Options" 006 DNS servers we have 3 listed, IP address of main domain controller, Public Telstra IP Address, and 3rd Public Bigpond IP address
You need to fix this. Take out the last 2 addresses. Those should NOT be in the DNS settings of any machine on the domain, as it will cause a number of issues, including slowness. They should not be in the DNS settings for any system on the network with a static address either.
Further update, i robooted all servers, switches, routers lastnight, came in this morning and still has issues.
Today i have found the following.

If i ping our domain controller / file server we are getting a 1ms response time.
If i ping our host server that is running the VM of SQL the ping times are drastically different, jumps from 1ms to 1500ms +
The same occurs if i ping IP address of the VM of SQL. I shutdown the VM of SQL and ping times went back to 1ms for host machine.
So something is definately happening on the SQL VM. I am installing wireshark now. Will let you know the results and if i need help siphoning what it produces.

Thanks Masnrock aswell for your input, would you know having these extra DNS settings cause an issue to what i just mentioned?
ASKER CERTIFIED SOLUTION
Avatar of Alan
Alan
Flag of New Zealand image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Another update.
Stayed back in the office til midnight scanning each computer for infections. Nothing obvious appeared on any machines,
a couple of computers were set to static IP, i changed them back to DHCP.
As a test, i followed your advise Masnrock and deleted the additional DNS settins on DHCP scope, just in case this was the issue.
Did not resolve the slow network issue.

Another thing i done to eliminate a workstation was manually turn off all computers myself, then cleaned a few at a time.
Even with only 2-3 computers logged in (not connecting to SQL Database) when i ping the server the response time was still fluctuating.
Turn those computers off and go to the next row and same thing. So im pretty confident its not workstation related.

Alan, unfortunately due to physical hardware limitations i cannot transfer the hyper V.
also as another test to make sure its not internet related i turned off the gateway router for an hour, still same issue.

Thanks
Matt

2 offline settings on the NIC's are enabled, Large Send Offload version 2 IPv4 and IPv6,
Hi Matt,

Midnight finishes sound all too familiar :-(

I am fairly convinced that the issue is with the SQL Server itself or an SQL application running theron.

To confirm this, perhaps you could shutdown or isolate the SQL Server, then see if the rest of your network still exhibits the same symptoms - either or both of:

  • your internet connection under 1mbps; or
  • slow logon from a workstation

Knowing for certain the source of the problem will definitely be a step towards solving it.

Alan.
Yes Alan, no doubt there will be more midnight finishes in the future, needs to be done in some cases.

I just quickly booted everyone off for 10 minutes to test this, not ideal doing it through the day but hey, if it helps it get resolved quicker then im all for it. Once i turned off the SQL Hyper V, Ping times of host server were 1ms using both dns name and ip address. Booted VM back up, straight away as soon as it logged into windows it had slow ping speeds.

Hope this helps, since doing the above test, to me the source issue seems to be in SQL Hyper V. i will keep looking through the VM to see if anything catches my attention and let you know.
Hi Matt,

Okay - great that we have narrowed it down to the SQL Server.

Have you looked at the event logs?  I often export the entire event log into Excel (or whatever) so that I can do some analysis on them - can you do that?

How far back do they go?  If the start prior to the issues, then I would take a period - I normally start with a week, but it could be a day, or an hour, whatever works - and compare the period from prior to the issues against now.

If working with a week, I try to avoid any public holidays.  If a day, then I choose the same day of the week, and if an hour, I use the same hour on the same day of the week - probably just me being picky :-)

I then analyse to see if there are any obvious errors or warnings that are coming up a lot now compared to the period before things started having problems.

You *could* post them here, but I am always a bit leery of posting server logs publicly, unless they are very short, and I have manually scanned for anything that might be confidential.  If you want to email them to me, I have a Gmail address that is just a bouncer, and I kill periodically (so if anyone is reading this later, it is only guaranteed to work until the end of the month following this post):

Alan49228@gmail.com

Hope that helps,

Alan.
Further update.

I have no idea what changed overnight, left work last night and planned on arriving before staff this morning to do further diagnostics.
Arrive this morning and speed is back to 100%, all day so far it has been good. ping speeds are at 1ms, couple are jumping up to 30-40ms which im fine with as someone might be saving data on the SQL Database

All workstations are on and operational, so who knows what happened.
The ultimate issue i have with this now, is what was the cause, i know its going to be almost impossible to figure out, so if anyone else reads this who has network speed issues. GO HOME, come in early the next day. HAHA if only that was the case all the time.

To recap for anyone who reads this wanting more answers, this is what i have done so far.
Rebooted all servers, in sequence as mentioned by Alan
Turned off all modems, switches, routers etc, including patch panels
Tested each individual machine for spyware / viruses, nothing major found.
Also checked network IP settings on each workstation, a couple had static addresses, without being reserved in DHCP, although no messages of conflicts were appearing, this could have been the issue. Made sure all IPv6 were not ticked on each workstation.
Downloaded and installed all updates, drivers, ethernet drivers, updated Integrated services on the VM's.
Removed additional DNS settings in DHCP Scope as suggested by Masnrock.
I done all the above on Wednesday night, still slow on Thursday, no changes / tests done throughout Thursday, arrived Friday to normal operation.

Will send another update in a weeks time if all good, if not then it will be sooner.

Thanks
Matt
Hi Matt,

That's great to hear, but a bit scary not to be able to pin down the cause.

For what its worth, I would still suggest analysing the event logs on the SQL Server.  If nothing else, it might be interesting :-)

Look forward to hearing what happens....

Alan.
Speed issue has not returned, unable to determine exact cause, but appreciate the support received none the less.

Matt