Link to home
Start Free TrialLog in
Avatar of alankinane
alankinane

asked on

Server NIC stops working intermittently

I have a strange problem with my Windows 2003 Server.  Since last week the server has intermittently become inaccessible.  It will be working away fine and all of a sudden nobody can access it.  You can't ping the server nor can you ping any hosts from the server itself.  Usually if you wait a while it comes back of it's own accord but this can sometimes take hours.  Rebooting the server once or twice usually fixed it for a while but today I rebooted about 6 or 7 times with no luck and then I just left it logged in and after about 20 minutes it came back "online" and has been fine for the past 2 hours or so.

It doesn't seem to be software related as I disabled all firewalls and anti-virus software.  I also restarted in safe mode with networking and it was still down under this setup.  I am thinking there is a hardware issue with the NIC but unfofrtunately it's an old single NIC server so I need to purchase a second card to prove this theory.

When it goes down, the odd thing is that the ethernet lights always stay on and it always says connected.  It seems to be sending packets ok but receiving none or very few.  There is nothing showing up in the event logs and I have also tried replacing the CAT5 cable which made no difference.  

Is there anything else I can do, any suggestions for narrowing the problem down further???
Avatar of mrbrain646
mrbrain646

I would also try a different switch port. They can sometimes go bad. Not sure what brand of server you have but HP has diagnostics you can run from smartstart cd.
I would also update the drivers and firmware for the nic and do a windows update.
You can try to ping loopback during failure.  If it failes then I suspect card.  If it doesn't fail then check the Hub/Switch it is connected to, try a different port.  If it's a managed product there may be something there to check.

-dave
Avatar of alankinane

ASKER

It's an IBM xSeries server.  I already update the driver for the NIC but will try the firmware also.  I was able to ping localhost and also the static IP address of the server.  I will try connecting to a different port in the switch though.  Thanks for the suggestions.....
Agreed I have experienced a similar problem which has been caused by the switch. If you are able to log on to the switch via a web interface checking the logs would be a good place to start.

Also maybe the server has multiple network cards, maybe you can just try using a different NIC?

If you can't ping the loop back (127.0.0.1) from the server as Dave mentioned above then something is wrong with the NIC itself.
Avatar of Steve
youve already updated NIC drivers and advise you can ping loopback and ip.
Id try rolling the driver back to an old one as its a shame its suddenly stopped working.

Id definately try another switch port but things also worth trying to help diagnose next time it happens:

Try disconnecting the network cable and reconnecting after 30 seconds. Similar tests to try are rebooting the switch or disabling the NIC on the server and re-enabling again in network connextions.

What does IPconfig /all provide during the issue?

on the switch that the server is connected too, check for errors and data information.
you can try port spanning and packet sniffing on another port to track traffics on the port connected to your server.
ASKER CERTIFIED SOLUTION
Avatar of celazkon
celazkon
Flag of Czechia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I actually have two switches.  One 24 port switch that the server was connected to and the other is a 5-port firewall which I have a few PCs connected to also.  I had the server connected to the 24-port but I am trying it with the firewall now.  I have also now updated the firmware on the nic.  We'll see how it goes.  Thanks for all the suggestions people......
Update:  It stayed up for the rest of yesterday and was working first thing this morning.  Then it went down again.  I think now that it is the 24-port switch that is causing the problem (or a device connected to it perhaps).  Initially when we went down this morning the server and PC connected to the firewall appeared unaffected but then they went down also so I think the 24-port is bringing everything down.

Upon restarting the 24-port switch everything came back up again although it took about 15 minutes or so after restarting the device.

I have ordered a replacement 24-port switch and will see how that goes.
@alankinane

great stuff! Although rebooting the server has been your previous fix, we needed to establish if the server was actually at fault.
If you've already ordered the switch you may as well see how it goes but take care what is connected as another device on the network could still be the cause.
It turns out it that one of the CAT5 cables in the 24-port switch was connected to another port and thus creating a loop.  I should really have checked for this first before buying a new switch.  Not sure who connected it like this or why it suddenly started to become an issue now as it appears it has been connected like this for some time.