My server keeps going offline. Pls help!

I have a Windows 2003 Standard edition server that becomes unreachable lately. It's been fine for almost a year now, but just in the past month the server can't be reached by remote desktop or access any of the shared files on it. The strange thing is that I can ping it when it's in this state. I always have to reboot the computer to get everything working again. The only installed on this server is Exchange 2003 and it's been on there for about a year. I haven't installed anything new on this server and has been sitting in the corner untouched.

I looked in the event logs and unfortunately I have no errors in there to help solve this issue. The only errors I get is when Exchange can't reach the domain controller anymore becuase it becomes in this state. But that is just of the result of the server getting into this state (not a cause). Does anybody know what I should do to help diagnose or fix this problem? I don't know what it could be since it pretty much a vanilla install of Windows and Exchange and doesn't get used by the console. Any ideas?
bemara57Asked:
Who is Participating?
 
redcelltechCommented:
I have seen this many times, and it is generally related to an underlying hardware issue. I say this because of the lack of any errors or warnings.

First place I would start is with perfmon. Looking specifically at disk IO, check bytes written, read, and dis queue length. I have seen array controllers/drives going bad cause this issue. Basically disk IO goes up so high the box can no longer respond in the time it takes you client to time out.

Also use perfmon to check memory usage I know you say that you have not installed and new software, but perhaps a patch or update to existing software has a memory leak. Or you have memory going bad. Your a memory test tool. The best come on a floppy disk and you boot to them. Run the tests in loop mode running more the 10 loops. I have had machines that pass the first few only to find out they fall over on later loops.

I will tell you, these are the most difficult problems to resolve. Carefully cataloging everything that is going on is key to finding these issues. I have had one situation were a scheduled task was running the machine over during the day. Another were an array controllers onboard wirte cache memory was going bad, etc, etc.
0
 
cshepfamCommented:
you can first start by doing a netdiag and dcdiag command.  this will inform you of any errors with networking, etc.


if you see any errors and know how to fix it, go for it.  if not, post up the errors in here and we'll go from there.  


to be honest, it sounds like a network issue.



one question, is your IP addresses on the server static?  I hope so because if not your IP address
will keep changing everytime you reboot.  If you have WSUS installed, after updates are completed it will reboot the server.  Then when your server comes online, it picks up a different IP address.


Go to network connections and check your configuration in TCP/IP Properties.  If the box "Obtain IP" is checked, uncheck that and enter a static (permanent) IP address.


What I would do is first do an ipconfig, find out your IP address currently, then use that IP address for the static one.
0
 
ChiefITCommented:
This may be a network issue:

You may have a dumb switch not set for Spanning Tree Port Fast.

You may also have a network cable or jack.

My first guess would be the router lost its DNS settings. You said you can ping it.
Can you ping this by IP name and Computer name. If you can ping by IP address and not by computer name, you have a DNS problem. Since no errors show in event viewer, my first guess would be you lost the DNS settings in the router. DNS of the router, on the LAN side should point towards your servers.
0
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
redcelltechCommented:
Did you ever figure this out? Any more help?
0
 
bemara57Author Commented:
I did a few things but didn't work. It's still a mystery and happening about every other day. Here's what I did:

1.) Ran netdiag and dcdiag- everything was successful with no errors

2.) IP address and DNS server (DNS IP is pointing to domain controller) has always been static on the machine especially since it's an Exchange 2003 server. Also the domain controller/DHCP server has a reservationfor the IP address for that machine's MAC address

3.) I have a good Linksys SD2008 switch. I used it out the box and don't think there is any settings I can change (http://www.linksys.com/servlet/Satellite?c=L_Product_C2&childpagename=US%2FLayout&cid=1123638180691&pagename=Linksys%2FCommon%2FVisitorWrapper&lid=8069122279B01).

4.) The biggest clue I came up with is I installed a second network card to work with a different IP address. When it gets offline, I can still ping either network card but can't connect to either card via remote desktop or even OpenSSH (A telnet for Windows: http://sshwindows.sourceforge.net/)

5.) I looked at perfmon, but only get live stats. Is there a way to get history or do I have to turn it on so it records history? I know this step is important and sorry I couldn't get this one done but need help with this one. The options were a bit overwhelming because there were so many. Should I just select these counters:

Permance Object: PhysicalDisk
Counters: Current Disk Queue Length, Disk read bytes/sec, Disk Write Bytes/sec, Split IO/sec

Permance Object: Memory
Counters: % Committed Bytes In Use

Also is there anything else I should do in the meantime? Thanks a mil for all the help so far.
0
 
redcelltechCommented:
What hardware platform is this. I would suggest running any diagnostic tools available. Like I stated already generally, I see this with low lying hardware problems. Such as a bad PCI bridge, bad memory, etc. If this machine is under warranty make the vendor due the dance. If it is not, check to see how much to put it under warranty, and then make them do the dance.
0
 
ChiefITCommented:
to add to redceltech's inquiry:

I was working on a linksys RSV4000 router the other day. I was trying to make a simple connection. It is advertised on the page the author provided. This router is a 10/100/1000 router. For some reason, it didn't like the 10/100 NIC cards we had. It simply communicated periodically. Maybe you are having the same problem with the switch. Try a 10/100 switch.

John
0
 
bemara57Author Commented:
The computer is definately out of warranty. After some research I found that a lot of people are experiencing problems with store.exe eating up memory. This process is for Exchange. It eats up so much memory that it looks like a memory leak but Microsoft says it's normal. So I'm going to add another gig of ram and see how that goes. Beyond that I'm hoping I don't have to junk it out. Thanks for all the help.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.