CMD processes crash at Friday afternoon each other week

One of our webservers crashes at Friday evening between 1800 and 1900 hours, mostly around 18.35, every other week. The server runs Tomcat 4.1.24 in a cmd-window. At the time of the crash, three cmd-windows were open. One was idle, one was Tomcat, one running another process. All three windows were gone after the crash. The third process was running only the last time, so it couldn't have caused the other crashes.

Dates: 27 aug, 10 sept, 24 sept, 8 oct, 22 oct. I've checked the Tomcat logs, but couldn't find any activity around that time. One time Tomcat was up for two weeks, one time for less than one day, the other times for several days. Starting Tomcat again, and it works like normal. We have Tomcat running on two other servers, no problems there.

It's not a Windows crash, not a memory problem it seems. The task manager doesn't show anything strange regarding memory. The server runs on Windows 2000 with all updates installed, and I've scanned for viruses with the latest antivirus definitions, and found no viruses. Before August 27 this didn't happen, and logs go back to May 2003.

The firewall logs (ISA Server) didn't show anything strange, although I don't know exactly what everything means in the three types of logs.

I really have no idea where to look at the moment, so any suggestion is welcome.
LVL 1
grexxAsked:
Who is Participating?
 
elbereth21Connect With a Mentor Commented:
To do that, right-click on My Computer, choose Properties, than Advanced tab, Startup and recovery button, then uncheck automatically reboot.
0
 
SKULLS_HawkConnect With a Mentor Commented:
What about the event logs?  Any info in them from around the same time.  Application crashes normally register something.

Also have you checked the task sheduler to see what runs on a Friday between those times?
0
 
elbereth21Commented:
Take a look at the scheduling of back-up jobs, you might have one or more running on friday evening.
0
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

 
grexxAuthor Commented:
Thanks, SKULLS_Hawk, that was a useful tip! The event logs has several records per occurence. It  turns out that Windows crashes and restarts. So that answers the question why the processes disappeared. The new question is why the system crashes. And why on such a regular basis, on that specific time?

I copied the following eventlog entries:

Event Type:      Error
Event Source:      EventLog
Event Category:      None
Event ID:      6008
Date:            27/08/2004
Time:            19:35:57
Description: The previous system shutdown at 6:28:53 PM on 8/27/2004 was unexpected.

[event type etc is the same as above]
Date:            10/09/2004
Time:            20:02:48
Description: The previous system shutdown at 6:26:46 PM on 9/10/2004 was unexpected.

Date:            24/09/2004
Time:            18:32:21
Description: The previous system shutdown at 6:28:39 PM on 9/24/2004 was unexpected.

Date:            08/10/2004
Time:            18:32:41
Description: The previous system shutdown at 6:29:34 PM on 10/8/2004 was unexpected.

Date:            22/10/2004
Time:            18:32:22
Description: The previous system shutdown at 6:29:47 PM on 10/22/2004 was unexpected.

A friend suggested that the Windows update service (SUS) might be the cause, but looking at those logs, I can't find anything around that time, and as far as I know it's not setup to install and restart automatically. Furthermore, that shouldn't cause a system crash.

I cannot find any scheduled tasks, and backup scripts from other servers run at midnight.

My manager suggest reinstalling the system, which seems the easy :-(  solution, but I still hope to find the reason.
0
 
grexxAuthor Commented:
I added some points...
0
 
elbereth21Commented:
I think it is better if you uncheck the options for automatic restart in case of error so that you can see the blue screen of death and the error code.
0
 
SKULLS_HawkCommented:
Are there no other critical or even informational entries in the Event log, shortly before the restart logs?

If this happens at a set time on a set day everytime, then something somewhere is activating at that time.

Possibly Exchange mailbox mainenance, virus updates, windows updates, etc etc.  Any kind of application that may fire off at a particular time and day.

Going to be a difficult one to trace without log entries, depending on the actual crash, as elbereth21 mentioned disabling the automatic reboot may help pin point the problem.

Looking at your logs above, there certainly is a sequence to the timing.  Interesting.
0
 
grexxAuthor Commented:
There are no entries in the event log before the restart, well that is hours or days before the restart... As I mentioned above, updates were run days before. The mailserver doesn't have anything to do with this server and backups run at night.

What is interesting though, is that Norton Antivirus shows 22 october for the last virus update. I just found out that it has its own event log, which shows interesting but somewhat confusing parallels to the restarts. It turns out that on each of those days, NA has updated its virus definitions. But... the update time is 8:01 PM, so after the restart, not before. This might be a coincidence, but there is more.

The last updates were installed by user SYSTEM. Before 27 august, the automatic updates were installed by Administrator. Just after the restart, both eventlogs record the startup of NA, which is as expected.

I'm confused.... to say the least! :-S
0
 
SKULLS_HawkCommented:
What time is Norton set to download updates?  Possibly it is updating them at the time of the crash and then installing after server is rebooted.

I know by default symantec/norton's default virus definition update is once a week on a friday, I forget the time.  possibly that is the cause?  It is worth looking at.  Tomorrow if you know the server will normally crash, I would disable updates in the morning, and see what happens.

I don't mean to harp on about it, but given the regularity of the crashes and their timings, it is almost a certainty that there is a trigger causing the reboot.  Norton is as good a candidate as any. Do you have any other applications on your server that possibly check for updates, or run routine maintenance, possibly something small that isn't obvious.

It's like a ghost in your machine. :-)
0
 
grexxAuthor Commented:
Time for an update. The problem hasn't been solved, it only got more complicated.

1) We discovered last Friday (19nov) that in fact three servers are crashing. One is the firewall/proxyserver (ISA Server), one has the network drives and is the second AD/Domain of our network. The third is the one mentioned above, the Tomcat server. Our other servers have no problems, and show nothing in the event logs.
2) All three servers started crashing at the same date, and have crashed since then every other Friday evening between 17:26 and 17:30. There is no pattern which server crashes first. They all have crashed at least once first.
3) Wintertime: from August till October (summertime), the crashes occured around 18:30. In November (wintertime) the crashes occur one hour earlier.
4) On the Tomcat server we removed Norton Antivirus Corporate Edition and reinstalled it before 5 november. Still it crashed on 5 nov. On Friday 19 nov we deactivated NAV in the Windows services. It crashed that evening.
5) After re-examining the ISA logs, I found that there doesn't seem to be a relationship with Norton Liveupdate downloads. Comparing 5 and 19 november, the firewall downloaded the update at 00:00hrs, the Tocmat server didn't download any updates at those dates, and the networkdrives-server updated 5 november at 18:00 and 19 november at 19:00. (To be complete, some of the servers that didn't crash, did download the update around 18:00 hrs.) So I suppose NAV is not the cause, but still I will deinstall it before the next crash on at least one server.

So far a report of what happened. We plan to reinstall the Tomcat server after 1 december. The next crash will occur at 3 december.
0
 
grexxAuthor Commented:
Elbereth21 said: "I think it is better if you uncheck the options for automatic restart in case of error so that you can see the blue screen of death and the error code"

Thanks for the reply. I'm not so fond of these error codes. The strange thing is that on all servers that I've checked (crashing and non-crashing), this option is UNchecked! And still they all restart automatically...
0
 
elbereth21Commented:
Are these servers HP machines? If it is so, they might have a BIOS option (ASR, if I remember correctly) which overrides the Windows one and makes the server reboot, even if you changed the flag.
Have you considered the possibility of a memory leak? That kind of issues can cause regular reboots (especially during the weekend, I know so well....:-(    )
Hope it helps, Elbereth21
0
 
grexxAuthor Commented:
Tomcat: Dell
Share and Firewall: unknown brand
But still that could mean they have this option as well!

Memory leaks: I've looked at the uptime of the Tomcat server for all crashes. At least once it was up for the full two weeks. Once it was rebooted the same day. I'm not sure what that tells me. For the rest this seems unlikely, as the three servers do completely different things, run different applications, except for Windows of course...
0
 
grexxAuthor Commented:
I'm going to close this thread. The problem isn't solved at the moment, but it's time to clean up. If anything new comes up, then I'll post it here. Thanks for your input. It did help me somehow.

*****************************
New replies that lead to a solution will be rewarded by opening another question plus apropriate points. So they are still welcome!!!
*****************************
0
 
grexxAuthor Commented:
Okay, I think I've found the solution. All three servers were connected to an APC power supply, all other servers were not. I've removed those servers from this power supply and put another server back on it to see what happens. For further developments, I've opened a new question about this in the hardware section.

http://www.experts-exchange.com/Hardware/Q_21248276.html
0
 
grexxAuthor Commented:
Last Friday the servers didn't crash, and that's because we removed the UPS, which had a dead battery and tested every two weeks by cutting of normal power.

Furthermore I had one more little problem which I didn't understand, and that was the time of the crash in the windows event log. These times differed up to 4 minutes per server. And this was the time of the crash, not the time of the restart. It turns out that Windows makes a log entry every five minutes. See:

http://www.experts-exchange.com/Operating_Systems/Win2000/Q_21251118.html
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.