Solved

Unixware 7.1.1 server power on but cannot connect happening on weekly basis

Posted on 2008-06-23
64
1,393 Views
Last Modified: 2013-12-05
Hi

We have a very old server which is running Unixware 7.1.1. for the last 3 weeks we have been having a probelm with it.

The problem is that we have come into work in the morning and no one can connect to the server. It times out. I looked at the server and it was all powered but there was nothing on screen. On the server itself there is a light with a sign which i think is memory and exclamation mark. That was flashing!!! SO i had to power it down from the switch.

I bought the system backup and all is fine. But it will happen again and it happens overnight. The unix box is in a secure server room so no one has access.

i check in the log file and there was no errors recorded so i am a bit stuck. i checked both syslog and osmlog

hope someone can assist

Thanks

Tj
0
Comment
Question by:tjhack
  • 35
  • 23
  • 6
64 Comments
 
LVL 14

Expert Comment

by:mikelfritz
ID: 21851225
Any way to pinpoint the time?  Maybe write a script to echo the date out to a file and then let it die to see the time.

while true
do
date > date.log
sleep 60
done

Maybe something environmental at night.  I had a customers system that would die at night and it turned out that it coincided with the cardboard box crusher in the warehouse being run.
0
 
LVL 14

Expert Comment

by:mikelfritz
ID: 21852577
Maybe post the output of "crontab -l" to see if there is something kicking off that is crashing the server.
0
 

Author Comment

by:tjhack
ID: 21853166
Hi

I am quite new to unix so still learning. How do i post the output of the crontab?

0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21863395
run the command
   crontab -l
on your system to see your crontab.
To see the crontab of another user, run
  crontab -u username -l

To see which crontabs are there, list the crontab files
  ls /var/spool/crontabs
The directory name may be different on your system. Check with
"man cron" or "man crontabs" to verify.
You may also use "find / -type d -name crontabs" to find the dir.
0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21863417
If you install a script that runs all night, you may get rid of the problem
by actually hiding the root cause as the error may only occur when the
system is really idle over night ;-)

BTW: To edit a crontab, use the command
  EDITOR=vi ; export EDITOR        # make sure not to use 'ed' as editor
  crontab -e
0
 

Author Comment

by:tjhack
ID: 21863437
Hi Just did a search and found a few crontabs

./etc/inst/save/var/spool/cron/crontabs
./var/sadm/pkg/cmds/save/build/var/spool/cron/crontabs
./var/spool/cron/crontabs

which one is the correct one?

we have the backup running at night which uses the script cron.backup.dat

0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21863438
To run a command once every minute, you can put this line into your crontab:
* * * * * echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'` > /path/to/check.log

a) Verify that the command works fine:
      /usr/bin/date '+%d.%m.%Y %H:%M'
b) Change the filename /path/to/check.log to your liking.
0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21863492
The crontabs are in
  /var/spool/cron/crontabs
To see all files, simply use
  ls /var/spool/cron/crontabs
0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21863499
Does your backup complete successfully or does the system die before it finished?
0
 

Author Comment

by:tjhack
ID: 21863915
when it has happened the system dies before it finishes

0
 

Author Comment

by:tjhack
ID: 21863927
i did an ls

adm        
_cron16161  lp        
 root.new    
time
_cron13781  
_cron22417  root      
 sys        
 uucp

and i got the folders above. which folder or file do i need to look at?
0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21864023
you may want to add you periodic check into root's crontab:
a) make sure you are root
b) use "crontab -e" as stated before to edit the file

The other files (adm, lp, sys, uucp) are standard cron files.
You can see the contents of the sys crontab like this:
   crontab -l sys

The crontabs with names like _cron* and root.new will not
be used, as there is no corresponding user with that name.
0
 

Author Comment

by:tjhack
ID: 21864076
how can make it run every 10 mins

i added this line which prodcued errors

0,10,20,30,40,50,60 * * * * echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'
` > /home/tp/check.log

0
 

Author Comment

by:tjhack
ID: 21864116
error is

UX:crontab: ERROR: 0-59/10 * * * * echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y
 %H:%M'` > /home/tp/check.log
: error on previous line;Unexpected character found in line.
UX:crontab: ERROR: Errors detected in input, no crontab file generated.
0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21864182
What are you trying to tell cron?

To have the command running every minute from 10:00 to 10:59 you
should write
* 10 * * *   echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'` > /home/tp/check.log

The syntax x-y is invalid, as well as n/m is invalid

The following (one single) line would be fine, too. Note the closing " near the end!
0,10,20,30,40,50,60 * * * * echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'`" >/home/tp/check.log

Open in new window

0
 

Author Comment

by:tjhack
ID: 21864233
i want it to run every 10 mins

i still get the error below

UX:crontab: ERROR: 0,10,20,30,40,50,60 * * * * echo "SYSTEM check: `/usr/bin/dat
e '+%d.%m.%Y %H:%M'`" > /home/tp/check.log
: error on previous line;Number out of bounds.
UX:crontab: ERROR: Errors detected in input, no crontab file generated.

cant see anything wrong quotes are ok.. States error on previous line; number out of bounds??

0
 

Author Comment

by:tjhack
ID: 21864257
ok i have got into the crontab did not need 60

will wait and see what happens

this script will let me know the system crashes. From there how can i dagnoise it further in regards to checking other things?
0
 

Author Comment

by:tjhack
ID: 21864336
when i go to edit the crontab it gives me the error

crontab > /tmp/crontej.txt

UX:crontab: ERROR:
: error on previous line;Unexpected character found in line.

so not sorted
0
 

Author Comment

by:tjhack
ID: 21864374
i cannot edit the crontab file

typing crontab -e
0
 

Author Comment

by:tjhack
ID: 21864431
edited the temop crontab but when ir emove the line the error is still there

i cannot even copy the crontab to a file so i can whats in there

HELP!!!!
0
 

Author Comment

by:tjhack
ID: 21864462
ok got it sorted back to how it was not sure the script did not run!

copied from above with the minutes but no file was created after 10 mins! left for half hour!
0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21864594
The minutes field can only be in the range from 0 to 59

Which crontab did you change? root's crontab?
Check if that user got email with some error information.

If nothing helps, try creating the file manually:
   touch /home/tp/check.log
and (maybe) change permissions, making sure everybody can write it
   chmod 666 /home/tp/check.log

Maybe, you should use
   /var/syscheck.log
or something like that.
0
 

Author Comment

by:tjhack
ID: 21865109
i have checked root email as its the roots crontab.

How can i view the email? i do not have pine or anything

but i do have emails which are getting send every 10 mins

cheers
0
 

Author Comment

by:tjhack
ID: 21865149
ok got to mail it shows the following:

From root Wed Jun 25 13:40:00 2008
Return-Path: root
Message-Id: <200806251240.NAA24061@aqua-jpir.aqua-jpir>
From: root@aqua-jpir.ccllabel.local
To: root
Date: Wed, 25 Jun 2008 13:40 BST
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Status: R
Content-Length: 178

To: root
Subject: Output from "cron" command

Your "cron" job

echo "SYSTEM check: `/usr/bin/date '+

 produced the following output:

SYSTEM check: Wed Jun 25 13:40:00 BST 2008

i dont want it emailing can this be stopped as mail box will get pretty full!!!

ive created the file now so will wait and see what happens. Did the chmod as well

0
 

Author Comment

by:tjhack
ID: 21865448
i created the file but no joy! nothing added to the file permissions were set with chmod 666

0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21865683
Looks like your "date" command is somehow wrong ...

Check your line in crontab again, compare to the one below -- and make sure everything is in one single line.

Also, try the command directly (you can cut-and-paste from what you get from a "crontab -l" output):
a) The date command alone:
    /usr/bin/date '+%d.%m.%Y %H:%M'
b) The whole echo statement:
    echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'`"
c) The echo with redirection into the log file:
    echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'`" >/home/tp/check.log

Only when all three command ran OK, put it in your crontab exactly (!) like this.
0,10,20,30,40,50 * * * * echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'`" >/home/tp/check.log 2>&1

Open in new window

0
 

Author Comment

by:tjhack
ID: 21865864
hi

when i ran all the commands they worked fine. if the last one it wrote the text into the file

but from the crontab it is not writing to the text file

it should work though as the line worked fine when putting it in the shell
0
 
LVL 14

Expert Comment

by:mikelfritz
ID: 21865942
OK.

maybe this line so it redirects standard error to standard in?




0,10,20,30,40,50 * * * *  echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'`" >/home/tp/check.log   2>&1

Open in new window

0
 

Author Comment

by:tjhack
ID: 21866031
have put that to the test will check in 10 mins

how can i stop the emails going to root as  the mail box will get big overnight

0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21866038
are you getting mail, still?

did you append the 2>&1 at the end (as from my crontab sample)?
0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21866165
the 2>&1 will make sure you don't get emails anymore as it will also go to the file
0
 

Author Comment

by:tjhack
ID: 21866191
got email through again

0,10,20,30,40,50 * * * *  echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'`"
 >/home/tp/check.log   2>&1

thats the line i put in. but still did not append to file

but manually it is ok is it better to put file elsewhere?

0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21866299
did you try to write to the file as the same user?

Maybe start with file like  /var/test.log or /tmp/test.log ?

And make sure to append 2>&1 at the end to avoid getting error messages as emails.
0
 

Author Comment

by:tjhack
ID: 21866391
yup as root.

well i changed the line to  point to a check.log on the top directory so will see what it does but i still got the email sent previoius

is it the correct syntax for sco unixware 7.1.1
0
 

Author Comment

by:tjhack
ID: 21866428
no joy still! still sent email and did not append to file

0
 
LVL 16

Accepted Solution

by:
Hanno Schröder earned 250 total points
ID: 21866510
... but the email you've posted earlier, end with ".... date +
Therefore, something is missong in your crontab's line?

Try this line instead and post the email if you still get one:
0,10,20,30,40,50 * * * *  /usr/bin/date '+%d.%m.%Y %H:%M' >/home/tp/check.log 2>&1

#

# If it still fails, use this minimal one instead:

0,10,20,30,40,50 * * * *  /usr/bin/date  >/tmp/syscheck.log 2>&1

Open in new window

0
 

Author Comment

by:tjhack
ID: 21866576
testing that line but here was the email i have been receving. seems it producing output

From root Wed Jun 25 16:00:00 2008
Return-Path: root
Date: Wed, 25 Jun 2008 16:00 BST
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Status: R
Content-Length: 179

To: root
Subject: Output from "cron" command

Your "cron" job

 echo "SYSTEM check: `/usr/bin/date '+

 produced the following output:

SYSTEM check: Wed Jun 25 16:00:00 BST 2008
0
 

Author Comment

by:tjhack
ID: 21866642
tried with the first line and got email and nothing written to file email states

To: root
Subject: Output from "cron" command

Your "cron" job

 /usr/bin/date '+

 produced the following output:

Wed Jun 25 16:20:00 BST 2008
0
 

Author Comment

by:tjhack
ID: 21866649
i will try the other line in the morning! but dont think it iwll make a difference

0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21866737
maybe you are havin problem with the date command :-(

Use
0,10,20,30,40,50 * * * *  /usr/bin/date  >/tmp/syscheck.log 2>&1

Open in new window

0
 
LVL 14

Expert Comment

by:mikelfritz
ID: 21866813
Try this
0,10,20,30,40,50 * * * *  /usr/bin/date  >/tmp/syscheck.log  >/dev/null 2>&1

Open in new window

0
 
LVL 14

Expert Comment

by:mikelfritz
ID: 21866825
Is the date command under /usr/bin?

try manually:

/usr/bin/date

The mail seems to indicate that it is working, but...
0
 

Author Comment

by:tjhack
ID: 21870120
Hi The date is under usr/bin which i tried and it displayed the day

i will try again in the morning

cheers guys
0
 

Author Comment

by:tjhack
ID: 21872857
hi guys sorted!

i added the following line 0,10,20,30,40,50 * * * *  echo "SYSTEM check:`/usr/bin/date`">/syscheck.log 2>&1

space was also removed between /usr/bin/date`">/syscheck.log

ok will now have to wait and see when system falls.

is there anything else i can do? i mentioned the light on the server which has a picture of memory that was flashign when it last failed

0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21872958
What server (hardware) do you have?
Don't know of a machine with a lamp like that ...

How does a "picture of memory" look like (attach a photo or you have some link to a website)?

If you do copy-and-past the
   echo "SYSTEM check: `/usr/bin/date`"
it should output like this:
  SYSTEM check: Thu Jun 26 11:08:04 MEST 2008
0
 

Author Comment

by:tjhack
ID: 21873193
hi the file has the output

SYSTEM check:Thu Jun 26 10:10:00 BST 2008

i have attached picture. as you can see its an old server.

but i just found out that the light always flashes after a backup ??
DSCN0148.JPG
0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21873303
OK, now you know that your cron entry will produce a timestamp in your file

a) To get new timestamp appended, change the crontab entry to read like
    below

b) Maybe, the backup process utilizes memory that is not used otherwise?
    And you may have some ECC memory detecting hardware error that can
    get corrected (?)  ----  and sometimes  the system  produces  too many
    errors and the system gets haltet (??)

c) Do you have a chance to take the system down an run some intensive
    memory check? This may as well lead to the final decision to get the
    system replaced by some new hardware and software (still want to
    use old Unixware or may migrate to some other Unix variant?)
0,10,20,30,40,50 * * * *  echo "SYSTEM check: `/usr/bin/date`">/syscheck.log 2>&1

Open in new window

0
 

Author Comment

by:tjhack
ID: 21873389
do i need to append to the file ? do i need to use >>?

also how do i do some intensive memory checks?
0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21873710
Yes, I did mistype it :-(

To get a log of all the time stamps, use append mode. If you are find with only having the last entry, you can leave it as it is.
But remember: As soon as you reboot the system after it got to the halt, it will start writing to the log file again.

0,10,20,30,40,50 * * * *  echo "SYSTEM check: `/usr/bin/date`">>/syscheck.log 2>&1

Open in new window

0
 

Author Comment

by:tjhack
ID: 21874089
cheers have done that

you mentioned memory testing? is there tools to do this?
0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21874180
There is "memtest86+" available (at http://www.memtest.org)

You put it on a boot floopy and start from there:
http://www.memtest.org/download/2.01/memtest86+-2.01.floppy.zip
0
 

Author Comment

by:tjhack
ID: 21881450
cheers, need to find the right time to do this

is it best to truncate the th output manually? how can i do this?
0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21882407
But be warned: You might end up to get a new server in place;-)
0
 

Author Comment

by:tjhack
ID: 21897307
looks that way was dead again this morning

hard drive lights were not flashing. nothing on screen.

timed out connecting from shell

went down on saturday 940pm

but nothing runs at that time!
0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21897570
when was the last entry in your log file (time stamp) ?
This should help to find the pont in time when the server stopped working.
0
 

Author Comment

by:tjhack
ID: 21897587
SYSTEM check:Sat Jun 28 21:40:00 BST 2008

0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21897602
Now, you should try finding out what was active by that time, if you have any other
messages around this time or what stopped working at that time.
0
 

Author Comment

by:tjhack
ID: 21897619
i looked in the osmlog.old and i found this but the time is different

WARNING: Tape Driver: HA 2 TC 5 LU 0 - CHECK CONDITION:

WARNING: TaA "MEDIUM ERROR" condition has been detected.
A "MEDIUM ERAdditional data = "EXCESSIVE WRITE ERRORS".
Additional dLogical block address = 0x0A000000
Logical bloc
WARNING: Tape Driver: HA 2 TC 5 LU 0 - CHECK CONDITION:

WARNING: TaA "MEDIUM ERROR" condition has been detected.
A "MEDIUM ERAdditional data = "EXCESSIVE WRITE ERRORS".

Additional dLogical block address = 0x02000000
Logical blocJun 28 07:56:30 sendmail[13634]: HAA13634: from=root, size=486, clas
s=0, pri=30486, nrcpts=1, msgid=<200806280656.HAA13634@aqua-jpir.aqua-jpir>, rel
ay=root@localhost
Jun 28 07:56:30 sendmail[13636]: HAA13634: to=root, ctladdr=root (0/3), delay=00
:00:00, xdelay=00:00:00, mailer=local, stat=Sent
Jun 28 08:01:35 in.telnetd[13655]: connect from 10.110.128.12
Jun 28 08:48:32 in.telnetd[13814]: connect from 10.110.128.12
Jun 28 09:06:46 in.telnetd[13866]: connect from 10.110.128.12
Jun 28 09:25:47 in.telnetd[13912]: connect from 10.110.128.13
Jun 28 09:41:48 in.telnetd[14015]: connect from 10.110.128.12

thats the last entry

stopped writting in the morning it seems like
0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21897665
The only things you know:
a) The time your system stopped working (got hung)

All the other info may not be related to your problem with that "freeze" of your box.

1) From your info, it looks like the system stops when it's mostly idle -- is that right?
2) Does the system stop working frequently; would you say that it's reproducible?
3) If these two assumptions are true, you may want to create a script to keep your system busy during those times it's likely to stop working.
0
 

Author Comment

by:tjhack
ID: 21897692
Hi

The system freezes randomly. But it has always been at some point in the night. But the system is idle every night as no one is at work. People may be working till about 10pm.

I think i need to see if it happens again because then i check the log file i created to see when it stopped again. Not a very good way of checking though but it seems like the only way!

its not reproducible as it has done it on random days!

But it happened on the last 2 saturdays in the evening. So need to check i might even do a reboot on friday and see if it happens on saturday. then hopefully a pattern will form.





0
 
LVL 16

Expert Comment

by:Hanno Schröder
ID: 21897826
That's the problem with situations like this:
You have usually only very few information and cannot reproduce regularly ...
0
 
LVL 14

Expert Comment

by:mikelfritz
ID: 21905212
I still say it sounds environmental - external power problem or the like.  
0
 

Author Comment

by:tjhack
ID: 21905531
Hi Mikelfritz,

the server is in the server room which has controlled temperature. We have a pc next to the machine and ups in the shelf above

what can i do to check environmental issues?
0
 

Author Comment

by:tjhack
ID: 21923680
Just update did a reboot and at a kernal module error is this linked to the memory

said something like warning adsk....
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Let's say you need to move the data of a file system from one partition to another. This generally involves dismounting the file system, backing it up to tapes, and restoring it to a new partition. You may also copy the file system from one place to…
Using libpcap/Jpcap to capture and send packets on Solaris version (10/11) Library used: 1.      Libpcap (http://www.tcpdump.org) Version 1.2 2.      Jpcap(http://netresearch.ics.uci.edu/kfujii/Jpcap/doc/index.html) Version 0.6 Prerequisite: 1.      GCC …
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now