Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1430
  • Last Modified:

Unixware 7.1.1 server power on but cannot connect happening on weekly basis

Hi

We have a very old server which is running Unixware 7.1.1. for the last 3 weeks we have been having a probelm with it.

The problem is that we have come into work in the morning and no one can connect to the server. It times out. I looked at the server and it was all powered but there was nothing on screen. On the server itself there is a light with a sign which i think is memory and exclamation mark. That was flashing!!! SO i had to power it down from the switch.

I bought the system backup and all is fine. But it will happen again and it happens overnight. The unix box is in a secure server room so no one has access.

i check in the log file and there was no errors recorded so i am a bit stuck. i checked both syslog and osmlog

hope someone can assist

Thanks

Tj
0
tjhack
Asked:
tjhack
  • 35
  • 23
  • 6
1 Solution
 
mikelfritzCommented:
Any way to pinpoint the time?  Maybe write a script to echo the date out to a file and then let it die to see the time.

while true
do
date > date.log
sleep 60
done

Maybe something environmental at night.  I had a customers system that would die at night and it turned out that it coincided with the cardboard box crusher in the warehouse being run.
0
 
mikelfritzCommented:
Maybe post the output of "crontab -l" to see if there is something kicking off that is crashing the server.
0
 
tjhackAuthor Commented:
Hi

I am quite new to unix so still learning. How do i post the output of the crontab?

0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
run the command
   crontab -l
on your system to see your crontab.
To see the crontab of another user, run
  crontab -u username -l

To see which crontabs are there, list the crontab files
  ls /var/spool/crontabs
The directory name may be different on your system. Check with
"man cron" or "man crontabs" to verify.
You may also use "find / -type d -name crontabs" to find the dir.
0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
If you install a script that runs all night, you may get rid of the problem
by actually hiding the root cause as the error may only occur when the
system is really idle over night ;-)

BTW: To edit a crontab, use the command
  EDITOR=vi ; export EDITOR        # make sure not to use 'ed' as editor
  crontab -e
0
 
tjhackAuthor Commented:
Hi Just did a search and found a few crontabs

./etc/inst/save/var/spool/cron/crontabs
./var/sadm/pkg/cmds/save/build/var/spool/cron/crontabs
./var/spool/cron/crontabs

which one is the correct one?

we have the backup running at night which uses the script cron.backup.dat

0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
To run a command once every minute, you can put this line into your crontab:
* * * * * echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'` > /path/to/check.log

a) Verify that the command works fine:
      /usr/bin/date '+%d.%m.%Y %H:%M'
b) Change the filename /path/to/check.log to your liking.
0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
The crontabs are in
  /var/spool/cron/crontabs
To see all files, simply use
  ls /var/spool/cron/crontabs
0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
Does your backup complete successfully or does the system die before it finished?
0
 
tjhackAuthor Commented:
when it has happened the system dies before it finishes

0
 
tjhackAuthor Commented:
i did an ls

adm        
_cron16161  lp        
 root.new    
time
_cron13781  
_cron22417  root      
 sys        
 uucp

and i got the folders above. which folder or file do i need to look at?
0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
you may want to add you periodic check into root's crontab:
a) make sure you are root
b) use "crontab -e" as stated before to edit the file

The other files (adm, lp, sys, uucp) are standard cron files.
You can see the contents of the sys crontab like this:
   crontab -l sys

The crontabs with names like _cron* and root.new will not
be used, as there is no corresponding user with that name.
0
 
tjhackAuthor Commented:
how can make it run every 10 mins

i added this line which prodcued errors

0,10,20,30,40,50,60 * * * * echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'
` > /home/tp/check.log

0
 
tjhackAuthor Commented:
error is

UX:crontab: ERROR: 0-59/10 * * * * echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y
 %H:%M'` > /home/tp/check.log
: error on previous line;Unexpected character found in line.
UX:crontab: ERROR: Errors detected in input, no crontab file generated.
0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
What are you trying to tell cron?

To have the command running every minute from 10:00 to 10:59 you
should write
* 10 * * *   echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'` > /home/tp/check.log

The syntax x-y is invalid, as well as n/m is invalid

The following (one single) line would be fine, too. Note the closing " near the end!
0,10,20,30,40,50,60 * * * * echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'`" >/home/tp/check.log

Open in new window

0
 
tjhackAuthor Commented:
i want it to run every 10 mins

i still get the error below

UX:crontab: ERROR: 0,10,20,30,40,50,60 * * * * echo "SYSTEM check: `/usr/bin/dat
e '+%d.%m.%Y %H:%M'`" > /home/tp/check.log
: error on previous line;Number out of bounds.
UX:crontab: ERROR: Errors detected in input, no crontab file generated.

cant see anything wrong quotes are ok.. States error on previous line; number out of bounds??

0
 
tjhackAuthor Commented:
ok i have got into the crontab did not need 60

will wait and see what happens

this script will let me know the system crashes. From there how can i dagnoise it further in regards to checking other things?
0
 
tjhackAuthor Commented:
when i go to edit the crontab it gives me the error

crontab > /tmp/crontej.txt

UX:crontab: ERROR:
: error on previous line;Unexpected character found in line.

so not sorted
0
 
tjhackAuthor Commented:
i cannot edit the crontab file

typing crontab -e
0
 
tjhackAuthor Commented:
edited the temop crontab but when ir emove the line the error is still there

i cannot even copy the crontab to a file so i can whats in there

HELP!!!!
0
 
tjhackAuthor Commented:
ok got it sorted back to how it was not sure the script did not run!

copied from above with the minutes but no file was created after 10 mins! left for half hour!
0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
The minutes field can only be in the range from 0 to 59

Which crontab did you change? root's crontab?
Check if that user got email with some error information.

If nothing helps, try creating the file manually:
   touch /home/tp/check.log
and (maybe) change permissions, making sure everybody can write it
   chmod 666 /home/tp/check.log

Maybe, you should use
   /var/syscheck.log
or something like that.
0
 
tjhackAuthor Commented:
i have checked root email as its the roots crontab.

How can i view the email? i do not have pine or anything

but i do have emails which are getting send every 10 mins

cheers
0
 
tjhackAuthor Commented:
ok got to mail it shows the following:

From root Wed Jun 25 13:40:00 2008
Return-Path: root
Message-Id: <200806251240.NAA24061@aqua-jpir.aqua-jpir>
From: root@aqua-jpir.ccllabel.local
To: root
Date: Wed, 25 Jun 2008 13:40 BST
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Status: R
Content-Length: 178

To: root
Subject: Output from "cron" command

Your "cron" job

echo "SYSTEM check: `/usr/bin/date '+

 produced the following output:

SYSTEM check: Wed Jun 25 13:40:00 BST 2008

i dont want it emailing can this be stopped as mail box will get pretty full!!!

ive created the file now so will wait and see what happens. Did the chmod as well

0
 
tjhackAuthor Commented:
i created the file but no joy! nothing added to the file permissions were set with chmod 666

0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
Looks like your "date" command is somehow wrong ...

Check your line in crontab again, compare to the one below -- and make sure everything is in one single line.

Also, try the command directly (you can cut-and-paste from what you get from a "crontab -l" output):
a) The date command alone:
    /usr/bin/date '+%d.%m.%Y %H:%M'
b) The whole echo statement:
    echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'`"
c) The echo with redirection into the log file:
    echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'`" >/home/tp/check.log

Only when all three command ran OK, put it in your crontab exactly (!) like this.
0,10,20,30,40,50 * * * * echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'`" >/home/tp/check.log 2>&1

Open in new window

0
 
tjhackAuthor Commented:
hi

when i ran all the commands they worked fine. if the last one it wrote the text into the file

but from the crontab it is not writing to the text file

it should work though as the line worked fine when putting it in the shell
0
 
mikelfritzCommented:
OK.

maybe this line so it redirects standard error to standard in?




0,10,20,30,40,50 * * * *  echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'`" >/home/tp/check.log   2>&1

Open in new window

0
 
tjhackAuthor Commented:
have put that to the test will check in 10 mins

how can i stop the emails going to root as  the mail box will get big overnight

0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
are you getting mail, still?

did you append the 2>&1 at the end (as from my crontab sample)?
0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
the 2>&1 will make sure you don't get emails anymore as it will also go to the file
0
 
tjhackAuthor Commented:
got email through again

0,10,20,30,40,50 * * * *  echo "SYSTEM check: `/usr/bin/date '+%d.%m.%Y %H:%M'`"
 >/home/tp/check.log   2>&1

thats the line i put in. but still did not append to file

but manually it is ok is it better to put file elsewhere?

0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
did you try to write to the file as the same user?

Maybe start with file like  /var/test.log or /tmp/test.log ?

And make sure to append 2>&1 at the end to avoid getting error messages as emails.
0
 
tjhackAuthor Commented:
yup as root.

well i changed the line to  point to a check.log on the top directory so will see what it does but i still got the email sent previoius

is it the correct syntax for sco unixware 7.1.1
0
 
tjhackAuthor Commented:
no joy still! still sent email and did not append to file

0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
... but the email you've posted earlier, end with ".... date +
Therefore, something is missong in your crontab's line?

Try this line instead and post the email if you still get one:
0,10,20,30,40,50 * * * *  /usr/bin/date '+%d.%m.%Y %H:%M' >/home/tp/check.log 2>&1
#
# If it still fails, use this minimal one instead:
0,10,20,30,40,50 * * * *  /usr/bin/date  >/tmp/syscheck.log 2>&1

Open in new window

0
 
tjhackAuthor Commented:
testing that line but here was the email i have been receving. seems it producing output

From root Wed Jun 25 16:00:00 2008
Return-Path: root
Date: Wed, 25 Jun 2008 16:00 BST
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Status: R
Content-Length: 179

To: root
Subject: Output from "cron" command

Your "cron" job

 echo "SYSTEM check: `/usr/bin/date '+

 produced the following output:

SYSTEM check: Wed Jun 25 16:00:00 BST 2008
0
 
tjhackAuthor Commented:
tried with the first line and got email and nothing written to file email states

To: root
Subject: Output from "cron" command

Your "cron" job

 /usr/bin/date '+

 produced the following output:

Wed Jun 25 16:20:00 BST 2008
0
 
tjhackAuthor Commented:
i will try the other line in the morning! but dont think it iwll make a difference

0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
maybe you are havin problem with the date command :-(

Use
0,10,20,30,40,50 * * * *  /usr/bin/date  >/tmp/syscheck.log 2>&1

Open in new window

0
 
mikelfritzCommented:
Try this
0,10,20,30,40,50 * * * *  /usr/bin/date  >/tmp/syscheck.log  >/dev/null 2>&1

Open in new window

0
 
mikelfritzCommented:
Is the date command under /usr/bin?

try manually:

/usr/bin/date

The mail seems to indicate that it is working, but...
0
 
tjhackAuthor Commented:
Hi The date is under usr/bin which i tried and it displayed the day

i will try again in the morning

cheers guys
0
 
tjhackAuthor Commented:
hi guys sorted!

i added the following line 0,10,20,30,40,50 * * * *  echo "SYSTEM check:`/usr/bin/date`">/syscheck.log 2>&1

space was also removed between /usr/bin/date`">/syscheck.log

ok will now have to wait and see when system falls.

is there anything else i can do? i mentioned the light on the server which has a picture of memory that was flashign when it last failed

0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
What server (hardware) do you have?
Don't know of a machine with a lamp like that ...

How does a "picture of memory" look like (attach a photo or you have some link to a website)?

If you do copy-and-past the
   echo "SYSTEM check: `/usr/bin/date`"
it should output like this:
  SYSTEM check: Thu Jun 26 11:08:04 MEST 2008
0
 
tjhackAuthor Commented:
hi the file has the output

SYSTEM check:Thu Jun 26 10:10:00 BST 2008

i have attached picture. as you can see its an old server.

but i just found out that the light always flashes after a backup ??
DSCN0148.JPG
0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
OK, now you know that your cron entry will produce a timestamp in your file

a) To get new timestamp appended, change the crontab entry to read like
    below

b) Maybe, the backup process utilizes memory that is not used otherwise?
    And you may have some ECC memory detecting hardware error that can
    get corrected (?)  ----  and sometimes  the system  produces  too many
    errors and the system gets haltet (??)

c) Do you have a chance to take the system down an run some intensive
    memory check? This may as well lead to the final decision to get the
    system replaced by some new hardware and software (still want to
    use old Unixware or may migrate to some other Unix variant?)
0,10,20,30,40,50 * * * *  echo "SYSTEM check: `/usr/bin/date`">/syscheck.log 2>&1

Open in new window

0
 
tjhackAuthor Commented:
do i need to append to the file ? do i need to use >>?

also how do i do some intensive memory checks?
0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
Yes, I did mistype it :-(

To get a log of all the time stamps, use append mode. If you are find with only having the last entry, you can leave it as it is.
But remember: As soon as you reboot the system after it got to the halt, it will start writing to the log file again.

0,10,20,30,40,50 * * * *  echo "SYSTEM check: `/usr/bin/date`">>/syscheck.log 2>&1

Open in new window

0
 
tjhackAuthor Commented:
cheers have done that

you mentioned memory testing? is there tools to do this?
0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
There is "memtest86+" available (at http://www.memtest.org)

You put it on a boot floopy and start from there:
http://www.memtest.org/download/2.01/memtest86+-2.01.floppy.zip
0
 
tjhackAuthor Commented:
cheers, need to find the right time to do this

is it best to truncate the th output manually? how can i do this?
0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
But be warned: You might end up to get a new server in place;-)
0
 
tjhackAuthor Commented:
looks that way was dead again this morning

hard drive lights were not flashing. nothing on screen.

timed out connecting from shell

went down on saturday 940pm

but nothing runs at that time!
0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
when was the last entry in your log file (time stamp) ?
This should help to find the pont in time when the server stopped working.
0
 
tjhackAuthor Commented:
SYSTEM check:Sat Jun 28 21:40:00 BST 2008

0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
Now, you should try finding out what was active by that time, if you have any other
messages around this time or what stopped working at that time.
0
 
tjhackAuthor Commented:
i looked in the osmlog.old and i found this but the time is different

WARNING: Tape Driver: HA 2 TC 5 LU 0 - CHECK CONDITION:

WARNING: TaA "MEDIUM ERROR" condition has been detected.
A "MEDIUM ERAdditional data = "EXCESSIVE WRITE ERRORS".
Additional dLogical block address = 0x0A000000
Logical bloc
WARNING: Tape Driver: HA 2 TC 5 LU 0 - CHECK CONDITION:

WARNING: TaA "MEDIUM ERROR" condition has been detected.
A "MEDIUM ERAdditional data = "EXCESSIVE WRITE ERRORS".

Additional dLogical block address = 0x02000000
Logical blocJun 28 07:56:30 sendmail[13634]: HAA13634: from=root, size=486, clas
s=0, pri=30486, nrcpts=1, msgid=<200806280656.HAA13634@aqua-jpir.aqua-jpir>, rel
ay=root@localhost
Jun 28 07:56:30 sendmail[13636]: HAA13634: to=root, ctladdr=root (0/3), delay=00
:00:00, xdelay=00:00:00, mailer=local, stat=Sent
Jun 28 08:01:35 in.telnetd[13655]: connect from 10.110.128.12
Jun 28 08:48:32 in.telnetd[13814]: connect from 10.110.128.12
Jun 28 09:06:46 in.telnetd[13866]: connect from 10.110.128.12
Jun 28 09:25:47 in.telnetd[13912]: connect from 10.110.128.13
Jun 28 09:41:48 in.telnetd[14015]: connect from 10.110.128.12

thats the last entry

stopped writting in the morning it seems like
0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
The only things you know:
a) The time your system stopped working (got hung)

All the other info may not be related to your problem with that "freeze" of your box.

1) From your info, it looks like the system stops when it's mostly idle -- is that right?
2) Does the system stop working frequently; would you say that it's reproducible?
3) If these two assumptions are true, you may want to create a script to keep your system busy during those times it's likely to stop working.
0
 
tjhackAuthor Commented:
Hi

The system freezes randomly. But it has always been at some point in the night. But the system is idle every night as no one is at work. People may be working till about 10pm.

I think i need to see if it happens again because then i check the log file i created to see when it stopped again. Not a very good way of checking though but it seems like the only way!

its not reproducible as it has done it on random days!

But it happened on the last 2 saturdays in the evening. So need to check i might even do a reboot on friday and see if it happens on saturday. then hopefully a pattern will form.





0
 
Hanno P.S.IT Consultant and Infrastructure ArchitectCommented:
That's the problem with situations like this:
You have usually only very few information and cannot reproduce regularly ...
0
 
mikelfritzCommented:
I still say it sounds environmental - external power problem or the like.  
0
 
tjhackAuthor Commented:
Hi Mikelfritz,

the server is in the server room which has controlled temperature. We have a pc next to the machine and ups in the shelf above

what can i do to check environmental issues?
0
 
tjhackAuthor Commented:
Just update did a reboot and at a kernal module error is this linked to the memory

said something like warning adsk....
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 35
  • 23
  • 6
Tackle projects and never again get stuck behind a technical roadblock.
Join Now