OSX 10.6 SERVER help decipher the mystery - network becomes unavaiable

School with about 60 macs & 20 PC's,
6 buildings, connected with underground fiber to a main 8 port fiber, 16 port Giga switch, each building has a switch with 2 Fiber 24 port Giga- all protected by UPS's - good infrastructure.
Xserve (early intel - june 2008)  2 x 2.8G Xeon, 8Gig RAM
2 x 1Tb HD in RAID1 (software RAID using disk utility)
Server is in a server room with aircon 24/7

HD partitioned into 2 = OS - 100G - SERVER & user Data- 900G - DATA (all user shares inc Users, homes, profiles & groups)
services are located on DATA partition  

clean instal of 10.6 (January 2010) updated to 10.6.2 before any services were configured.
(an upgrade instal from 10.5 didn't go too well....)
Server is Open Directory Master & Primary Domain Controller (Windows), DNS, DHCP, AFP, SMB, Software update server

2 backup routines -
Time machine  - DATA partition (excluding services ) to a time capsule over the network
Super Duper - Daily smart update - of both partitions to an external firewire RAID1 drive (Bootable)

The issue is that intermediately the server & it's network resource become unavailable, when it occurs, we can click on dock but application won't open, can click on spotlight but can't type, can click on apple menu & on restart but server won't action it.
at the same time, I can remote from outside the site using VNC on to the server (though it's not much use)
iStat for iphone will report nothing unusual during this (i.e. can connect & all resources, fan speeds, temperatures etc... are normal)
connected users will report shared locations not available, unconnected users are not able to log in.
The only way out is hard reboot (press the button), the server restarts happily & works just fine!
This started imiditattly after an upgrade instal from 10.5 to 10.6 - this prompted us for a clean instal

Have installed a second server at another site 2 weeks after this one, same CD, same set-up - no issues at all.

initially this was happening once or twice a week but now it's once or twice a day.
Additionally, the RAID broke... we repaired it & it broke again, replace HD, imaged data off, recreate RAID, restored data, broke again, this lead us to thing it must be hardware issue, possibly board.
had it in local Apple repair centre for 10 days - they done extensive testing & reckon 2 stick of RAM are faulty - currently they are out for warranty.
Apple centre rebuild the RAID without faulty RAM, so far so good with RAID but mystery freezes still a huge issue.
while it was out on warranty we were running of a mac mini - booted of firewire drive - same issue.

actions taken
Done all the obvious things - verify permissions, verify disk (from OS & from CD) etc..
Called Apple Enterprise support (level 3) {many times...} they were very helpful & verified  all our settings are correct (DNS, DHCP, Open directory) but no resolution to date.
replaced some of the old switches in the classes
turned off all non-essential services (leaving only DNS, DHCP, AFP, SMB & software update service)
replace main fiber / giga switch server is connected to
replaced CAT6 cable from server to switch
change network location
changed from using en0 to en1
Updated to 10.6.3
removed 3rd part software (iStat, Super Duper, Vine) only MS Office 2004 left  
turned off Time machine
disconnected firewire backup drive
disconnected USB KB & Mouse

any ideas to decipher the mystery?
crystaltecAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

strungCommented:
Any clues in the server logs?
0
crystaltecAuthor Commented:
possibly, but I still haven't found common thread, which ones would you like me to post?
0
gmbaxterCommented:
this seems very strange. do you notice a bottle-neck with system and data on same disks?

with you mentioning freezes, i'd be tempted to raid 1 2X1 TB for data and a separate 80 or 160 GB disk in bay 1 for system. use superduper to clone the drives to external as before. Also backup the OD to the raided data drives.

I dislike software raid, especially on a system drive with OD.
0
crystaltecAuthor Commented:
We have settled it down now.

It seems like it's caused by Time machine backing OD. apple confirmed TM is not recommended for use on OD servers.
but we still have some other software to install back & switches to turn on in the classes before we can be sure it was just TM

No disk bottle necks, it's been working fine in this disk set-up (as well as other servers we set-up).
I'll think over your HD setup suggestion though
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
crystaltecAuthor Commented:
seemed to have worked
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Apple Hardware

From novice to tech pro — start learning today.