Link to home
Start Free TrialLog in
Avatar of crystaltec
crystaltecFlag for Australia

asked on

OSX 10.6 SERVER help decipher the mystery - network becomes unavaiable

School with about 60 macs & 20 PC's,
6 buildings, connected with underground fiber to a main 8 port fiber, 16 port Giga switch, each building has a switch with 2 Fiber 24 port Giga- all protected by UPS's - good infrastructure.
Xserve (early intel - june 2008)  2 x 2.8G Xeon, 8Gig RAM
2 x 1Tb HD in RAID1 (software RAID using disk utility)
Server is in a server room with aircon 24/7

HD partitioned into 2 = OS - 100G - SERVER & user Data- 900G - DATA (all user shares inc Users, homes, profiles & groups)
services are located on DATA partition  

clean instal of 10.6 (January 2010) updated to 10.6.2 before any services were configured.
(an upgrade instal from 10.5 didn't go too well....)
Server is Open Directory Master & Primary Domain Controller (Windows), DNS, DHCP, AFP, SMB, Software update server

2 backup routines -
Time machine  - DATA partition (excluding services ) to a time capsule over the network
Super Duper - Daily smart update - of both partitions to an external firewire RAID1 drive (Bootable)

The issue is that intermediately the server & it's network resource become unavailable, when it occurs, we can click on dock but application won't open, can click on spotlight but can't type, can click on apple menu & on restart but server won't action it.
at the same time, I can remote from outside the site using VNC on to the server (though it's not much use)
iStat for iphone will report nothing unusual during this (i.e. can connect & all resources, fan speeds, temperatures etc... are normal)
connected users will report shared locations not available, unconnected users are not able to log in.
The only way out is hard reboot (press the button), the server restarts happily & works just fine!
This started imiditattly after an upgrade instal from 10.5 to 10.6 - this prompted us for a clean instal

Have installed a second server at another site 2 weeks after this one, same CD, same set-up - no issues at all.

initially this was happening once or twice a week but now it's once or twice a day.
Additionally, the RAID broke... we repaired it & it broke again, replace HD, imaged data off, recreate RAID, restored data, broke again, this lead us to thing it must be hardware issue, possibly board.
had it in local Apple repair centre for 10 days - they done extensive testing & reckon 2 stick of RAM are faulty - currently they are out for warranty.
Apple centre rebuild the RAID without faulty RAM, so far so good with RAID but mystery freezes still a huge issue.
while it was out on warranty we were running of a mac mini - booted of firewire drive - same issue.

actions taken
Done all the obvious things - verify permissions, verify disk (from OS & from CD) etc..
Called Apple Enterprise support (level 3) {many times...} they were very helpful & verified  all our settings are correct (DNS, DHCP, Open directory) but no resolution to date.
replaced some of the old switches in the classes
turned off all non-essential services (leaving only DNS, DHCP, AFP, SMB & software update service)
replace main fiber / giga switch server is connected to
replaced CAT6 cable from server to switch
change network location
changed from using en0 to en1
Updated to 10.6.3
removed 3rd part software (iStat, Super Duper, Vine) only MS Office 2004 left  
turned off Time machine
disconnected firewire backup drive
disconnected USB KB & Mouse

any ideas to decipher the mystery?
Avatar of strung
strung
Flag of Canada image

Any clues in the server logs?
Avatar of crystaltec

ASKER

possibly, but I still haven't found common thread, which ones would you like me to post?
this seems very strange. do you notice a bottle-neck with system and data on same disks?

with you mentioning freezes, i'd be tempted to raid 1 2X1 TB for data and a separate 80 or 160 GB disk in bay 1 for system. use superduper to clone the drives to external as before. Also backup the OD to the raided data drives.

I dislike software raid, especially on a system drive with OD.
ASKER CERTIFIED SOLUTION
Avatar of crystaltec
crystaltec
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
seemed to have worked