2003 R2 Terminal Server Instability

I have two 2003 R2 servers different hardware configurations but same software configurations.  They both except RDC connections from pxe booted thin clients (ubuntu 8.10 server running ltsp provide boot images).  Load balancing places approximately 25 users per server.  This solution ran perfectly until about a month ago.  Servers just reboot seemingly at will.  I was stable after having the servers reboot themselves nightly (not a great solution but it worked) until today.  Both servers have rebooted themselves.  I can't find anything in the logs to clue me in but I am also not well versed with windows servers.  I've tried the normal stuff.  Defrag, HD repair, antivirus, and spyware.  I even rebuilt the raid arrays.  Nothing seems to be wrong.  Can anyone help point me in the right direction?
bmbaerAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ryansotoCommented:
First start with memory test it.  Memtest
Next I would check Antivirus - disable all services
Check drivers and see if any updates are available.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
exx1976Commented:
Rebooting every night is a GOOD thing on multi-user systems..  I've been doing it with TS/Citrix boxes for about 7 years now..  Every night, like clockwork..  Need to clear those memory leaks.  Until someone knows of a better way to do that, then................

Also, are you running the normal TS stuff?  UPHClean, etc.. ?
0
Michael WorshamStaff Infrastructure ArchitectCommented:
Define what you mean by 'instability'...

Also, what kind of specs are each of the Terminal Servers (i.e. CPU, Ram, Swap space, etc)?


0
Protecting & Securing Your Critical Data

Considering 93 percent of companies file for bankruptcy within 12 months of a disaster that blocked access to their data for 10 days or more, planning for the worst is just smart business. Learn how Acronis Backup integrates security at every stage

bmbaerAuthor Commented:
Antivirus was run and gave a clean bill of health
drivers have all been updated to most recent available via windows update
Memtest, is there a program in windows to do this that I'm unaware of or is this something I should be doing from BIOS?
0
bmbaerAuthor Commented:
Not sure what UPHclean is.  I run a weekly antivirus and antispyware scan using etrust products.  I defrag every other day using diskeeper and reboot nightly.  Both machines are dual zeon dualcore 3.0 on one box & 3.2 on the other, with 4gig of ram, VM is set to 2046MB, both have megaraid cards setup with a 3 drive raid 5 & a hot spare  drive space is approx 300gig on one server and about 500 gig on the other.
0
Michael WorshamStaff Infrastructure ArchitectCommented:
Memtest is a downloadable ISO and is run from the server directly, not under the Windows operating system environment.

Reference:
http://www.memtest.org/
0
Michael WorshamStaff Infrastructure ArchitectCommented:
0
bmbaerAuthor Commented:
thanks mwecomputers, I'll run that tonight after hours
0
exx1976Commented:
Yes, as MWE referenced, UPHClean is the User Profile Cleanup service.  It's a must on multi-user boxes.


Also, IMHO, you should scrap eTrust.  It's not a multi-user-aware AV product, and it will throw all notifications to the console, instead of into the user's sessions.

Plus it doesn't scale very well.  An independent study found Trend's AV product to be the best for TS/Citrix boxes.

Here's a link to a video of a thorough discussion of it at BriForum 2005..

http://www.brianmadden.com/blogs/guestbloggers/archive/2006/01/08/briforum-2005-video-fabrice-cornet-discusses-antivirus-software-in-citrix-and-terminal-server-environments.aspx

0
bmbaerAuthor Commented:
Have tried spyware and antivirus scans with trends av products with no results.  memtest utility was run on both servers for a clean bill of health.  UPHclean was installed on both servers yesterday and today the servers rebooted themselves around 1pm EST.  Average users online at the time was about 20 users per server.  going back I saw that I was asked to define "instability".  I would define that as servers that reboot on their own with nothing to be seen in the logs and a seemingly low load.  Anyone have any other suggestions for me?
0
HerrmannatorCommented:
Are these crash type reboots?  Are these servers fully patched (OS)?  You could try disabling or un-installing anti-virus temporariy to see if the rebooting stops.  It might also turn out to be a driver issue.  Any recent changes in drivers?  What about auto-created printers from user sessions?  Can you start watching whether a particular user(s) happens to be logging on when these crashes occur?  If so it may turn out to be a printer driver or some other device on that users machine causing the crash as it is brought into the terminal session.  If it is always the same person(s) logging in when the crash occurs, look at their machine.  First try deleting all their printers and see if problem goes away.
0
bmbaerAuthor Commented:
Herrmannator,

These are crash type reboots or the systems go unresponsive and then have to be hard reset.  I have noticed two things in the logs that I didn't pay much attention to but they may be a factor.  First though,  All windows patch's have been applied including hardware drivers.  I did change my antivirus application last night to trend micro worry free business security from ca etrust (it was EOL on the first of November anyway) so perhaps the problem was antivirus, we'll find out perhaps in the next day or two if the servers reboot.

Back to the errors.  I get three; one is in regards to a lexmark T612 printer but it appears to be print jobs that were active when the server went down so I didn't think much about it.  It always seems to be one of two people but that same driver is used for 6 other identical printers on the network so my guess is that it isn't the driver as the other 4 printers never seem to show up in the logs, would that be an accurate assumption?

the second error is "Windows has detected that Offline Caching is enabled on the Roaming Profile share - to avoid potential profile corruption, Offline Caching must be disabled on shares where roaming user profiles are stored."  I checked in the control panels and under files offline caching is not available there.  Where else should I be turning offline caching off?


The Third:  "The server {0002DF01-0000-0000-C000-000000000046} did not register with DCOM within the required timeout."  ????

0
HerrmannatorCommented:
As for the issue of offline caching on the profile share, yes, you definately need to disable it to help reduce profile corruption.  Here is an article on that:
http://support.microsoft.com/default.aspx?scid=kb;en-us;287566
As for the printer drivers being the cause of the crashes, if the printer drivers are installed on the client PC and then getting brought into the terminal services session, then the printer drivers on the local PC are a likely suspect, and the issue might happen immediately when they log on.  If that is what happens, try deleting the person's printers from their PC and then have them log into the TS and see if the error is gone.  If it is, the reinstall their printers one at a time with updated drivers that are hopefully windows and citrix certified.
0
bmbaerAuthor Commented:
Thank everyone for your input.  I found some great information out about what I was doing wrong with Terminal Services.  The actual answer I believe is in regard to the AnitVirus solution.  I was running Clam AV on one of the servers because I hadn't yet upgraded to a new solution after my CA license expired.  After removing ClamAV and replacing it with Trend Micro Small business the servers seem to have stabalized.   It looks like after a combination of changes from disabling offline caching as suggested as well as running UPHclean the corruption of files in the profiles I was seeing also went away.
0
Michael WorshamStaff Infrastructure ArchitectCommented:
Glad you found a solution.

Just as a reference, if you are running an Exchange server anywhere on your network, make sure that the Anti-Virus software installed on the server is _not_ scanning your Exchange server databases or you will most likely encounter database corruption.

0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Server OS

From novice to tech pro — start learning today.