Solved

Exchange is crashing (requiring reboot) frequently

Posted on 2008-09-30
70
8,264 Views
Last Modified: 2013-12-24
Hello,

I'm running Exchange Server 2003 Enterprise (SP2) on WS2K3 (Enterprise SP2).  It is installed on our only domain controller, which is also our antivirus server (TrendMicro OfficeScan, ScanMail).  Within the past few months, we've had Exchange crash pretty consistently.  It usually starts with the people with phones/ActiveSync complaining that their phones stopped syncing sometime the night before or early that morning.  After that, Outlook starts messing up on its connection to the server (unable to retrieve data, even if it says Connected, or it just says it's offline and can't connect).  I've checked that services are running, and everything looks fine, as far as I can tell.  I can't find a fix, and we end up rebooting.  This is obviously not very convenient, as our DC takes down the whole Internet connection, halting productivity, and Exchange takes forever to shut down/boot back up.

I'm going through the event logs, trying to find the root of the problem, and I thought I'd post everything here to see if someone smarter than me has any ideas. :)  I've just been banging my head against the wall.

The errors I found are listed below (I'm not sure which of these is a cause, and which an effect).  The last one listed is one I noticed just today (since it's at Exchange startup, apparently, and therefore not listed with the rest of the errors/warnings at the time of the problem); I looked at the KB article, and our settings definitely don't match what's listed, but I wanted to confirm what needs to be done (I'm a little nervous messing with the registry on the DC) and/or see if thisfits into the problem, or if it's a completely different issue, and really has no bearing.

Please let me know if you need additional information.  Thanks for any help!

Event Type:       Error
Event Source:    MSExchangeDSAccess
Event Category: Topology
Event ID:           2102
Date:                9/29/2008
Time:                8:03:08 AM
User:                N/A
Computer:         <domaincontroller>
Description:
Process MAD.EXE (PID=6336). All Domain Controller Servers in use are not responding:
domaincontroller.domain.local

For more information, click http://www.microsoft.com/contentredirect.asp
------------------------------------------------------------
Event Type:       Error
Event Source:    MSExchangeDSAccess
Event Category: Topology
Event ID:           2104
Date:                9/29/2008
Time:                8:03:08 AM
User:                N/A
Computer:         <domaincontroller>
Description:
Process STORE.EXE (PID=7124). All the DS Servers in domain are not responding.

For more information, click http://www.microsoft.com/contentredirect.asp.
--------------------------------------------------------
Event Type:       Error
Event Source:    MSExchangeDSAccess
Event Category: Topology
Event ID:           2103
Date:                9/29/2008
Time:                8:03:18 AM
User:                N/A
Computer:         <domaincontroller>
Description:
Process MAD.EXE (PID=6336). All Global Catalog Servers in use are not responding:
domaincontroller.domain.local

For more information, click http://www.microsoft.com/contentredirect.asp.
------------------------------------------------------------
Event Type:       Error
Event Source:    MSExchangeAL
Event Category: LDAP Operations
Event ID:           8026
Date:                9/29/2008
Time:                8:03:15 AM
User:                N/A
Computer:         <domaincontroller>
Description:
LDAP Bind was unsuccessful on directory domaincontroller.domain.local for distinguished name ''. Directory returned error:[0x51] Server Down.    

For more information, click http://www.microsoft.com/contentredirect.asp.
------------------------------------------------------------
Event Type:       Error
Event Source:    MSExchangeAL
Event Category: Service Control
Event ID:           8250
Date:                9/29/2008
Time:                8:03:15 AM
User:                N/A
Computer:         <domaincontroller>
Description:
The Win32 API call 'DsGetDCNameW' returned error code [0x862] The specified component could not be found in the configuration information.  The service could not be initialized.  Make sure that the operating system was installed properly.

For more information, click http://www.microsoft.com/contentredirect.asp.
----------------------------------------------------------
Event Type:       Warning
Event Source:    Server ActiveSync
Event Category: None
Event ID:           3007
Date:                9/29/2008
Time:                8:02:20 AM
User:                EPITECGROUP\user
Computer:         <domaincontroller>
Description:
Exchange mailbox Server response timeout: Server: [domaincontroller.domain.local] User: [user@epitecgroup.com]. Exchange ActiveSync Server failed to communicate with the Exchange mailbox server in a timely manner. Verify that the Exchange mailbox Server is working correctly and is not overloaded.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
-----------------------------------------------------------
Event Type:       Error
Event Source:    Server ActiveSync
Event Category: None
Event ID:           3014
Date:                9/29/2008
Time:                8:02:04 AM
User:                EPITECGROUP\user2
Computer:         <domaincontroller>
Description:
The Exchange mailbox Server: [domaincontroller.domain.local] has reached its timeout threshold. The mailbox server will be protected from new requests for [60] seconds.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
--------------------------------------------------------------
Event Type: Warning
Event Source: MSExchangeIS
Event Category: General
Event ID: 9665
Date: 9/29/2008
Time: 5:57:32 PM
User: N/A
Computer: <domaincontroller>
Description:
The memory settings for this server are not optimal for Exchange.
For more information, click http://support.microsoft.com?kbid=815372
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
0
Comment
Question by:epitec
  • 36
  • 10
  • 6
  • +7
70 Comments
 
LVL 9

Assisted Solution

by:abdulzis
abdulzis earned 25 total points
Comment Utility
Run Exchange Best Practices analyser from www.exbpa.com when the problem occurs and check for any critical errors in the report.

Did you follow http://support.microsoft.com/?kbid=815372 as per the event warning?

Make sure the NIC speed is not set to Auto detect

Disable TCP Chimney, Checksum Offload, etc from the properties of the NIC

Also make sure the binding order of NIC is correct in Network Connections if you have multiple NICs.

Make sure only internal DNS servers are listed on the NIC and all external DNS servers should be listed in DNS forwarders.
0
 
LVL 38

Assisted Solution

by:Hypercat (Deb)
Hypercat (Deb) earned 125 total points
Comment Utility
It's entirely possible, actually likely, that the root of your problem is the 9665 error.  Since this is a domain controller, It's not recommended that you use the /3GB switch in your boot.ini file mentioned in the referenced KB article (815372).  Instead, use the following article that describes editing the HeapDecommitFreeBlockThreshold registry key; this alone will probably fix your issue:

  http://support.microsoft.com/kb/315407/en-us

This has worked for me every time when face with memory issues on a combined DC/Exchange server with 3GB or more of memory.

Also, another hint on optimizing this server: MS recommends that when you have Exchange installed on a DC, you should make that DC a global catalog server and point the Exchange RUS to that server (i.e., itself).
0
 
LVL 5

Assisted Solution

by:rikke_vp
rikke_vp earned 50 total points
Comment Utility
Hi

I completely support the opinion that this is memory related, how many memory do you have? whats the size of your page file? Also think about disk access, try using seperate controllers and drives for databases, logs and user/system/company data.

grts
0
 

Author Comment

by:epitec
Comment Utility
Thank you for all the suggestions.  I had class last night, so I had to jet shortly after posting my question.

abdulzis
"Also make sure the binding order of NIC is correct in Network Connections if you have multiple NICs.

Make sure only internal DNS servers are listed on the NIC and all external DNS servers should be listed in DNS forwarders."

I have verified these.  The other settings you mentioned would be found by accessing the NIC through Device Manager?  I have not followed the KB815372 instructions yet.

I would like to try hypercat's suggestion first (which is also part of the first KB article), although I might not be able to do so until Friday, because I have two more classes, and it says it requires a reboot after the change.  
As to the Exchange RUS, it looks like both RUS (Enterprise Configuration) and RUS (DOMAINNAME) are set to our DC (and only our DC), but I'm not sure if more specific configuration needs to be done than what is here.

Looking into the global catalog settings, I also found this error

Event Type:      Error
Event Source:      DNS
Event Category:      None
Event ID:      4010
Date:            9/29/2008
Time:            8:38:06 AM
User:            N/A
Computer:      <domaincontroller>
Description:
The DNS server was unable to create a resource record for  511ff76f-18e0-4d07-bd86-129bb86106b8._msdcs.local.epitecgroup.com. in zone domain.local. The Active Directory definition of this resource record is corrupt or contains an invalid DNS name. The event data contains the error.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 7b 00 00 00               {...    

The resource record is the DNS Alias (as listed in AD sites & services > NTDS Settings).  This error coincides with at least the last couple crashes.

rikke_vp, we have 16GB RAM, but I cannot locate my page file.  Is it named something different in WS2K3?
0
 
LVL 38

Assisted Solution

by:Hypercat (Deb)
Hypercat (Deb) earned 125 total points
Comment Utility
As to the event log error, look at this posting at the EventID.net page:

http://www.eventid.net/display.asp?eventid=4010&eventno=791&source=DNS&phase=1

I would advise trying the second suggestion of deleting and recreating the netlogon.dns and netlogon.dnb files.

It sounds like your RUS is pointing to the correct location.  If this is the only DC in your domain, then it will automatically have been made a global catalog server, so no further configuration would be needed.  However, if you want to double-check that it is in fact a global catalog, you can do this in the AD Sites and Services mgmt. console.  Open the console, expand down to the [SiteName]/Servers/[servername]/NTDS Settings.  Right-click the NTDS Settings object and click Properties.  On the General tab, you'll see a check box for Global Catalog - make sure it's checked.

The page file name is the same as it always was.  It's a hidden/system file, so unless you have the setting turned on to display hidden and system files in Explorer/My Computer, you won't see it. Depending on the size of your %systemroot% partition, and whether you or someone else has manually moved the page file or created more than one, you should have a pagefile.sys file on your C: partition and/or possibly on other partitions.  You can check the configuration by going to Control Panel/System/Advanced tab, click the Performance settings button and go to the Advanced tab there - same as always.  
0
 

Author Comment

by:epitec
Comment Utility
Thanks - I was searching hidden/system files (and I do have show hidden files selected), but it still wasn't coming up with anything.  I don't think I've ever checked page file size through Control Panel before.  It says 2046MB for all drives... just under 2GB, but then when I look in Task Mgr > Performance, it says PF Usage 3.86GB.  How's it pulling that off?  Maybe I just need to read up on page files again. That was way back in my Intro classes. :-S

Double- and triple-checked, and the DC is a GC.

Would the procedure from that post be better to do during off-hours, or does it make a difference?
0
 
LVL 38

Expert Comment

by:Hypercat (Deb)
Comment Utility
On the PF setting, there's usually a minimum and maximum size set.  Unless the minimum and maximum are set to the same value, it would start out with the minimum and could grow up to the maximum size.

You could do the procedure during regular hours, but since you have to stop the netlogon service while you're doing it, it's possible someone could get an error message if they were logging on or using a resource that required authentication while you were doing it.  So, to be safe, if you can  you should probably wait until off hours or at least some period of time when there's the least amount of traffic on that server.
0
 

Author Comment

by:epitec
Comment Utility
I will try to do it tomorrow, either at lunch or in the evening.  Hopefully it will go smoothly. :)  Thank you for your help thus far.
0
 

Author Comment

by:epitec
Comment Utility
I didn't get to do it, because there wasn't enough advance notice.  I think I can do it either Friday or Sunday, and I'll update after that.
0
 

Author Comment

by:epitec
Comment Utility
Update: it happened again, so I was forced to reboot, but before I did, I thought to change the HeapDecommitFreeBlockThreshold (according to the procedure in hypercat's first post).  When the server booted up, it still gave me the 9665 warning (memory settings not optimal for Exchange).  Hmm...
0
 
LVL 38

Expert Comment

by:Hypercat (Deb)
Comment Utility
I think you will still get that warning, because it is looking for a different parameter than the  HeapDecommit registry entry.  However, it still fixes the basic problem, which is related to the use and management of virtual memory.
0
 

Author Comment

by:epitec
Comment Utility
If that was the solution, I'll be forever grateful!
I'll wait to see if it happens again in the next week or two (I'm hoping and praying it doesn't :)... boy, this place makes me feel guilty about leaving a question open. :P
0
 
LVL 38

Expert Comment

by:Hypercat (Deb)
Comment Utility
That's OK - don't worry about leaving the question open for a week or two.  Anyone who looks at the question will realize why you're leaving it open and not interfere.  If it goes for more than 21 days, then someone will likely make an administrative post to close the question and you will have a chance to object if there's a need to leave it open.
0
 

Author Comment

by:epitec
Comment Utility
This wasn't it (at least not all of it).  I just had it happen again at lunchtime. :(
0
 

Author Comment

by:epitec
Comment Utility
Ok, to keep everyone up to speed, I called Microsoft (actually one of my less painful experiences, especially if this helps).  I'm blind, and I didn't even see that SystemPages was part of the memory optimization article (i.e. I probably could have done this myself). :P

(I haven't rebooted yet, so these changes haven't taken effect.)
We changed SystemPages to 0.
He also asked me to add those two switches (/3GB and /USERVA=3030) - of course, while I was on the phone, I couldn't find the article I was thinking of, which said some part of that wasn't recommended for Exchange on DC/GC setups; I asked him if it made a difference if this was Exchange running on the DC.  He said no.  After I hung up, I found the right link and so I e-mailed that section of the article to him, to see what he says. [I got the response as I was typing this.  He said the recommendation given in that article is under normal conditions, where there are no Exchange performance issues - why did they write the article, if there wasn't a reason? - he said these switches "should only help resolve" the system pages/virtual address space problem.]

I also had the bright idea to power up our old Exchange server and see what these settings are on that one (we do still have the old server, and it was also a DC, although not the primary - we had two before our upgrade in June... I really am doubting this decision to condense everything to two servers right now.  Maybe I just need to promote our other server :)

HeapDeCommit... = 0
SystemPages = 0
boot.ini (neither of the switches)

I'm not sure how relevant that is, since it wasn't the only DC, but I thought it might be useful to compare.
0
 
LVL 38

Expert Comment

by:Hypercat (Deb)
Comment Utility
That's very odd, to say the least.  I have always avoided the /3GB and /USERVA switches on DCs, simply because of that statement in the article.  I certainly agree with you that it makes no sense to put that statement in an article about Exchange performance issues and then say it doesn't apply if you have Exchange performance issues.  Can you spell "circular reasoning"? Oh, okay, that's hard one so...probably not.

Please keep us up to date on how you do and whether using those switches seems to help, whether you see any ill effects DC-wise, etc.  I'd be really interested to know.
0
 

Author Comment

by:epitec
Comment Utility
I'm back. :)  I rebooted the server on Friday.  On first startup, I ran ExBPA and MPS Reporting Tool (requested by the MS tech).  I'm not sure if I didn't wait long enough for everything to load properly before running MPSRT, or if it was the boot.ini switches, but I ran into some problems.

I wasn't able to connect to Exchange from Outlook on my PC.  I thought at first it was just offline because I had left it open when I rebooted the server, but work offline wasn't selected, and it still wouldn't connect after I restarted Outlook.  (After several minutes, it did connect, but I could only receive from the external account I was using to test; I couldn't send to it.  I looked in ESM, and all outgoing messages were just sitting in the queue.)

On the first run of ExBPA, it said "No Domain Naming master could be found" or something along those lines.  I waited a while, ran it again and got "Exchange server does not exist" or "...not detected" - obviously something wasn't right.  I ran the first one while MPSRT was running, so I thought they might have conflicted, but I believe I ran the second after it had finished.

I ruled out the SystemPages change, since that was initially a critical error in ExBPA, so it was either the boot.ini switches or the MPSRT causing this.  (I wasn't sure if, during diagnostics, MPSRT caused any disruption, such as stopping services for testing - I've since been told that it does not.)  I thought about rebooting with no changes, just to see if the server would boot up okay on the second try, but I decided against it in the interest of time.  I removed the switches, and Exchange worked fine.

The MS tech wants me to deal with the rest of the non-critical issues from ExBPA; he said he feels sure this will solve our problems.  I guess I'm more or less back to the monitoring stage.
0
 

Author Comment

by:epitec
Comment Utility
I'm just going to throw everything in here, in case anyone else runs into this.  Hopefully it'll help someone somewhere (if I ever get through it).  I now have two Microsoft techs (well, I've more or less switched from Exchange specialist to ActiveSync specialist, actually).  That last post didn't fix the problem.  My Exchange tech closed the case, and the next day, his manager called (perfect timing - hehe!) to see how satisfied I was... right after I had had the problem recur.

We went through this http://support.microsoft.com/kb/817379/en-us
- ran this (ActiveSync Test) https://www.testexchangeconnectivity.com/ - no errors
- checked this HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MasSync\Parameters\Exchange\VDir (it was correct "/exchange-oma")
- ran this procedure (under Default Web Site in IIS)
  1. Delete Microsoft-Server-Activesync from IIS
  2. Go to command prompt.
  3. type cd\inetpub\adminscripts
  4. type cscript adsutil.vbs delete ds2mb
  5. restart system attendant service
  6. make sure the Microsoft-Server-Activesync reappear in IIS
I also rebooted the server Monday night for the changes to take effect

Today I spoke with the ActiveSync tech, and we followed this http://support.microsoft.com/kb/943612/en-us
We also excluded C:\Inetpub, C:\WINDOWS\system32\inetsrv, and C:\Program Files\Exchsrvr from our virus-scan program (Trend Micro ServerProtect - we also have other Trend products, but SP was the only one with a place to specify exclusions).  The registry changes require a restart of the MS Exchange Information Store (which I haven't done yet).
0
 

Author Comment

by:epitec
Comment Utility
I'm heading down a new lead now... We have our Web monitoring/filtering software on this box, too.  It uses SQL, and I thought we just had MSDE or Express or something, which it installs automatically, but apparently we have the full version SQL Server 2005 (I'm not sure why we did that).  I didn't even remember that, but the MS engineer I just spoke with pointed it out, and the memory issues it can cause, since both Exchange and SQL use a ton of memory.  I have to see what we're going to do about this (probably migrate the software *groan* :(... but hey, if it fixes Exchange, that'll be awesome.
0
 

Expert Comment

by:lshriver
Comment Utility
epitec:  
Have you had to restart Exchange since Oct 29th, when you recreated the active sync virtual dir?  I have been having a similar problem and until this latest post (sql) we seemed to have very similar setups...I too am having problems with Exchange 2003 Sp2 periodically not allowing connections and it is driving me crazy.
0
 

Author Comment

by:epitec
Comment Utility
Yes, unfortunately, I've had the issue several times since Oct 29. :(  The latest was last Friday (11/7).

Yesterday, I spoke with another person from MS, and he had me send him more logs/diagnostics.  We ran DCDIAG (from Windows Support Tools) - I did that when I first set up the domain/DC, of course, until it came through clear, but now it's failing the SystemLog step.  I'm not sure if that has any impact on this issue, or if it's just an effect... or if it's unrelated.

On the Exchange server, if you go to Start > Run > logfiles, it should bring up a folder with several other log folders.  He wanted to see the HTTPERR folder and the W3SVC1 (1 for default Web site) folder, with any logs since this issue started.  We also did a Find in the W3SVC1 log file (the log from the time of the most recent recurrence) for "refused" (to no avail, in my case, but if this helps start you off in the right direction, I'm more than happy to share).
0
 

Author Comment

by:epitec
Comment Utility
Turned off AV on the server to run Exchange Troubleshooting Assistant (ExTRA).  That turned up several bottlenecks that we're still working on.

Also, I set up a performance monitor for 8 hours (15 second intervals, 500MB log) using PerfWiz (http://www.microsoft.com/downloads/details.aspx?familyid=31fccd98-c3a1-4644-9622-faa046d69214&displaylang=en)

lshriver, if you see this again... if you haven't already, you may want to try the ExTRA (it reported more specifically on some issues brought up by ExBPA).
0
 

Expert Comment

by:lshriver
Comment Utility
epitec, Thanks for the update and I will run the ExTRA and we'll see what else we need to deal with.
0
 

Author Comment

by:epitec
Comment Utility
We removed our Web filtering software (using SQL) from the Exchange box.  We should see if this helps within a week or two.

MS also had me set msExchESEParamMaxOpenTables, as follows:
Using ADSIEdit, browse to Configuration [server.domain.com]/CN=Configuration, DC=domain, DC=com/Services/Microsoft Exchange/<Organization Name>/Administrative Groups/First Administrative Group/Servers/HANNIBAL/Information Store/Storage Group and find the attribute named msExchESEParamMaxOpenTables.  Set this value to 27600 for each storage group that you find under the Information Store object.  Reboot.
0
 

Author Comment

by:epitec
Comment Utility
Removed Spiceworks monitoring software from Exchange box (I know this isn't the root cause, since I installed it after the issue, attempting to get it to alert when Exchange was frozen; it might be part of the build-up, though).  The issue recurred yesterday.  Now I'm working on removing Backup Exec (ha, it didn't seem like I had this much junk installed with Exchange), which also uses a SQL instance.
0
 
LVL 5

Expert Comment

by:rikke_vp
Comment Utility
Hi

I had several issues like this on one of my clients their server where Trend was installed.

My solution was,
install exchange on another server
setup as front-end server for email
setup trend for scanning email
setup anti spam for email scanning

on the old server we installed norman AV for file scanning and excluded all exchange related directorys

after these steps the issue was gone... So for any reason, and I don't even want to know what this was, Trend was screwing around with the databases from Exchange, SQL, AD and also with our DNS.

I would give this also a try since I had issues with taht server for months and neither HP nor Microsoft nor TREND could help me with this.

One more thing, try to double up your Page file settings to the availlable memory in the server. So if you have 2GB installed set it to at least 4 GB paging. (this comes from a Microsoft tech)

kind regards
R
0
 

Author Comment

by:epitec
Comment Utility
Hi rikke_vp,

Thanks for the post.  I was trying to find out more about the page file size recently, because we have 16GB installed on this server, and a 24GB page file seemed a little excessive to me (since 1.5 is the usual recommendation).  I did find one place that said that recommendation only holds up to a certain point (at the moment, I don't remember what size).

Moving Exchange is probably on the agenda, but I'm dreading it, because I feel like it was just such a headache last time... but I guess we need to get rid of this.  I was wondering if it would be easier to just move Trend.  Did your client move from one Exchange server to a front-end/back-end?  If I may ask, why did they choose to move Exchange rather than Trend?
0
 
LVL 5

Expert Comment

by:rikke_vp
Comment Utility
Well, I did also move trend away from our backend... Its realy a heavy AV-AS solution if you ask me

the main reason for the migration was that they were growing, in need more data storage and we wanted to kill the load on that server since users were complaining about slow access, etc...

another reason is IT budget - splitup in support cost and hardware cost, we work on a per hour rate so how longer we need to search, the bigger there IT overhead cost is. New hardware is easier to fit into the budget (all the money was just sitting there) so we just took that approach.

yes, front-end/back-end solution now
so the old server is still in place, this is the backend. Thats also the DC and file/print server. We downsized the load to 10% average on the CPU and 2 GB in memory usage.

we moved all the databases to the front-end and installed exchange and trend there so that load is off the backend. If mail comes in a bit slower no problem, if exchange gets stuck we can simply reboot without any user complaining (the SQL databases are used for Mailarchiving, Spysweeper Enterprise, etc)

For us, this was actually the most cheap, adequate and solution with the highest return for the investment.
0
 

Author Comment

by:epitec
Comment Utility
The MS tech was reviewing the performance log from the last recurrence, and it looks like it is being caused by a specific issue with Trend.  One Control Manager agent (not sure which, since the processes are named the same (EntityMain.exe), but it's either ServerProtect or Damage Cleanup Services) had a handle leak.  The handle count was increasing by one each minute (so it took about 8 days for it to get too high and shut down Exchange functionality)... I've uninstalled Control Manager and both agents, and I'm monitoring to make sure this was the cause.  8 days since the last (planned) reboot, and counting! *fingers crossed* :)
0
 

Author Comment

by:epitec
Comment Utility
Argh... I'm officially ripping my hair out.  Next step is to remove anything I possibly can from the server... after that, I think we're starting fresh.

Does anyone know if migration from Exchange 2003 to 2007 will work (preferrably smoothly)?  My bosses want to upgrade if we have to move it, anyway.
0
 

Expert Comment

by:lshriver
Comment Utility
We rebuilt our Exchange 2003 server and have not reinstalled Trend.  Something hasn't been right since we installed Trend in Aug 2008.  Within 10 days of the initial install, we had our first issue with not being able to connect to the Information Store.

We too are now waiting.  Our new build was last restarted yesterday morning (1/19/09).

If this holds up until 1/29/09, I will celebrate by purchasing a new AV/AS solution for Exchange.
0
 

Expert Comment

by:icepack
Comment Utility
Hi, I have a server with almost identical scenario.
It's SBS2003 SP2 (std edition) with bugger all installed from defaults except Trend WFBSadv.
Has any of this been escalated to Trend ?
0
 

Author Comment

by:epitec
Comment Utility
Yes, I have been in contact with Trend, but the support I've received is crap.  I believe (now) the problem is OfficeScan Server... I did find a patch for it that deals with a memory leak.  I'm moving it to another server (and making sure I keep up with patches this time!)... the only thing Trend that will remain with Exchange is ScanMail (obviously).
0
 

Author Comment

by:epitec
Comment Utility
Ha - removed OfficeScan server and the issue persists.  It seems to be ScanMail, and I'm back in contact with Trend.  My SharedResPool (C:\Program Files\Trend Micro\Smex\SharedResPool) folder was much larger than normal, and the Trend tech thought that might cause the hanging... I changed the folder (so it's at a more reasonable size), and if that doesn't help, I'm going to turn off the scheduled virus scan and see if that fixes it (my MS tech said every time Exchange hangs, it's waiting on a virus scan).  
The last several times, I've caught the problem before it's disrupted business (either late at night or early in the morning)... I think I'm getting the hang of this. :P  I really appreciate everyone's patience (if anyone's still monitoring this) in my keeping this open forever.
0
 

Expert Comment

by:lshriver
Comment Utility
I, for one, am watching this issue very closely.

Our rebuilt Exchange 2003 server ran b-e-a-utifully, from its build date of 1/19/09, until 2/7/09.  We installed OfficeScan the morning of 2/7/09 and had the system hang on 2/10/09 and again this morning (2/11/09).  OfficeScan is coming off tonight.

I truly hope we can get back to smooth operations again soon.

Please, epitec, keep us informed of your progress.
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 

Expert Comment

by:icepack
Comment Utility
I am still watching this.
I have run a Trend patch, but it didn't help. There is still a newer version I can go to and will try upgrading to it in the next week or so.
0
 

Author Comment

by:epitec
Comment Utility
I'll post this info, then, if you guys want to take a look.  The SharedResPool, according to Trend, should be around 300MB, not usually greater.  Mine was 2.14GB... not sure if that really has anything to do with it, but I did change the folder (didn't cause any disruption by restarting SMEX).  From my Trend tech:

"This link below contained the instruction on how to move your Shared Respool Directory to another location:
http://esupport.trendmicro.com/support/search.do?cmd=displayKC&docType=kc&externalId=PUB-en-127549

Also,the link below have an information if what's the normal size of that folder:
http://esupport.trendmicro.com/support/search.do?cmd=displayKC&docType=kc&externalId=PUB-en-126898"
0
 

Author Comment

by:epitec
Comment Utility
The SharedResPool did not make a difference.  I had to reboot this morning.  I contacted Trend to let the tech on the case know that this folder change didn't fix the issue, so I'll see what he says.  Since my MS tech said it seems to hang waiting for a virus scan each time, I've disabled the scheduled virus scan in ScanMail (to see if that fixes it).  ScanMail is currently the only virus product left on this server.  

My Trend technician's first suggestion was to reinstall ScanMail (since we're on the latest build and having this problem).  I may end up trying that if disabling this virus scan doesn't help, although it doesn't seem likely to me that that will help, either.
0
 

Expert Comment

by:twoj
Comment Utility
I was just at a client that seems to be experiencing the same thing, their setup is SBS 2003 SP2, running quite a few other things on the same box, SQL, et al. Plus they are running Trend Worry-free probably standard but not positive. Since i was looking in the exchange for the cause i didn't look at the Trendmicro products so i'm not sure if it has any other product, such as Scanmail installed.

Epitec, do you believe the problem comes from scanmail and have you tried removing it to see if it resolves the problem?
I'm watching this one too, so please keep us up to date, thanks

0
 

Author Comment

by:epitec
Comment Utility
I do believe it is ScanMail causing the issue.  Per my Microsoft engineer (who seems pretty knowledgeable... now that it's been passed around ten times, or so), Exchange normal operations are hanging while it waits for a virus scan to complete.  SMEX is the only virus software I have installed.

I have not tried removing it.  Before we installed it, we had ridiculous amounts of spam; I've been struggling with which is the lesser of two evils - at least this way our users can get some work done (less 15-20 min for reboot every 8 days, if it hits during business hours).

Another suggestion from Trend:
"1. Look for the "VirusScanStampMode" key [he didn't include any sort of path... it's under HKLM\Software\TrendMicro\ScanMail for Exchange\CurrentVersion]
2. Modify the value of the key to "2".
3. Set Reloadnow to 1 (HKLM\System\CCS\services\MSExchangeIS\VirusScan)
4. Restart SMEX services
It's a workaround for crash or hang-up of Exchange server."

It remains to be seen how this will affect anything... shouldn't be long now (recurrence is scheduled before tomorrow :P).
0
 

Assisted Solution

by:twoj
twoj earned 200 total points
Comment Utility
I took a look at the customer's server again, and they have the Trend Micro Worry-Free Buisness Security Advanced which includes Trend Micro Messaging Security Agent. This is run by SMEX_Master.exe which is as far as i can tell seems to be the same engine as ScanMail?
Also SMEX_master seems to be taking up a fair bit of memory (150MB), 2nd after the store @ 610MB

I checked in Program Files\Trend Micro\Messaging Security Agent and it has about 5GB of files in it, so i'm assuming that something is not normal there.
Another issue that is happening is the Print Spooler Service crashes as well, and i set up some monitoring and at the same time as the print spooler and exchange crashing there is a Allocated Memory alert.
Do you think by stopping the SMEX_Master service that 1) no problems will happen to email flow & 2) it could give a pretty good indication of whether this is the culprit?

Since there are quite a few trend micro services, 4 of which seem to deal with the messaging; and since you seem to have narrowed it down to the trend micro messaging if disabling those 4 will be able to give a real test of the problem?
0
 

Author Comment

by:epitec
Comment Utility
Yes, SMEX_Master.exe is part of ScanMail, and I've noticed that it does take a lot of memory.  I wrote this off to the fact that it's constantly scanning for spam, etc.

I don't know if stopping SMEX_Master before it becomes an issue would help.  I thought it might even help (prevent me having to reboot) if I stopped it when the problem recurred (it seemed to me that stopping it would free up the resources and stop whatever scan Exchange is waiting for, so maybe Exchange would start functioning again)... I guess I was wrong.  I tried stopping all SMEX services last time it recurred, and I saw no difference (I still had to reboot). :(

My estimation was wrong; the issue did not recur last night, which means it will probably recur sometime during business hours (I would have preferred it to happen last night :)... either that, or the registry modifications fixed it... but I won't truly believe that until several more days pass.
0
 

Accepted Solution

by:
epitec earned 0 total points
Comment Utility
11.5 days and counting... I'm still wary, expecting it to recur any second now, but it's definitely way overdue.  My latest changes are the registry modification three posts above and disabling the scheduled virus scan (real-time scan is still running).  I'll post if/when it happens again.
0
 

Expert Comment

by:twoj
Comment Utility
Well i hope it is good news in your next report.
I haven't looked but i hope the registry items are the same for their worry-free product, a bit of an ironic name for the product :-).
Do you think uninstalling scanmail would fix the problem?
0
 

Author Comment

by:epitec
Comment Utility
Yes, I think it would, because that would completely eliminate whatever virus scan is causing Exchange to hang (it seems to me).
0
 

Expert Comment

by:icepack
Comment Utility
Epitec,
With the registry change you gained from Trend, did they say what it actually does, apart from - "it's a workaround for crash or hang-up of Exchange server" ?
It seems to be a reasonable option rather than disabling the sheduled scan, although setting the scheduled scan to weekly might be a way of limiting the impact.
0
 

Author Comment

by:epitec
Comment Utility
No, I put down exactly what he told me (after that, he got back on his "it must be that you have backups running during the time of the crash" soapbox... never mind the fact that it happens at various times of day and night).

I've considered re-enabling the scheduled scan, but I want to make it another few days.  If it doesn't recur by Sunday night, I will probably consider this resolved, and from that point, I may re-enable the virus scan (as you say, weekly would probably be a better option) so I can more accurately determine whether it was really the registry fix (not that I really want it to happen again, if it wasn't, but I do prefer virus scans to no virus scans - although I have to admit, I am not terribly confident in Trend's products' effectiveness after this ordeal :P).
0
 

Expert Comment

by:twoj
Comment Utility
Epitec, could you clarify.
You stopped the scheduled scan of ScanMail, and then you applied the registry changes (and then restarted the server)?
I'm just wondering if it hasn't crashed because its not scanning or because of the registry changes or both?
Any thoughts?
0
 

Author Comment

by:epitec
Comment Utility
After the last recurrence, I disabled the daily scheduled scan (real-time virus scan is still running).  I believe it was a few days after that when Trend got me the registry solution, so I implemented that (it said it only requires a restart of ScanMail services - I haven't rebooted since 2/18).

That last is the only part about this that I'm not happy about: I don't know if it's the scheduled scan or the registry change that fixed it, since I implemented two changes before verifying the results of one, but I guess that was my own fault... I was just so anxious to get it fixed.  That is why I said I'm thinking about re-enabling the scan after Sunday night, to see if it was the registry changes... but I am a little paranoid that if I re-enable it, that will bring the problem back.  :)  If I had a way to ensure it would not return until after business hours, it wouldn't be so bad... this is when I wish I had a duplicate testing system. :P
0
 

Author Comment

by:epitec
Comment Utility
icepack, I'm requesting more information on those changes (if the tech can explain what that did, or if there is an article in the knowledgebase or somewhere that explains it)... I'll let you know if I hear anything.
0
 

Author Comment

by:epitec
Comment Utility
Our scheduled reboot was yesterday evening, and no recurrence of the issue since 2/18, so this does appear to be resolved.  I sent an update to Microsoft and Trend... still haven't heard back from Trend when I asked about the purpose of the registry changes, so I did ask again.  I'll keep this thread open until I (hopefully) figure out whether it was those changes, or the virus scan (or both).
0
 

Expert Comment

by:icepack
Comment Utility
Epitec,
Thanks. We had been re-booting the SBS server each morning (early before user activity) to clear connections because of a license argument with MS, but as this is now resolved and we don't need to re-boot, I expect we'll see the recurrence of the Exchange failure we were originally experiencing - over the next few days.
When this occurs, I'll do the reg hack and advise of the result.
0
 

Author Comment

by:epitec
Comment Utility
Sounds good.

Per my Trend engineer (I love off-shoring, don't you? I still don't know what the registry changes were supposed to have done :P):

"Just to have a brief information about Stamp mode you can refer to this link below:

http://esupport.trendmicro.com/support/search.do?cmd=displayKC&docType=kc&externalId=PUB-en-1034998

Actually, the purpose of those registry changes will prevent those performance issue into the exchange server. It refers to the scanning of your files wherein it will triggered the capability of scan engine to not being able to hang."

He did say that reducing the scan to weekly should work fine.
0
 

Assisted Solution

by:twoj
twoj earned 200 total points
Comment Utility
Hi again
Well the client had to reboot the server a few times so it took some time for the error to come back, but sure enough this weekend it locked up. So epitec i'm going to try to answer your question by first stopping the scheduled scans and rebooting the server to see whether the registry changes are necessary.
I'm assuming since you haven't posted that means you haven't had any crashes? Has there been any other developments?
0
 

Author Comment

by:epitec
Comment Utility
We have not had any crashes.  No further developments to report, but I would like to ask you, because I think the *default* scheduled scan was Daily - at least, I do not remember configuring the daily scan after install - is your current scheduled scan daily, or less frequent?

If icepack is implementing the reg hack, and you are testing the stopped scan, we should get this question answered in short order. :)
0
 

Assisted Solution

by:icepack
icepack earned 100 total points
Comment Utility
Hi,
We will implement the reg hack today.
Exchange locked up again this morning as expected (it lasted about a week since the last reboot).
It may be another week before we see a result, but I will update you then.
0
 

Expert Comment

by:twoj
Comment Utility
Well i hope i turned it off, the Worry-Free product is all web-based, so there is nowhere which says scanmail. However in the Scheduled settings, which i believe handles both file & email scanning, i've turned it off for the server. For good measure in the General Options in the program i found settings for not scanning the exchange folders???? on the exchange server and on clients. whatever that means - whatever it is i took them off. I've rebooted the server - now its time to wait and see!
I looked in the registry and i found that VirusScanStampMode, so i assume scanmail is integrated into the worry-free product, just hidden under the interface
0
 

Author Comment

by:epitec
Comment Utility
Any news?
0
 

Expert Comment

by:twoj
Comment Utility
Hi Epitec

No news to report so far, there was some work done on the SBS that resulted in some problems for a few days at which point the server was being restarted, so i believe it has only been a few days since that was resolved and the servers being allowed to chug along to wait and see. Its been about 1 week since i believe the last time it was restarted and i haven't had any reports so i'll need to give it another 1-2 weeks i think.

Thanks for everyones help with this
0
 

Author Comment

by:epitec
Comment Utility
twoj, icepack... any results?
0
 

Expert Comment

by:icepack
Comment Utility
We have still had a few problems with the symptoms and result similar but not identical to those previously. Unfortuneately the server has been re-booted for other reasons (changing things around in racks), but the last power cycle was 18th, so by early next week (if no other events occur) we should have a good indication.
0
 

Expert Comment

by:twoj
Comment Utility
Things are looking good on my side, I talked to the ops director at the company and so far so good
we decided to give it one more week, till next monday/ tuesday to be certain. But no registry changes done, just stoped the scans on the server.
I'll report back next week with the final verdict.

I wasn't a big fan of trend before this, amazing how jaded i am of some software!.
0
 

Assisted Solution

by:twoj
twoj earned 200 total points
Comment Utility
Well things look good for my client, i think this is over 3 weeks without a shutdown, so i am calling the case closed.
Just to reconfirm that i did not change any registry keys, but only stopped all scanning on the server and exchange scanning. It may be that i was overzealous in shutting down all the scanning on the server to be able to solve the problem, and that perhaps some of the scanning can be restarted, however the client is happy so i'd rather leave it the way it is.
Epitec - thanks alot for your help - i was knee deep in exchange logs before i saw your post. You certainly pointed me in the right direction with this problem!
Icepak - it would be interesting to hear how your situation is, i'm curious to see if the registry changes also do the job?

Thanks
0
 

Author Comment

by:epitec
Comment Utility
You're welcome!  I'm glad to hear your problem is also solved. :)  It's such a good feeling not to have to worry about Exchange going down in the middle of the day sometime each week...

I'd also be interested to hear Icepack's results.
0
 

Author Comment

by:epitec
Comment Utility
Ok, I guess it's time to close this question... thanks to everyone for their help!

Icepack, (if you see this) whenever you determine whether your issue's fixed, feel free to add it as a comment... :)
0
 

Expert Comment

by:abmcsltd
Comment Utility
Hi I have a customer who is experiencing the very same issue, and I now know it is Trend Causing the problem, we have to down the store and restart it on Wednesday morning at 11:00 Hrs and normally we have a weeks grace until the store needs restarting, I found this log and have been going through, I updated trend to the latest Patches and SP's and all was well, I made the Registry Change last night and within an hour the store had gone down, I rebooted the server this morning and again within an hour of this the store had stopped responding, however I stopped the SMEX Master service and straight away the store was responding and could send receive email.

I will be raising a ticket Trend so will keep you posted, with regards to your comments twoj are you not concerned that you have no virus scanning in place for exchange?

will keep you posted.
0
 

Expert Comment

by:twoj
Comment Utility
Hi abmcsltd
I work as a consultant, so i haven't seen that customer for some time, funny enough i ran into a similar problem with installing ESET exchange antivirus on another company's server, and it would cause the server to just randomly reboot after 2-14 days. Having seen this issue with Trend, i removed the ESET exchange program and no more problem.

To me having a server running 24/7 is the highest priority, i would much rather deal with a virus on a clients computer than with having exchange go down - dealing with one irritated user is much better than a whole company. Second point is that server software should not cause servers or services to stop, this is why i have a list of certain software that i don't use exactly for that reason. Add on top of that, that Trend has been aware of this problem for some time as has not fixed the problem so why should i use it?

Because exchange is usually one of the critical services running in an organization, i need to know that whatever AV program isn't going to screw it up, Trend, ESET, & Mcafee are all programs that by choice i won't install. The ESET NOD32 however is a client program that i have no problem installing on client machines that include an email scanner so that keeps the clients virus free and removes the issues with exchange. If your company is sufficently equiped i would recommend a all-in-one box, most of the higher end firewalls have options to do integrated spam/virus filtering, or their are specific boxes that just do filtering like barracuda.

Hopefully trend will fix it this time, but this just reaffirms my decision to avoid them.
0
 

Expert Comment

by:abmcsltd
Comment Utility
Hi twoj

Thanks for your comments, I am too a consultant and have many customers with Trend Worry Free Business Server Advanced installed, I get the occasional problem with IIS settings but never had it cause problems with Exchange. However saying that I have noticed more problems with Worry Free Business than the old CSMS products.

I will keep you posted on this situation as we are a trend supplier and hopefully may have a little more success with this issue.
 
0
 
LVL 1

Expert Comment

by:ITnavigators
Comment Utility
Thanks to everyone for the incredible history and detail in the posts.  This type of sharing is how Experts-Exchange works so well.

I'm troubleshooting a similar problem.  Also running Trend WFBS-A.  Overall, we have been very happy with Trend.  They have stopped more SPAM and virus attacks than anything else we have used.  Significantly less colateral problems (like this one) than we experienced with Symantec, McAfee, or Sophos.  Will post back what we find on this issue.
0
 

Expert Comment

by:mudgie
Comment Utility
Anyone using Trend Worry-Free Advanced should consider deploying the included free version of the hosted email security. It works great and frankly makes it possible to remove the messaging security agent from the servers. It's a bit of work to set it up but it really works fantastic. You may as well offload that job to them and give your Exchange server some rest!
0

Featured Post

Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

Join & Write a Comment

ADCs have gained traction within the last decade, largely due to increased demand for legacy load balancing appliances to handle more advanced application delivery requirements and improve application performance.
Scam emails are a huge burden for many businesses. Spotting one is not always easy. Follow our tips to identify if an email you receive is a scam.
In this video we show how to create a Resource Mailbox in Exchange 2013. We show this process by using the Exchange Admin Center. Log into Exchange Admin Center.: Navigate to the Recipients >> Resources tab.: "Recipients" is our default selection …
The video tutorial explains the basics of the Exchange server Database Availability groups. The components of this video include: 1. Automatic Failover 2. Failover Clustering 3. Active Manager

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now