Link to home
Start Free TrialLog in
Avatar of epitec
epitecFlag for United States of America

asked on

Exchange is crashing (requiring reboot) frequently

Hello,

I'm running Exchange Server 2003 Enterprise (SP2) on WS2K3 (Enterprise SP2).  It is installed on our only domain controller, which is also our antivirus server (TrendMicro OfficeScan, ScanMail).  Within the past few months, we've had Exchange crash pretty consistently.  It usually starts with the people with phones/ActiveSync complaining that their phones stopped syncing sometime the night before or early that morning.  After that, Outlook starts messing up on its connection to the server (unable to retrieve data, even if it says Connected, or it just says it's offline and can't connect).  I've checked that services are running, and everything looks fine, as far as I can tell.  I can't find a fix, and we end up rebooting.  This is obviously not very convenient, as our DC takes down the whole Internet connection, halting productivity, and Exchange takes forever to shut down/boot back up.

I'm going through the event logs, trying to find the root of the problem, and I thought I'd post everything here to see if someone smarter than me has any ideas. :)  I've just been banging my head against the wall.

The errors I found are listed below (I'm not sure which of these is a cause, and which an effect).  The last one listed is one I noticed just today (since it's at Exchange startup, apparently, and therefore not listed with the rest of the errors/warnings at the time of the problem); I looked at the KB article, and our settings definitely don't match what's listed, but I wanted to confirm what needs to be done (I'm a little nervous messing with the registry on the DC) and/or see if thisfits into the problem, or if it's a completely different issue, and really has no bearing.

Please let me know if you need additional information.  Thanks for any help!

Event Type:       Error
Event Source:    MSExchangeDSAccess
Event Category: Topology
Event ID:           2102
Date:                9/29/2008
Time:                8:03:08 AM
User:                N/A
Computer:         <domaincontroller>
Description:
Process MAD.EXE (PID=6336). All Domain Controller Servers in use are not responding:
domaincontroller.domain.local

For more information, click http://www.microsoft.com/contentredirect.asp
------------------------------------------------------------
Event Type:       Error
Event Source:    MSExchangeDSAccess
Event Category: Topology
Event ID:           2104
Date:                9/29/2008
Time:                8:03:08 AM
User:                N/A
Computer:         <domaincontroller>
Description:
Process STORE.EXE (PID=7124). All the DS Servers in domain are not responding.

For more information, click http://www.microsoft.com/contentredirect.asp.
--------------------------------------------------------
Event Type:       Error
Event Source:    MSExchangeDSAccess
Event Category: Topology
Event ID:           2103
Date:                9/29/2008
Time:                8:03:18 AM
User:                N/A
Computer:         <domaincontroller>
Description:
Process MAD.EXE (PID=6336). All Global Catalog Servers in use are not responding:
domaincontroller.domain.local

For more information, click http://www.microsoft.com/contentredirect.asp.
------------------------------------------------------------
Event Type:       Error
Event Source:    MSExchangeAL
Event Category: LDAP Operations
Event ID:           8026
Date:                9/29/2008
Time:                8:03:15 AM
User:                N/A
Computer:         <domaincontroller>
Description:
LDAP Bind was unsuccessful on directory domaincontroller.domain.local for distinguished name ''. Directory returned error:[0x51] Server Down.    

For more information, click http://www.microsoft.com/contentredirect.asp.
------------------------------------------------------------
Event Type:       Error
Event Source:    MSExchangeAL
Event Category: Service Control
Event ID:           8250
Date:                9/29/2008
Time:                8:03:15 AM
User:                N/A
Computer:         <domaincontroller>
Description:
The Win32 API call 'DsGetDCNameW' returned error code [0x862] The specified component could not be found in the configuration information.  The service could not be initialized.  Make sure that the operating system was installed properly.

For more information, click http://www.microsoft.com/contentredirect.asp.
----------------------------------------------------------
Event Type:       Warning
Event Source:    Server ActiveSync
Event Category: None
Event ID:           3007
Date:                9/29/2008
Time:                8:02:20 AM
User:                EPITECGROUP\user
Computer:         <domaincontroller>
Description:
Exchange mailbox Server response timeout: Server: [domaincontroller.domain.local] User: [user@epitecgroup.com]. Exchange ActiveSync Server failed to communicate with the Exchange mailbox server in a timely manner. Verify that the Exchange mailbox Server is working correctly and is not overloaded.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
-----------------------------------------------------------
Event Type:       Error
Event Source:    Server ActiveSync
Event Category: None
Event ID:           3014
Date:                9/29/2008
Time:                8:02:04 AM
User:                EPITECGROUP\user2
Computer:         <domaincontroller>
Description:
The Exchange mailbox Server: [domaincontroller.domain.local] has reached its timeout threshold. The mailbox server will be protected from new requests for [60] seconds.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
--------------------------------------------------------------
Event Type: Warning
Event Source: MSExchangeIS
Event Category: General
Event ID: 9665
Date: 9/29/2008
Time: 5:57:32 PM
User: N/A
Computer: <domaincontroller>
Description:
The memory settings for this server are not optimal for Exchange.
For more information, click http://support.microsoft.com?kbid=815372 
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
SOLUTION
Avatar of abdulzis
abdulzis
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Avatar of Hypercat (Deb)
Hypercat (Deb)
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of epitec

ASKER

Thank you for all the suggestions.  I had class last night, so I had to jet shortly after posting my question.

abdulzis
"Also make sure the binding order of NIC is correct in Network Connections if you have multiple NICs.

Make sure only internal DNS servers are listed on the NIC and all external DNS servers should be listed in DNS forwarders."

I have verified these.  The other settings you mentioned would be found by accessing the NIC through Device Manager?  I have not followed the KB815372 instructions yet.

I would like to try hypercat's suggestion first (which is also part of the first KB article), although I might not be able to do so until Friday, because I have two more classes, and it says it requires a reboot after the change.  
As to the Exchange RUS, it looks like both RUS (Enterprise Configuration) and RUS (DOMAINNAME) are set to our DC (and only our DC), but I'm not sure if more specific configuration needs to be done than what is here.

Looking into the global catalog settings, I also found this error

Event Type:      Error
Event Source:      DNS
Event Category:      None
Event ID:      4010
Date:            9/29/2008
Time:            8:38:06 AM
User:            N/A
Computer:      <domaincontroller>
Description:
The DNS server was unable to create a resource record for  511ff76f-18e0-4d07-bd86-129bb86106b8._msdcs.local.epitecgroup.com. in zone domain.local. The Active Directory definition of this resource record is corrupt or contains an invalid DNS name. The event data contains the error.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 7b 00 00 00               {...    

The resource record is the DNS Alias (as listed in AD sites & services > NTDS Settings).  This error coincides with at least the last couple crashes.

rikke_vp, we have 16GB RAM, but I cannot locate my page file.  Is it named something different in WS2K3?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of epitec

ASKER

Thanks - I was searching hidden/system files (and I do have show hidden files selected), but it still wasn't coming up with anything.  I don't think I've ever checked page file size through Control Panel before.  It says 2046MB for all drives... just under 2GB, but then when I look in Task Mgr > Performance, it says PF Usage 3.86GB.  How's it pulling that off?  Maybe I just need to read up on page files again. That was way back in my Intro classes. :-S

Double- and triple-checked, and the DC is a GC.

Would the procedure from that post be better to do during off-hours, or does it make a difference?
On the PF setting, there's usually a minimum and maximum size set.  Unless the minimum and maximum are set to the same value, it would start out with the minimum and could grow up to the maximum size.

You could do the procedure during regular hours, but since you have to stop the netlogon service while you're doing it, it's possible someone could get an error message if they were logging on or using a resource that required authentication while you were doing it.  So, to be safe, if you can  you should probably wait until off hours or at least some period of time when there's the least amount of traffic on that server.
Avatar of epitec

ASKER

I will try to do it tomorrow, either at lunch or in the evening.  Hopefully it will go smoothly. :)  Thank you for your help thus far.
Avatar of epitec

ASKER

I didn't get to do it, because there wasn't enough advance notice.  I think I can do it either Friday or Sunday, and I'll update after that.
Avatar of epitec

ASKER

Update: it happened again, so I was forced to reboot, but before I did, I thought to change the HeapDecommitFreeBlockThreshold (according to the procedure in hypercat's first post).  When the server booted up, it still gave me the 9665 warning (memory settings not optimal for Exchange).  Hmm...
I think you will still get that warning, because it is looking for a different parameter than the  HeapDecommit registry entry.  However, it still fixes the basic problem, which is related to the use and management of virtual memory.
Avatar of epitec

ASKER

If that was the solution, I'll be forever grateful!
I'll wait to see if it happens again in the next week or two (I'm hoping and praying it doesn't :)... boy, this place makes me feel guilty about leaving a question open. :P
That's OK - don't worry about leaving the question open for a week or two.  Anyone who looks at the question will realize why you're leaving it open and not interfere.  If it goes for more than 21 days, then someone will likely make an administrative post to close the question and you will have a chance to object if there's a need to leave it open.
Avatar of epitec

ASKER

This wasn't it (at least not all of it).  I just had it happen again at lunchtime. :(
Avatar of epitec

ASKER

Ok, to keep everyone up to speed, I called Microsoft (actually one of my less painful experiences, especially if this helps).  I'm blind, and I didn't even see that SystemPages was part of the memory optimization article (i.e. I probably could have done this myself). :P

(I haven't rebooted yet, so these changes haven't taken effect.)
We changed SystemPages to 0.
He also asked me to add those two switches (/3GB and /USERVA=3030) - of course, while I was on the phone, I couldn't find the article I was thinking of, which said some part of that wasn't recommended for Exchange on DC/GC setups; I asked him if it made a difference if this was Exchange running on the DC.  He said no.  After I hung up, I found the right link and so I e-mailed that section of the article to him, to see what he says. [I got the response as I was typing this.  He said the recommendation given in that article is under normal conditions, where there are no Exchange performance issues - why did they write the article, if there wasn't a reason? - he said these switches "should only help resolve" the system pages/virtual address space problem.]

I also had the bright idea to power up our old Exchange server and see what these settings are on that one (we do still have the old server, and it was also a DC, although not the primary - we had two before our upgrade in June... I really am doubting this decision to condense everything to two servers right now.  Maybe I just need to promote our other server :)

HeapDeCommit... = 0
SystemPages = 0
boot.ini (neither of the switches)

I'm not sure how relevant that is, since it wasn't the only DC, but I thought it might be useful to compare.
That's very odd, to say the least.  I have always avoided the /3GB and /USERVA switches on DCs, simply because of that statement in the article.  I certainly agree with you that it makes no sense to put that statement in an article about Exchange performance issues and then say it doesn't apply if you have Exchange performance issues.  Can you spell "circular reasoning"? Oh, okay, that's hard one so...probably not.

Please keep us up to date on how you do and whether using those switches seems to help, whether you see any ill effects DC-wise, etc.  I'd be really interested to know.
Avatar of epitec

ASKER

I'm back. :)  I rebooted the server on Friday.  On first startup, I ran ExBPA and MPS Reporting Tool (requested by the MS tech).  I'm not sure if I didn't wait long enough for everything to load properly before running MPSRT, or if it was the boot.ini switches, but I ran into some problems.

I wasn't able to connect to Exchange from Outlook on my PC.  I thought at first it was just offline because I had left it open when I rebooted the server, but work offline wasn't selected, and it still wouldn't connect after I restarted Outlook.  (After several minutes, it did connect, but I could only receive from the external account I was using to test; I couldn't send to it.  I looked in ESM, and all outgoing messages were just sitting in the queue.)

On the first run of ExBPA, it said "No Domain Naming master could be found" or something along those lines.  I waited a while, ran it again and got "Exchange server does not exist" or "...not detected" - obviously something wasn't right.  I ran the first one while MPSRT was running, so I thought they might have conflicted, but I believe I ran the second after it had finished.

I ruled out the SystemPages change, since that was initially a critical error in ExBPA, so it was either the boot.ini switches or the MPSRT causing this.  (I wasn't sure if, during diagnostics, MPSRT caused any disruption, such as stopping services for testing - I've since been told that it does not.)  I thought about rebooting with no changes, just to see if the server would boot up okay on the second try, but I decided against it in the interest of time.  I removed the switches, and Exchange worked fine.

The MS tech wants me to deal with the rest of the non-critical issues from ExBPA; he said he feels sure this will solve our problems.  I guess I'm more or less back to the monitoring stage.
Avatar of epitec

ASKER

I'm just going to throw everything in here, in case anyone else runs into this.  Hopefully it'll help someone somewhere (if I ever get through it).  I now have two Microsoft techs (well, I've more or less switched from Exchange specialist to ActiveSync specialist, actually).  That last post didn't fix the problem.  My Exchange tech closed the case, and the next day, his manager called (perfect timing - hehe!) to see how satisfied I was... right after I had had the problem recur.

We went through this http://support.microsoft.com/kb/817379/en-us
- ran this (ActiveSync Test) https://www.testexchangeconnectivity.com/ - no errors
- checked this HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MasSync\Parameters\Exchange\VDir (it was correct "/exchange-oma")
- ran this procedure (under Default Web Site in IIS)
  1. Delete Microsoft-Server-Activesync from IIS
  2. Go to command prompt.
  3. type cd\inetpub\adminscripts
  4. type cscript adsutil.vbs delete ds2mb
  5. restart system attendant service
  6. make sure the Microsoft-Server-Activesync reappear in IIS
I also rebooted the server Monday night for the changes to take effect

Today I spoke with the ActiveSync tech, and we followed this http://support.microsoft.com/kb/943612/en-us
We also excluded C:\Inetpub, C:\WINDOWS\system32\inetsrv, and C:\Program Files\Exchsrvr from our virus-scan program (Trend Micro ServerProtect - we also have other Trend products, but SP was the only one with a place to specify exclusions).  The registry changes require a restart of the MS Exchange Information Store (which I haven't done yet).
Avatar of epitec

ASKER

I'm heading down a new lead now... We have our Web monitoring/filtering software on this box, too.  It uses SQL, and I thought we just had MSDE or Express or something, which it installs automatically, but apparently we have the full version SQL Server 2005 (I'm not sure why we did that).  I didn't even remember that, but the MS engineer I just spoke with pointed it out, and the memory issues it can cause, since both Exchange and SQL use a ton of memory.  I have to see what we're going to do about this (probably migrate the software *groan* :(... but hey, if it fixes Exchange, that'll be awesome.
epitec:  
Have you had to restart Exchange since Oct 29th, when you recreated the active sync virtual dir?  I have been having a similar problem and until this latest post (sql) we seemed to have very similar setups...I too am having problems with Exchange 2003 Sp2 periodically not allowing connections and it is driving me crazy.
Avatar of epitec

ASKER

Yes, unfortunately, I've had the issue several times since Oct 29. :(  The latest was last Friday (11/7).

Yesterday, I spoke with another person from MS, and he had me send him more logs/diagnostics.  We ran DCDIAG (from Windows Support Tools) - I did that when I first set up the domain/DC, of course, until it came through clear, but now it's failing the SystemLog step.  I'm not sure if that has any impact on this issue, or if it's just an effect... or if it's unrelated.

On the Exchange server, if you go to Start > Run > logfiles, it should bring up a folder with several other log folders.  He wanted to see the HTTPERR folder and the W3SVC1 (1 for default Web site) folder, with any logs since this issue started.  We also did a Find in the W3SVC1 log file (the log from the time of the most recent recurrence) for "refused" (to no avail, in my case, but if this helps start you off in the right direction, I'm more than happy to share).
Avatar of epitec

ASKER

Turned off AV on the server to run Exchange Troubleshooting Assistant (ExTRA).  That turned up several bottlenecks that we're still working on.

Also, I set up a performance monitor for 8 hours (15 second intervals, 500MB log) using PerfWiz (http://www.microsoft.com/downloads/details.aspx?familyid=31fccd98-c3a1-4644-9622-faa046d69214&displaylang=en)

lshriver, if you see this again... if you haven't already, you may want to try the ExTRA (it reported more specifically on some issues brought up by ExBPA).
epitec, Thanks for the update and I will run the ExTRA and we'll see what else we need to deal with.
Avatar of epitec

ASKER

We removed our Web filtering software (using SQL) from the Exchange box.  We should see if this helps within a week or two.

MS also had me set msExchESEParamMaxOpenTables, as follows:
Using ADSIEdit, browse to Configuration [server.domain.com]/CN=Configuration, DC=domain, DC=com/Services/Microsoft Exchange/<Organization Name>/Administrative Groups/First Administrative Group/Servers/HANNIBAL/Information Store/Storage Group and find the attribute named msExchESEParamMaxOpenTables.  Set this value to 27600 for each storage group that you find under the Information Store object.  Reboot.
Avatar of epitec

ASKER

Removed Spiceworks monitoring software from Exchange box (I know this isn't the root cause, since I installed it after the issue, attempting to get it to alert when Exchange was frozen; it might be part of the build-up, though).  The issue recurred yesterday.  Now I'm working on removing Backup Exec (ha, it didn't seem like I had this much junk installed with Exchange), which also uses a SQL instance.
Hi

I had several issues like this on one of my clients their server where Trend was installed.

My solution was,
install exchange on another server
setup as front-end server for email
setup trend for scanning email
setup anti spam for email scanning

on the old server we installed norman AV for file scanning and excluded all exchange related directorys

after these steps the issue was gone... So for any reason, and I don't even want to know what this was, Trend was screwing around with the databases from Exchange, SQL, AD and also with our DNS.

I would give this also a try since I had issues with taht server for months and neither HP nor Microsoft nor TREND could help me with this.

One more thing, try to double up your Page file settings to the availlable memory in the server. So if you have 2GB installed set it to at least 4 GB paging. (this comes from a Microsoft tech)

kind regards
R
Avatar of epitec

ASKER

Hi rikke_vp,

Thanks for the post.  I was trying to find out more about the page file size recently, because we have 16GB installed on this server, and a 24GB page file seemed a little excessive to me (since 1.5 is the usual recommendation).  I did find one place that said that recommendation only holds up to a certain point (at the moment, I don't remember what size).

Moving Exchange is probably on the agenda, but I'm dreading it, because I feel like it was just such a headache last time... but I guess we need to get rid of this.  I was wondering if it would be easier to just move Trend.  Did your client move from one Exchange server to a front-end/back-end?  If I may ask, why did they choose to move Exchange rather than Trend?
Well, I did also move trend away from our backend... Its realy a heavy AV-AS solution if you ask me

the main reason for the migration was that they were growing, in need more data storage and we wanted to kill the load on that server since users were complaining about slow access, etc...

another reason is IT budget - splitup in support cost and hardware cost, we work on a per hour rate so how longer we need to search, the bigger there IT overhead cost is. New hardware is easier to fit into the budget (all the money was just sitting there) so we just took that approach.

yes, front-end/back-end solution now
so the old server is still in place, this is the backend. Thats also the DC and file/print server. We downsized the load to 10% average on the CPU and 2 GB in memory usage.

we moved all the databases to the front-end and installed exchange and trend there so that load is off the backend. If mail comes in a bit slower no problem, if exchange gets stuck we can simply reboot without any user complaining (the SQL databases are used for Mailarchiving, Spysweeper Enterprise, etc)

For us, this was actually the most cheap, adequate and solution with the highest return for the investment.
Avatar of epitec

ASKER

The MS tech was reviewing the performance log from the last recurrence, and it looks like it is being caused by a specific issue with Trend.  One Control Manager agent (not sure which, since the processes are named the same (EntityMain.exe), but it's either ServerProtect or Damage Cleanup Services) had a handle leak.  The handle count was increasing by one each minute (so it took about 8 days for it to get too high and shut down Exchange functionality)... I've uninstalled Control Manager and both agents, and I'm monitoring to make sure this was the cause.  8 days since the last (planned) reboot, and counting! *fingers crossed* :)
Avatar of epitec

ASKER

Argh... I'm officially ripping my hair out.  Next step is to remove anything I possibly can from the server... after that, I think we're starting fresh.

Does anyone know if migration from Exchange 2003 to 2007 will work (preferrably smoothly)?  My bosses want to upgrade if we have to move it, anyway.
We rebuilt our Exchange 2003 server and have not reinstalled Trend.  Something hasn't been right since we installed Trend in Aug 2008.  Within 10 days of the initial install, we had our first issue with not being able to connect to the Information Store.

We too are now waiting.  Our new build was last restarted yesterday morning (1/19/09).

If this holds up until 1/29/09, I will celebrate by purchasing a new AV/AS solution for Exchange.
Avatar of icepack
icepack

Hi, I have a server with almost identical scenario.
It's SBS2003 SP2 (std edition) with bugger all installed from defaults except Trend WFBSadv.
Has any of this been escalated to Trend ?
Avatar of epitec

ASKER

Yes, I have been in contact with Trend, but the support I've received is crap.  I believe (now) the problem is OfficeScan Server... I did find a patch for it that deals with a memory leak.  I'm moving it to another server (and making sure I keep up with patches this time!)... the only thing Trend that will remain with Exchange is ScanMail (obviously).
Avatar of epitec

ASKER

Ha - removed OfficeScan server and the issue persists.  It seems to be ScanMail, and I'm back in contact with Trend.  My SharedResPool (C:\Program Files\Trend Micro\Smex\SharedResPool) folder was much larger than normal, and the Trend tech thought that might cause the hanging... I changed the folder (so it's at a more reasonable size), and if that doesn't help, I'm going to turn off the scheduled virus scan and see if that fixes it (my MS tech said every time Exchange hangs, it's waiting on a virus scan).  
The last several times, I've caught the problem before it's disrupted business (either late at night or early in the morning)... I think I'm getting the hang of this. :P  I really appreciate everyone's patience (if anyone's still monitoring this) in my keeping this open forever.
I, for one, am watching this issue very closely.

Our rebuilt Exchange 2003 server ran b-e-a-utifully, from its build date of 1/19/09, until 2/7/09.  We installed OfficeScan the morning of 2/7/09 and had the system hang on 2/10/09 and again this morning (2/11/09).  OfficeScan is coming off tonight.

I truly hope we can get back to smooth operations again soon.

Please, epitec, keep us informed of your progress.
I am still watching this.
I have run a Trend patch, but it didn't help. There is still a newer version I can go to and will try upgrading to it in the next week or so.
Avatar of epitec

ASKER

I'll post this info, then, if you guys want to take a look.  The SharedResPool, according to Trend, should be around 300MB, not usually greater.  Mine was 2.14GB... not sure if that really has anything to do with it, but I did change the folder (didn't cause any disruption by restarting SMEX).  From my Trend tech:

"This link below contained the instruction on how to move your Shared Respool Directory to another location:
http://esupport.trendmicro.com/support/search.do?cmd=displayKC&docType=kc&externalId=PUB-en-127549

Also,the link below have an information if what's the normal size of that folder:
http://esupport.trendmicro.com/support/search.do?cmd=displayKC&docType=kc&externalId=PUB-en-126898"
Avatar of epitec

ASKER

The SharedResPool did not make a difference.  I had to reboot this morning.  I contacted Trend to let the tech on the case know that this folder change didn't fix the issue, so I'll see what he says.  Since my MS tech said it seems to hang waiting for a virus scan each time, I've disabled the scheduled virus scan in ScanMail (to see if that fixes it).  ScanMail is currently the only virus product left on this server.  

My Trend technician's first suggestion was to reinstall ScanMail (since we're on the latest build and having this problem).  I may end up trying that if disabling this virus scan doesn't help, although it doesn't seem likely to me that that will help, either.
I was just at a client that seems to be experiencing the same thing, their setup is SBS 2003 SP2, running quite a few other things on the same box, SQL, et al. Plus they are running Trend Worry-free probably standard but not positive. Since i was looking in the exchange for the cause i didn't look at the Trendmicro products so i'm not sure if it has any other product, such as Scanmail installed.

Epitec, do you believe the problem comes from scanmail and have you tried removing it to see if it resolves the problem?
I'm watching this one too, so please keep us up to date, thanks

Avatar of epitec

ASKER

I do believe it is ScanMail causing the issue.  Per my Microsoft engineer (who seems pretty knowledgeable... now that it's been passed around ten times, or so), Exchange normal operations are hanging while it waits for a virus scan to complete.  SMEX is the only virus software I have installed.

I have not tried removing it.  Before we installed it, we had ridiculous amounts of spam; I've been struggling with which is the lesser of two evils - at least this way our users can get some work done (less 15-20 min for reboot every 8 days, if it hits during business hours).

Another suggestion from Trend:
"1. Look for the "VirusScanStampMode" key [he didn't include any sort of path... it's under HKLM\Software\TrendMicro\ScanMail for Exchange\CurrentVersion]
2. Modify the value of the key to "2".
3. Set Reloadnow to 1 (HKLM\System\CCS\services\MSExchangeIS\VirusScan)
4. Restart SMEX services
It's a workaround for crash or hang-up of Exchange server."

It remains to be seen how this will affect anything... shouldn't be long now (recurrence is scheduled before tomorrow :P).
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of epitec

ASKER

Yes, SMEX_Master.exe is part of ScanMail, and I've noticed that it does take a lot of memory.  I wrote this off to the fact that it's constantly scanning for spam, etc.

I don't know if stopping SMEX_Master before it becomes an issue would help.  I thought it might even help (prevent me having to reboot) if I stopped it when the problem recurred (it seemed to me that stopping it would free up the resources and stop whatever scan Exchange is waiting for, so maybe Exchange would start functioning again)... I guess I was wrong.  I tried stopping all SMEX services last time it recurred, and I saw no difference (I still had to reboot). :(

My estimation was wrong; the issue did not recur last night, which means it will probably recur sometime during business hours (I would have preferred it to happen last night :)... either that, or the registry modifications fixed it... but I won't truly believe that until several more days pass.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Well i hope it is good news in your next report.
I haven't looked but i hope the registry items are the same for their worry-free product, a bit of an ironic name for the product :-).
Do you think uninstalling scanmail would fix the problem?
Avatar of epitec

ASKER

Yes, I think it would, because that would completely eliminate whatever virus scan is causing Exchange to hang (it seems to me).
Epitec,
With the registry change you gained from Trend, did they say what it actually does, apart from - "it's a workaround for crash or hang-up of Exchange server" ?
It seems to be a reasonable option rather than disabling the sheduled scan, although setting the scheduled scan to weekly might be a way of limiting the impact.
Avatar of epitec

ASKER

No, I put down exactly what he told me (after that, he got back on his "it must be that you have backups running during the time of the crash" soapbox... never mind the fact that it happens at various times of day and night).

I've considered re-enabling the scheduled scan, but I want to make it another few days.  If it doesn't recur by Sunday night, I will probably consider this resolved, and from that point, I may re-enable the virus scan (as you say, weekly would probably be a better option) so I can more accurately determine whether it was really the registry fix (not that I really want it to happen again, if it wasn't, but I do prefer virus scans to no virus scans - although I have to admit, I am not terribly confident in Trend's products' effectiveness after this ordeal :P).
Epitec, could you clarify.
You stopped the scheduled scan of ScanMail, and then you applied the registry changes (and then restarted the server)?
I'm just wondering if it hasn't crashed because its not scanning or because of the registry changes or both?
Any thoughts?
Avatar of epitec

ASKER

After the last recurrence, I disabled the daily scheduled scan (real-time virus scan is still running).  I believe it was a few days after that when Trend got me the registry solution, so I implemented that (it said it only requires a restart of ScanMail services - I haven't rebooted since 2/18).

That last is the only part about this that I'm not happy about: I don't know if it's the scheduled scan or the registry change that fixed it, since I implemented two changes before verifying the results of one, but I guess that was my own fault... I was just so anxious to get it fixed.  That is why I said I'm thinking about re-enabling the scan after Sunday night, to see if it was the registry changes... but I am a little paranoid that if I re-enable it, that will bring the problem back.  :)  If I had a way to ensure it would not return until after business hours, it wouldn't be so bad... this is when I wish I had a duplicate testing system. :P
Avatar of epitec

ASKER

icepack, I'm requesting more information on those changes (if the tech can explain what that did, or if there is an article in the knowledgebase or somewhere that explains it)... I'll let you know if I hear anything.
Avatar of epitec

ASKER

Our scheduled reboot was yesterday evening, and no recurrence of the issue since 2/18, so this does appear to be resolved.  I sent an update to Microsoft and Trend... still haven't heard back from Trend when I asked about the purpose of the registry changes, so I did ask again.  I'll keep this thread open until I (hopefully) figure out whether it was those changes, or the virus scan (or both).
Epitec,
Thanks. We had been re-booting the SBS server each morning (early before user activity) to clear connections because of a license argument with MS, but as this is now resolved and we don't need to re-boot, I expect we'll see the recurrence of the Exchange failure we were originally experiencing - over the next few days.
When this occurs, I'll do the reg hack and advise of the result.
Avatar of epitec

ASKER

Sounds good.

Per my Trend engineer (I love off-shoring, don't you? I still don't know what the registry changes were supposed to have done :P):

"Just to have a brief information about Stamp mode you can refer to this link below:

http://esupport.trendmicro.com/support/search.do?cmd=displayKC&docType=kc&externalId=PUB-en-1034998

Actually, the purpose of those registry changes will prevent those performance issue into the exchange server. It refers to the scanning of your files wherein it will triggered the capability of scan engine to not being able to hang."

He did say that reducing the scan to weekly should work fine.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of epitec

ASKER

We have not had any crashes.  No further developments to report, but I would like to ask you, because I think the *default* scheduled scan was Daily - at least, I do not remember configuring the daily scan after install - is your current scheduled scan daily, or less frequent?

If icepack is implementing the reg hack, and you are testing the stopped scan, we should get this question answered in short order. :)
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Well i hope i turned it off, the Worry-Free product is all web-based, so there is nowhere which says scanmail. However in the Scheduled settings, which i believe handles both file & email scanning, i've turned it off for the server. For good measure in the General Options in the program i found settings for not scanning the exchange folders???? on the exchange server and on clients. whatever that means - whatever it is i took them off. I've rebooted the server - now its time to wait and see!
I looked in the registry and i found that VirusScanStampMode, so i assume scanmail is integrated into the worry-free product, just hidden under the interface
Avatar of epitec

ASKER

Any news?
Hi Epitec

No news to report so far, there was some work done on the SBS that resulted in some problems for a few days at which point the server was being restarted, so i believe it has only been a few days since that was resolved and the servers being allowed to chug along to wait and see. Its been about 1 week since i believe the last time it was restarted and i haven't had any reports so i'll need to give it another 1-2 weeks i think.

Thanks for everyones help with this
Avatar of epitec

ASKER

twoj, icepack... any results?
We have still had a few problems with the symptoms and result similar but not identical to those previously. Unfortuneately the server has been re-booted for other reasons (changing things around in racks), but the last power cycle was 18th, so by early next week (if no other events occur) we should have a good indication.
Things are looking good on my side, I talked to the ops director at the company and so far so good
we decided to give it one more week, till next monday/ tuesday to be certain. But no registry changes done, just stoped the scans on the server.
I'll report back next week with the final verdict.

I wasn't a big fan of trend before this, amazing how jaded i am of some software!.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of epitec

ASKER

You're welcome!  I'm glad to hear your problem is also solved. :)  It's such a good feeling not to have to worry about Exchange going down in the middle of the day sometime each week...

I'd also be interested to hear Icepack's results.
Avatar of epitec

ASKER

Ok, I guess it's time to close this question... thanks to everyone for their help!

Icepack, (if you see this) whenever you determine whether your issue's fixed, feel free to add it as a comment... :)
Hi I have a customer who is experiencing the very same issue, and I now know it is Trend Causing the problem, we have to down the store and restart it on Wednesday morning at 11:00 Hrs and normally we have a weeks grace until the store needs restarting, I found this log and have been going through, I updated trend to the latest Patches and SP's and all was well, I made the Registry Change last night and within an hour the store had gone down, I rebooted the server this morning and again within an hour of this the store had stopped responding, however I stopped the SMEX Master service and straight away the store was responding and could send receive email.

I will be raising a ticket Trend so will keep you posted, with regards to your comments twoj are you not concerned that you have no virus scanning in place for exchange?

will keep you posted.
Hi abmcsltd
I work as a consultant, so i haven't seen that customer for some time, funny enough i ran into a similar problem with installing ESET exchange antivirus on another company's server, and it would cause the server to just randomly reboot after 2-14 days. Having seen this issue with Trend, i removed the ESET exchange program and no more problem.

To me having a server running 24/7 is the highest priority, i would much rather deal with a virus on a clients computer than with having exchange go down - dealing with one irritated user is much better than a whole company. Second point is that server software should not cause servers or services to stop, this is why i have a list of certain software that i don't use exactly for that reason. Add on top of that, that Trend has been aware of this problem for some time as has not fixed the problem so why should i use it?

Because exchange is usually one of the critical services running in an organization, i need to know that whatever AV program isn't going to screw it up, Trend, ESET, & Mcafee are all programs that by choice i won't install. The ESET NOD32 however is a client program that i have no problem installing on client machines that include an email scanner so that keeps the clients virus free and removes the issues with exchange. If your company is sufficently equiped i would recommend a all-in-one box, most of the higher end firewalls have options to do integrated spam/virus filtering, or their are specific boxes that just do filtering like barracuda.

Hopefully trend will fix it this time, but this just reaffirms my decision to avoid them.
Hi twoj

Thanks for your comments, I am too a consultant and have many customers with Trend Worry Free Business Server Advanced installed, I get the occasional problem with IIS settings but never had it cause problems with Exchange. However saying that I have noticed more problems with Worry Free Business than the old CSMS products.

I will keep you posted on this situation as we are a trend supplier and hopefully may have a little more success with this issue.
 
Thanks to everyone for the incredible history and detail in the posts.  This type of sharing is how Experts-Exchange works so well.

I'm troubleshooting a similar problem.  Also running Trend WFBS-A.  Overall, we have been very happy with Trend.  They have stopped more SPAM and virus attacks than anything else we have used.  Significantly less colateral problems (like this one) than we experienced with Symantec, McAfee, or Sophos.  Will post back what we find on this issue.
Anyone using Trend Worry-Free Advanced should consider deploying the included free version of the hosted email security. It works great and frankly makes it possible to remove the messaging security agent from the servers. It's a bit of work to set it up but it really works fantastic. You may as well offload that job to them and give your Exchange server some rest!