Solved

Incoming Internet Mail, Daily Outages.

Posted on 2003-10-28
47
1,701 Views
Last Modified: 2007-12-19
OK. Here's my issue.  I will start at the beginning just in case these events are related.  About 3 weeks ago, I was remotely connected to the PDC(NT 4.0)/Exchange 5.5 Server via VNC from my workstation in the office.  I was in the middle of creating a user mailbox in Exchange Administrator, when I lost my connection.  Others in the office were suddenly locked up at the same time.  Upon examining the server, I noticed that it was powered off.  It appeared to have shut itself down suddenly.  After rebooting, & checking the services, I noticed that quite a few of the Exchange related services had not started on their own.  Further examination showed that "Startup" on these services was now set to "Manual" instead of "Automatic".  I've never seen settings such as these change on their own.

OK, so I get Exchange & its services back up & running & everything seems to be going well.  However, the last two Fridays, incoming internet Email has stopped working.  Outgoing still works & internet connectivity is still there (DSL connection).  The server needs to be rebooted in order to put things right.  I have yet to see anything in the event log that can clue me in to what is going on.  

From the senders' end here is an exerpt of the non-deliverable report:
"xxx.xxx.xxx.xxx does not like recipient.  Remote host said:  550 Unable to relay for user@domain.com.  Giving up on xxx.xxx.xxx.xxx."  NAT is used on the Netopia Router with ports 110 & 25 open.  Mail has been working just fine for ever up until just these last couple weeks.

I'm not even certain that the first event is related to the intermittent incoming mail outages.  Just looking for some ideas on how to troubleshoot this one.  It now seems to be happening daily.  

Thanks in advance for your assistance.
0
Comment
Question by:2Geeks
  • 25
  • 19
47 Comments
 
LVL 35

Expert Comment

by:Bembi
ID: 9637923
One hint may be your DNS server if you have one and it is used by exchange. Maybe the cache is corrupted after the unexpected shut down. But as it seams to be that service settings has changed (means the registry must have changed), I would say that parts of the registry may be damaged and maybe recreated during the reboot or the swap file has corrupted data. Otherwise it is a little bit curious, that registry settings are changing.

I think, that the reason for the shut down may have something to do with that. Either a damaged RAM or damaged disk sectors are often reasons for cold shut downs. Try to check your hard drives for errors. If there is a sector failure within one of the important system areas (registry , swap file), this may be a reason for symptoms like that. Note, that it may be, that a sector is not completely dead, but looses its information after a while.

So, first check your hard drives for failures and have a look at the protocol on the screen, if there are any messages durich the disk check (usually running with the next boot). Also clear the swap file (by temporary moving to another sidk or by setting the registry values to clear the swap file during reboot).

Run eseutil with the diagnostic switch to check the integrety of your exchange databases.
0
 

Author Comment

by:2Geeks
ID: 9650507
Cleared the Swap File yesterday.  Mail was out again this morning.  Ran CHKDSK this morning, Everything appeared OK.  Mail came back after a reboot.
0
 
LVL 35

Expert Comment

by:Bembi
ID: 9651268
There may be the simple reason, that there is not enough free disk space on your server. Esp. if Windows and Exchange resides at the same drive, it may be that the swap file increases and the drive exceeds a critical limit of avalable disk space. You should have all the time at least 100 MB of free disk space. As you have said, that the server runs for a while, if rebooted, this may be an option. This may also be a reason for a cold reset.

As Exchange server increases the used RAM memory as available, EXCH tries to hold as much data as possible within the RAM. Have a look at your task manager an have a look at the amount of available and used memory. If the limit of your virtual memory is reached, it may happen, that some services may stop working correctly. Also possible is, that a RAM module is damaged and this effect takes place, if EXCH increases memory usage.

As you have said, that there must be a cold reset, something seems to be wrong within your system.

Try to connect to your server (after the ICM stops responding) using

telnet servername 25

and see, if you get an connection response.

Try
eseutil /g ispriv
eseutil /g ispub
eseutil /g ds

this checks (no correction) the database integrety of MS Exchange Server. Note: Never run this tool with other switches against the Directory (/ds)!!!
0
 
LVL 35

Expert Comment

by:Bembi
ID: 9651309
Another option, have a look at your total size of priv.mdb and pub.mdb. Dependend on your Exchange version, the total amout may be limited to 16GB. If this limit is reached, force your users to delete unneeded mails and attachements and then compact the database using

eseutil /d ispriv
eseutil /d ispub

Note that eseutil is not a maintenance tool, but a emergency repair tool. That means, that you should run these tools only if necessary.
0
 

Author Comment

by:2Geeks
ID: 9652004
Disk space shouldn't be an issue.  I have 1.7Gb available on the OS drive & 17Gb on the Exchange drive. The IS is limited to 16Gb, however, the priv is only 522mb & pub is only 20mb.  How long should a ESEUTIL take to complete? Any idea?  Does it need to be offline to run this?
0
 
LVL 35

Expert Comment

by:Bembi
ID: 9652738
For eseutil, you have to stop the information store. For your size, I would say it is a work of about 10 - 20 minutes.
0
 

Author Comment

by:2Geeks
ID: 9652916
I will probably schedule that for early next week (Mon. or Tues.).  I will keep you posted as to how it goes.  Thanks.
0
 

Author Comment

by:2Geeks
ID: 9670779
OK, Here's what I get.


C:\>eseutil /g g:\exchsrvr/mdbdata/priv.edb /x

Microsoft(R) Windows NT(TM) Server Database Utilities
Version 5.5
Copyright (C) Microsoft Corporation 1991-1999.  All Rights Reserved.

Initiating INTEGRITY mode...
        Database: g:\exchsrvr/mdbdata/priv.edb
  Temp. Database: INTEG.EDB

checking database integrity

                    Scanning Status  ( % complete )

          0    10   20   30   40   50   60   70   80   90  100
          |----|----|----|----|----|----|----|----|----|----|
          ..............................ERROR: orphaned LV (lid 98099, size 17289, refcount 0). Non-corrupting error
ERROR: node [107815:0]: leaf node check failed
ERROR: orphaned LV (lid 98100, size 9216, refcount 0). Non-corrupting error
ERROR: node [107820:0]: leaf node check failed
..................ERROR: orphaned LV (lid 95564, size 1242, refcount 0). Non-cor
rupting error
ERROR: node [94509:4]: leaf node check failed
ERROR: orphaned LV (lid 95568, size 1479, refcount 0). Non-corrupting error
ERROR: node [95537:0]: leaf node check failed
ERROR: orphaned LV (lid 95604, size 1455, refcount 0). Non-corrupting error
ERROR: node [95537:2]: leaf node check failed
ERROR: orphaned LV (lid 95676, size 6839, refcount 0). Non-corrupting error
ERROR: node [96218:0]: leaf node check failed
ERROR: orphaned LV (lid 95677, size 1460, refcount 0). Non-corrupting error
ERROR: node [120750:0]: leaf node check failed
...

integrity check completed.
Operation completed successfully in 216.937 seconds.
0
 

Author Comment

by:2Geeks
ID: 9671008
I'm not too sure what these errors mean.  Whether they're critical or not.  Or if Non-corrupting means that this isn't the cause of my problems.  Any ideas????
0
 
LVL 35

Expert Comment

by:Bembi
ID: 9674306
See
http://support.microsoft.com/default.aspx?scid=kb;en-us;185271&Product=ech

for the errors, MS states that they are harmless. If you want to repair / compact your database, make sure you allways have a backup of the information stores and the directory. Never run the repair mode against the directory!!!

Another point, as your server responsds with a "relay" message and is still working for outgoing messages, that means all services are working, but there is an issue with routing.

Have a look at the following article:
http://support.microsoft.com/default.aspx?scid=kb;en-us;148284&Product=ech

Be carefully using such tools and make allways backups (in this case the MTAData directory)
0
 

Author Comment

by:2Geeks
ID: 9714876
In the NDR from the mail sender's end, in the following message:

<<xxx.xxx.xxx.xxx does not like recipient.
Remote host said: 550 Unable to relay for rshrader@grahamei.com
Giving up on xxx.xxx.xxx.xxx.>>

xxx.xxx.xxx.xxx is the ip address of my router.   Does this give any clues?  I can telnet to port 25 on the server & receive a response even though mail is not getting in.
0
 
LVL 35

Expert Comment

by:Bembi
ID: 9716580
First at all, a link you should regard, this replaces the telnet, you should run it if your server stops delivery. But run it anyway, there is also another issue, you should regard.
http://www.checkdns.net/

The relay message says, that your server do not accept the mail (all mails), the IP is your router IP because this is public and your server can not directly be seen from the internet (because of NAT).

Some questions:
1.) You said, you have neither in the system nor in the application log any error messages? Have you checked the security log? If you enable the SMTP Log within IMC, you may have an idea, when the IMC stops accepting mails. Additionally, you may be able to determine, if there are several mails or domains, which make trouble.

2.) MTA or ICM queues:
Have you checked the queues of the MTA and the IMC. Are there messages in this queues which should not be there? Esp. after the server stops accepting mails?

3.) Service Pack Level:
Which service packs are installed on WinNT4 and Exchange. Have you applied the laste fixes after EXCH SP4, if installed?

4.) Can you experience, if the IMC also stops responding, if nothing happens on the server, like on weekends. It is also possible, that a damaged rule or wrong configurated rule / out  of office rule produces this issue.

5.) Whithin you Exchange server, you have a lot of log options for the event log. It maybe helpful, to enable the eventlogs for MTA and IMC transport issues.

6.) Do you have the dump from the crash you describes. If the server has produced a cold reset, there may be a memory dump from this crash somewhere on your harddisk. Within this dump, there may be blue screen information, if the server was able to write the dump.

7.) Have you checked your router, tried to reset the router after the server stops accepting mails?  Does your youter have an error log, which can give some information.

8.) is there a firewall (or MS Proxy) or something else in front of the EXCH server?

9.) How many mailboxes do you host (for an idea of how many work a solution may be)
0
 

Author Comment

by:2Geeks
ID: 9717306
Went to www.CheckDNS.net and this is what I got:

Asking root servers about authoritative NS for domain
  Got DNS list for 'mycompany.com' from a.gtld-servers.net
  Found NS record: auth40.ns.uu.net[198.6.1.18], was resolved to IP address by a.gtld-servers.net
  Found NS record: auth62.ns.uu.net[198.6.1.19], was resolved to IP address by a.gtld-servers.net
  Domain has 2 DNS server(s)

Verifying if NS are alive
  DNS server auth40.ns.uu.net[198.6.1.18] is alive and authoritative for domain mycompany.com
  DNS server auth62.ns.uu.net[198.6.1.19] is alive and authoritative for domain mycompany.com
  2 server(s) are alive

Check if all NS have the same version
  All 2 your servers have the same zone version 4



Check mail-servers
  Domain mycompany.com has 2 mail-servers.
  Checking mail server (PRI=10) mail.mycompany.com [xxx.xxx.xxx.xxx]
  Mail server mail.mycompany.com[xxx.xxx.xxx.xxx] answers on port 25
  <<< 220 exchange.mycompany.com ESMTP Server (Microsoft Exchange Internet Mail Service 5.5.2656.59) ready
  >>> HELO www.checkdns.net
  <<< 250 OK
  >>> MAIL FROM: <dnscheck@uniplace.com>
  <<< 250 OK - mail from <dnscheck@uniplace.com>
  >>> RCPT TO: <postmaster@mycompany.com>
  <<< 250 OK - Recipient <POSTMASTER@mycompany.COM>
  >>> QUIT
  Mail server mail.mycompany.com [xxx.xxx.xxx.xxx] accepts mail for mycompany.com
  Checking mail server (PRI=100) mail.uu.net [199.171.54.98]
  Error connecting to mail server mail.uu.net [199.171.54.98] port 25 : timed out waiting for connection
  Some of your MX do not work properly


This report states that I have 2 mail servers.  I only have one internal Exchange server.  Is this a standard response?  Does the MX record typically include a server from your isp as well as your mail server???  For backup purposes or something?  I don't understand that portion of the report.  could this be an underlying cause of my problems??  If this other uunet mail server cannot be reached, yet is apparantly supposed to handle my mail, that is not a good thing, no?
0
 
LVL 35

Expert Comment

by:Bembi
ID: 9717960
I've run that before I wrote my post, now it seem to be OK. AS you can see, your server now works properly for the moment. The second mail server is a secondary MX record, which is hosted by your provider. As this server was dead in my first try, I wrote you to test it. I can not see if it is a SMTP mail forwarder or a POP3 box, but mails, which can not be delivered to your server will been forwarded to this uu.net server. This is a usual configuration, but you should clarify this with your provider.

Means also, if this server is dead, it can not accept mails, if your server is also dead. Also you have to clarify, what happens to these mails. Note, that the second server is only used, if your server is not reachable. If your server rejects the mail, this mail is not forwarded to uu.net.

The general result says, that your real backbone provider is uu.net, must not be your direct ISP as your ISP is a reseller of uu.net.

But this will not solve our problem, therefore it would be great, if you can answer my questions, so that I can have a few new ideas. As the last test worked fine, your external configuration seems to be OK. Nevertheless, repeat the test a few times to see, if it is stable, as we have seen a few errors, which should not be. Otherwise, you should talk to your provider.

0
 

Author Comment

by:2Geeks
ID: 9722092
If there was a dump, on the server when it crashed, I didn't see it.  Would this typically be stored somewhere in particular if there was a dump??  How would I go about finding it?  Or reading it?
0
 

Author Comment

by:2Geeks
ID: 9724579
<<1.) You said, you have neither in the system nor in the application log any error messages? Have you checked the security log? If you enable the SMTP Log within IMC, you may have an idea, when the IMC stops accepting mails. Additionally, you may be able to determine, if there are several mails or domains, which make trouble.>>

Nothing looks out of the ordinary in the Security log.  In the Application log this morning I did notice an event not seen before:

Event ID 290
Source MSExchange MTA
Type Warning
Category X.400 Service

Description  
A non--delivery report (reason code unable-to-transfer and diagnostic code unrecognised-OR-name) is being generated for message C=US;A= ;P=Graham Enterpris;L=Graham_NT2-031111130523Z-24.
It was originally destined for DN:/o=GRAHAM ENTERPRISES,Inc./ou=G§ (recipient number 1), and was to be redirected to . [MTA DISP:RESULT 14 136] (12)

And 3 minutes later:

Event ID 290
Source MSExchange MTA
Type Warning
Category X.400 Service

Description  
A non--delivery report (reason code unable-to-transfer and diagnostic code unrecognised-OR-name) is being generated for message C=US;A= ;P=Graham Enterpris;L=Graham_NT2-031111130519Z-23.
It was originally destined for DN:/o=GRAHAM ENTERPRISES,Inc./ou=G§ (recipient number 1), and was to be redirected to . [MTA DISP:RESULT 14 136] (12)

<<2.) MTA or ICM queues:
Have you checked the queues of the MTA and the IMC. Are there messages in this queues which should not be there? Esp. after the server stops accepting mails?>>

I see nothing in the IMS queues.

<<3.) Service Pack Level:
Which service packs are installed on WinNT4 and Exchange. Have you applied the laste fixes after EXCH SP4, if installed?>>

NT SP6 & Exchange 5.5 SP4

<<4.) Can you experience, if the IMC also stops responding, if nothing happens on the server, like on weekends. It is also possible, that a damaged rule or wrong configurated rule / out  of office rule produces this issue. >>

No one is in the office for the most part on weekends.  Plus, it happens during the week also.  Where can I tell in Exchange Admin, what rules, if any, users have applied?

<<5.) Whithin you Exchange server, you have a lot of log options for the event log. It maybe helpful, to enable the eventlogs for MTA and IMC transport issues.>>

I recently set it to "Medium"

<<6.) Do you have the dump from the crash you describes. If the server has produced a cold reset, there may be a memory dump from this crash somewhere on your harddisk. Within this dump, there may be blue screen information, if the server was able to write the dump. >>

Where can I look for this??

<<7.) Have you checked your router, tried to reset the router after the server stops accepting mails?  Does your youter have an error log, which can give some information.>>

Reseting the router does nothing.  Nothing in it's logs.

<<8.) is there a firewall (or MS Proxy) or something else in front of the EXCH server?>>

No
 
<<9.) How many mailboxes do you host (for an idea of how many work a solution may be)>>

65 mailboxes, users & hidden.
0
 
LVL 35

Expert Comment

by:Bembi
ID: 9726833
The dump has usually the extension *.dmp, search on your C: drive, may be the system directory od the DrWatson (if enabled). The path is setup in the System settings of your computer.
0
 
LVL 35

Expert Comment

by:Bembi
ID: 9726975
1.) I assume, that these messages came, after the delivery hast stopped?

4.) Not really on your exchange server, only on the clients. As they may have defined server rules (i.e. out of office rules), theses rules are running also if the clients is shut down.

You can run ISINTEG -PRI -TEST ALLTESTS from the exchange\bin directory, so see if anything comes back.

ISINTEG -FIX -PRI -TEST ALLTESTS fixes problems, if found, but always run all these tools only if needed and always make a copy/backup of your databases, before you run them

If you have enabled the transaction log, you may be able to see, when the server stops accepting mails. Maybe you can find a recurring rule like a time, a recipient or a sender before the services stops. Also you can enable SMTP logging for the same reason.
0
 

Author Comment

by:2Geeks
ID: 9731313
Ran a checkdns again this morning as mail was not coming in and received the same 550 error that senders get wehn their messages bounce back:

Check mail-servers
  Domain grahamei.com has 2 mail-servers.
  Checking mail server (PRI=10) mail.grahamei.com [206.114.240.32]
  Mail server mail.grahamei.com[206.114.240.32] answers on port 25
  <<< 220-graham_nt2.grahamei.com Microsoft SMTP MAIL ready at Wed, 12 Nov 2003 07:37:29 -0600 Version: 5.5.1877.197.19
  >>> HELO www.checkdns.net
  <<< 220 ESMTP spoken here
  <<< 250 graham_nt2.grahamei.com Hello [212.117.192.42]
  >>> MAIL FROM: <dnscheck@uniplace.com>
  <<< 250 dnscheck@uniplace.com....Sender OK
  >>> RCPT TO: <postmaster@grahamei.com>
  <<< 550 Unable to relay for postmaster@grahamei.com
  Probably mail server does not accept mail for grahamei.com and recognizes this as relay attempt.
  Checking mail server (PRI=100) mail.uu.net [199.171.54.245]
  Mail server mail.uu.net[199.171.54.245] answers on port 25
  <<< 220 mr0.ash.ops.us.uu.net ESMTP Please see http://www.worldcom.com/global/terms/a_u_p/ for Acceptable Use Policy
  >>> HELO www.checkdns.net
  <<< 250 mr0.ash.ops.us.uu.net Hello www.checkdns.net [212.117.192.42], pleased to meet you
  >>> MAIL FROM: <dnscheck@uniplace.com>
  <<< 250 <dnscheck@uniplace.com>... Sender ok
  >>> RCPT TO: <postmaster@grahamei.com>
  <<< 250 <postmaster@grahamei.com>... Recipient ok
  >>> QUIT
  Mail server mail.uu.net [199.171.54.245] accepts mail for grahamei.com
  Some of your MX do not work properly
0
 

Author Comment

by:2Geeks
ID: 9731394
<<If you have enabled the transaction log, you may be able to see, when the server stops accepting mails. Maybe you can find a recurring rule like a time, a recipient or a sender before the services stops. Also you can enable SMTP logging for the same reason.>>

Where exactly is the Transaction Log you refer to?  I want to make sure it is enabled.
0
 
LVL 35

Expert Comment

by:Bembi
ID: 9732815
The transaction log can be enabled within the properties of your server (Ex-Manager - navigate to your servername - properties).

There is a checkbox and a retention time in days, how long the transaction logs should stay on your hard drive. The tansaction logs are stored at a *.log subfolder of your exchange server.

The transaction log is necessary to see the message state and routing, a menu item which can be found at the menu "tools".

What you may not see is, which mails are rejected, but you can see, when the last mail came in.
0
 
LVL 35

Expert Comment

by:Bembi
ID: 9732983
DNSCheck: If all other test worked fine, the DNSCheck has discovered the same that your senders. AS the test runs successfully before, this does not really help.

One more question, as one of the errors may point to this.
You said, you have one exchange server, I assume, that there are no additional Connectors installed?
0
Want to promote your upcoming event?

Attending an event? Speaking at a conference? Or exhibiting at a tradeshow? Easily inform your contacts by using a promotional banner in your email signature. This will ensure your organization’s most important contacts are in the know.

 

Author Comment

by:2Geeks
ID: 9739365
I see the diagnostic logging settings, and the Database Circular logging settings, but I'm not seeing what you describe for a Transaction log. Maybe I'm not looking in the right place.
0
 

Author Comment

by:2Geeks
ID: 9739395
Data Consistency Adjuster??
Filtering Inconsistencies???? #of days??
Is this it?
0
 
LVL 35

Expert Comment

by:Bembi
ID: 9742666
Sorry: Klick on
Configuration - MTA Site-Configuration - First Tab, there is a checkbox for tracking
Configuration - IS Site-Configuration - First Tab, there is another checkbox for tracking

The retention time can be set at
Click on your server-name (left box) and System Attendent (SA) - Properties in th right box.
0
 

Author Comment

by:2Geeks
ID: 9763430
Thanks.  Got it now.
Server just shut down again, without warning, on Friday.  Of course, because I wasn't in the office on Friday.  Sounds like things may be getting slightly worse. Looks like an MTA database recovery operation ran successfully shortly after the server was back online.  
I also found this from a few hours later than that on Friday eve. as well as again early this morning:

Event ID: 4122
Source: MSExchangeIMC
Type: Error
Category: Internal Processing
Description:
An error occurred while retrieving the originating address of a message to be delivered. Since the originating address is needed for mail delivery, the mail cannot be delivered. The message that was being processed has been moved to the "BAD" folder.  Use the appropriate  utilities found in the SUPPORT directory of your Exchange CD to view and manipulate these messages.

Also an unusually high number of these:

Event ID: 4188
Source: MSExchangeIMC
Type: Error
Category: SMTP Interface Events
Description:
Refused to relay <relay@kiberecords.com.br> for 200-140-066-248.gnace7005.dsl.brasiltelecom.net.br (200.140.66.248).


As well as:

Event ID: 4131
Source: MSExchangeIMC
Type: Error
Category: SMTP Interface Events
Description:
The following message could not be delivered to <null@[127.0.0.1]>. The destination server reported: 550 Relaying is prohibited From: <> Subject: Undeliverable: open relay test message

And:

Event ID: 4188
Source: MSExchangeIMC
Type: Error
Category: SMTP Interface Events
Description:
Refused to relay <null@localhost> for localhost (127.0.0.1).

And:

Event ID: 4188
Source: MSExchangeIMC
Type: Error
Category: SMTP Interface Events
Description:
Refused to relay <null@grahamen-gw.customer.dsl.alter.net> for grahamen-GW.CUSTOMER.DSL.ALTER.NET (206.114.240.32).

And:

Event ID: 3004
Source: MSExchangeIMC
Type: Warning
Category: Message Transfer
Description:
An NDR could not be sent. This is most likely because the original message had a blank originating address.  In most cases this is normal behavior, although it can sometimes  indicate a local or remote server configuration problem. If archiving was enabled at the time of failure, you should be able to find the failed message in the file: ..\IMCDATA\IN\ARCHIVE\W70XXLBQ.

Not sure what all this means yet.  Just starting to look at log files & such this morning.
0
 
LVL 35

Expert Comment

by:Bembi
ID: 9765431
What you see there are some relay attacks against your server. The first of the messages comes, as you have enable Reverse DNS Lookup. Means, your server checks the existance of the IP address before accepting mails. The behaviour is ok. The others are some usual relay attacks. Nevertheless, you should have a look at your SMTP log file to have an idea, where there are coming from. As long as your server produces these messages, it is ok. Have a look, if you get further messages during the week. If you see some of them only before your server stops working, it may be, that the MTA or IMC is/was confused by some illegal mail header contents.
0
 

Author Comment

by:2Geeks
ID: 9810273
OK. I think I have it.  Apparantly, a previous consultant here set up a "robocopy" script that runs nightly.  It stops all Exchange services, backs up the mdbdata folder to another drive letter, then restarts the services.  For the last couple weeks, I've disabled this script & as a result, have not had to reboot once (with the exception of the server crash).   So, I'm beginning to think maybe the crash & the email issues were unrelated.  Do you know anything about Robocoy??? or why it would suddenly start causing these symptoms over the last couple months??
0
 
LVL 35

Expert Comment

by:Bembi
ID: 9814606
Uups, I assume that not really robocopy is the problem as a wrong timing. But this also can be done using a simple xcopy. It may be that the start sequence or stop sequence runs in a wrong order or something was blocked or forgott to restart (like virus scan or such services)?
0
 

Author Comment

by:2Geeks
ID: 9819706
Well, The McAfee services relating to Exchange tipped me off, because they were not running in the morning when email was out (Groupshield & On-Line Update).  Exchange was shutting them down at night but they weren't specified in the copy script to restart.  So I added those lines into the script.  McAfee services would then start back up every evening, but we would still have the mail outages.  All other services "appear" to be normal.
0
 
LVL 35

Expert Comment

by:Bembi
ID: 9821299
Do you have also the MC Afee SMTP Web Shield installed? Restarted? Is the script running on the same machine from a remote machine?. I made the experience, that the MC Afee services need some time to start.
0
 

Author Comment

by:2Geeks
ID: 9837647
The script runs on the same server.  As far as McAfee goes, All that runs on this machine is:

Alert Manager
Groupshield Exchange
Groupshield On-Line Update
McShield
Task Manager

Currnetly, all appear to be running properly.  
0
 
LVL 35

Expert Comment

by:Bembi
ID: 9839198
If you manually stop and restart the services (which does usually the script), is your server then running properly? Have you ever installed the latest version of Group-Shield? (5.2 SP1). Have you had a look into the knowledge-base of McAfee for issues about this?
0
 

Author Comment

by:2Geeks
ID: 9839414
I haven't tried recently, but I do remember a couple times, trying a manual restart of all Exchange-related services right after the email outages began occurring.  But that did not bring back the email.  We do not have the latest McAfee installed yet.  Ours is a couple versions old.  That may be something to try.
0
 

Author Comment

by:2Geeks
ID: 10219061
Didn't see anything in McAfee's Knowledge Base.  Will try soon to upgrade to newer version of McAfee.  I revisited this issue last nite.  I had disabled the Robocopy script for awhile since last posting.  Last nite I re-enabled it & it knocked out incoming email by this morning.  It shouldn't be anything to do with the order of services restarting, 'cause that hasn't changed at all.
0
 

Author Comment

by:2Geeks
ID: 10462677
Upgraded McAfee to latest version this morning.  I have also been playing around with Xcopy.  When backing up through Xcopy, I get the same results. I stop all Exchange & related services, copy the database, & restart services.  Then I get no incoming emails.  This time however, stopping all Exchange & related services, & restarting brought it back.  Will test again on Monday now that McAfee has been upgraded.
0
 

Author Comment

by:2Geeks
ID: 10486792
Tested Xcopy again this morning with Groupshield 5.0 sp2 now running.  Same results. Knocking out incoming mail.  I tried stopping & restarting MTA & that seemed to bring the email back up.  Ran Xcopy again & knocked out email one more time.  This time stopping & restarting MTA didn't do anything.  I had to stop & restart all Exchange services. Any other thoughts?  Anyone?
0
 
LVL 35

Expert Comment

by:Bembi
ID: 10797522
2Geeks:
Was out for some time, have you solved in the meanwhile or is there any additional help needed?
0
 

Author Comment

by:2Geeks
ID: 10951989
No, unfortunately, still not solved. At the moment we are in the midst of replacing the DSL connection with a T1 & changing ISPs in the process.  I think I will wait until the change is complete & see if the problem has changed at all.  I don't know if things will be any different or not with a new ISP & DNS, MX etc...  One thing at a time for the moment.  Should know something in a week or two.
0
 

Author Comment

by:2Geeks
ID: 11116981
I have "not" abandoned the question!  As posted above, I have just wrapped up an ISP change.  I am currently retesting and monitoring the above mentioned issues.  
0
 
LVL 35

Expert Comment

by:Bembi
ID: 11121038
2Greeks, simply post a short message here every two weeks to keep the thread open...
0
 

Author Comment

by:2Geeks
ID: 11225587
OK.  For a while things seemed to be working quite well.  I found that if I added a statement to the Robocopy script to stop & restart the SMTP svc after the exchange copy & restarting all other services, the outages appeared to not occur.  I'm beginning to think that I have a larger issue.  This morning the server rebooted itself with no warning.  This happened also back in October of last year.  All that I can see in the event log prior to the shutdown is:

Event ID:  9330
Source:  MSExchangeMTA
Type: Warning
Category: Directory Access
Description:
An error has occurred reading a value from the directory.  A call to DS_WAIT () has returned the error 32. [BASE IL OPERATOR 21 491] (10)

Don't know what this means.  Can't find anything like it on the web either.  Not sure if it is indicative of a greater issue or not. This is the only warning or error just before the reboot.
0
 
LVL 35

Accepted Solution

by:
Bembi earned 500 total points
ID: 11235489
Sounds like a hardware problem? I made the experience in the last days, that one of my server went down without error messages. Installed a new processor cooler and the problem is away. Also RAMs may be a problem as well as the issue of some older boards, that they went instable as a result of cheap components. If your board supports this, you should install a DMI hardware monitor libe ASUS Probe for ASUS boards to see, if there is shown any problem. Such tools are also able to write a logfile, so you can see, what happens when the server goes down. Also the memory dump of windows maybe a help.

Another issue is, if the server runs out of memory (harddisk or RAM). Some programs do not like that and shoot down the server.

Note, that it may seems to be, that your server is running within the limits for a while, and he went out the limits, i.e. if online defragmentation is running (or other services like virus scanners or backup software). Have a look, if you can determine the shutdown time and check all services, which are running in this time scope.
0
 

Author Comment

by:2Geeks
ID: 11548216
Bembi,
Sorry for the lack of postings laely.  I've had several other priorities taking up most of my time.  No memory dump this last time.  Compaq utilities don't tell me anything as far as hardware.  Nothing in the logs either to point to any culprit.  I think I'm leaning toward a hardware problem also at this point.  At least the outages are under control to the extent that I know how to bring it back up.  If incoming mail is down, stopping and restarting Microsoft SMTP service does the trick.  It only seems to happen occasionally anymore.  Most of the outages lately seem to happen after a reboot of the server.  I've added stopping and starting of the SMTP service to the nightly Exchange backup script also.  That seems to have taken care of the outages occurring daily after the script runs.  At the moment, unfortunately, because it is somewhat under control, it has sort of taken a backseat to a couple other projects.  I think that maybe in the long run, replacing the server may be the best thing here for better stability and performance.  It is only a 500MHz unit and, so far, tight budgets have prevented any real discussion of replacing this unit.  However with these increased annoyances and a few other quirks as of late, I think that the idea of a new server may be revisited with renewed interest.  I will leave it at that for now.  I thank you for the ongoing discussion and ideas.  It has been educational.  I will accept an answer so you are awarded the points for all your time and effort.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
outlook, calendar, exchange 10 27
Older clients and Exchange 2016 5 39
MX Backup 4 39
Hide External contact 13 30
Follow this checklist to learn more about the 15 things you should never include in an email signature from personal quotes, animated gifs and out-of-date marketing content.
Marketers need statistics and metrics like everybody else needs oxygen. In this article we explain how to enable marketing campaign statistics for Microsoft Exchange mail.
In this video we show how to create a Contact in Exchange 2013. We show this process by using the Exchange Admin Center. Log into Exchange Admin Center.: First we need to log into the Exchange Admin Center. Navigate to the Recipients >> Contact ta…
In this video we show how to create an Address List in Exchange 2013. We show this process by using the Exchange Admin Center. Log into Exchange Admin Center.: First we need to log into the Exchange Admin Center. Navigate to the Organization >> Ad…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now